[infra] fetch-zuul-cloner and permissions (was: redefining devstack)
On Tue, Jun 04, 2019 at 05:32:41PM +0000, Jeremy Stanley wrote:
> On 2019-06-04 17:23:46 +0100 (+0100), Graham Hayes wrote:
> > I have been trying to limit this behaviour for nearly 4 years 
> > (it can actually add 10-15 mins sometimes depending on what source trees
> > I have mounted via NFS into a devstack VM when doing dev)
> >  - https://review.opendev.org/#/c/203698
> Similar I suppose, though the problem mentioned in this subthread is
> actually not about the mass permission change itself, rather about
> the resulting permissions. In particular the fetch-zuul-cloner role
> makes the entire set of provided repositories world-writeable
> because the zuul-cloner v2 compatibility shim performs clones from
> those file paths and Git wants to hardlink them if they're being
> cloned within the same filesystem. This is necessary to support
> occasions where the original copies aren't owned by the same user
> running the zuul-cloner shim, since you can't hardlink files for
> which your account lacks write access.
> I've done a bit of digging into the history of this now, so the
> following is probably boring to the majority of you. If you want to
> help figure out why it's still there at the moment and what's left
> to do, read on...
> Change https://review.openstack.org/512285 which added the chmod
> task includes a rather prescient comment from Paul about not adding
> it to the mirror-workspace-git-repos role because "we might not want
> to chmod 777 on no-legacy jobs." Unfortunately I think we failed to
> realize that it already would because we had added fetch-zuul-cloner
> to our base job a month earlier in
> https://review.openstack.org/501843 for reasons which are not
> recorded in the change (presumably a pragmatic compromise related to
> the scramble to convert our v2 jobs at the time, I did not resort to
> digging in IRC history just yet). Soon after, we added
> fetch-zuul-cloner to the main "legacy" pre playbook with
> https://review.opendev.org/513067 and prepared to test its removal
> from the base job with https://review.opendev.org/513079 but that
> was never completed and I can't seem to find the results of the
> testing (or even any indication it was ever actually performed).
Testing was done, you can see that in
https://review.opendev.org/513506/. However the issue was, at the time,
projects that were using tools/tox_install.sh would break (I have no
idea is that is still the case).
For humans interested,
https://etherpad.openstack.org/p/zuulv3-remove-zuul-cloner was the
etherpad to capture this work.
Eventually I ended up abandoning the patch, because I wasn't able to
keep pushing on it.
> At this point, I feel like we probably just need to re-propose an
> equivalent of 513079 in our base-jobs repository, exercise it with
> some DNM changes running a mix of legacy imported v2 and modern v3
> native jobs, announce a flag day for the cut over, and try to help
> address whatever fallout we're unable to predict ahead of time. This
> is somewhat complicated by the need to also do something similar
> in https://review.opendev.org/656195 with the bindep "fallback"
> packages list, so we're going to need to decide how those two
> efforts will be sequenced, or whether we want to combine them into a
> single (and likely doubly-painful) event.
> Jeremy Stanley