[infra] fetch-zuul-cloner and permissions (was: redefining devstack)
On 2019-06-04 17:23:46 +0100 (+0100), Graham Hayes wrote:
> I have been trying to limit this behaviour for nearly 4 years 
> (it can actually add 10-15 mins sometimes depending on what source trees
> I have mounted via NFS into a devstack VM when doing dev)
>  - https://review.opendev.org/#/c/203698
Similar I suppose, though the problem mentioned in this subthread is
actually not about the mass permission change itself, rather about
the resulting permissions. In particular the fetch-zuul-cloner role
makes the entire set of provided repositories world-writeable
because the zuul-cloner v2 compatibility shim performs clones from
those file paths and Git wants to hardlink them if they're being
cloned within the same filesystem. This is necessary to support
occasions where the original copies aren't owned by the same user
running the zuul-cloner shim, since you can't hardlink files for
which your account lacks write access.
I've done a bit of digging into the history of this now, so the
following is probably boring to the majority of you. If you want to
help figure out why it's still there at the moment and what's left
to do, read on...
Change https://review.openstack.org/512285 which added the chmod
task includes a rather prescient comment from Paul about not adding
it to the mirror-workspace-git-repos role because "we might not want
to chmod 777 on no-legacy jobs." Unfortunately I think we failed to
realize that it already would because we had added fetch-zuul-cloner
to our base job a month earlier in
https://review.openstack.org/501843 for reasons which are not
recorded in the change (presumably a pragmatic compromise related to
the scramble to convert our v2 jobs at the time, I did not resort to
digging in IRC history just yet). Soon after, we added
fetch-zuul-cloner to the main "legacy" pre playbook with
https://review.opendev.org/513067 and prepared to test its removal
from the base job with https://review.opendev.org/513079 but that
was never completed and I can't seem to find the results of the
testing (or even any indication it was ever actually performed).
At this point, I feel like we probably just need to re-propose an
equivalent of 513079 in our base-jobs repository, exercise it with
some DNM changes running a mix of legacy imported v2 and modern v3
native jobs, announce a flag day for the cut over, and try to help
address whatever fallout we're unable to predict ahead of time. This
is somewhat complicated by the need to also do something similar
in https://review.opendev.org/656195 with the bindep "fallback"
packages list, so we're going to need to decide how those two
efforts will be sequenced, or whether we want to combine them into a
single (and likely doubly-painful) event.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 963 bytes
Desc: not available