git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Pinning dependencies for Apache Airflow


pip-tools can definitely help here to ship a reference [locked]
`requirements.txt` that can be used in [all or part of] the CI. It's
actually kind of important to get CI to fail when a new [backward
incompatible] lib comes out and break things while allowing version ranges.

I think there may be challenges around pip-tools and projects that run in
both python2.7 and python3.6. You sometimes need to have 2 requirements.txt
lock files.

Max

On Sun, Oct 7, 2018 at 5:06 AM Jarek Potiuk <Jarek.Potiuk@xxxxxxxxxxx>
wrote:

> It's a nice one :). However I think when/if we go to pinned dependencies
> the way poetry/pip-tools do it, this will be suddenly lot-less useful It
> will be very easy to track dependency changes (they will be always
> committed as a change in the .lock file or requirements.txt) and if someone
> has a problem while upgrading a dependency (always consciously, never
> accidentally) it will simply fail during CI build and the change won't get
> merged/won't break the builds of others in the first place :).
>
> J.
>
> On Sun, Oct 7, 2018 at 6:26 AM Deng Xiaodong <xd.deng.r@xxxxxxxxx> wrote:
>
> > Hi folks,
> >
> > On top of this discussion, I was thinking we should have the ability to
> > quickly monitor dependency release as well. Previously, it happened for a
> > few times that CI kept failing for no reason and eventually turned out it
> > was due to dependency release. But it took us some time, sometimes a few
> > days, to realise the failure was because of dependency release.
> >
> > To partially address this, I tried to develop a mini tool to help us
> check
> > the latest release of Python packages & the release date-time on PyPi.
> So,
> > by comparing it with our CI failure history, we may be able to
> troubleshoot
> > faster.
> >
> > Output Sample (ordered by upload time in desc order):
> >                                Latest Version          Upload Time
> > Package Name
> > awscli                    1.16.28
> 2018-10-05T23:12:45
> > botocore                1.12.18                      2018-10-05T23:12:39
> > promise                   2.2.1
> 2018-10-04T22:04:18
> > Keras                     2.2.4
>  2018-10-03T20:59:39
> > bleach                    3.0.0
> 2018-10-03T16:54:27
> > Flask-AppBuilder         1.12.0                2018-10-03T09:03:48
> > ... ...
> >
> > It's a minimal tool (not perfect yet but working). I have hosted this
> tool
> > at https://github.com/XD-DENG/pypi-release-query.
> >
> >
> > XD
> >
> > On Sat, Oct 6, 2018 at 12:25 AM Jarek Potiuk <Jarek.Potiuk@xxxxxxxxxxx>
> > wrote:
> >
> > > Hello Erik,
> > >
> > > I understand your concern. It's a hard one to solve in general (i.e.
> > > dependency-hell). It looks like in this case you treat Airflow as
> > > 'library', where for some other people it might be more like 'end
> > product'.
> > > If you look at the "pinning" philosophy - the "pin everything" is good
> > for
> > > end products, but not good for libraries. In the case you have Airflow
> is
> > > treated as a bit of both. And it's perfectly valid case at that (with
> > > custom python DAGs being central concept for Airflow).
> > > However, I think it's not as bad as you think when it comes to exact
> > > pinning.
> > >
> > > I believe - a bit counter-intuitively - that tools like
> pip-tools/poetry
> > > with exact pinning result in having your dependencies upgraded more
> > often,
> > > rather than less - especially in complex systems where dependency-hell
> > > creeps-in. If you look at Airflow's setup.py now - It's a bit scary to
> > make
> > > any change to it. There is a chance it will blow at your face if you
> > change
> > > it. You never know why there is 0.3 < ver < 1.0 - and if you change it,
> > > whether it will cause chain reaction of conflicts that will ruin your
> > work
> > > day.
> > >
> > > On the contrary - if you change it to exact pinning in
> > > .lock/requirements.txt file (poetry/pip-tools) and have much simpler
> (and
> > > commented) exclusion/avoidance rules in your .in/.tml file, the whole
> > setup
> > > might be much easier to maintain and upgrade. Every time you prepare
> for
> > > release (or even once in a while for master) one person might
> consciously
> > > attempt to upgrade all dependencies to latest ones. It should be almost
> > as
> > > easy as letting poetry/pip-tools help with figuring out what are the
> > latest
> > > set of dependencies that will work without conflicts. It should be
> rather
> > > straightforward (I've done it in the past for fairly complex systems).
> > What
> > > those tools enable is - doing single-shot upgrade of all dependencies.
> > > After doing it you can make sure that all tests work fine (and fix any
> > > problems that result from it). And then you test it thoroughly before
> you
> > > make final release. You can do it in separate PR - with automated
> testing
> > > in Travis which means that you are not disturbing work of others
> > > (compilation/building + unit tests are guaranteed to work before you
> > merge
> > > it) while doing it. It's all conscious rather than accidental. Nice
> side
> > > effect of that is that with every release you can actually "catch-up"
> > with
> > > latest stable versions of many libraries in one go. It's better than
> > > waiting until someone deliberately upgrades to newer version (and the
> > rest
> > > remain terribly out-dated as is the case for Airflow now).
> > >
> > > So a bit counterintuitively I think tools like pip-tools/poetry help
> you
> > to
> > > catch up faster in many cases. That is at least my experience so far.
> > >
> > > Additionally, Airflow is an open system - if you have very specific
> needs
> > > for requirements, you might actually - in the very same way with
> > > pip-tools/poetry - upgrade all your dependencies in your local fork of
> > > Airflow before someone else does it in master/release. Those tools kind
> > of
> > > democratise dependency management. It should be as easy as `pip-compile
> > > --upgrade` or `poetry update` and you will get all the
> "non-conflicting"
> > > latest dependencies in your local fork (and poetry especially seems to
> do
> > > all the heavy lifting of figuring out which versions will work). You
> > should
> > > be able to test and publish it locally as your private package for
> local
> > > installations. You can even mark the specific dependency you want to
> use
> > > specific version and let pip-tools/poetry figure out exact versions of
> > > other requirements. You can even make a PR with such upgrade eventually
> > to
> > > get it faster in master. You can even downgrade in case newer
> dependency
> > > causes problems for you in similar way. Guided by the tools, it's much
> > > faster than figuring the versions out by yourself.
> > >
> > > As long as we have simple way of managing it and document how to
> > > upgrade/downgrade dependencies in your own fork, and mention how to
> > locally
> > > release Airflow as a package, I think your case could be covered even
> > > better than now. What do you think ?
> > >
> > > J.
> > >
> > > On Fri, Oct 5, 2018 at 2:34 PM EKC (Erik Cederstrand)
> > > <EKC@xxxxxxxxxxxxx.invalid> wrote:
> > >
> > > > For us, exact pinning of versions would be problematic. We have DAG
> > code
> > > > that shares direct and indirect dependencies with Airflow, e.g. lxml,
> > > > requests, pyhive, future, thrift, tzlocal, psycopg2 and ldap3. If our
> > DAG
> > > > code for some reason needs a newer point release due to a bug that's
> > > fixed,
> > > > then we can't cleanly build a virtual environment containing the
> fixed
> > > > version. For us, it's already a problem that Airflow has quite strict
> > > (and
> > > > sometimes old) requirements in setup.py.
> > > >
> > > > Erik
> > > > ________________________________
> > > > From: Jarek Potiuk <Jarek.Potiuk@xxxxxxxxxxx>
> > > > Sent: Friday, October 5, 2018 2:01:15 PM
> > > > To: dev@xxxxxxxxxxxxxxxxxxxxxxxxxxxx
> > > > Subject: Re: Pinning dependencies for Apache Airflow
> > > >
> > > > I think one solution to release approach is to check as part of
> > automated
> > > > Travis build if all requirements are pinned with == (even the deep
> > ones)
> > > > and fail the build in case they are not for ALL versions (including
> > > > dev). And of course we should document the approach of
> > releases/upgrades
> > > > etc. If we do it all the time for development versions (which seems
> > quite
> > > > doable), then transitively all the releases will also have pinned
> > > versions
> > > > and they will never try to upgrade any of the dependencies. In poetry
> > > > (similarly in pip-tools with .in file) it is done by having a .lock
> > file
> > > > that specifies exact versions of each package so it can be rather
> easy
> > to
> > > > manage (so it's worth trying it out I think  :D  - seems a bit more
> > > > friendly than pip-tools).
> > > >
> > > > There is a drawback - of course - with manually updating the module
> > that
> > > > you want, but I really see that as an advantage rather than drawback
> > > > especially for users. This way you maintain the property that it will
> > > > always install and work the same way no matter if you installed it
> > today
> > > or
> > > > two months ago. I think the biggest drawback for maintainers is that
> > you
> > > > need some kind of monitoring of security vulnerabilities and cannot
> > rely
> > > on
> > > > automated security upgrades. With >= requirements those security
> > updates
> > > > might happen automatically without anyone noticing, but to be honest
> I
> > > > don't think such upgrades are guaranteed even in current setup for
> all
> > > > security issues for all libraries anyway.
> > > >
> > > > Finding the need to upgrade because of security issues can be quite
> > > > automated. Even now I noticed Github started to inform owners about
> > > > potential security vulnerabilities in used libraries for their
> project.
> > > > Those notifications can be sent to devlist and turned into JIRA
> issues
> > > > followed bvy  minor security-related releases (with only few library
> > > > dependencies upgraded).
> > > >
> > > > I think it's even easier to automate it if you have pinned
> > dependencies -
> > > > because it's generally easy to find applicable vulnerabilities for
> > > specific
> > > > versions of libraries by static analysers - when you have >=, you
> never
> > > > know which version will be used until you actually perform the
> > > > installation.
> > > >
> > > > There is one big advantage for maintainers for "pinned" case. Your
> > users
> > > > always have the same dependencies - so when issue is raised, you can
> > > > reproduce it more easily. It's hard to know which version user has
> (as
> > > the
> > > > user could install it month ago or yesterday) and even if you find
> out
> > by
> > > > asking the user, you might not be able to reproduce the set of
> > > requirements
> > > > easily (simply because there are already newer versions of the
> > libraries
> > > > released and they are used automatically). You can ask the user to
> run
> > > pip
> > > > --upgrade but that's dangerous and pretty lame ("check the latest
> > > version -
> > > > maybe it fixes your problem ? ") and sometimes not possible (e.g.
> > someone
> > > > has pre-built docker image with dependencies from few months ago and
> > > cannot
> > > > rebuild the image easily).
> > > >
> > > > J.
> > > >
> > > > On Fri, Oct 5, 2018 at 12:35 PM Ash Berlin-Taylor <ash@xxxxxxxxxx>
> > > wrote:
> > > >
> > > > > One thing to point out here.
> > > > >
> > > > > Right now if you `pip install apache-airflow=1.10.0` in a clean
> > > > > environment it will fail.
> > > > >
> > > > > This is because we pin flask-login to 0.2.1 but flask-appbuilder is
> > >=
> > > > > 1.11.1, so that pulls in 1.12.0 which requires flask-login >= 0.3.
> > > > >
> > > > > So I do think there is maybe something to be said about pinning for
> > > > > releases. The down side to that is that if there are updates to a
> > > module
> > > > > that we want then we have to make a point release to let people get
> > it
> > > > >
> > > > > Both methods have draw-backs
> > > > >
> > > > > -ash
> > > > >
> > > > > > On 4 Oct 2018, at 17:13, Arthur Wiedmer <
> arthur.wiedmer@xxxxxxxxx>
> > > > > wrote:
> > > > > >
> > > > > > Hi Jarek,
> > > > > >
> > > > > > I will +1 the discussion Dan is referring to and George's advice.
> > > > > >
> > > > > > I just want to double check we are talking about pinning in
> > > > > > requirements.txt only.
> > > > > >
> > > > > > This offers the ability to
> > > > > > pip install -r requirements.txt
> > > > > > pip install --no-deps airflow
> > > > > > For a guaranteed install which works.
> > > > > >
> > > > > > Several different requirement files can be provided for specific
> > use
> > > > > cases,
> > > > > > like a stable dev one for instance for people wanting to work on
> > > > > operators
> > > > > > and non-core functions.
> > > > > >
> > > > > > However, I think we should proactively test in CI against
> unpinned
> > > > > > dependencies (though it might be a separate case in the matrix) ,
> > so
> > > > that
> > > > > > we get advance warning if possible that things will break.
> > > > > > CI downtime is not a bad thing here, it actually caught a problem
> > :)
> > > > > >
> > > > > > We should unpin as possible in setup.py to only maintain minimum
> > > > required
> > > > > > compatibility. The process of pinning in setup.py is extremely
> > > > > detrimental
> > > > > > when you have a large number of python libraries installed with
> > > > different
> > > > > > pinned versions.
> > > > > >
> > > > > > Best,
> > > > > > Arthur
> > > > > >
> > > > > > On Thu, Oct 4, 2018 at 8:36 AM Dan Davydov
> > > > <ddavydov@xxxxxxxxxxx.invalid
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > >> Relevant discussion about this:
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-airflow%2Fpull%2F1809%23issuecomment-257502174&amp;data=01%7C01%7CEKC%40novozymes.com%7Cd31403917b084e3615c208d62aba4c24%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=MM%2FoNwkPYR8UtBUczXLfZD2lCp7Ig%2BI%2FL2rFszcoJi8%3D&amp;reserved=0
> > > > > >>
> > > > > >> On Thu, Oct 4, 2018 at 11:25 AM Jarek Potiuk <
> > > > Jarek.Potiuk@xxxxxxxxxxx>
> > > > > >> wrote:
> > > > > >>
> > > > > >>> TL;DR; A change is coming in the way how
> > dependencies/requirements
> > > > are
> > > > > >>> specified for Apache Airflow - they will be fixed rather than
> > > > flexible
> > > > > >> (==
> > > > > >>> rather than >=).
> > > > > >>>
> > > > > >>> This is follow up after Slack discussion we had with Ash and
> > Kaxil
> > > -
> > > > > >>> summarising what we propose we'll do.
> > > > > >>>
> > > > > >>> *Problem:*
> > > > > >>> During last few weeks we experienced quite a few downtimes of
> > > > TravisCI
> > > > > >>> builds (for all PRs/branches including master) as some of the
> > > > > transitive
> > > > > >>> dependencies were automatically upgraded. This because in a
> > number
> > > of
> > > > > >>> dependencies we have  >= rather than == dependencies.
> > > > > >>>
> > > > > >>> Whenever there is a new release of such dependency, it might
> > cause
> > > > > chain
> > > > > >>> reaction with upgrade of transitive dependencies which might
> get
> > > into
> > > > > >>> conflict.
> > > > > >>>
> > > > > >>> An example was Flask-AppBuilder vs flask-login transitive
> > > dependency
> > > > > with
> > > > > >>> click. They started to conflict once AppBuilder has released
> > > version
> > > > > >>> 1.12.0.
> > > > > >>>
> > > > > >>> *Diagnosis:*
> > > > > >>> Transitive dependencies with "flexible" versions (where >= is
> > used
> > > > > >> instead
> > > > > >>> of ==) is a reason for "dependency hell". We will sooner or
> later
> > > hit
> > > > > >> other
> > > > > >>> cases where not fixed dependencies cause similar problems with
> > > other
> > > > > >>> transitive dependencies. We need to fix-pin them. This causes
> > > > problems
> > > > > >> for
> > > > > >>> both - released versions (cause they stop to work!) and for
> > > > development
> > > > > >>> (cause they break master builds in TravisCI and prevent people
> > from
> > > > > >>> installing development environment from the scratch.
> > > > > >>>
> > > > > >>> *Solution:*
> > > > > >>>
> > > > > >>>   - Following the old-but-good post
> > > > > >>>
> > > >
> > >
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnvie.com%2Fposts%2Fpin-your-packages%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7Cd31403917b084e3615c208d62aba4c24%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=PVE3S4mgki7L%2BcAe104o2cf68wRXolvYXRFmAyiX8gA%3D&amp;reserved=0
> > > > we are going to fix the
> > > > > >>> pinned
> > > > > >>>   dependencies to specific versions (so basically all
> > dependencies
> > > > are
> > > > > >>>   "fixed").
> > > > > >>>   - We will introduce mechanism to be able to upgrade
> > dependencies
> > > > with
> > > > > >>>   pip-tools (
> > > >
> > >
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fjazzband%2Fpip-tools&amp;data=01%7C01%7CEKC%40novozymes.com%7Cd31403917b084e3615c208d62aba4c24%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=Kt9CjWrolvpjp7MwIR2nn8EIf9CW9HW02U7GVGyOXMo%3D&amp;reserved=0
> > > ).
> > > > We might also
> > > > > >> take a
> > > > > >>>   look at pipenv:
> > > >
> > >
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpipenv.readthedocs.io%2Fen%2Flatest%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7Cd31403917b084e3615c208d62aba4c24%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=1tiY6pgX3IbRYC5W0HKr0ER2qMZ3GKYrwmWg%2BUo0tqs%3D&amp;reserved=0
> > > > > >>>   - People who would like to upgrade some dependencies for
> their
> > > PRs
> > > > > >> will
> > > > > >>>   still be able to do it - but such upgrades will be in their
> PR
> > > thus
> > > > > >> they
> > > > > >>>   will go through TravisCI tests and they will also have to be
> > > > > specified
> > > > > >>> with
> > > > > >>>   pinned fixed versions (==). This should be part of review
> > process
> > > > to
> > > > > >>> make
> > > > > >>>   sure new/changed requirements are pinned.
> > > > > >>>   - In release process there will be a point where an upgrade
> > will
> > > be
> > > > > >>>   attempted for all requirements (using pip-tools) so that we
> are
> > > not
> > > > > >>> stuck
> > > > > >>>   with older releases. This will be in controlled PR
> environment
> > > > where
> > > > > >>> there
> > > > > >>>   will be time to fix all dependencies without impacting others
> > and
> > > > > >> likely
> > > > > >>>   enough time to "vet" such changes (this can be done for
> > > alpha/beta
> > > > > >>> releases
> > > > > >>>   for example).
> > > > > >>>   - As a side effect dependencies specification will become far
> > > > simpler
> > > > > >>>   and straightforward.
> > > > > >>>
> > > > > >>> Happy to hear community comments to the proposal. I am happy to
> > > take
> > > > a
> > > > > >> lead
> > > > > >>> on that, open JIRA issue and implement if this is something
> > > community
> > > > > is
> > > > > >>> happy with.
> > > > > >>>
> > > > > >>> J.
> > > > > >>>
> > > > > >>> --
> > > > > >>>
> > > > > >>> *Jarek Potiuk, Principal Software Engineer*
> > > > > >>> Mobile: +48 660 796 129
> > > > > >>>
> > > > > >>
> > > > >
> > > > >
> > > >
> > > > --
> > > >
> > > > *Jarek Potiuk, Principal Software Engineer*
> > > > Mobile: +48 660 796 129
> > > >
> > >
> > >
> > > --
> > >
> > > *Jarek Potiuk, Principal Software Engineer*
> > > Mobile: +48 660 796 129
> > >
> >
>
>
> --
>
> *Jarek Potiuk, Principal Software Engineer*
> Mobile: +48 660 796 129
>