git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Pinning dependencies for Apache Airflow


For us, exact pinning of versions would be problematic. We have DAG code that shares direct and indirect dependencies with Airflow, e.g. lxml, requests, pyhive, future, thrift, tzlocal, psycopg2 and ldap3. If our DAG code for some reason needs a newer point release due to a bug that's fixed, then we can't cleanly build a virtual environment containing the fixed version. For us, it's already a problem that Airflow has quite strict (and sometimes old) requirements in setup.py.

Erik
________________________________
From: Jarek Potiuk <Jarek.Potiuk@xxxxxxxxxxx>
Sent: Friday, October 5, 2018 2:01:15 PM
To: dev@xxxxxxxxxxxxxxxxxxxxxxxxxxxx
Subject: Re: Pinning dependencies for Apache Airflow

I think one solution to release approach is to check as part of automated
Travis build if all requirements are pinned with == (even the deep ones)
and fail the build in case they are not for ALL versions (including
dev). And of course we should document the approach of releases/upgrades
etc. If we do it all the time for development versions (which seems quite
doable), then transitively all the releases will also have pinned versions
and they will never try to upgrade any of the dependencies. In poetry
(similarly in pip-tools with .in file) it is done by having a .lock file
that specifies exact versions of each package so it can be rather easy to
manage (so it's worth trying it out I think  :D  - seems a bit more
friendly than pip-tools).

There is a drawback - of course - with manually updating the module that
you want, but I really see that as an advantage rather than drawback
especially for users. This way you maintain the property that it will
always install and work the same way no matter if you installed it today or
two months ago. I think the biggest drawback for maintainers is that you
need some kind of monitoring of security vulnerabilities and cannot rely on
automated security upgrades. With >= requirements those security updates
might happen automatically without anyone noticing, but to be honest I
don't think such upgrades are guaranteed even in current setup for all
security issues for all libraries anyway.

Finding the need to upgrade because of security issues can be quite
automated. Even now I noticed Github started to inform owners about
potential security vulnerabilities in used libraries for their project.
Those notifications can be sent to devlist and turned into JIRA issues
followed bvy  minor security-related releases (with only few library
dependencies upgraded).

I think it's even easier to automate it if you have pinned dependencies -
because it's generally easy to find applicable vulnerabilities for specific
versions of libraries by static analysers - when you have >=, you never
know which version will be used until you actually perform the
installation.

There is one big advantage for maintainers for "pinned" case. Your users
always have the same dependencies - so when issue is raised, you can
reproduce it more easily. It's hard to know which version user has (as the
user could install it month ago or yesterday) and even if you find out by
asking the user, you might not be able to reproduce the set of requirements
easily (simply because there are already newer versions of the libraries
released and they are used automatically). You can ask the user to run pip
--upgrade but that's dangerous and pretty lame ("check the latest version -
maybe it fixes your problem ? ") and sometimes not possible (e.g. someone
has pre-built docker image with dependencies from few months ago and cannot
rebuild the image easily).

J.

On Fri, Oct 5, 2018 at 12:35 PM Ash Berlin-Taylor <ash@xxxxxxxxxx> wrote:

> One thing to point out here.
>
> Right now if you `pip install apache-airflow=1.10.0` in a clean
> environment it will fail.
>
> This is because we pin flask-login to 0.2.1 but flask-appbuilder is >=
> 1.11.1, so that pulls in 1.12.0 which requires flask-login >= 0.3.
>
> So I do think there is maybe something to be said about pinning for
> releases. The down side to that is that if there are updates to a module
> that we want then we have to make a point release to let people get it
>
> Both methods have draw-backs
>
> -ash
>
> > On 4 Oct 2018, at 17:13, Arthur Wiedmer <arthur.wiedmer@xxxxxxxxx>
> wrote:
> >
> > Hi Jarek,
> >
> > I will +1 the discussion Dan is referring to and George's advice.
> >
> > I just want to double check we are talking about pinning in
> > requirements.txt only.
> >
> > This offers the ability to
> > pip install -r requirements.txt
> > pip install --no-deps airflow
> > For a guaranteed install which works.
> >
> > Several different requirement files can be provided for specific use
> cases,
> > like a stable dev one for instance for people wanting to work on
> operators
> > and non-core functions.
> >
> > However, I think we should proactively test in CI against unpinned
> > dependencies (though it might be a separate case in the matrix) , so that
> > we get advance warning if possible that things will break.
> > CI downtime is not a bad thing here, it actually caught a problem :)
> >
> > We should unpin as possible in setup.py to only maintain minimum required
> > compatibility. The process of pinning in setup.py is extremely
> detrimental
> > when you have a large number of python libraries installed with different
> > pinned versions.
> >
> > Best,
> > Arthur
> >
> > On Thu, Oct 4, 2018 at 8:36 AM Dan Davydov <ddavydov@xxxxxxxxxxx.invalid
> >
> > wrote:
> >
> >> Relevant discussion about this:
> >>
> >>
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-airflow%2Fpull%2F1809%23issuecomment-257502174&amp;data=01%7C01%7CEKC%40novozymes.com%7Cd31403917b084e3615c208d62aba4c24%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=MM%2FoNwkPYR8UtBUczXLfZD2lCp7Ig%2BI%2FL2rFszcoJi8%3D&amp;reserved=0
> >>
> >> On Thu, Oct 4, 2018 at 11:25 AM Jarek Potiuk <Jarek.Potiuk@xxxxxxxxxxx>
> >> wrote:
> >>
> >>> TL;DR; A change is coming in the way how dependencies/requirements are
> >>> specified for Apache Airflow - they will be fixed rather than flexible
> >> (==
> >>> rather than >=).
> >>>
> >>> This is follow up after Slack discussion we had with Ash and Kaxil -
> >>> summarising what we propose we'll do.
> >>>
> >>> *Problem:*
> >>> During last few weeks we experienced quite a few downtimes of TravisCI
> >>> builds (for all PRs/branches including master) as some of the
> transitive
> >>> dependencies were automatically upgraded. This because in a number of
> >>> dependencies we have  >= rather than == dependencies.
> >>>
> >>> Whenever there is a new release of such dependency, it might cause
> chain
> >>> reaction with upgrade of transitive dependencies which might get into
> >>> conflict.
> >>>
> >>> An example was Flask-AppBuilder vs flask-login transitive dependency
> with
> >>> click. They started to conflict once AppBuilder has released version
> >>> 1.12.0.
> >>>
> >>> *Diagnosis:*
> >>> Transitive dependencies with "flexible" versions (where >= is used
> >> instead
> >>> of ==) is a reason for "dependency hell". We will sooner or later hit
> >> other
> >>> cases where not fixed dependencies cause similar problems with other
> >>> transitive dependencies. We need to fix-pin them. This causes problems
> >> for
> >>> both - released versions (cause they stop to work!) and for development
> >>> (cause they break master builds in TravisCI and prevent people from
> >>> installing development environment from the scratch.
> >>>
> >>> *Solution:*
> >>>
> >>>   - Following the old-but-good post
> >>>   https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnvie.com%2Fposts%2Fpin-your-packages%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7Cd31403917b084e3615c208d62aba4c24%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=PVE3S4mgki7L%2BcAe104o2cf68wRXolvYXRFmAyiX8gA%3D&amp;reserved=0 we are going to fix the
> >>> pinned
> >>>   dependencies to specific versions (so basically all dependencies are
> >>>   "fixed").
> >>>   - We will introduce mechanism to be able to upgrade dependencies with
> >>>   pip-tools (https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fjazzband%2Fpip-tools&amp;data=01%7C01%7CEKC%40novozymes.com%7Cd31403917b084e3615c208d62aba4c24%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=Kt9CjWrolvpjp7MwIR2nn8EIf9CW9HW02U7GVGyOXMo%3D&amp;reserved=0). We might also
> >> take a
> >>>   look at pipenv: https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpipenv.readthedocs.io%2Fen%2Flatest%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7Cd31403917b084e3615c208d62aba4c24%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=1tiY6pgX3IbRYC5W0HKr0ER2qMZ3GKYrwmWg%2BUo0tqs%3D&amp;reserved=0
> >>>   - People who would like to upgrade some dependencies for their PRs
> >> will
> >>>   still be able to do it - but such upgrades will be in their PR thus
> >> they
> >>>   will go through TravisCI tests and they will also have to be
> specified
> >>> with
> >>>   pinned fixed versions (==). This should be part of review process to
> >>> make
> >>>   sure new/changed requirements are pinned.
> >>>   - In release process there will be a point where an upgrade will be
> >>>   attempted for all requirements (using pip-tools) so that we are not
> >>> stuck
> >>>   with older releases. This will be in controlled PR environment where
> >>> there
> >>>   will be time to fix all dependencies without impacting others and
> >> likely
> >>>   enough time to "vet" such changes (this can be done for alpha/beta
> >>> releases
> >>>   for example).
> >>>   - As a side effect dependencies specification will become far simpler
> >>>   and straightforward.
> >>>
> >>> Happy to hear community comments to the proposal. I am happy to take a
> >> lead
> >>> on that, open JIRA issue and implement if this is something community
> is
> >>> happy with.
> >>>
> >>> J.
> >>>
> >>> --
> >>>
> >>> *Jarek Potiuk, Principal Software Engineer*
> >>> Mobile: +48 660 796 129
> >>>
> >>
>
>

--

*Jarek Potiuk, Principal Software Engineer*
Mobile: +48 660 796 129