git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [VOTE] Airflow 1.10.0rc3


I tried to install this version of airflow with python 3.7.
There was still `async` keyword in setup.py.

I think it should include AIRFLOW-2713
<https://issues.apache.org/jira/browse/AIRFLOW-2713> for supporting python
3.7

Regards,
Yongjun

2018년 8월 8일 (수) 오전 8:08, George Leslie-Waksman <waksman@xxxxxxxxx>님이 작성:

> We just tried to upgrade a 1.8.1 install to 1.10rc3 and ran into a critical
> error on alembic migration execution. I have captured the issue in JIRA:
> https://issues.apache.org/jira/browse/AIRFLOW-2870
>
> I would consider this a critical blocker for release because it hard blocks
> upgrading.
>
> George
>
> On Tue, Aug 7, 2018 at 7:58 AM Bolke de Bruin <bdbruin@xxxxxxxxx> wrote:
>
> > Done. When I roll rc4 it will be part of it.
> >
> >
> > > On 7 Aug 2018, at 16:26, Naik Kaxil <k.naik@xxxxxxxxx> wrote:
> > >
> > > @bolke Can we also include the following commit to 1.10 release as we
> > would need this commit to generate docs at ReadTheDocs?
> > >
> > > -
> >
> https://github.com/apache/incubator-airflow/commit/8af0aa96bfe3caa51d67ab393db069d37b0c4169
> > >
> > > Regards,
> > > Kaxil
> > >
> > > On 06/08/2018, 14:59, "James Meickle" <jmeickle@xxxxxxxxxxxxxx
> .INVALID>
> > wrote:
> > >
> > >    Not a vote, but a comment: it might be worth noting that the new
> > >    environment variable is also required if you have any Airflow plugin
> > test
> > >    suites that install Airflow as part of their dependencies. In my
> > case, I
> > >    had to set the new env var outsidfe of tox and add this:
> > >
> > >    ```
> > >    [testenv]
> > >    passenv = SLUGIFY_USES_TEXT_UNIDECODE
> > >    ```
> > >
> > >    (`setenv` did not work as that provides env vars at runtime but not
> > >    installtime, as far as I can tell.)
> > >
> > >
> > >    On Sun, Aug 5, 2018 at 5:20 PM Bolke de Bruin <bdbruin@xxxxxxxxx>
> > wrote:
> > >
> > >> +1 :-)
> > >>
> > >> Sent from my iPhone
> > >>
> > >>> On 5 Aug 2018, at 23:08, Ash Berlin-Taylor <
> > >> ash_airflowlist@xxxxxxxxxxxxxx> wrote:
> > >>>
> > >>> Yup, just worked out the same thing.
> > >>>
> > >>> I think as "punishment" for me finding bugs so late in two RCs (this,
> > >> and 1.9) I should run the release for the next release.
> > >>>
> > >>> -ash
> > >>>
> > >>>> On 5 Aug 2018, at 22:05, Bolke de Bruin <bdbruin@xxxxxxxxx> wrote:
> > >>>>
> > >>>> Yeah I figured it out. Originally i was using a different
> > >> implementation of UTCDateTime, but that was unmaintained. I switched,
> > but
> > >> this version changed or has a different contract. While it transforms
> on
> > >> storing to UTC it does not so when it receives timezone aware fields
> > from
> > >> the db. Hence the issue.
> > >>>>
> > >>>> I will prepare a PR that removes the dependency and implements our
> own
> > >> extension of DateTime. Probably tomorrow.
> > >>>>
> > >>>> Good catch! Just in time :-(.
> > >>>>
> > >>>> B.
> > >>>>
> > >>>>> On 5 Aug 2018, at 22:43, Ash Berlin-Taylor <
> > >> ash_airflowlist@xxxxxxxxxxxxxx> wrote:
> > >>>>>
> > >>>>> Entirely possible, though I wasn't even dealing with the scheduler
> -
> > >> the issue I was addressing was entirely in the webserver for a
> > pre-existing
> > >> Task Instance.
> > >>>>>
> > >>>>> Ah, I hadn't noticed/twigged we are using sqlalchemy-utc. It
> appears
> > >> that isn't working right/ as expected. This line:
> > >>
> >
> https://github.com/spoqa/sqlalchemy-utc/blob/master/sqlalchemy_utc/sqltypes.py#L34
> > >> doens't look right for us - as you mentioned the TZ is set to
> something
> > >> (rather than having no TZ value).
> > >>>>>
> > >>>>> Some background on how Pq handles TZs. It always returns DTs in the
> > TZ
> > >> of the connection. I'm not sure if this is unique to postgres or if
> > other
> > >> DBs behave the same.
> > >>>>>
> > >>>>> postgres=# select '2018-08-03 00:00:00+00:00'::timestamp with time
> > >> zone;
> > >>>>>   timestamptz
> > >>>>> ------------------------
> > >>>>> 2018-08-03 01:00:00+01
> > >>>>>
> > >>>>> postgres=# select '2018-08-03 02:00:00+02'::timestamp with time
> zone;
> > >>>>>   timestamptz
> > >>>>> ------------------------
> > >>>>> 2018-08-03 01:00:00+01
> > >>>>>
> > >>>>> The server will always return TZs in the connection timezone.
> > >>>>>
> > >>>>> postgres=# set timezone=utc;
> > >>>>> SET
> > >>>>> postgres=# select '2018-08-03 02:00:00+02'::timestamp with time
> zone;
> > >>>>>   timestamptz
> > >>>>> ------------------------
> > >>>>> 2018-08-03 00:00:00+00
> > >>>>> (1 row)
> > >>>>>
> > >>>>> postgres=# select '2018-08-03 01:00:00+01'::timestamp with time
> zone;
> > >>>>>   timestamptz
> > >>>>> ------------------------
> > >>>>> 2018-08-03 00:00:00+00
> > >>>>> (1 row)
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> -ash
> > >>>>>
> > >>>>>> On 5 Aug 2018, at 21:28, Bolke de Bruin <bdbruin@xxxxxxxxx>
> wrote:
> > >>>>>>
> > >>>>>> This is the issue:
> > >>>>>>
> > >>>>>> [2018-08-05 22:08:21,952] {jobs.py:906} INFO - NEXT RUN DATE:
> > >> 2018-08-03 00:00:00+00:00 tzinfo: <Timezone [UTC]>
> > >>>>>> [2018-08-05 22:08:22,007] {jobs.py:1425} INFO - Created <DagRun
> > >> example_http_operator @ 2018-08-03 02:00:00+02:00:
> > >> scheduled__2018-08-03T00:00:00+00:00, externally triggered: False>
> > >>>>>>
> > >>>>>> [2018-08-05 22:08:24,651] {jobs.py:906} INFO - NEXT RUN DATE:
> > >> 2018-08-04 02:00:00+02:00 tzinfo: psycopg2.tz
> > .FixedOffsetTimezone(offset=120,
> > >> name=None)
> > >>>>>> [2018-08-05 22:08:24,696] {jobs.py:1425} INFO - Created <DagRun
> > >> example_http_operator @ 2018-08-04 02:00:00+02:00:
> > >> scheduled__2018-08-04T02:00:00+02:00, externally triggered: False>
> > >>>>>>
> > >>>>>> Notice at line 1+2: that the next run date is correctly in UTC but
> > >> from the DB it gets a +2. At the next bit (3+4) we get a psycopg2.tz
> > .FixedOffsetTimezone
> > >> which should be set to UTC according to the specs of
> > >> https://github.com/spoqa/sqlalchemy-utc <
> > >> https://github.com/spoqa/sqlalchemy-utc> , but it isn’t.
> > >>>>>>
> > >>>>>> So changing your setting of the DB to UTC fixes the symptom but
> not
> > >> the cause.
> > >>>>>>
> > >>>>>> B.
> > >>>>>>
> > >>>>>>> On 5 Aug 2018, at 22:03, Ash Berlin-Taylor <
> > >> ash_airflowlist@xxxxxxxxxxxxxx> wrote:
> > >>>>>>>
> > >>>>>>> Sorry for being terse before.
> > >>>>>>>
> > >>>>>>> So the issue is that the ts loaded from the DB is not in UTC,
> it's
> > >> in GB/+01 (the default of the DB server)
> > >>>>>>>
> > >>>>>>> For me, on a currently running 1.9 (no TZ) db:
> > >>>>>>>
> > >>>>>>> airflow=# select * from task_instance;
> > >>>>>>> get_op            | example_http_operator | 2018-07-23 00:00:00
> > >>>>>>>
> > >>>>>>> This date time appears in the log url, and the path it looks at
> on
> > >> S3 is
> > >>>>>>>
> > >>>>>>> .../example_http_operator/2018-07-23T00:00:00/1.log
> > >>>>>>>
> > >>>>>>> If my postgres server has a default timezone of GB (which the one
> > >> running on my laptop does), and I then apply the migration then it is
> > >> converted to that local time.
> > >>>>>>>
> > >>>>>>> airflow=# select * from task_instance;
> > >>>>>>> get_op            | example_http_operator | 2018-07-23
> 01:00:00+01
> > >>>>>>>
> > >>>>>>> airflow=# set timezone=UTC;
> > >>>>>>> airflow=# select * from task_instance;
> > >>>>>>> get_op            | example_http_operator | 2018-07-23
> 00:00:00+00
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> This is all okay so far. The migration has kept the column at the
> > >> same moment in time.
> > >>>>>>>
> > >>>>>>> The issue come when the UI tries to display logs for this old
> task:
> > >> because the timezone of the connection is not UTC, PG returns a date
> > with a
> > >> +01 TZ. Thus after the migration this old task tries to look for a log
> > file
> > >> of
> > >>>>>>>
> > >>>>>>> .../example_http_operator/2018-07-23T01:00:00/1.log
> > >>>>>>>
> > >>>>>>> which doesn't exist - it's changed the time it has rendered from
> > >> midnight (in v1.9) to 1am (in v1.10).
> > >>>>>>>
> > >>>>>>> (This is with my change to log_filename_template from UPDATING.md
> > in
> > >> my other branch)
> > >>>>>>>
> > >>>>>>> Setting the timezone to UTC per connection means the behaviour of
> > >> Airflow doesn't change depending on how the server is configured.
> > >>>>>>>
> > >>>>>>> -ash
> > >>>>>>>
> > >>>>>>>> On 5 Aug 2018, at 20:58, Bolke de Bruin <bdbruin@xxxxxxxxx>
> > wrote:
> > >>>>>>>>
> > >>>>>>>> Digging in a bit further.
> > >>>>>>>>
> > >>>>>>>> {{{{ ti.dag_id }}}}/{{{{ ti.task_id }}}}/{{{{ ts }}}}/{{{{
> > >> try_number }}}}.log
> > >>>>>>>>
> > >>>>>>>> is the format
> > >>>>>>>>
> > >>>>>>>> ts = execution_date.isoformat and should be in UTC afaik.
> > >>>>>>>>
> > >>>>>>>> something is weird tbh.
> > >>>>>>>>
> > >>>>>>>> B.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>> On 5 Aug 2018, at 21:32, Bolke de Bruin <bdbruin@xxxxxxxxx>
> > wrote:
> > >>>>>>>>>
> > >>>>>>>>> Ash,
> > >>>>>>>>>
> > >>>>>>>>> Reading your proposed changes on your “set-timezone-to-utc”
> > branch
> > >> and below analysis, I am not sure what you are perceiving as an issue.
> > >>>>>>>>>
> > >>>>>>>>> For conversion we assume everything is stored in UTC and in a
> > >> naive format. Conversion then adds the timezone information. This
> > results
> > >> in the following
> > >>>>>>>>>
> > >>>>>>>>> postgres timezone = “Europe/Amsterdam”
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> airflow=# select * from task_instance;
> > >>>>>>>>> get_op            | example_http_operator | 2018-07-27
> > 02:00:00+02
> > >>>>>>>>>
> > >>>>>>>>> airflow=# set timezone=UTC;
> > >>>>>>>>> airflow=# select * from task_instance;
> > >>>>>>>>> get_op            | example_http_operator | 2018-07-27
> > 00:00:00+00
> > >>>>>>>>>
> > >>>>>>>>> If we don’t set the timezone in the connection postgres assumes
> > >> server timezone (in my case “Europe/Amsterdam”). So every datetime
> > Airflow
> > >> receives will be in “Europe/Amsterdam” format. However as we defined
> the
> > >> model to use UTCDateTime it will always convert the returned DateTime
> to
> > >> UTC.
> > >>>>>>>>>
> > >>>>>>>>> If we have configured Airflow to support something else as UTC
> as
> > >> the default timezone or a DAG has a associated timezone we only
> convert
> > to
> > >> that timezone when calculating the next runtime (not for cron btw).
> > Nowhere
> > >> else and thus we are UTC everywhere.
> > >>>>>>>>>
> > >>>>>>>>> What do you think is inconsistent?
> > >>>>>>>>>
> > >>>>>>>>> Bolke
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>> On 5 Aug 2018, at 18:13, Ash Berlin-Taylor <
> > >> ash_airflowlist@xxxxxxxxxxxxxx> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>> Relating to 2): I'm not sure that the upgrade from
> timezoneless
> > >> to timezone aware colums in the task instance is right, or at least
> it's
> > >> not what I expected.
> > >>>>>>>>>>
> > >>>>>>>>>> Before weren't all TZs from schedule dates etc in UTC? For the
> > >> same task instance (these outputs from psql directly):
> > >>>>>>>>>>
> > >>>>>>>>>> before: execution_date=2017-09-04 00:00:00
> > >>>>>>>>>> after: execution_date=2017-09-04 01:00:00+01
> > >>>>>>>>>>
> > >>>>>>>>>> **Okay the migration is fine**. It appears that the migration
> > has
> > >> done the right thing, but my local DB I'm testing with has a Timezone
> > of GB
> > >> set, so Postgres converts it to that TZ on returning an object.
> > >>>>>>>>>>
> > >>>>>>>>>> 3) Do we need to set the TZ of the connection to UTC in
> > >> SQLAlchemy to have consistent behaviour? Is this possible some how? I
> > don't
> > >> know SQLAlchemy that well.
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> -ash
> > >>>>>>>>>>
> > >>>>>>>>>>> On 5 Aug 2018, at 16:01, Ash Berlin-Taylor <
> > >> ash_airflowlist@xxxxxxxxxxxxxx> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>> 1.) Missing UPDATING note about change of task_log_reader to
> > now
> > >> always being "task" (was "s3.task" before.). Logging config is much
> > simpler
> > >> now though. This may be particular to my logging config, but given how
> > much
> > >> of a pain it was to set up S3 logging in 1.9 I have shared my config
> > with
> > >> some people in the Gitter chat so It's not just me.
> > >>>>>>>>>>>
> > >>>>>>>>>>> 2) The path that log-files are written to in S3 has changed
> > >> (again - this happened from 1.8 to 1.9). I'd like to avoid having to
> > move
> > >> all of my log files again to continue viewing them. The change is that
> > the
> > >> path now (in 1.10) has a timezone in it, and the date is in local
> time,
> > >> before it was UTC:
> > >>>>>>>>>>>
> > >>>>>>>>>>> before: 2018-07-23T00:00:00/1.log
> > >>>>>>>>>>> after: 2018-07-23T01:00:00+01:00/1.log
> > >>>>>>>>>>>
> > >>>>>>>>>>> We can possibly get away with an updating note about this to
> > set
> > >> a custom log_filename_template. Testing this now.
> > >>>>>>>>>>>
> > >>>>>>>>>>>> On 5 Aug 2018, at 15:00, Ash Berlin-Taylor <
> > ash@xxxxxxxxxxxxxx>
> > >> wrote:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> -1(binding) from me.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Installed with:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> AIRFLOW_GPL_UNIDECODE=yes pip install '
> > >>
> >
> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz#egg=apache-airflow[emr
> > >> <
> > >>
> >
> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz#egg=apache-airflow[emr
> > >,
> > >> s3, crypto]>=1.10'
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Install went fine.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Our DAGs that use SparkSubmitOperator are now failing as
> there
> > >> is now a hard dependency on the Kubernetes client libs, but the `emr`
> > group
> > >> doesn't mention this.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Introduced in
> > >> https://github.com/apache/incubator-airflow/pull/3112 <
> > >> https://github.com/apache/incubator-airflow/pull/3112>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> I see two options for this - either conditionally enable
> > k8s://
> > >> support if the import works, or (less preferred) add kube-client to
> the
> > emr
> > >> deps (which I like less)
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Sorry - this is the first time I've been able to test it.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> I will install this dep manually and continue testing.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> -ash
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> (Normally no time at home due to new baby, but I got a
> > standing
> > >> desk, and a carrier meaning she can sleep on me and I can use my
> laptop.
> > >> Win!)
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> On 4 Aug 2018, at 22:32, Bolke de Bruin
> > <https://maps.google.com/?q=ruin++%3E+&entry=gmail&source=g><
> > bdbruin@xxxxxxxxx
> > >> <mailto:bdbruin@xxxxxxxxx>> wrote:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Bump.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Committers please cast your vote.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> B.
> > >>>>>>>>>>>>> <
> https://maps.google.com/?q=ruin++%3E+&entry=gmail&source=g>
> > >>>>>>>>>>>>> Sent from my iPhone
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>> On 3 Aug 2018, at 13:23, Driesprong, Fokko
> > >> <fokko@xxxxxxxxxxxxxx <mailto:fokko@xxxxxxxxxxxxxx>> wrote:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> +1 Binding
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Installed it using: SLUGIFY_USES_TEXT_UNIDECODE=yes pip
> > >> install
> > >>>>>>>>>>>>>>
> > >>
> >
> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz
> > >> <
> > >>
> >
> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz
> > >>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Cheers, Fokko
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> 2018-08-03 9:47 GMT+02:00 Bolke de Bruin <
> bdbruin@xxxxxxxxx
> > >:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Hey all,
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> I have cut Airflow 1.10.0 RC3. This email is calling a
> vote
> > >> on the release,
> > >>>>>>>>>>>>>>> which will last for 72 hours. Consider this my (binding)
> > +1.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Airflow 1.10.0 RC 3 is available at:
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/ <
> > >>>>>>>>>>>>>>>
> > >> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> apache-airflow-1.10.0rc3+incubating-source.tar.gz is a
> > >> source release that
> > >>>>>>>>>>>>>>> comes with INSTALL instructions.
> > >>>>>>>>>>>>>>> apache-airflow-1.10.0rc3+incubating-bin.tar.gz is the
> > binary
> > >> Python
> > >>>>>>>>>>>>>>> "sdist"
> > >>>>>>>>>>>>>>> release.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Public keys are available at:
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >> https://dist.apache.org/repos/dist/release/incubator/airflow/ <
> > >>>>>>>>>>>>>>>
> > >> https://dist.apache.org/repos/dist/release/incubator/airflow/>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> The amount of JIRAs fixed is over 700. Please have a look
> > at
> > >> the
> > >>>>>>>>>>>>>>> changelog.
> > >>>>>>>>>>>>>>> Since RC2 the following has been fixed:
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> * [AIRFLOW-2817] Force explicit choice on GPL dependency
> > >>>>>>>>>>>>>>> * [AIRFLOW-2716] Replace async and await py3.7 keywords
> > >>>>>>>>>>>>>>> * [AIRFLOW-2810] Fix typo in Xcom model timestamp
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Please note that the version number excludes the `rcX`
> > >> string as well
> > >>>>>>>>>>>>>>> as the "+incubating" string, so it's now simply 1.10.0.
> > This
> > >> will allow us
> > >>>>>>>>>>>>>>> to rename the artifact without modifying the artifact
> > >> checksums when we
> > >>>>>>>>>>>>>>> actually release.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> WARNING: Due to licensing requirements you will need to
> set
> > >>>>>>>>>>>>>>> SLUGIFY_USES_TEXT_UNIDECODE=yes in your environment when
> > >>>>>>>>>>>>>>> installing or upgrading. We will try to remove this
> > >> requirement for the
> > >>>>>>>>>>>>>>> next release.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Cheers,
> > >>>>>>>>>>>>>>> Bolke
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> > >
> > >
> > >
> > >
> > >
> > >
> > > Kaxil Naik
> > >
> > > Data Reply
> > > 2nd Floor, Nova South
> > > 160 Victoria Street, Westminster
> > > London SW1E 5LB - UK
> > > phone: +44 (0)20 7730 6000 <+44%2020%207730%206000>
> > > k.naik@xxxxxxxxx
> > > www.reply.com
> >
> >
>