git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [VOTE] Airflow 1.10.0rc3


Done. When I roll rc4 it will be part of it.


> On 7 Aug 2018, at 16:26, Naik Kaxil <k.naik@xxxxxxxxx> wrote:
> 
> @bolke Can we also include the following commit to 1.10 release as we would need this commit to generate docs at ReadTheDocs?
> 
> - https://github.com/apache/incubator-airflow/commit/8af0aa96bfe3caa51d67ab393db069d37b0c4169
> 
> Regards,
> Kaxil
> 
> On 06/08/2018, 14:59, "James Meickle" <jmeickle@xxxxxxxxxxxxxx.INVALID> wrote:
> 
>    Not a vote, but a comment: it might be worth noting that the new
>    environment variable is also required if you have any Airflow plugin test
>    suites that install Airflow as part of their dependencies. In my case, I
>    had to set the new env var outsidfe of tox and add this:
> 
>    ```
>    [testenv]
>    passenv = SLUGIFY_USES_TEXT_UNIDECODE
>    ```
> 
>    (`setenv` did not work as that provides env vars at runtime but not
>    installtime, as far as I can tell.)
> 
> 
>    On Sun, Aug 5, 2018 at 5:20 PM Bolke de Bruin <bdbruin@xxxxxxxxx> wrote:
> 
>> +1 :-)
>> 
>> Sent from my iPhone
>> 
>>> On 5 Aug 2018, at 23:08, Ash Berlin-Taylor <
>> ash_airflowlist@xxxxxxxxxxxxxx> wrote:
>>> 
>>> Yup, just worked out the same thing.
>>> 
>>> I think as "punishment" for me finding bugs so late in two RCs (this,
>> and 1.9) I should run the release for the next release.
>>> 
>>> -ash
>>> 
>>>> On 5 Aug 2018, at 22:05, Bolke de Bruin <bdbruin@xxxxxxxxx> wrote:
>>>> 
>>>> Yeah I figured it out. Originally i was using a different
>> implementation of UTCDateTime, but that was unmaintained. I switched, but
>> this version changed or has a different contract. While it transforms on
>> storing to UTC it does not so when it receives timezone aware fields from
>> the db. Hence the issue.
>>>> 
>>>> I will prepare a PR that removes the dependency and implements our own
>> extension of DateTime. Probably tomorrow.
>>>> 
>>>> Good catch! Just in time :-(.
>>>> 
>>>> B.
>>>> 
>>>>> On 5 Aug 2018, at 22:43, Ash Berlin-Taylor <
>> ash_airflowlist@xxxxxxxxxxxxxx> wrote:
>>>>> 
>>>>> Entirely possible, though I wasn't even dealing with the scheduler -
>> the issue I was addressing was entirely in the webserver for a pre-existing
>> Task Instance.
>>>>> 
>>>>> Ah, I hadn't noticed/twigged we are using sqlalchemy-utc. It appears
>> that isn't working right/ as expected. This line:
>> https://github.com/spoqa/sqlalchemy-utc/blob/master/sqlalchemy_utc/sqltypes.py#L34
>> doens't look right for us - as you mentioned the TZ is set to something
>> (rather than having no TZ value).
>>>>> 
>>>>> Some background on how Pq handles TZs. It always returns DTs in the TZ
>> of the connection. I'm not sure if this is unique to postgres or if other
>> DBs behave the same.
>>>>> 
>>>>> postgres=# select '2018-08-03 00:00:00+00:00'::timestamp with time
>> zone;
>>>>>   timestamptz
>>>>> ------------------------
>>>>> 2018-08-03 01:00:00+01
>>>>> 
>>>>> postgres=# select '2018-08-03 02:00:00+02'::timestamp with time zone;
>>>>>   timestamptz
>>>>> ------------------------
>>>>> 2018-08-03 01:00:00+01
>>>>> 
>>>>> The server will always return TZs in the connection timezone.
>>>>> 
>>>>> postgres=# set timezone=utc;
>>>>> SET
>>>>> postgres=# select '2018-08-03 02:00:00+02'::timestamp with time zone;
>>>>>   timestamptz
>>>>> ------------------------
>>>>> 2018-08-03 00:00:00+00
>>>>> (1 row)
>>>>> 
>>>>> postgres=# select '2018-08-03 01:00:00+01'::timestamp with time zone;
>>>>>   timestamptz
>>>>> ------------------------
>>>>> 2018-08-03 00:00:00+00
>>>>> (1 row)
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> -ash
>>>>> 
>>>>>> On 5 Aug 2018, at 21:28, Bolke de Bruin <bdbruin@xxxxxxxxx> wrote:
>>>>>> 
>>>>>> This is the issue:
>>>>>> 
>>>>>> [2018-08-05 22:08:21,952] {jobs.py:906} INFO - NEXT RUN DATE:
>> 2018-08-03 00:00:00+00:00 tzinfo: <Timezone [UTC]>
>>>>>> [2018-08-05 22:08:22,007] {jobs.py:1425} INFO - Created <DagRun
>> example_http_operator @ 2018-08-03 02:00:00+02:00:
>> scheduled__2018-08-03T00:00:00+00:00, externally triggered: False>
>>>>>> 
>>>>>> [2018-08-05 22:08:24,651] {jobs.py:906} INFO - NEXT RUN DATE:
>> 2018-08-04 02:00:00+02:00 tzinfo: psycopg2.tz.FixedOffsetTimezone(offset=120,
>> name=None)
>>>>>> [2018-08-05 22:08:24,696] {jobs.py:1425} INFO - Created <DagRun
>> example_http_operator @ 2018-08-04 02:00:00+02:00:
>> scheduled__2018-08-04T02:00:00+02:00, externally triggered: False>
>>>>>> 
>>>>>> Notice at line 1+2: that the next run date is correctly in UTC but
>> from the DB it gets a +2. At the next bit (3+4) we get a psycopg2.tz.FixedOffsetTimezone
>> which should be set to UTC according to the specs of
>> https://github.com/spoqa/sqlalchemy-utc <
>> https://github.com/spoqa/sqlalchemy-utc> , but it isn’t.
>>>>>> 
>>>>>> So changing your setting of the DB to UTC fixes the symptom but not
>> the cause.
>>>>>> 
>>>>>> B.
>>>>>> 
>>>>>>> On 5 Aug 2018, at 22:03, Ash Berlin-Taylor <
>> ash_airflowlist@xxxxxxxxxxxxxx> wrote:
>>>>>>> 
>>>>>>> Sorry for being terse before.
>>>>>>> 
>>>>>>> So the issue is that the ts loaded from the DB is not in UTC, it's
>> in GB/+01 (the default of the DB server)
>>>>>>> 
>>>>>>> For me, on a currently running 1.9 (no TZ) db:
>>>>>>> 
>>>>>>> airflow=# select * from task_instance;
>>>>>>> get_op            | example_http_operator | 2018-07-23 00:00:00
>>>>>>> 
>>>>>>> This date time appears in the log url, and the path it looks at on
>> S3 is
>>>>>>> 
>>>>>>> .../example_http_operator/2018-07-23T00:00:00/1.log
>>>>>>> 
>>>>>>> If my postgres server has a default timezone of GB (which the one
>> running on my laptop does), and I then apply the migration then it is
>> converted to that local time.
>>>>>>> 
>>>>>>> airflow=# select * from task_instance;
>>>>>>> get_op            | example_http_operator | 2018-07-23 01:00:00+01
>>>>>>> 
>>>>>>> airflow=# set timezone=UTC;
>>>>>>> airflow=# select * from task_instance;
>>>>>>> get_op            | example_http_operator | 2018-07-23 00:00:00+00
>>>>>>> 
>>>>>>> 
>>>>>>> This is all okay so far. The migration has kept the column at the
>> same moment in time.
>>>>>>> 
>>>>>>> The issue come when the UI tries to display logs for this old task:
>> because the timezone of the connection is not UTC, PG returns a date with a
>> +01 TZ. Thus after the migration this old task tries to look for a log file
>> of
>>>>>>> 
>>>>>>> .../example_http_operator/2018-07-23T01:00:00/1.log
>>>>>>> 
>>>>>>> which doesn't exist - it's changed the time it has rendered from
>> midnight (in v1.9) to 1am (in v1.10).
>>>>>>> 
>>>>>>> (This is with my change to log_filename_template from UPDATING.md in
>> my other branch)
>>>>>>> 
>>>>>>> Setting the timezone to UTC per connection means the behaviour of
>> Airflow doesn't change depending on how the server is configured.
>>>>>>> 
>>>>>>> -ash
>>>>>>> 
>>>>>>>> On 5 Aug 2018, at 20:58, Bolke de Bruin <bdbruin@xxxxxxxxx> wrote:
>>>>>>>> 
>>>>>>>> Digging in a bit further.
>>>>>>>> 
>>>>>>>> {{{{ ti.dag_id }}}}/{{{{ ti.task_id }}}}/{{{{ ts }}}}/{{{{
>> try_number }}}}.log
>>>>>>>> 
>>>>>>>> is the format
>>>>>>>> 
>>>>>>>> ts = execution_date.isoformat and should be in UTC afaik.
>>>>>>>> 
>>>>>>>> something is weird tbh.
>>>>>>>> 
>>>>>>>> B.
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On 5 Aug 2018, at 21:32, Bolke de Bruin <bdbruin@xxxxxxxxx> wrote:
>>>>>>>>> 
>>>>>>>>> Ash,
>>>>>>>>> 
>>>>>>>>> Reading your proposed changes on your “set-timezone-to-utc” branch
>> and below analysis, I am not sure what you are perceiving as an issue.
>>>>>>>>> 
>>>>>>>>> For conversion we assume everything is stored in UTC and in a
>> naive format. Conversion then adds the timezone information. This results
>> in the following
>>>>>>>>> 
>>>>>>>>> postgres timezone = “Europe/Amsterdam”
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> airflow=# select * from task_instance;
>>>>>>>>> get_op            | example_http_operator | 2018-07-27 02:00:00+02
>>>>>>>>> 
>>>>>>>>> airflow=# set timezone=UTC;
>>>>>>>>> airflow=# select * from task_instance;
>>>>>>>>> get_op            | example_http_operator | 2018-07-27 00:00:00+00
>>>>>>>>> 
>>>>>>>>> If we don’t set the timezone in the connection postgres assumes
>> server timezone (in my case “Europe/Amsterdam”). So every datetime Airflow
>> receives will be in “Europe/Amsterdam” format. However as we defined the
>> model to use UTCDateTime it will always convert the returned DateTime to
>> UTC.
>>>>>>>>> 
>>>>>>>>> If we have configured Airflow to support something else as UTC as
>> the default timezone or a DAG has a associated timezone we only convert to
>> that timezone when calculating the next runtime (not for cron btw). Nowhere
>> else and thus we are UTC everywhere.
>>>>>>>>> 
>>>>>>>>> What do you think is inconsistent?
>>>>>>>>> 
>>>>>>>>> Bolke
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On 5 Aug 2018, at 18:13, Ash Berlin-Taylor <
>> ash_airflowlist@xxxxxxxxxxxxxx> wrote:
>>>>>>>>>> 
>>>>>>>>>> Relating to 2): I'm not sure that the upgrade from timezoneless
>> to timezone aware colums in the task instance is right, or at least it's
>> not what I expected.
>>>>>>>>>> 
>>>>>>>>>> Before weren't all TZs from schedule dates etc in UTC? For the
>> same task instance (these outputs from psql directly):
>>>>>>>>>> 
>>>>>>>>>> before: execution_date=2017-09-04 00:00:00
>>>>>>>>>> after: execution_date=2017-09-04 01:00:00+01
>>>>>>>>>> 
>>>>>>>>>> **Okay the migration is fine**. It appears that the migration has
>> done the right thing, but my local DB I'm testing with has a Timezone of GB
>> set, so Postgres converts it to that TZ on returning an object.
>>>>>>>>>> 
>>>>>>>>>> 3) Do we need to set the TZ of the connection to UTC in
>> SQLAlchemy to have consistent behaviour? Is this possible some how? I don't
>> know SQLAlchemy that well.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> -ash
>>>>>>>>>> 
>>>>>>>>>>> On 5 Aug 2018, at 16:01, Ash Berlin-Taylor <
>> ash_airflowlist@xxxxxxxxxxxxxx> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> 1.) Missing UPDATING note about change of task_log_reader to now
>> always being "task" (was "s3.task" before.). Logging config is much simpler
>> now though. This may be particular to my logging config, but given how much
>> of a pain it was to set up S3 logging in 1.9 I have shared my config with
>> some people in the Gitter chat so It's not just me.
>>>>>>>>>>> 
>>>>>>>>>>> 2) The path that log-files are written to in S3 has changed
>> (again - this happened from 1.8 to 1.9). I'd like to avoid having to move
>> all of my log files again to continue viewing them. The change is that the
>> path now (in 1.10) has a timezone in it, and the date is in local time,
>> before it was UTC:
>>>>>>>>>>> 
>>>>>>>>>>> before: 2018-07-23T00:00:00/1.log
>>>>>>>>>>> after: 2018-07-23T01:00:00+01:00/1.log
>>>>>>>>>>> 
>>>>>>>>>>> We can possibly get away with an updating note about this to set
>> a custom log_filename_template. Testing this now.
>>>>>>>>>>> 
>>>>>>>>>>>> On 5 Aug 2018, at 15:00, Ash Berlin-Taylor <ash@xxxxxxxxxxxxxx>
>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> -1(binding) from me.
>>>>>>>>>>>> 
>>>>>>>>>>>> Installed with:
>>>>>>>>>>>> 
>>>>>>>>>>>> AIRFLOW_GPL_UNIDECODE=yes pip install '
>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz#egg=apache-airflow[emr
>> <
>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz#egg=apache-airflow[emr>,
>> s3, crypto]>=1.10'
>>>>>>>>>>>> 
>>>>>>>>>>>> Install went fine.
>>>>>>>>>>>> 
>>>>>>>>>>>> Our DAGs that use SparkSubmitOperator are now failing as there
>> is now a hard dependency on the Kubernetes client libs, but the `emr` group
>> doesn't mention this.
>>>>>>>>>>>> 
>>>>>>>>>>>> Introduced in
>> https://github.com/apache/incubator-airflow/pull/3112 <
>> https://github.com/apache/incubator-airflow/pull/3112>
>>>>>>>>>>>> 
>>>>>>>>>>>> I see two options for this - either conditionally enable k8s://
>> support if the import works, or (less preferred) add kube-client to the emr
>> deps (which I like less)
>>>>>>>>>>>> 
>>>>>>>>>>>> Sorry - this is the first time I've been able to test it.
>>>>>>>>>>>> 
>>>>>>>>>>>> I will install this dep manually and continue testing.
>>>>>>>>>>>> 
>>>>>>>>>>>> -ash
>>>>>>>>>>>> 
>>>>>>>>>>>> (Normally no time at home due to new baby, but I got a standing
>> desk, and a carrier meaning she can sleep on me and I can use my laptop.
>> Win!)
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> On 4 Aug 2018, at 22:32, Bolke de Bruin <bdbruin@xxxxxxxxx
>> <mailto:bdbruin@xxxxxxxxx>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Bump.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Committers please cast your vote.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> B.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Sent from my iPhone
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On 3 Aug 2018, at 13:23, Driesprong, Fokko
>> <fokko@xxxxxxxxxxxxxx <mailto:fokko@xxxxxxxxxxxxxx>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> +1 Binding
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Installed it using: SLUGIFY_USES_TEXT_UNIDECODE=yes pip
>> install
>>>>>>>>>>>>>> 
>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz
>> <
>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz
>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Cheers, Fokko
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 2018-08-03 9:47 GMT+02:00 Bolke de Bruin <bdbruin@xxxxxxxxx>:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Hey all,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I have cut Airflow 1.10.0 RC3. This email is calling a vote
>> on the release,
>>>>>>>>>>>>>>> which will last for 72 hours. Consider this my (binding) +1.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Airflow 1.10.0 RC 3 is available at:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/ <
>>>>>>>>>>>>>>> 
>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/>
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> apache-airflow-1.10.0rc3+incubating-source.tar.gz is a
>> source release that
>>>>>>>>>>>>>>> comes with INSTALL instructions.
>>>>>>>>>>>>>>> apache-airflow-1.10.0rc3+incubating-bin.tar.gz is the binary
>> Python
>>>>>>>>>>>>>>> "sdist"
>>>>>>>>>>>>>>> release.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Public keys are available at:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>> https://dist.apache.org/repos/dist/release/incubator/airflow/ <
>>>>>>>>>>>>>>> 
>> https://dist.apache.org/repos/dist/release/incubator/airflow/>
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The amount of JIRAs fixed is over 700. Please have a look at
>> the
>>>>>>>>>>>>>>> changelog.
>>>>>>>>>>>>>>> Since RC2 the following has been fixed:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> * [AIRFLOW-2817] Force explicit choice on GPL dependency
>>>>>>>>>>>>>>> * [AIRFLOW-2716] Replace async and await py3.7 keywords
>>>>>>>>>>>>>>> * [AIRFLOW-2810] Fix typo in Xcom model timestamp
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Please note that the version number excludes the `rcX`
>> string as well
>>>>>>>>>>>>>>> as the "+incubating" string, so it's now simply 1.10.0. This
>> will allow us
>>>>>>>>>>>>>>> to rename the artifact without modifying the artifact
>> checksums when we
>>>>>>>>>>>>>>> actually release.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> WARNING: Due to licensing requirements you will need to set
>>>>>>>>>>>>>>> SLUGIFY_USES_TEXT_UNIDECODE=yes in your environment when
>>>>>>>>>>>>>>> installing or upgrading. We will try to remove this
>> requirement for the
>>>>>>>>>>>>>>> next release.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>> Bolke
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 
> 
> 
> 
> 
> 
> Kaxil Naik 
> 
> Data Reply
> 2nd Floor, Nova South
> 160 Victoria Street, Westminster
> London SW1E 5LB - UK 
> phone: +44 (0)20 7730 6000
> k.naik@xxxxxxxxx
> www.reply.com