git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: execution_date - can we stop the confusion?


Hi Bolke

Speaking as a consultant who is constantly training other teams how to use
airflow, I do frequently see this confusion.
Another one is how the batch_date is always batch_date + interval or as the
docs make it quite clear

"*Let’s Repeat That* The scheduler runs your job one schedule_interval AFTER
the start date, at the END of the period."

Renaming it would make it simpler for newbies, but essentially they will
need to understand how Airflow behaves, execution_date being the batch
execution date not the run_date of the DAG

I am actually in the process of writing a blog post
<https://samelamin.github.io/2017/04/27/Building-A-Datapipeline-part1/>
about this which I could use peoples feedback

If it helps, I find that explaining how backfills work and why they are
important will drive home what the execution_date is :)


Regards
Sam



On Wed, Sep 26, 2018 at 4:10 PM Bolke de Bruin <bdbruin@xxxxxxxxx> wrote:

> I dont think this makes sense and I dont that think anyone had a real
> issue with this. Execution date has been clearly documented  and is part of
> the core principles of airflow. Renaming will create more confusion.
>
> Please note that I do think that as an anonymous user you cannot speak for
> any "new airflow user". That is a contradiction to me.
>
> Thanks
> Bolke
>
> Sent from my iPhone
>
> > On 26 Sep 2018, at 07:59, airflowuser <airflowuser@xxxxxxxxxxxxxx.INVALID>
> wrote:
> >
> > One of the most annoying, hard to understand and against all common
> sense is the execution_date behavior. I assume that any new Airflow user
> has been struggling with it.
> > The amount of questions with answers referring to :
> https://airflow.apache.org/scheduler.html?scheduling-triggers  is
> uncountable.
> >
> > Most people mistakenly think that execution_date is the datetime which
> the DAG started to run.
> >
> > I suggest the following changes:
> > 1. Renaming the execution_date to something else like: run_stamped
>  This name won't cause people to get confused.
> > 2. Adding a new variable which indicated the actual datetime when the
> DAG run was generated. call it execution_start_date. People seem to want
> the information when the DAG actually started to be executed/run.
> >
> > This is only naming changes. No need to actual change the behavior -
> This will only make things simpler as when user encounter  run_stamped  he
> won't be confused by the name like execution_date
>