git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: execution_date - can we stop the confusion?


This comes up a lot. I've seen it on this mailing list multiple times and
it's something that I have to explicitly call out to every single person
that I've helped train up on Airflow.

If we take a moment to set aside why things are the way they are, what the
documentation says, and how experienced users feel things should behave;
there still remains the fact that a lot of new users get confused by how
"execution_date" works.

Whether it's a problem, whether we need to do something, and what we could
do are all separate questions but I think it's important that we
acknowledge and start from:

A lot of new users get confused by how "execution_date" works.

I recognize that some of this is a learning curve issue and some of this is
a mindset issue but it begs the question: do enough users benefit from the
current structure to justify the harm to new users?

--George

On Wed, Sep 26, 2018 at 1:40 PM Brian Greene <
brian@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:

> It took a minute to grok, but in the larger context of how af works it
> makes perfect sense the way it is.  Changing something so fundamentally
> breaking to every dag in existence should bring a comparable benefit.
> Beyond the avoiding teaching a concept you disagree with, what benefits
> does the proposal bring to offset the cost of change?
>
> I’m gonna make a meme - “do you even airflow bro?”
>
> Sent from a device with less than stellar autocorrect
>
> > On Sep 26, 2018, at 8:33 AM, Maxime Beauchemin <
> maximebeauchemin@xxxxxxxxx> wrote:
> >
> > I think if you have a functional mindset (as in "functional data
> engineering
> > <
> https://medium.com/@maximebeauchemin/functional-data-engineering-a-modern-paradigm-for-batch-data-processing-2327ec32c42a
> >")
> > as opposed to a cron mindset, using the left bound of the time interval
> > makes a lot of sense. Things like your daily table partition keys align
> > with your Airflow execution_date.
> >
> > The main thing is that whatever we do we cannot break backwards
> > compatibility. Offering both views (left bound/right bound), as it's been
> > proposed before, either as an environment setting or a user personal
> > preference is even more confusing to me personally. Users would have to
> > switch context as they help each other or change environments.
> >
> > Also note that your intuition may differ from other people's intuition,
> and
> > that "unlearning" something is way harder than learning something.
> >
> > My personal take on this is to make this a rite of passage. This is just
> > one of the many thing you have to learn when learning Airflow.
> >
> > Max
> >
> >> On Wed, Sep 26, 2018 at 8:18 AM Sam Elamin <hussam.elamin@xxxxxxxxx>
> wrote:
> >>
> >> Hi Bolke
> >>
> >> Speaking as a consultant who is constantly training other teams how to
> use
> >> airflow, I do frequently see this confusion.
> >> Another one is how the batch_date is always batch_date + interval or as
> the
> >> docs make it quite clear
> >>
> >> "*Let’s Repeat That* The scheduler runs your job one schedule_interval
> >> AFTER
> >> the start date, at the END of the period."
> >>
> >> Renaming it would make it simpler for newbies, but essentially they will
> >> need to understand how Airflow behaves, execution_date being the batch
> >> execution date not the run_date of the DAG
> >>
> >> I am actually in the process of writing a blog post
> >> <https://samelamin.github.io/2017/04/27/Building-A-Datapipeline-part1/>
> >> about this which I could use peoples feedback
> >>
> >> If it helps, I find that explaining how backfills work and why they are
> >> important will drive home what the execution_date is :)
> >>
> >>
> >> Regards
> >> Sam
> >>
> >>
> >>
> >>> On Wed, Sep 26, 2018 at 4:10 PM Bolke de Bruin <bdbruin@xxxxxxxxx>
> wrote:
> >>>
> >>> I dont think this makes sense and I dont that think anyone had a real
> >>> issue with this. Execution date has been clearly documented  and is
> part
> >> of
> >>> the core principles of airflow. Renaming will create more confusion.
> >>>
> >>> Please note that I do think that as an anonymous user you cannot speak
> >> for
> >>> any "new airflow user". That is a contradiction to me.
> >>>
> >>> Thanks
> >>> Bolke
> >>>
> >>> Sent from my iPhone
> >>>
> >>>> On 26 Sep 2018, at 07:59, airflowuser <airflowuser@xxxxxxxxxxxxxx
> >> .INVALID>
> >>> wrote:
> >>>>
> >>>> One of the most annoying, hard to understand and against all common
> >>> sense is the execution_date behavior. I assume that any new Airflow
> user
> >>> has been struggling with it.
> >>>> The amount of questions with answers referring to :
> >>> https://airflow.apache.org/scheduler.html?scheduling-triggers  is
> >>> uncountable.
> >>>>
> >>>> Most people mistakenly think that execution_date is the datetime which
> >>> the DAG started to run.
> >>>>
> >>>> I suggest the following changes:
> >>>> 1. Renaming the execution_date to something else like: run_stamped
> >>> This name won't cause people to get confused.
> >>>> 2. Adding a new variable which indicated the actual datetime when the
> >>> DAG run was generated. call it execution_start_date. People seem to
> want
> >>> the information when the DAG actually started to be executed/run.
> >>>>
> >>>> This is only naming changes. No need to actual change the behavior -
> >>> This will only make things simpler as when user encounter  run_stamped
> >> he
> >>> won't be confused by the name like execution_date
> >>>
> >>
>