git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Is `airflow backfill` disfunctional?


The request was for opposition, but I’d like to weigh in on the side of “it’s a better behavior [to have failed tasks re-run when cleared in a backfill"
On Jun 5, 2018, 4:16 PM -0700, Maxime Beauchemin <maximebeauchemin@xxxxxxxxx>, wrote:
> @Jeremiah Lowin <jlowin@xxxxxxxxx> & @Bolke de Bruin <bdbruin@xxxxxxxxx> I
> think you may have some context on why this may have changed at some point.
> I'm assuming that when DagRun handling was added to the backfill logic, the
> behavior just happened to change to what it is now.
>
> Any opposition in moving back towards re-running failed tasks when starting
> a backfill? I think it's a better behavior, though it's a change in
> behavior that we should mention in UPDATE.md.
>
> One of our goals is to make sure that a failed or killed backfill can be
> restarted and just seamlessly pick up where it left off.
>
> Max
>
> On Tue, Jun 5, 2018 at 3:25 PM Tao Feng <fengtao04@xxxxxxxxx> wrote:
>
> > After discussing with Max, we think it would be great if `airflow backfill`
> > could be able to auto pick up and rerun those failed tasks. Currently, it
> > will throw exceptions(
> >
> > https://github.com/apache/incubator-airflow/blob/master/airflow/jobs.py#L2489
> > )
> > without rerunning the failed tasks.
> >
> > But since it broke some of the previous assumptions for backfill, we would
> > like to get some feedback and see if anyone has any concerns(pr could be
> > found at https://github.com/apache/incubator-airflow/pull/3464/files).
> >
> > Thanks,
> > -Tao
> >
> > On Thu, May 24, 2018 at 10:26 AM, Maxime Beauchemin <
> > maximebeauchemin@xxxxxxxxx> wrote:
> >
> > > So I'm running a backfill for what feels like the first time in years
> > using
> > > a simple `airflow backfill --local` commands.
> > >
> > > First I start getting a ton of `logging.info` of each tasks that cannot
> > be
> > > started just yet at every tick flooding my terminal with the keyword
> > > `FAILED` in it, looking like a million of lines like this one:
> > >
> > > [2018-05-24 14:33:07,852] {models.py:1123} INFO - Dependencies not met
> > for
> > > <TaskInstance: some_dag.some_task_id 2018-01-28 00:00:00 [scheduled]>,
> > > dependency 'Trigger Rule' FAILED: Task's trigger rule 'all_success' re
> > > quires all upstream tasks to have succeeded, but found 1 non-success(es).
> > > upstream_tasks_state={'successes': 0L, 'failed': 0L, 'upstream_failed':
> > > 0L,
> > > 'skipped': 0L, 'done': 0L}, upstream_task_ids=['some_other_task_id']
> > >
> > > Good thing I triggered 1 month and not 2 years like I actually need, just
> > > the logs here would be "big data". Now I'm unclear whether there's
> > anything
> > > actually running or if I did something wrong, so I decide to kill the
> > > process so I can set a smaller date range and get a better picture of
> > > what's up.
> > >
> > > I check my logging level, am I in DEBUG? Nope. Just INFO. So I take a
> > note
> > > that I'll need to find that log-flooding line and demote it to DEBUG in a
> > > quick PR, no biggy.
> > >
> > > Now I restart with just a single schedule, and get an error `Dag
> > {some_dag}
> > > has reached maximum amount of 3 dag runs`. Hmmm, I wish backfill could
> > just
> > > pickup where it left off. Maybe I need to run an `airflow clear` command
> > > and restart? Ok, ran my clear command, same error is showing up. Dead
> > end.
> > >
> > > Maybe there is some new `airflow clear --reset-dagruns` option? Doesn't
> > > look like it... Maybe `airflow backfill` has some new switches to pick up
> > > where it left off? Can't find it. Am I supposed to clear the DAG Runs
> > > manually in the UI? This is a pre-production, in-development DAG, so
> > it's
> > > not on the production web server. Am I supposed to fire up my own web
> > > server to go and manually handle the backfill-related DAG Runs? Cannot to
> > > my staging MySQL and do manually clear some DAG runs?
> > >
> > > So. Fire up a web server, navigate to my dag_id, delete the DAG runs, it
> > > appears I can finally start over.
> > >
> > > Next thought was: "Alright looks like I need to go Linus on the mailing
> > > list".
> > >
> > > What am I missing? I'm really hoping these issues specific to 1.8.2!
> > >
> > > Backfilling is core to Airflow and should work very well. I want to
> > restate
> > > some reqs for Airflow backfill:
> > > * when failing / interrupted, it should seamlessly be able to pickup
> > where
> > > it left off
> > > * terminal logging at the INFO level should be a clear, human consumable,
> > > indicator of progress
> > > * backfill-related operations (including restarts) should be doable
> > through
> > > CLI interactions, and not require web server interactions as the typical
> > > sandbox (dev environment) shouldn't assume the existence of a web server
> > >
> > > Let's fix this.
> > >
> > > Max
> > >
> >