git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: airflow.exceptions.AirflowException dag_id not found


DagBag import timeouts happen when people do more than just "configuration
as code" in their module scope (say doing actual compute in module scope,
which is a no-no). They may also happen if you read things from flimsy
external systems that may introduce delays. Say you read pipeline
configuration from Zookeeper or from a database or network drive and
somehow that operation is timing out.

Also with Airflow (at the moment) you are responsible to synchronize the
pipeline definitions (DAGS_FOLDER) on all machines across the cluster. If
they are not in sync you'll have problems with symptoms that may look like
"dag_id not found". That happens when the scheduler is aware of DAGs that
workers may not be aware of.

Max

On Mon, Jun 11, 2018 at 12:42 PM Stephane Bonneaud <stephane@xxxxxxxxxxxxxxx>
wrote:

> Hi there,
>
> We’re using Airflow in our startup and it’s been great in many ways,
> thanks for the work you guys are doing!
>
> Unfortunately, we’re hitting a bunch of issues with ops timing out, DAGs
> failing for unclear reasons, with no logs or the following error:
> "airflow.exceptions.AirflowException: dag_id could not be found”. This
> seems to happen when enough DAGs are running at the same time, though it
> can also happen more rarely here and there. But, the best way to reproduce
> the error with our setup is to run enough DAGs at once. Most of the time,
> clearing the DAG run or ops that have failed and letting the DAG re-run is
> enough to fix the problem.
>
> I found resources pointing to the dagbag_import_timeout, e.g.,
> https://stackoverflow.com/questions/43235130/airflow-dag-id-could-not-be-found
> <
> https://stackoverflow.com/questions/43235130/airflow-dag-id-could-not-be-found
> >.
> I did play with that parameter, and other parameters as well. And it does
> seem that they help, i.e., I can run more DAGs at once, but
>         (1) if I run enough DAGs at once, I still see ops and DAGs
> failing, so the problem is not fixed ;
>         (2) more importantly, I don’t fully understand the problem. I have
> some ideas on what is happening, but maybe I’m totally wrong?
>
> Any recommendations on how I should investigate that?
>
> Thank you very much!
> Have a nice rest of the day,
> Stéphane
> http://stephanebonneaud.com <http://stephanebonneaud.com/>
>
>