git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

programmatically creating and airflow quirks


Hey AirFlow Devs:
In our organization, we build a Machine Learning WorkBench with AirFlow as
an orchestrator of the ML Work Flows, and have wrapped AirFlow python
operators to customize the behaviour. These work flows are specified in
YAML.

We drop a DAG loader (written python) in the default location airflow
expects the DAG files.  This DAG loader reads the specified YAML files and
converts them into airflow DAG objects. Essentially, we are
programmatically creating the DAG objects. In order to support muliple
parsers (yaml, json etc), we separated the DAG creation from loading. But
when a DAG is created (in a separate module) and made available to the DAG
loaders, airflow does not pick it up. As an example, consider that I
created a DAG picked it, and will simply unpickle the DAG and give it to
airflow.

However, in current avatar of airfow, the very creation of DAG has to
happen in the loader itself. As far I am concerned, airflow should not care
where and how the DAG object is created, so long as it is a valid DAG
object. The workaround for us is to mix parser and loader in the same file
and drop it in the airflow default dags folder. During dag_bag creation,
this file is loaded up with import_modules utility and shows up in the UI.
While this is a solution, but it is not clean.

What do DEVs think about a solution to this problem? Will saving the DAG to
the db and reading it from the db work? Or some core changes need to happen
in the dag_bag creation. Can dag_bag take a bunch of "created" DAGs.

thanks,
-soma