git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: programmatically creating and airflow quirks


It’s because of this

“When searching for DAGs, Airflow will only consider files where the string
“airflow” and “DAG” both appear in the contents of the .py file.”

On Thu, Nov 22, 2018 at 2:27 AM soma dhavala <soma.dhavala@xxxxxxxxx> wrote:

>
>
> On Nov 22, 2018, at 3:37 PM, Alex Guziel <alex.guziel@xxxxxxxxxx> wrote:
>
> I think this is what is going on. The dags are picked by local variables.
> I.E. if you do
> dag = Dag(...)
> dag = Dag(…)
>
>
> from my_module import create_dag
>
> for file in yaml_files:
> dag = create_dag(file)
> globals()[dag.dag_id] = dag
>
> You notice that create_dag is in a different module. If it is in the same
> scope (file), it will be fine.
>
>
>
> Only the second dag will be picked up.
>
> On Thu, Nov 22, 2018 at 2:04 AM Soma S Dhavala <soma.dhavala@xxxxxxxxx>
> wrote:
>
>> Hey AirFlow Devs:
>> In our organization, we build a Machine Learning WorkBench with AirFlow as
>> an orchestrator of the ML Work Flows, and have wrapped AirFlow python
>> operators to customize the behaviour. These work flows are specified in
>> YAML.
>>
>> We drop a DAG loader (written python) in the default location airflow
>> expects the DAG files.  This DAG loader reads the specified YAML files and
>> converts them into airflow DAG objects. Essentially, we are
>> programmatically creating the DAG objects. In order to support muliple
>> parsers (yaml, json etc), we separated the DAG creation from loading. But
>> when a DAG is created (in a separate module) and made available to the DAG
>> loaders, airflow does not pick it up. As an example, consider that I
>> created a DAG picked it, and will simply unpickle the DAG and give it to
>> airflow.
>>
>> However, in current avatar of airfow, the very creation of DAG has to
>> happen in the loader itself. As far I am concerned, airflow should not
>> care
>> where and how the DAG object is created, so long as it is a valid DAG
>> object. The workaround for us is to mix parser and loader in the same file
>> and drop it in the airflow default dags folder. During dag_bag creation,
>> this file is loaded up with import_modules utility and shows up in the UI.
>> While this is a solution, but it is not clean.
>>
>> What do DEVs think about a solution to this problem? Will saving the DAG
>> to
>> the db and reading it from the db work? Or some core changes need to
>> happen
>> in the dag_bag creation. Can dag_bag take a bunch of "created" DAGs.
>>
>> thanks,
>> -soma
>>
>
>