git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: programmatically creating and airflow quirks


I believe that is mostly because we want to skip parsing/loading .py files
that doesn't contain DAG defs to save time, as scheduler is going to
parse/load the .py files over and over again and some files can take quite
long to load.

Cheers,
Kevin Y

On Fri, Nov 23, 2018 at 12:44 AM soma dhavala <soma.dhavala@xxxxxxxxx>
wrote:

> happy to report that the “fix” worked. thanks Alex.
>
> btw, wondering why was it there in the first place? how does it help —
> saves time, early termination — what?
>
>
> > On Nov 23, 2018, at 8:18 AM, Alex Guziel <alex.guziel@xxxxxxxxxx> wrote:
> >
> > Yup.
> >
> > On Thu, Nov 22, 2018 at 3:16 PM soma dhavala <soma.dhavala@xxxxxxxxx
> <mailto:soma.dhavala@xxxxxxxxx>> wrote:
> >
> >
> >> On Nov 23, 2018, at 3:28 AM, Alex Guziel <alex.guziel@xxxxxxxxxx
> <mailto:alex.guziel@xxxxxxxxxx>> wrote:
> >>
> >> It’s because of this
> >>
> >> “When searching for DAGs, Airflow will only consider files where the
> string “airflow” and “DAG” both appear in the contents of the .py file.”
> >>
> >
> > Have not noticed it.  From airflow/models.py, in process_file — (both in
> 1.9 and 1.10)
> > ..
> > if not all([s in content for s in (b'DAG', b'airflow')]):
> > ..
> > is looking for those strings and if they are not found, it is returning
> without loading the DAGs.
> >
> >
> > So having “airflow” and “DAG”  dummy strings placed somewhere will make
> it work?
> >
> >
> >> On Thu, Nov 22, 2018 at 2:27 AM soma dhavala <soma.dhavala@xxxxxxxxx
> <mailto:soma.dhavala@xxxxxxxxx>> wrote:
> >>
> >>
> >>> On Nov 22, 2018, at 3:37 PM, Alex Guziel <alex.guziel@xxxxxxxxxx
> <mailto:alex.guziel@xxxxxxxxxx>> wrote:
> >>>
> >>> I think this is what is going on. The dags are picked by local
> variables. I.E. if you do
> >>> dag = Dag(...)
> >>> dag = Dag(…)
> >>
> >> from my_module import create_dag
> >>
> >> for file in yaml_files:
> >>      dag = create_dag(file)
> >>      globals()[dag.dag_id] = dag
> >>
> >> You notice that create_dag is in a different module. If it is in the
> same scope (file), it will be fine.
> >>
> >>>
> >>
> >>> Only the second dag will be picked up.
> >>>
> >>> On Thu, Nov 22, 2018 at 2:04 AM Soma S Dhavala <soma.dhavala@xxxxxxxxx
> <mailto:soma.dhavala@xxxxxxxxx>> wrote:
> >>> Hey AirFlow Devs:
> >>> In our organization, we build a Machine Learning WorkBench with
> AirFlow as
> >>> an orchestrator of the ML Work Flows, and have wrapped AirFlow python
> >>> operators to customize the behaviour. These work flows are specified in
> >>> YAML.
> >>>
> >>> We drop a DAG loader (written python) in the default location airflow
> >>> expects the DAG files.  This DAG loader reads the specified YAML files
> and
> >>> converts them into airflow DAG objects. Essentially, we are
> >>> programmatically creating the DAG objects. In order to support muliple
> >>> parsers (yaml, json etc), we separated the DAG creation from loading.
> But
> >>> when a DAG is created (in a separate module) and made available to the
> DAG
> >>> loaders, airflow does not pick it up. As an example, consider that I
> >>> created a DAG picked it, and will simply unpickle the DAG and give it
> to
> >>> airflow.
> >>>
> >>> However, in current avatar of airfow, the very creation of DAG has to
> >>> happen in the loader itself. As far I am concerned, airflow should not
> care
> >>> where and how the DAG object is created, so long as it is a valid DAG
> >>> object. The workaround for us is to mix parser and loader in the same
> file
> >>> and drop it in the airflow default dags folder. During dag_bag
> creation,
> >>> this file is loaded up with import_modules utility and shows up in the
> UI.
> >>> While this is a solution, but it is not clean.
> >>>
> >>> What do DEVs think about a solution to this problem? Will saving the
> DAG to
> >>> the db and reading it from the db work? Or some core changes need to
> happen
> >>> in the dag_bag creation. Can dag_bag take a bunch of "created" DAGs.
> >>>
> >>> thanks,
> >>> -soma
> >>
> >
>
>