git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Interesting things about how to know it's a DAG file


I think that the idea of a Dag Fetcher is a great one.

Manifests are a good idea, or indeed it could look to a specific airflow
DAG index to instantiate the look up behaviour if it needs to be
programmatic.

We may want to offer a simple Dag Fetcher which follows the current
behaviour for backward compatibility, if we want to target 2.0 for the Dag
Fetcher implementation.

Best,
Arthur


On Thu, May 10, 2018 at 10:37 AM Gabriel Silk <gsilk@xxxxxxxxxxx.invalid>
wrote:

> What about a manifest file that names all the DAGs? Or a naming convention
> for the DAG files themselves?
>
> Alternatively, there could be a single entry point (ie, index.py) from
> which all the DAGs are instantiated. There's probably some complexity in
> making that work with the multi-process scheduler model, but doesn't seem
> insurmountable.
>
> On Thu, May 10, 2018 at 10:31 AM, Arthur Wiedmer <arthur.wiedmer@xxxxxxxxx
> >
> wrote:
>
> > Hi Song,
> >
> > I agree that this is not ideal, but it is difficult to do otherwise
> without
> > parsing/executing the Python code.
> >
> > Note that an import from airflow should be enough, or DAG in a comment. I
> > think we are open to other solutions, if anyone on the list has better
> > ideas.
> >
> >
> > Best,
> > Arthur
> >
> >
> >
> > On Thu, May 10, 2018 at 12:59 AM Song Liu <songliu@xxxxxxxxxxx> wrote:
> >
> > > Hi,
> > >
> > > I just create a custom Dag class naming such as "MyPipeline" by
> extending
> > > the "DAG" class, but Airflow is failed to identify this is a DAG file.
> > >
> > > After digging into the Airflow implementation around the
> > dag_processing.py
> > > file:
> > >
> > > ```
> > > # Heuristic that guesses whether a Python file contains an # Airflow
> DAG
> > > definition. might_contain_dag = True if safe_mode and not
> > > zipfile.is_zipfile(file_path): with open(file_path, 'rb') as f:
> content =
> > > f.read() might_contain_dag = all( [s in content for s in (b'DAG',
> > > b'airflow')])
> > > ```
> > >
> > > So if the keyword "DAG" and "airflow" contained, it is a DAG file.
> > >
> > > I don't know is there any other be more scientific way for this ?
> > >
> > > Thanks,
> > > Song
> > >
> >
>