git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Fundamental change - Separate DAG name and id.


I'm personally against having some kind of auto-increment numeric ID for
DAGs. While this makes a lot of sense for systems where creation is a
database activity (like a POST request), in Airflow, DAG creation is
actually a code ship activity. There are all kinds of complex scenarios
around that:

- I revert a commit and a DAG disappears or is renamed
- I run the same file, twice, with multiple parameters to create two DAGs
- I create the DAG in both staging and prod, but they wind up with
different IDs

It's just too hard to automatically track these scenarios.

If we really wanted to put something like this in place, it would first
make more sense to decouple DAG creation from code shipping, and instead
prefer creation of a DAG outside of code (but with a definition that
references which git repo/committish/file/arguments/etc. to use). Then if
you do something like rename a file, the DAG breaks, but at least still
exists in the db with that ID and history still makes sense once you update
the DAG definition with the new code location.

On Thu, Sep 20, 2018 at 4:52 AM airflowuser
<airflowuser@xxxxxxxxxxxxxx.invalid> wrote:

> Hi,
> though this could have been explained on Jira I think this should be
> discussed first.
>
> The problem:
> Airflow mixes DAG name with id. It uses same filed for both purposes.
>
> I assume that most of you use the dag_id to describe what the DAG actually
> does.
> For example:
>
> dag = DAG(
>     dag_id='cost_report_daily',
> ...
> )
>
> This dag_id is reflected to the dag id column in the UI.
> Now, lets say that you want to add another task to this specific dag - You
> are to be extremely careful when you change the dag_id to represent the new
> functionality for example : dag_id='cost_expenses_reports_daily' . This
> will break the history of the DAG.
>
> Or even with simpler use case.. the user just want to change the name he
> sees on the UI.
>
> I suggest to have a discussion if the dag_id should be split into id (an
> actual id) and name to reflect what it does. When the "connection" is done
> by id's  - names can change as much as you want without breaking anything.
> essentially it becomes a field uses for display purpose  only.
>
> * I didn't mention also the issue of DAG file name which can also cause
> trouble if someone wants to change it.
>
> Sent with [ProtonMail](https://protonmail.com) Secure Email.