git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Using Too Many Aiflow Variables in Dag is Good thing ?


Who don't we cache variables? We can fairly assume that variables won't get
changed very frequently(not as frequent as scheduler DAG run time). We can
keep default timeout to few times scheduler run time. This will help
control number of connections to database and reduces load both on
scheduler and database.

On Mon 22 Oct, 2018, 13:34 Marcin Szymański, <ms32035@xxxxxxxxx> wrote:

> Hi
>
> You are right, it's a sure way to saturate db connections, as a connection
> is established every few seconds when the DAGs are parsed. The same happens
> when you use variables in __init__ of an operator. Os environment variable
> would be safer for your need.
>
> Marcin
>
>
> On Mon, 22 Oct 2018, 08:34 Pramiti Goel, <pramitigoel20@xxxxxxxxx> wrote:
>
> > Hi,
> >
> > We want to make owner and email Id general, so we don't want to put in
> > airflow dag. Using variables will help us in changing the email/owner
> > later, if there are lot of dags of same owner.
> >
> > For example:
> >
> >
> > default_args = {
> >     'owner': Variable.get('test_owner_de'),
> >     'depends_on_past': False,
> >     'start_date': datetime(2018, 10, 17),
> >     'email': Variable.get('de_infra_email'),
> >     'email_on_failure': True,
> >     'email_on_retry': True,
> >     'retries': 2,
> >     'retry_delay': timedelta(minutes=1)}
> >
> >
> > Looking into the code of Airflow, it is making connection session
> everytime
> > the variable is created, and then close it. (Let me know if I understand
> > wrong). If there are many dags with variables in default args running
> > parallel, querying variable table in MySQL, will it have any sort of
> > limitation on number of sessions of SQLAlchemy ? Will that make dag slow
> as
> > there will be many queries to mysql for each dag? is the above approach
> > good ?
> >
> >  >using Airlfow 1.9
> >
> > Thanks,
> > Pramiti.
> >
>