git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: SubdagOperator and Pools


Hi Andreas,

The default executor for SubdagOperator is SequentialExecutor which makes
sure all the tasks within subdag are executed in sequential order. But if
you have too many subdags within single DAG and want to control with
pooling(https://airflow.apache.org/concepts.html#pools), subdagOperator u
nfortunately doesn't respect pooling(
https://issues.apache.org/jira/browse/AIRFLOW-2371) at this momement. My
understanding is that airflow uses backfill Scheduler to schedule
subdagOperator instead of the normal scheduler which backfill scheduler has
certain discrepancies with the normal scheduler on pooling support.

Best,
-Tao

On Wed, Aug 8, 2018 at 9:14 AM, Andreas Koeltringer <
andreas.koeltringer@xxxxxxxxx> wrote:

> Hi,
>
> we have a SubdagOperator with lots of tasks in it. We want to limit the
> parallelism, with which these tasks execute. Therefore we created a pool
> and added the tasks within the SubdagOperator to this pool.
>
> However, this setting is not respected (see image attached).
>
> Now we am wondering why that is. In 'subdag_operator.py' on the master
> branch there is a comment that
>
>     "Airflow pool is not honored by SubDagOperator."
>
> This comment is not in the file in v1.9.0 (which I am using).
>
> So this means that Pools are not respected for Subdags?
>
> On the other handside it states that Subdags use the SequentialExecutor,
> which *should* execute tasks sequentially?
>
> Can anyone clarify this, please?
> And if pools do not work, what options do we have to limit parallelism in
> a Subdag?
>
> Thanks in advance,
> Andreas
>