git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: SubdagOperator and Pools


Hi,

to clarify, I created a Gist with instructions for how to reproduce this issue:

https://gist.github.com/akoeltringer/63fcf0340ae219c112b2a5377e6d2715

thanks, regards
Andreas


On 08/09/2018 07:41 AM, Andreas Koeltringer wrote:
Hi Tao,

thanks for your response.

That's just the thing: I am talking about ONE SubdagOperator: the tasks within in execute in parallel. That's what confuses me.


Kind regards,
Andreas


On 08/08/2018 06:41 PM, Tao Feng wrote:
Hi Andreas,

The default executor for SubdagOperator is SequentialExecutor which makes
sure all the tasks within subdag are executed in sequential order. But if
you have too many subdags within single DAG and want to control with
pooling(https://airflow.apache.org/concepts.html#pools), subdagOperator u
nfortunately doesn't respect pooling(
https://issues.apache.org/jira/browse/AIRFLOW-2371) at this momement. My
understanding is that airflow uses backfill Scheduler to schedule
subdagOperator instead of the normal scheduler which backfill scheduler has
certain discrepancies with the normal scheduler on pooling support.

Best,
-Tao

On Wed, Aug 8, 2018 at 9:14 AM, Andreas Koeltringer <
andreas.koeltringer@xxxxxxxxx> wrote:

Hi,

we have a SubdagOperator with lots of tasks in it. We want to limit the
parallelism, with which these tasks execute. Therefore we created a pool
and added the tasks within the SubdagOperator to this pool.

However, this setting is not respected (see image attached).

Now we am wondering why that is. In 'subdag_operator.py' on the master
branch there is a comment that

     "Airflow pool is not honored by SubDagOperator."

This comment is not in the file in v1.9.0 (which I am using).

So this means that Pools are not respected for Subdags?

On the other handside it states that Subdags use the SequentialExecutor,
which *should* execute tasks sequentially?

Can anyone clarify this, please?
And if pools do not work, what options do we have to limit parallelism in
a Subdag?

Thanks in advance,
Andreas






( ! ) Warning: include(msgfooter.php): failed to open stream: No such file or directory in /var/www/git/apache-airflow-development/msg04196.html on line 154
Call Stack
#TimeMemoryFunctionLocation
10.0000357752{main}( ).../msg04196.html:0

( ! ) Warning: include(): Failed opening 'msgfooter.php' for inclusion (include_path='.:/var/www/git') in /var/www/git/apache-airflow-development/msg04196.html on line 154
Call Stack
#TimeMemoryFunctionLocation
10.0000357752{main}( ).../msg04196.html:0