git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

How to wait for external process


Hello,

I have a DAG (externally triggered) where some processing is done at an
external system (EC2 instance). The processing is started by an Airflow
task (via HTTP request). The DAG should only continue once that
processing is completed. In a first naive implementation I created a
sensor that gets the progress (via HTTP request) and only if status is
"finished" returns true and the DAG run continues. That works but...

... the external processing can take hours or days, and during that time
a worker is occupied which does nothing but HTTP GET and sleep. There
will be hundreds of DAG runs in parallel which means hundreds of workers
are occupied.

I looked into other operators that do computation on external systems
(ECSOperator, AWSBatchOperator) but they also follow that pattern and
just wait/sleep.

So I want to ask if there is a more efficient way to build such a
workflow with Airflow?

Kind Regards,
Stefan