How to wait for external process
I have a DAG (externally triggered) where some processing is done at an
external system (EC2 instance). The processing is started by an Airflow
task (via HTTP request). The DAG should only continue once that
processing is completed. In a first naive implementation I created a
sensor that gets the progress (via HTTP request) and only if status is
"finished" returns true and the DAG run continues. That works but...
... the external processing can take hours or days, and during that time
a worker is occupied which does nothing but HTTP GET and sleep. There
will be hundreds of DAG runs in parallel which means hundreds of workers
I looked into other operators that do computation on external systems
(ECSOperator, AWSBatchOperator) but they also follow that pattern and
So I want to ask if there is a more efficient way to build such a
workflow with Airflow?