git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

How Airflow import modules as it executes the tasks


To start off, here is my project structure:
├── dags
│   ├── __init__.py
│   ├── core
│   │   ├── __init__.py
│   │   ├── operators
│   │   │   ├── __init__.py
│   │   │   ├── first_operator.py
│   │   └── util
│   │       ├── __init__.py
│   │       ├── db.py
│   ├── my_dag.py

Here is the versions and details of the airflow docker setup:

In my dag in different tasks I'm connecting to db (not Airflow db). I've setup db connection pooling,  I expected that my db.py would be be loaded once across the DagRun. However, in the log I can see that each task imports the module and new db connections made by each and every task. I can see that db.py is loaded in each task by having the line below in db.py:

logging.info("I was loaded {}".format(random.randint(0,100)))

I understand that each operator can technically be run in a separate machine and it does make sense that each task runs sort of independently. However, not sure that if this does apply in case of using LocalExecutor. Now the question is, how I can share the resources (db connections) across tasks using LocalExecutor.



( ! ) Warning: include(msgfooter.php): failed to open stream: No such file or directory in /var/www/git/apache-airflow-development/msg03373.html on line 89
Call Stack
#TimeMemoryFunctionLocation
10.0032364696{main}( ).../msg03373.html:0

( ! ) Warning: include(): Failed opening 'msgfooter.php' for inclusion (include_path='.:/var/www/git') in /var/www/git/apache-airflow-development/msg03373.html on line 89
Call Stack
#TimeMemoryFunctionLocation
10.0032364696{main}( ).../msg03373.html:0