git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Mocking airflow (similar to moto for AWS)


I am also looking to have (I think) similar workflow. Maybe someone has
done something similar and can give some hints on how to do it the easiest
way?

Context:

While developing operators I am using example test DAGs that talk to GCP.
So far our "integration tests" require copying the dag folder and
restarting the airflow servers, unpausing the dag and waiting for it to
start. That takes a lot of time, sometimes just to find out that you missed
one import.

Ideal workflow:

Ideally I'd love to have a "unit" test (i.e possible to run via nosetests
or IDE integration/PyCharm) that:

   - should not need to have airflow scheduler/webserver started. I guess
   we need a DB but possibly an in-memory, on-demand created database might be
   a good solution
   - load the DAG from a file specified (not really from/dags directory)
   - build internal dependencies between the DAG tasks (as specified in the
   Dag)
   - run the DAG immediately and fully (i.e. run all the "execute" methods
   as needed and pass XCOM between tasks).
   - ideally produce log output in console rather in per-task files.

I thought about using DagRun/DagBag but have not tried it yet and not sure
if you need to have whole environment set (which parts?). Any help
appreciated :) ?

J.

On Thu, Oct 18, 2018 at 1:08 AM bielllobera@xxxxxxxxx <bielllobera@xxxxxxxxx>
wrote:

> I think it would be great to have a way to mock airflow for unit tests.
> The way I approached this was to create a context manager that creates a
> temporary directory, sets the AIRFLOW_HOME environment variable to this
> directory (only within the scope of the context manager) and then renders
> an airflow.cfg to that location. This creates an SQLite just for the test
> so you can add variables and connections needed for the test without
> affecting the real Airflow installation.
>
> The first thing I realized is that this didn't work if the imports were
> outside the context manager, since airflow.configuration and
> airflow.settings perform all the initialization when they are imported, so
> the AIRFLOW_HOME variable is already set to the real installation before
> getting inside the context manager.
>
> The workaround for this was to reload those modules and this works for the
> tests I have written. However, when I tried to use it for something more
> complex (I have a plugin that I'm importing) I noticed that inside the
> operator in this plugin, AIRFLOW_HOME is still set to the real
> installation, not the temporary one for the test. I thought this must be
> related to the imports but I haven't been able to figure out a way to fix
> the issue. I tried patching some methods but I must have been missing
> something because the database initialization failed.
>
> Does anyone have an idea on the best way to mock/patch airflow so that
> EVERYTHING that is executed inside the context manager uses the temporary
> installation?
>
> PS: This is my current attempt which works for the tests I defined but not
> for external plugins:
> https://github.com/biellls/airflow_testing
>
> For an example on how it works:
> https://github.com/biellls/airflow_testing/blob/master/tests/mock_airflow_test.py
>


-- 

*Jarek Potiuk, Principal Software Engineer*
Mobile: +48 660 796 129