git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Mocking airflow (similar to moto for AWS)


I have pylint set up in my IDE which catches most silly errors like missing
imports
I also use a docker image so I can start up airflow locally and manually
test any changes before trying to deploy them. I use a slightly modified
version of https://github.com/puckel/docker-airflow to control it. This
only works on connections I have access to from my machine
Finally I have a suite of tests based on
https://blog.usejournal.com/testing-in-airflow-part-1-dag-validation-tests-dag-definition-tests-and-unit-tests-2aa94970570c
which I can run to test DAGs are valid and any unit tests I can put in. The
tests are run in a docker container which runs a local db instance so I
have access to xcoms etc

As part of my deployment pipeline, I run pylint and tests again before
deploying anywhere to make sure nobody has forgotten to run them locally

Gerard - I like the suggestion about using mocked hooks and BDD. I will
look into this further

On Thu, 18 Oct 2018 at 15:12, Gerard Toonstra <gtoonstra@xxxxxxxxx> wrote:

> There was a discussion about a unit testing approach last year 2017 I
> believe. If you dig the mail archives, you can find it.
>
> My take is:
>
> - You should test "hooks" against some real system, which can be a docker
> container. Make sure the behavior is predictable when talking against that
> system. Hook tests are not part of general CI tests because of the
> complexity of the CI setup you'd have to make, so they are run on local
> boxes.
> - Maybe add additional "mock" hook tests, mocking out the connected
> systems.
> - When hooks are tested, operators can use 'mocked' hooks that no longer
> need access to actual systems. You can then set up an environment where you
> have predictable inputs and outputs and test how the operators act on them.
> I've used "behave" to do that with very simple record sets, but you can
> make these as complex as you want.
> - Then you know your hooks and operators work functionally. Testing if your
> workflow works in general can be implemented by adding "check" operators.
> The benefit here is that you don't test the workflow once, but you test for
> data consistency every time the dag runs. If you have complex workflows
> where the correct behavior of the flow is worrysome, then you may need to
> go deeper into it.
>
> The above doesn't depend on DAGS that need to be scheduled and the delays
> involving that.
>
> All of the above is implemented in my repo
> https://github.com/gtoonstra/airflow-hovercraft  , using "behave" as a BDD
> method of testing, so you can peruse that.
>
> Rgds,
>
> G>
>
>
> On Thu, Oct 18, 2018 at 2:43 PM Jarek Potiuk <Jarek.Potiuk@xxxxxxxxxxx>
> wrote:
>
> > I am also looking to have (I think) similar workflow. Maybe someone has
> > done something similar and can give some hints on how to do it the
> easiest
> > way?
> >
> > Context:
> >
> > While developing operators I am using example test DAGs that talk to GCP.
> > So far our "integration tests" require copying the dag folder and
> > restarting the airflow servers, unpausing the dag and waiting for it to
> > start. That takes a lot of time, sometimes just to find out that you
> missed
> > one import.
> >
> > Ideal workflow:
> >
> > Ideally I'd love to have a "unit" test (i.e possible to run via nosetests
> > or IDE integration/PyCharm) that:
> >
> >    - should not need to have airflow scheduler/webserver started. I guess
> >    we need a DB but possibly an in-memory, on-demand created database
> > might be
> >    a good solution
> >    - load the DAG from a file specified (not really from/dags directory)
> >    - build internal dependencies between the DAG tasks (as specified in
> the
> >    Dag)
> >    - run the DAG immediately and fully (i.e. run all the "execute"
> methods
> >    as needed and pass XCOM between tasks).
> >    - ideally produce log output in console rather in per-task files.
> >
> > I thought about using DagRun/DagBag but have not tried it yet and not
> sure
> > if you need to have whole environment set (which parts?). Any help
> > appreciated :) ?
> >
> > J.
> >
> > On Thu, Oct 18, 2018 at 1:08 AM bielllobera@xxxxxxxxx <
> > bielllobera@xxxxxxxxx>
> > wrote:
> >
> > > I think it would be great to have a way to mock airflow for unit tests.
> > > The way I approached this was to create a context manager that creates
> a
> > > temporary directory, sets the AIRFLOW_HOME environment variable to this
> > > directory (only within the scope of the context manager) and then
> renders
> > > an airflow.cfg to that location. This creates an SQLite just for the
> test
> > > so you can add variables and connections needed for the test without
> > > affecting the real Airflow installation.
> > >
> > > The first thing I realized is that this didn't work if the imports were
> > > outside the context manager, since airflow.configuration and
> > > airflow.settings perform all the initialization when they are imported,
> > so
> > > the AIRFLOW_HOME variable is already set to the real installation
> before
> > > getting inside the context manager.
> > >
> > > The workaround for this was to reload those modules and this works for
> > the
> > > tests I have written. However, when I tried to use it for something
> more
> > > complex (I have a plugin that I'm importing) I noticed that inside the
> > > operator in this plugin, AIRFLOW_HOME is still set to the real
> > > installation, not the temporary one for the test. I thought this must
> be
> > > related to the imports but I haven't been able to figure out a way to
> fix
> > > the issue. I tried patching some methods but I must have been missing
> > > something because the database initialization failed.
> > >
> > > Does anyone have an idea on the best way to mock/patch airflow so that
> > > EVERYTHING that is executed inside the context manager uses the
> temporary
> > > installation?
> > >
> > > PS: This is my current attempt which works for the tests I defined but
> > not
> > > for external plugins:
> > > https://github.com/biellls/airflow_testing
> > >
> > > For an example on how it works:
> > >
> >
> https://github.com/biellls/airflow_testing/blob/master/tests/mock_airflow_test.py
> > >
> >
> >
> > --
> >
> > *Jarek Potiuk, Principal Software Engineer*
> > Mobile: +48 660 796 129
> >
>


-- 
-- 

Anthony Brown
Data Engineer BI Team - John Lewis
Tel : 0787 215 7305
**********************************************************************
This email is confidential and may contain copyright material of the John Lewis Partnership. 
If you are not the intended recipient, please notify us immediately and delete all copies of this message. 
(Please note that it is your responsibility to scan this message for viruses). Email to and from the
John Lewis Partnership is automatically monitored for operational and lawful business reasons.
**********************************************************************

John Lewis plc
Registered in England 233462
Registered office 171 Victoria Street London SW1E 5NN
 
Websites: https://www.johnlewis.com 
http://www.waitrose.com 
https://www.johnlewisfinance.com
http://www.johnlewispartnership.co.uk
 
**********************************************************************