git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Airflow - High Availability and Scale Up vs Scale Out


We are just starting out but our setup is 2 EC2 with one running the web
server and scheduler and the other having multiple workers. The database is
an RDS which both are connected to as well as Redis on AWS elastic cache
for the Celery connection.

All 4 services are run in containers with systemd and we use CodeDeploy and
sync up the code by mapping volumes from local file to the container. We
are not yet heavy users of Airflow so I can't speak to performance and
scale up just yet.

In general I think an AMI with baked in code can be brittle and hard to
maintain and update. Container is the way to go as you can bake in the code
in the image if you want. We have chosen not to do that and rely on volume
mapping to update the latest code in the container. This makes it easier
that you don't need to keep creating new images.

Arash

On Sat, Jun 9, 2018 at 9:47 AM Naik Kaxil <k.naik@xxxxxxxxx> wrote:

> Let us know after trying the beefy box approach about your findings.
>
> On 08/06/2018, 12:24, "Sam Sen" <sxs@xxxxxxxxxxxxxxxx> wrote:
>
>     We are facing this now. We have tried the celeryexecutor and it adds
> more
>     moving parts. While we have no thrown out this idea, we are going to
> give
>     one big beefy box a try.
>
>     To handle the HA side of things, we are putting the server in an
>     auto-scaling group (we use AWS) with a min and Max of 1 server. We
> deploy
>     from an AMI that has airflow baked in and we point the DB config to an
> RDS
>     using service discovery (consul).
>
>     As for the dag code, we can either bake it into the AMI as well or
> install
>     it on bootup. We haven't decided what to do for this but either way, we
>     realize it could take a few minutes to fully recover in the event of a
>     catastrophe.
>
>     The other option is to have a standby server if using celery isn't
> ideal.
>     With that, I have tried using Hashicorp nomad to handle the services.
> In my
>     limited trial, it did what we wanted but we need more time to test.
>
>     On Fri, Jun 8, 2018, 4:23 AM Naik Kaxil <k.naik@xxxxxxxxx> wrote:
>
>     > Hi guys,
>     >
>     >
>     >
>     > I have 2 specific questions for the guys using Airflow in production?
>     >
>     >
>     >
>     >    1. How have you achieved High availability? How does the
> architecture
>     >    look like? Do you replicate the master node as well?
>     >    2. Scale Up vs Scale Out?
>     >       1. What is the preferred approach you take? 1 beefy Airflow VM
> with
>     >       Worker, Scheduler and Webserver using Local Executor or a
> cluster with
>     >       multiple workers using Celery Executor.
>     >
>     >
>     >
>     > I think this thread should help others as well with similar question.
>     >
>     >
>     >
>     >
>     >
>     > Regards,
>     >
>     > Kaxil
>     >
>     >
>     >
>     >
>     > Kaxil Naik
>     >
>     > Data Reply
>     > 2nd Floor, Nova South
>     > 160 Victoria Street, Westminster
>     > London SW1E 5LB - UK
>     > phone: +44 (0)20 7730 6000 <+44%2020%207730%206000>
>     > k.naik@xxxxxxxxx
>     > www.reply.com
>     >
>     > [image: Data Reply]
>     >
>
>
>
>
>
>
> Kaxil Naik
>
> Data Reply
> 2nd Floor, Nova South
> 160 Victoria Street, Westminster
> London SW1E 5LB - UK
> phone: +44 (0)20 7730 6000 <+44%2020%207730%206000>
> k.naik@xxxxxxxxx
> www.reply.com
>