git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Airflow cli to remote host


A quick side note to say that it's common to deploy one or many Airflow
sandboxes which are effectively the same configuration as a worker without
an actual worker instance working on it. It's similar to the concept of a
"gateway node" in Hadoop.

Users typically work in user space with a modified `airflow.cfg` that may
point to an alternate metadata database (to insulate production) that may
or may not have alternate connections registered to staging / dev
counterparts if existing, depending on policy. You'll typically find the
same Airflow package and python environment as the one used in production
with similar connectivity to other systems and databases. From there you
can run any cli commands and even fire up your own Airflow webserver that
you can tunnel into if need be.

For example at Lyft there's a simple cli application that will prepare your
remote home and hook things up (provide a working airflow.cfg, sync/clone
the pipeline repo for you, ...) so that it all works and feels similar to
other development workflows specific to Lyft. It basically automated the
whole "setting up a dev env" with the proper policies.

At Airbnb, the "data sandboxes" act as Airflow sandboxes that you can ssh
into, AND JupyterHub nodes where you can find the same home whether you ssh
or you access Jupyter.

In the Kubernetes world, it seems like there should be an easy way to order
or "lease" an Airflow sandbox that would have your home directory persisted
and mounted on that pod just for the time that you need it.

Max

On Wed, May 23, 2018 at 3:12 PM Luke Diment <Luke.Diment@xxxxxxxxxxxxx>
wrote:

> Fabric looks perfect for this.
> ________________________________________
> From: Kyle Hamlin <hamlin.kn@xxxxxxxxx>
> Sent: Thursday, May 24, 2018 6:22 AM
> To: dev@xxxxxxxxxxxxxxxxxxxxxxxxxxxx
> Subject: Re: Airflow cli to remote host
>
> I'd suggest using something like Fabric <http://www.fabfile.org/> for
> this.
> This is is how I accomplish the same task.
>
> On Wed, May 23, 2018 at 2:19 PM Frank Maritato <fmaritato@xxxxxxxxxxxxx>
> wrote:
>
> > Hi All,
> >
> > I need to be able to run backfill for my jobs against our production
> > airflow server. Is there a way to run
> >
> > airflow backfill job_name -s 2018-05-01
> >
> > against a remote server? I didn’t see a -h option to specify a hostname.
> >
> > If not, is there a way through the ui to do this? I'd rather not have to
> > ssh into the production server to run these jobs.
> >
> > Thanks!
> > --
> > Frank Maritato
> >
> >
>
> --
> Kyle Hamlin
>
>
>
> The contents of this email and any attachments are confidential and may be
> legally privileged. If you are not the intended recipient please advise the
> sender immediately and delete the email and attachments. Any use,
> dissemination, reproduction or distribution of this email and any
> attachments by anyone other than the intended recipient is prohibited.
>