git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Question on Running Airflow 1.10 in Kubernetes


Hi Daniel,
Thank you for the reply. In our current deployment, the airflow workers are
running in Kubernetes. But we are not using Kubernetes operator in our
Dags. So our worker are long running pod on kubernetes.  So if we
restart/kill worker when we do new deployment (add/update dags), so I have
following doubt,

Is killing airflow worker (starting/stopping airflow worker service )many
times in a day is good and advisable ? What can be the risk installed if
worker doesn't gracefully shutdown(which i have seen quite some time)  ?

Also as per above replies, One issue is that, while updating dag definition
while tasks are running. But by risk, I meant also that like if we kill the
worker in kubernetes to do new deployment (add/update dags)between the
tasks are running, and worker doesn't do warm shutdown, can this lead to
zombie tasks. Or tasks whose status doesn't get updated etc ?






On Mon, Oct 15, 2018 at 10:37 AM Michael Ghen <mike@xxxxxxxxxxxx> wrote:

> We have a similar setup with Kubernetes. We deploy (often several times)
> during the day when DAG runs are active and it does kill them. Like a few
> others mentioned, we do a few things to mitigate any issues this would
> cause:
>
> 1. DAGs are idempotent, can be rerun with no issues (we have a few
> exceptions to this, so it goes)
> 2. We set retries on all DAGs so when they are killed during a deploy, they
> will retry before alerting us
> 3. We log to a GCS bucket
>
> We often do a few deployments in a day because we don't have our local
> development environments set up as well as we should. We are getting better
> at building and testing DAGs locally using Docker. Still, not uncommon to
> do 1 or 2 deploys to production in the day. We have dag runs every hour
> 24/7, deploying while they're running hasn't been an issue given the 3
> precautions taken above.
>
> On Sun, Oct 14, 2018 at 4:48 PM Jeff Payne <jpayne@xxxxxxxxxxx> wrote:
>
> > We have a similar airflow system, except that everything is in the same
> > container image. We use GCS for task log file storage, cloudsql postgres
> > for the airflow db, and conda to package our DAGs and dependencies. We
> > redeploy the entire system any time we want to deploy new DAGs or changes
> > to any existing DAGs, which works out to once every week or two, often in
> > the middle of active DAG runs. We are careful to try to keep the DAGs
> > idempotent, which helps. Regardless, being conscious of what the DAGs are
> > doing at each stage also helps ?
> >
> > I'm curious about your use cases that require multiple deployments in a
> > single day...
> >
> > Get Outlook for Android<https://aka.ms/ghei36>
> >
> > ________________________________
> > From: Daniel Imberman <daniel.imberman@xxxxxxxxx>
> > Sent: Sunday, October 14, 2018 8:41:58 AM
> > To: dev@xxxxxxxxxxxxxxxxxxxxxxxxxxxx
> > Subject: Re: Question on Running Airflow 1.10 in Kubernetes
> >
> > Hi pramiti,
> >
> > We're in the process of allowing baked in images for the k8s executor
> > (should be merged soon/possibly already merged). With this added you can
> > specify the worker image in the airflow.cfg pretty easily the only
> > potential issue with re-launching multiple times a day would be if a DAG
> > was mid execution. Otherwise should be fine.
> >
> > WRT worker failures with the k8s executor you don't even need to shut
> down
> > the workers since the workers only last as long as the tasks do. We also
> > use the k8s event stream to bubble up any worker failures to the airflow
> UI
> >
> > On Sun, Oct 14, 2018, 3:56 AM Pramiti Goel <pramitigoel20@xxxxxxxxx>
> > wrote:
> >
> > > Hi,
> > >
> > > We are trying to run airflow 1.10 in kubernetes.
> > > 1) We are running our scheduler, worker and webserver service in
> > individual
> > > containers.
> > > 2) We are using docker image which has airflow 1.10, python 3.x. We are
> > > deploying our dags in docker image.
> > >
> > > With above architecture of airflow setup in kubernetes, whenever we
> > deploy
> > > dags, we need to create new docker image, kill the current running
> > workers
> > > in airflow and restart them again with new docker image.
> > >
> > > My question is: Is killing airflow worker (starting/stopping airflow
> > worker
> > > service )many times in a day is good and advisable ? What can be the
> risk
> > > installed if worker doesn't gracefully shutdown(which i have seen quite
> > > some time) ?
> > >
> > > Let me know if this is not correct place to ask.
> > >
> > > Thanks,
> > > Pramiti
> > >
> >
>