Re: Making Airflow Fault-Tolerant when running Airflow on Kubernetes

Hi Kevin,

Have you looked into the KubernetesExecutor? We achieve fault tolerance
using the kubernetes resourceVersion to ensure that all state is

On Wed, Sep 12, 2018 at 1:08 PM Kevin Lam <kevin@xxxxxxxxxxxxxxx> wrote:

> Hi all,
> We currently run Airflow as a Deployment in a kubernetes cluster. We also
> use a variant of KubernetesOperator to run our DAGs.
> We are investigating how to best make Airflow fault-tolerant, in part, due
> to investigating the use of preemptible vms [1]. *Has there been much
> discussion about about how to deploy Airflow in a fault-tolerant way? Are
> there any best practices? Ideally we'd like our kubernetes-hosted Airflow
> to support rolling updates for Docker image updates and also recover from
> components (worker, scheduler, web) going down temporarily, including when
> DAGs are in flight. *
> Any advice, ideas and/or feedback appreciated!
> [1]