[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Airflow - YARN as an executor?

Kubernetes is a "monolithic" 1-level scheduler that can't handle what YARN
can - for example schedule tasks local to data.
Hadoop has multiple levels of data locality (node-local, rack-local) - so
computation happens local to data to minimize network
data transfer which is expensive.
K8s wasn't designed to handle this scheduling scenarios, as far as I know.

For cloud deployments where we don't have data locality problem (because of
s3 is being used instead of storage local
to servers), k8s might be okay.

Nice comparison [1] of k8s vs two-level schedulers like yarn and messos ..
although I think it's an offtopic.

We're mostly on-prem and we don't see kubernetes take over yarn any time



*2.3.2 Monolithic Schedulers *

Monolithic schedulers use a single, centralized scheduling algorithm for
all jobs. All workload is run through the same scheduler and same
scheduling logic. Swarm,
Fleet, Borg and Kubernetes adopt monolithic schedulers. Kubernetes
improvised on basic monolithic version of Borg and Swarm schedulers. This
type of schedulers are not suitable for running heterogeneous modern
workloads which include Spark jobs, containers, and other long running jobs,

*2.3.3 Two Level Schedulers *

Two-level schedulers address the drawbacks of a monolithic scheduler by
separating concerns of resource allocation and task placement. An active
resource manager offers compute resources to multiple parallel, independent
“scheduler frameworks”. The Mesos cluster manager pioneered this approach,
and YARN supports a limited version of it. In Mesos, resources are offered
to application-level schedulers. This allows for custom, workload-specific
scheduling policies. The drawback with this type of scheduling architecture
is that the application level frameworks cannot see all the possible
placement options anymore. Instead, they only see those options that
correspond to resources offered (Mesos) or allocated (YARN) by the resource
manager component. This makes priority preemption (higher priority tasks
kick out lower priority ones) difficult.

Ruslan Dautkhanov

On Tue, Apr 24, 2018 at 2:22 PM, Bolke de Bruin <bdbruin@xxxxxxxxx> wrote:

> Happy to have it as a contrib executor. However, I personally think yarn
> is a dead end. It has a lot of catching up to do and all the momentum is
> with kubernetes.
> B.
> Verstuurd vanaf mijn iPad
> > Op 24 apr. 2018 om 22:13 heeft Ruslan Dautkhanov <dautkhanov@xxxxxxxxx>
> het volgende geschreven:
> >
> > With Hadoop 3's Docker on YARN support, I think YARN becomes
> > somewhat a competitor for Kubernetes.
> >
> > Great job on adding k8s support to Airflow.
> >
> > Very similarly I see Airflow could integrate with YARN and use
> > its infrastructure as an "executor" .. have anyone explored feasibility
> of
> > this approach?
> >
> >
> > Thanks!
> > Ruslan Dautkhanov