git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Single Airflow Instance Vs Multiple Airflow Instance


At Slack, We follow a similar pattern of deploying multiple airflow
instances. Since the Airflow UI & the scheduler coupled, it introduces
friction as the user need to know underlying deployment strategy. (like
which Airflow URL I should visit to see my DAGs, multiple teams
collaborating on the same DAG, pipeline operations, etc.)

In one of the forum question, max mentioned renaming the scheduler to
supervisor as the scheduler do more than just scheduling.
It would be super cool if we can make multiple supervisors share the same
airflow metadata storage and the Airflow UI. (maybe introducing a unique
config param `supervisor.id` for each instance)

The approach will help us to scale Airflow scheduler horizontally and while
keeping the simplicity from the user perspective.


Regards,
Ananth.P,






On 7 June 2018 at 04:08, Arturo Michel <Arturo.Michel@xxxxxxxxxxxxxx> wrote:

> We have had up to 50 dags with multiple tasks each. Many of them run in
> parallel, we've had some issues with compute as it was meant to be a
> temporary deployment but somehow it's now the permanent production one and
> resources are not great.
> Oranisationally it is very similar to what Gerard described. More than one
> group working with different engineering practices and standards, this is
> probably one of the sources of problems.
>
> -----Original Message-----
> From: Gerard Toonstra <gtoonstra@xxxxxxxxx>
> Sent: Wednesday, June 6, 2018 5:02 PM
> To: dev@xxxxxxxxxxxxxxxxxxxxxxxxxxxx
> Subject: Re: Single Airflow Instance Vs Multiple Airflow Instance
>
> We are using two cluster instances. One cluster is for the engineering
> teams that are in the "tech" wing and which rigorously follow tech
> principles, the other instance is for use by business analysts and more
> ad-hoc, experimental work, who do not necessarily follow the principles. We
> have a nomad engineer helping out the ad-hoc cluster, setting it up,
> connecting it to all systems and resolving programming questions. All
> clusters are fully puppetized, so we reuse configs and ways how things are
> configured, plus have a common "platform code" package that is reused
> across both clusters.
>
> G>
>
>
> On Wed, Jun 6, 2018 at 5:50 PM, James Meickle <jmeickle@xxxxxxxxxxxxxx>
> wrote:
>
> > An important consideration here is that there are several settings
> > that are cluster-wide. In particular, cluster-wide concurrency
> > settings could result in Team B's DAG refusing to schedule based on an
> error in Team A's DAG.
> >
> > Do your teams follow similar practices in how eagerly they ship code,
> > or have similar SLAs for resolving issues? If so, you are probably
> > fine using co-tenancy. If not, you should probably talk about it first
> > to make sure the teams are okay with co-tenancy.
> >
> > On Wed, Jun 6, 2018 at 11:24 AM, gauthiermartin86@xxxxxxxxx <
> > gauthiermartin86@xxxxxxxxx> wrote:
> >
> > > Hi Everyone,
> > >
> > > We have been experimenting with airflow for about 6 months now.
> > > We are planning to have multiple departments to use it. Since we
> > > don't have any internal experience with Airflow we are wondering if
> > > single instance per department is more suited than single instance
> > > with multi-tenancy? We have been aware about the upcoming release of
> > > airflow
> > > 1.10 and changes that will be made to the RBAC which will be more
> > > suited for multi-tenancy.
> > >
> > > Any advice on this ? Any tips could be helpful to us.
> > >
> >
>
> This e-mail message and any attachments are confidential and are for the
> exclusive use of the addressee only.  If you are not the intended
> recipient, you should not use the content, place any reliance on it or
> disclose it to anyone else.  Please notify the sender immediately by
> replying to it and then ensure that it is deleted from your system
> (including any attachments).
>