git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: About the project support in Airflow


We've discussed internally something like having groups or "folders" for
DAGs in the UI.  Nothing functional on the backend, purely a front end
aesthetic.  Something like having DAGs named "foo/bar" and "foo/baz" would
be grouped like a tree visually in the UI:

- Group foo
  - DAG bar
  - DAG baz

Is that what you're looking for?

Best,
Taylor

On Thu, Apr 26, 2018 at 1:51 AM 刘松(Cycle++开发组) <liusong02@xxxxxxxxxx> wrote:

> Hi Feng,
>
> Thanks for your information, indeed I have noticed this work also.
>
> But if I am understanding correctly, it is focus on the permission
> (edit/read etc.) with the DAG itself.
>
> “project concept” is some kind of “Group” but it is more meaningful than
> the “Tag”, so if we don’t want to support “project concept”, is there any
> other solution for this requirement or any consideration behind ?
>
> Many thanks for help.
>
> Thanks,
> Song
>
> On 26/04/2018, 12:28 PM, "Tao Feng" <fengtao04@xxxxxxxxx> wrote:
>
>     Hi Song,
>
>     Just noted that we are also working on dag-level access on top of
>     RBAC(AIRFLOW-2267) which should provide dag-level acl functionality.
> The
>     WIP pr could be found at
>     https://github.com/apache/incubator-airflow/pull/3197
>
>     On Wed, Apr 25, 2018 at 7:42 PM, 刘松(Cycle++开发组) <liusong02@xxxxxxxxxx>
>     wrote:
>
>     > Hi Taylor,
>     >
>     > Yes, I know that this RBAC feature would be released within the 1.10
>     > release.
>     >
>     > # About multi-user support
>     >
>     > But Why not deploy one instance of Airflow per user ? (
>     > With this feature, don’t you think that the Airflow is to be more
> likely
>     > as a platform to serve more different users.
>     > Also multi-user case would exhaust the Airflow resource more easily
> if we
>     > are talking the scalability capability of Airflow.
>     >
>     > # About multi-project support
>     >
>     > You could see the “project” concept is some kind of logical group of
> the
>     > DAGs to let the DAGs be organized more structural.
>     > I can’t see it will beat the “scalability” of Airflow somehow, it
> just let
>     > the user experience be more friendly I see.
>     >
>     > So that is why I want to use the “multi-user support” case to argue
> why
>     > suggest using multi-instance for “multi-project”,
>     > since that I think the “multi-user” support is kindly of pushing the
>     > Airflow in the way of “be more scalable”, but “multi-project” just
> be more
>     > intuitive and more user-experience friendly.
>     >
>     > Thanks,
>     > Song
>     >
>     > On 26/04/2018, 4:50 AM, "Taylor Edmiston" <tedmiston@xxxxxxxxx>
> wrote:
>     >
>     >     Something else that might be relevant for your multi-user use
> case is
>     > the
>     >     new RBAC support that Joy Gao added.
>     >
>     >     https://github.com/apache/incubator-airflow/pull/3015
>     >
>     >     *Taylor Edmiston*
>     >     Blog <http://blog.tedmiston.com> | Stack Overflow CV
>     >     <https://stackoverflow.com/story/taylor> | LinkedIn
>     >     <https://www.linkedin.com/in/tedmiston/> | AngelList
>     >     <https://angel.co/taylor>
>     >
>     >
>     >     On Wed, Apr 25, 2018 at 3:04 PM, James Meickle <
>     > jmeickle@xxxxxxxxxxxxxx>
>     >     wrote:
>     >
>     >     > Another reason you would want separated infrastructure is that
> there
>     > are a
>     >     > lot of ways to exhaust Airflow resources or otherwise cause
>     > contention -
>     >     > like having too many sensors or sub-DAGs using up all available
>     > tasks.
>     >     >
>     >     > Doesn't seem like a great idea to push for having different
> teams
>     > with
>     >     > co-tenancy until there is also per-team control over resource
> use...
>     >     >
>     >     > On Tue, Apr 24, 2018 at 8:27 PM, 刘松(Cycle++开发组) <
>     > liusong02@xxxxxxxxxx>
>     >     > wrote:
>     >     >
>     >     > > It seems that all the current approach is pointing to
> multiple
>     > instance
>     >     > of
>     >     > > airflow, but project concept is very nature since one user
> might to
>     >     > handle
>     >     > > different type of tasks.
>     >     > >
>     >     > > Another thing about the multiple user support, one way is
> also to
>     > deploy
>     >     > > multiple instance, but it seems that airflow is providing
> multiple
>     > user
>     >     > > function builtin.
>     >     > >
>     >     > > So I can not be convinced that using multiple instance for
> multiple
>     >     > > project purpose.
>     >     > >
>     >     > > Thanks,
>     >     > > Song
>     >     > >
>     >     > >
>     >     > >
>     >     > >
>     >     > > On Wed, Apr 25, 2018 at 4:25 AM +0800, "Ace Haidrey" <
>     >     > acehaidrey@xxxxxxxxx
>     >     > > <mailto:acehaidrey@xxxxxxxxx>> wrote:
>     >     > >
>     >     > >
>     >     > > Looks neat Taylor!
>     >     > >
>     >     > > And regarding the original question, going off of what
> Maxime and
>     > Bolke
>     >     > > said, at Pandora, it made more sense for us to have an
> instance
>     > per team
>     >     > > since each team has its own system user for prod and the
> instance
>     > can run
>     >     > > all processes as that user. Alternatively you could have a
> super
>     > user
>     >     > that
>     >     > > can sudo as those other system users, and have many teams on
> a
>     > single
>     >     > > instance but that is a security concern (what if one team
> sudo's
>     > as the
>     >     > > other team and accidentally overwrites data - there is
> nothing
>     > stopping
>     >     > > them from doing it). It depends what your org set up is, but
> let
>     > me know
>     >     > if
>     >     > > there are any questions I can help with.
>     >     > >
>     >     > > Ace
>     >     > >
>     >     > >
>     >     > > > On Apr 24, 2018, at 1:16 PM, Taylor Edmiston  wrote:
>     >     > > >
>     >     > > > We use a similar approach like Bolke mentioned with running
>     > multiple
>     >     > > > Airflow instances.
>     >     > > >
>     >     > > > I haven't read the Pandora article yet, but we have an
>     > Astronomer Open
>     >     > > > Edition (fully open source) that bundles similar tools like
>     > Prometheus,
>     >     > > > Grafana, Celery, etc with Airflow and a Docker Compose
> file if
>     > you're
>     >     > > > looking to get a setup like that up and running quickly.
>     >     > > >
>     >     > > > https://github.com/astronomerio/astronomer/blob/
>     >     > master/examples/airflow-
>     >     > > enterprise/docker-compose.yml
>     >     > > > https://github.com/astronomerio/astronomer
>     >     > > >
>     >     > > > *Taylor Edmiston*
>     >     > > > Blog  | Stack Overflow CV
>     >     > > >  | LinkedIn
>     >     > > >  | AngelList
>     >     > > >
>     >     > > >
>     >     > > >
>     >     > > > On Tue, Apr 24, 2018 at 3:30 PM, Maxime Beauchemin <
>     >     > > > maximebeauchemin@xxxxxxxxx> wrote:
>     >     > > >
>     >     > > >> Related blog post about multi-tenant Airflow deployment
> out of
>     >     > Pandora:
>     >     > > >>
> https://engineering.pandora.com/apache-airflow-at-pandora-
>     >     > 1d7a844d68ee
>     >     > > >>
>     >     > > >> On Tue, Apr 24, 2018 at 10:20 AM, Bolke de Bruin
>     >     > > >> wrote:
>     >     > > >>
>     >     > > >>> My suggestion would be to deploy airflow per project. You
>     > could even
>     >     > > use
>     >     > > >>> airflow to manage your ci/cd pipeline.
>     >     > > >>>
>     >     > > >>> B.
>     >     > > >>>
>     >     > > >>> Sent from my iPhone
>     >     > > >>>
>     >     > > >>>> On 24 Apr 2018, at 18:33, Maxime Beauchemin <
>     >     > > >> maximebeauchemin@xxxxxxxxx>
>     >     > > >>> wrote:
>     >     > > >>>>
>     >     > > >>>> People have been talking about namespacing DAGs in the
> past.
>     > I'd
>     >     > > >>> recommend
>     >     > > >>>> using tags (many to many) instead of
> categories/projects (one
>     > to
>     >     > > many).
>     >     > > >>>>
>     >     > > >>>> It should be fairly easy to add this feature. One
> question is
>     >     > whether
>     >     > > >>> tags
>     >     > > >>>> are defined as code or in the UI/db only.
>     >     > > >>>>
>     >     > > >>>> Max
>     >     > > >>>>
>     >     > > >>>>> On Tue, Apr 24, 2018 at 1:48 AM, Song Liu
>     >     > > >> wrote:
>     >     > > >>>>>
>     >     > > >>>>> Hi,
>     >     > > >>>>>
>     >     > > >>>>> Basically the DAGs are created for a project purpose,
> so if
>     > I have
>     >     > > >> many
>     >     > > >>>>> different projects, will the Airflow support the
> Project
>     > concept
>     >     > and
>     >     > > >>>>> organize them separately ?
>     >     > > >>>>>
>     >     > > >>>>> Is this a known requirement or any plan for this
> already ?
>     >     > > >>>>>
>     >     > > >>>>> Thanks,
>     >     > > >>>>> Song
>     >     > > >>>>>
>     >     > > >>>
>     >     > > >>
>     >     > >
>     >     > >
>     >     > >
>     >     >
>     >
>     >
>     >
>
>
>