git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Best Practice of Airflow Setting-Up & Usage


Hi Xiaodong,

Thanks for preparing the questions.

Setting-Up: In container (previously Swarm and now K8S)
Executor: CeleryExecutor
Scale: two airflow workers
Queue: No
SLA: We don't have a hard limit but it would be unbearable for a DAG to be
scheduled in more than one minute.

Airflow has been run steadily and the Web UI is great to monitor the DAG
status (we added a button to allow user to upload their DAG files though).
The main frustration comes from that everything is in UTC time (we are in
GMT+8) although we can now set up a DAG in local timezone.
It has been confusing and inconvenient since users' data are usually
partitioned in local time.

Thanks,
Manu Zhang


On Wed, Sep 5, 2018 at 9:31 PM airflowuser
<airflowuser@xxxxxxxxxxxxxx.invalid> wrote:

> Hi,
>
> Setting up Airflow for the first time is a BIG DEAL.
> unlike the initial intention of the community of easy install with SQLite
> and SequentialExecutor - for actually working environment you need to
> change a lot of settings. It doesn't help much that the demo install went
> smoothly.
>
> The support for issues and problems is very limited. There is no actual
> community on StackOveflow and on Gitter other than Ash (and maybe few more
> occasionally) no one replies.
>
> Don't consider this as criticism. At the end all of you guys donating your
> time.. I simply writing my impressions. To be honest we were very close to
> neglect this project. May I suggest a module of "premium support" for
> payment which will be contribution to the community? Support in terms of
> questions, installation help etc..
>
>
> To your questions:
> 1. one-time
> 2. LocalExecutor
>
> Thous are not because this is what we wanted it's because that was the
> only thing that we could make it work. Hopefully we will try to install
> 1.10.1 from fresh and try to solve all the issues we encountered.
>
> 3. I use Queues.
> 4. Don't use SLAs.
>
>
>
>
> Sent with ProtonMail Secure Email.
>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On September 5, 2018 3:56 PM, Deng Xiaodong <xd.deng.r@xxxxxxxxx> wrote:
>
> > Hi folks,
> >
> > May you kindly share how your organization is setting up Airflow and
> using
> > it? Especially in terms of architecture. For example,
> >
> > -   Setting-Up: Do you install Airflow in a "one-time" fashion, or
> >     containerization fashion?
> >
> > -   Executor: Which executor are you using (LocalExecutor,
> >     CeleryExecutor, etc)? I believe most production environments are
> using
> >     CeleryExecutor?
> >
> > -   Scale: If using Celery, normally how many worker nodes do you add?
> (for
> >     sure this is up to workloads and performance of your worker nodes).
> >
> > -   Queue: if Queue feature
> >     https://airflow.apache.org/concepts.html#queues is used in your
> >
> >
> > architecture? For what advantage? (for example, explicitly assign
> > network-bound tasks to a worker node whose parallelism can be much higher
> > than its # of cores)
> >
> > -   SLA: do you have any SLA for your scheduling? (this is inspired by
> >     @yrqls21's PR 3830
> https://github.com/apache/incubator-airflow/pull/3830)
> >
> > -   etc.
> >
> >     Airflow's setting-up can be quite flexible, but I believe there is
> some
> >     sort of best practice, especially in the organisations where
> scalability is
> >     essential.
> >
> >     Thanks for sharing in advance!
> >
> >     Best regards,
> >     XD
> >
>
>
>