git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Airflow 1.10 Migration Duration


I haven't done 1.8.x to 1.10.x in one go, but multiple hours seems long for
running a handful of Alembic migrations on 10M rows.  It might be worth
noting if you're using MySQL or Postgres and how your db is hosted... I
wonder if there's a bottleneck at play here.

Also, are you running the migrations in online or offline mode?

You may see a performance improvement if you collapse all migrations into
one then apply that (https://stackoverflow.com/a/34492022/149428).

I prefer to keep all of my metadata in place personally, but the db-cleanup
DAG in https://github.com/teamclairvoyant/airflow-maintenance-dags has been
brought up before.

T

*Taylor Edmiston*
Blog <https://blog.tedmiston.com/> | LinkedIn
<https://www.linkedin.com/in/tedmiston/> | Stack Overflow
<https://stackoverflow.com/users/149428/taylor-edmiston> | Developer Story
<https://stackoverflow.com/story/taylor>


On Tue, Sep 25, 2018 at 8:30 PM, Sid Anand <sanand@xxxxxxxxxx> wrote:

> I checked with our Ops guy and he mentioned that when he upgraded from
> 1.8.x to 1.9.x, it took a few seconds. We had 3M rows in the task_instance
> table and run MySQL 5.7.
>
> -s
>
> On Tue, Sep 25, 2018 at 4:54 PM Matt Davis <jiffyclub@xxxxxxxxx> wrote:
>
> > Hi folks,
> >
> > Here at Clover we're excitedly migrating to Airflow 1.10 (thanks for
> > everyone's hard work on that!). We're finding that it's taking about 2
> > hours to apply all the migrations to go from Airflow 1.8 to 1.10, largely
> > driven by the 10 million rows in our task_instance table. That got us
> > wondering what kind of maintenance people do on their Airflow metadata
> > databases. Do folks mostly put up with long migrations and generally
> longer
> > queries, or are y'all doing periodic cleanups of your metadata DB to keep
> > it fairly light?
> >
> > Thanks,
> > Matt Davis
> >
>