git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Airflow 1.10 Migration Duration


Ruiqin - Re: backwards compatibility - I'm not sure, but my guess is that
the major versions have breaking schema changes that aren't simultaneously
backwards compatible.

Matt - Here's the offline mode support in Airflow and the Alembic docs.

-
https://github.com/apache/incubator-airflow/blob/f4f8027cbf61ce2ed6a9989facf6c99dffb12f66/airflow/migrations/env.py#L49-L66
- https://alembic.zzzcomputing.com/en/latest/offline.html

I haven't tested the two performance-wise but I would think online with
nothing else going would be comparable.


*Taylor Edmiston*
Blog <https://blog.tedmiston.com/> | LinkedIn
<https://www.linkedin.com/in/tedmiston/> | Stack Overflow
<https://stackoverflow.com/users/149428/taylor-edmiston> | Developer Story
<https://stackoverflow.com/story/taylor>


On Tue, Sep 25, 2018 at 11:00 PM, Matt Davis <jiffyclub@xxxxxxxxx> wrote:

> Good point about mentioning the database specifics, thanks. It's a Postgres
> 9.6.6 DB running in AWS RDS in an db.r3.large instance (2 vCPUs, 15 GB of
> RAM).
>
> Not sure what you mean by online/offline, but we timed the migrations in a
> test run against a database with nothing else going on at the time.
>
> - Matt
>
> On Tue, Sep 25, 2018 at 7:54 PM Ruiqin Yang <yrqls21@xxxxxxxxx> wrote:
>
> > Thank you Taylor, the db-cleanup DAG is very nice! Got a question for
> you,
> > should we expect the DB migration to be backward compatible, i.e. would
> > 1.8.x cluster run fine with upgraded DB?
> >
> > Thank you!
> > Kevin Y
> >
> > On Tue, Sep 25, 2018 at 6:14 PM Taylor Edmiston <tedmiston@xxxxxxxxx>
> > wrote:
> >
> > > I haven't done 1.8.x to 1.10.x in one go, but multiple hours seems long
> > for
> > > running a handful of Alembic migrations on 10M rows.  It might be worth
> > > noting if you're using MySQL or Postgres and how your db is hosted... I
> > > wonder if there's a bottleneck at play here.
> > >
> > > Also, are you running the migrations in online or offline mode?
> > >
> > > You may see a performance improvement if you collapse all migrations
> into
> > > one then apply that (https://stackoverflow.com/a/34492022/149428).
> > >
> > > I prefer to keep all of my metadata in place personally, but the
> > db-cleanup
> > > DAG in https://github.com/teamclairvoyant/airflow-maintenance-dags has
> > > been
> > > brought up before.
> > >
> > > T
> > >
> > > *Taylor Edmiston*
> > > Blog <https://blog.tedmiston.com/> | LinkedIn
> > > <https://www.linkedin.com/in/tedmiston/> | Stack Overflow
> > > <https://stackoverflow.com/users/149428/taylor-edmiston> | Developer
> > Story
> > > <https://stackoverflow.com/story/taylor>
> > >
> > >
> > > On Tue, Sep 25, 2018 at 8:30 PM, Sid Anand <sanand@xxxxxxxxxx> wrote:
> > >
> > > > I checked with our Ops guy and he mentioned that when he upgraded
> from
> > > > 1.8.x to 1.9.x, it took a few seconds. We had 3M rows in the
> > > task_instance
> > > > table and run MySQL 5.7.
> > > >
> > > > -s
> > > >
> > > > On Tue, Sep 25, 2018 at 4:54 PM Matt Davis <jiffyclub@xxxxxxxxx>
> > wrote:
> > > >
> > > > > Hi folks,
> > > > >
> > > > > Here at Clover we're excitedly migrating to Airflow 1.10 (thanks
> for
> > > > > everyone's hard work on that!). We're finding that it's taking
> about
> > 2
> > > > > hours to apply all the migrations to go from Airflow 1.8 to 1.10,
> > > largely
> > > > > driven by the 10 million rows in our task_instance table. That got
> us
> > > > > wondering what kind of maintenance people do on their Airflow
> > metadata
> > > > > databases. Do folks mostly put up with long migrations and
> generally
> > > > longer
> > > > > queries, or are y'all doing periodic cleanups of your metadata DB
> to
> > > keep
> > > > > it fairly light?
> > > > >
> > > > > Thanks,
> > > > > Matt Davis
> > > > >
> > > >
> > >
> >
>