git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Built in trigger: double-write for app migration


new DC and then split is one way, but you have to wait for it to stream,
and then how do you know the dc coherence is good enough to switch the
targetted DC for local_quorum? And then once we split it we'd have downtime
to "change the name" and other work that would distinguish it from the
original cluster, from what I'm told from the peoples that do the DC /
cluster setup and aws provisioning. It is a tool in the toolchest...

We might be able to get stats of the queries and updates impacting the
cluster in a centralized manner with a trigger too.

We will probably do stream-to-kafka trigger based on what is on the
intarweb and since we have kafka here already.

I will look at CDC.

Thank you everybody!


On Fri, Oct 19, 2018 at 3:29 AM Antonis Papaioannou <papaioan@xxxxxxxxxxxx>
wrote:

> It reminds me of “shadow writes” described in [1].
> During data migration the coordinator forwards  a copy of any write
> request regarding tokens that are being transferred to the new node.
>
> [1] Incremental Elasticity for NoSQL Data Stores, SRDS’17,
> https://ieeexplore.ieee.org/document/8069080
>
>
> > On 18 Oct 2018, at 18:53, Carl Mueller <carl.mueller@xxxxxxxxxxxxxxx.INVALID>
> wrote:
> >
> > tl;dr: a generic trigger on TABLES that will mirror all writes to
> > facilitate data migrations between clusters or systems. What is necessary
> > to ensure full write mirroring/coherency?
> >
> > When cassandra clusters have several "apps" aka keyspaces serving
> > applications colocated on them, but the app/keyspace bandwidth and size
> > demands begin impacting other keyspaces/apps, then one strategy is to
> > migrate the keyspace to its own dedicated cluster.
> >
> > With backups/sstableloading, this will entail a delay and therefore a
> > "coherency" shortfall between the clusters. So typically one would
> employ a
> > "double write, read once":
> >
> > - all updates are mirrored to both clusters
> > - writes come from the current most coherent.
> >
> > Often two sstable loads are done:
> >
> > 1) first load
> > 2) turn on double writes/write mirroring
> > 3) a second load is done to finalize coherency
> > 4) switch the app to point to the new cluster now that it is coherent
> >
> > The double writes and read is the sticking point. We could do it at the
> app
> > layer, but if the app wasn't written with that, it is a lot of testing
> and
> > customization specific to the framework.
> >
> > We could theoretically do some sort of proxying of the java-driver
> somehow,
> > but all the async structures and complex interfaces/apis would be
> difficult
> > to proxy. Maybe there is a lower level in the java-driver that is
> possible.
> > This also would only apply to the java-driver, and not
> > python/go/javascript/other drivers.
> >
> > Finally, I suppose we could do a trigger on the tables. It would be
> really
> > nice if we could add to the cassandra toolbox the basics of a write
> > mirroring trigger that could be activated "fairly easily"... now I know
> > there are the complexities of inter-cluster access, and if we are even
> > using cassandra as the target mirror system (for example there is an
> > article on triggers write-mirroring to kafka:
> > https://dzone.com/articles/cassandra-to-kafka-data-pipeline-part-1).
> >
> > And this starts to get into the complexities of hinted handoff as well.
> But
> > fundamentally this seems something that would be a very nice feature
> > (especially when you NEED it) to have in the core of cassandra.
> >
> > Finally, is the mutation hook in triggers sufficient to track all
> incoming
> > mutations (outside of "shudder" other triggers generating data)
>
>