git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Built in trigger: double-write for app migration


Thanks. Well, at a minimum I'll probably start writing something soon for
trigger-based write mirroring, and we will probably support kafka and
another cassandra cluster, so if those seem to work I will contribute
those.

On Thu, Oct 18, 2018 at 11:27 AM Jeff Jirsa <jjirsa@xxxxxxxxx> wrote:

> The write sampling is adding an extra instance with the same schema to
> test things like yaml params or compaction without impacting reads or
> correctness - it’s different than what you describe
>
>
>
> --
> Jeff Jirsa
>
>
> > On Oct 18, 2018, at 5:57 PM, Carl Mueller <carl.mueller@xxxxxxxxxxxxxxx.INVALID>
> wrote:
> >
> > I guess there is also write-survey-mode from cass 1.1:
> >
> > https://issues.apache.org/jira/browse/CASSANDRA-3452
> >
> > Were triggers intended to supersede this capability? I can't find a lot
> of
> > "user level" info on it.
> >
> >
> > On Thu, Oct 18, 2018 at 10:53 AM Carl Mueller <
> carl.mueller@xxxxxxxxxxxxxxx>
> > wrote:
> >
> >> tl;dr: a generic trigger on TABLES that will mirror all writes to
> >> facilitate data migrations between clusters or systems. What is
> necessary
> >> to ensure full write mirroring/coherency?
> >>
> >> When cassandra clusters have several "apps" aka keyspaces serving
> >> applications colocated on them, but the app/keyspace bandwidth and size
> >> demands begin impacting other keyspaces/apps, then one strategy is to
> >> migrate the keyspace to its own dedicated cluster.
> >>
> >> With backups/sstableloading, this will entail a delay and therefore a
> >> "coherency" shortfall between the clusters. So typically one would
> employ a
> >> "double write, read once":
> >>
> >> - all updates are mirrored to both clusters
> >> - writes come from the current most coherent.
> >>
> >> Often two sstable loads are done:
> >>
> >> 1) first load
> >> 2) turn on double writes/write mirroring
> >> 3) a second load is done to finalize coherency
> >> 4) switch the app to point to the new cluster now that it is coherent
> >>
> >> The double writes and read is the sticking point. We could do it at the
> >> app layer, but if the app wasn't written with that, it is a lot of
> testing
> >> and customization specific to the framework.
> >>
> >> We could theoretically do some sort of proxying of the java-driver
> >> somehow, but all the async structures and complex interfaces/apis would
> be
> >> difficult to proxy. Maybe there is a lower level in the java-driver
> that is
> >> possible. This also would only apply to the java-driver, and not
> >> python/go/javascript/other drivers.
> >>
> >> Finally, I suppose we could do a trigger on the tables. It would be
> really
> >> nice if we could add to the cassandra toolbox the basics of a write
> >> mirroring trigger that could be activated "fairly easily"... now I know
> >> there are the complexities of inter-cluster access, and if we are even
> >> using cassandra as the target mirror system (for example there is an
> >> article on triggers write-mirroring to kafka:
> >> https://dzone.com/articles/cassandra-to-kafka-data-pipeline-part-1).
> >>
> >> And this starts to get into the complexities of hinted handoff as well.
> >> But fundamentally this seems something that would be a very nice feature
> >> (especially when you NEED it) to have in the core of cassandra.
> >>
> >> Finally, is the mutation hook in triggers sufficient to track all
> incoming
> >> mutations (outside of "shudder" other triggers generating data)
> >>
> >>
> >>
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
>
>