git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Built in trigger: double-write for app migration


Also we have 2.1.x and 2.2 clusters, so we can't use CDC since apparently
that is a 3.8 feature.

Virtual tables are very exciting so we could do some collating stuff (which
I'd LOVE to do with our scheduling application where we can split tasks
into near term/most frequent(hours to days), medium-term/less common(days
to weeks), long/years ), with the aim of avoiding having to do compaction
at all and just truncating buckets as they "expire" for a nice O(1)
compaction process.

On Fri, Oct 19, 2018 at 9:57 AM Carl Mueller <carl.mueller@xxxxxxxxxxxxxxx>
wrote:

> new DC and then split is one way, but you have to wait for it to stream,
> and then how do you know the dc coherence is good enough to switch the
> targetted DC for local_quorum? And then once we split it we'd have downtime
> to "change the name" and other work that would distinguish it from the
> original cluster, from what I'm told from the peoples that do the DC /
> cluster setup and aws provisioning. It is a tool in the toolchest...
>
> We might be able to get stats of the queries and updates impacting the
> cluster in a centralized manner with a trigger too.
>
> We will probably do stream-to-kafka trigger based on what is on the
> intarweb and since we have kafka here already.
>
> I will look at CDC.
>
> Thank you everybody!
>
>
> On Fri, Oct 19, 2018 at 3:29 AM Antonis Papaioannou <papaioan@xxxxxxxxxxxx>
> wrote:
>
>> It reminds me of “shadow writes” described in [1].
>> During data migration the coordinator forwards  a copy of any write
>> request regarding tokens that are being transferred to the new node.
>>
>> [1] Incremental Elasticity for NoSQL Data Stores, SRDS’17,
>> https://ieeexplore.ieee.org/document/8069080
>>
>>
>> > On 18 Oct 2018, at 18:53, Carl Mueller <carl.mueller@xxxxxxxxxxxxxxx.INVALID>
>> wrote:
>> >
>> > tl;dr: a generic trigger on TABLES that will mirror all writes to
>> > facilitate data migrations between clusters or systems. What is
>> necessary
>> > to ensure full write mirroring/coherency?
>> >
>> > When cassandra clusters have several "apps" aka keyspaces serving
>> > applications colocated on them, but the app/keyspace bandwidth and size
>> > demands begin impacting other keyspaces/apps, then one strategy is to
>> > migrate the keyspace to its own dedicated cluster.
>> >
>> > With backups/sstableloading, this will entail a delay and therefore a
>> > "coherency" shortfall between the clusters. So typically one would
>> employ a
>> > "double write, read once":
>> >
>> > - all updates are mirrored to both clusters
>> > - writes come from the current most coherent.
>> >
>> > Often two sstable loads are done:
>> >
>> > 1) first load
>> > 2) turn on double writes/write mirroring
>> > 3) a second load is done to finalize coherency
>> > 4) switch the app to point to the new cluster now that it is coherent
>> >
>> > The double writes and read is the sticking point. We could do it at the
>> app
>> > layer, but if the app wasn't written with that, it is a lot of testing
>> and
>> > customization specific to the framework.
>> >
>> > We could theoretically do some sort of proxying of the java-driver
>> somehow,
>> > but all the async structures and complex interfaces/apis would be
>> difficult
>> > to proxy. Maybe there is a lower level in the java-driver that is
>> possible.
>> > This also would only apply to the java-driver, and not
>> > python/go/javascript/other drivers.
>> >
>> > Finally, I suppose we could do a trigger on the tables. It would be
>> really
>> > nice if we could add to the cassandra toolbox the basics of a write
>> > mirroring trigger that could be activated "fairly easily"... now I know
>> > there are the complexities of inter-cluster access, and if we are even
>> > using cassandra as the target mirror system (for example there is an
>> > article on triggers write-mirroring to kafka:
>> > https://dzone.com/articles/cassandra-to-kafka-data-pipeline-part-1).
>> >
>> > And this starts to get into the complexities of hinted handoff as well.
>> But
>> > fundamentally this seems something that would be a very nice feature
>> > (especially when you NEED it) to have in the core of cassandra.
>> >
>> > Finally, is the mutation hook in triggers sufficient to track all
>> incoming
>> > mutations (outside of "shudder" other triggers generating data)
>>
>>