git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: CommitLog Recovery replay stop on first timestamp after restore-point-in-time


Thanks for the response Ben this was exactly what I was looking for!
I think will write a test to ensure this condition holds when we do
Cassandra upgrades, and the linked test in CASSANDRA-6905
<https://issues.apache.org/jira/browse/CASSANDRA-6905> seems to be a nice
starting point.

Best regards
Morten V. Nielsen

On Thu, Dec 20, 2018 at 12:36 PM Ben Slater <ben.slater@xxxxxxxxxxxxxxx>
wrote:

> I don’t have any personal knowledge of the fix but out of interest I took a
> look in Jira and it looks sounds to me like the behaviour was fixed here
> (in 2.0.10): https://issues.apache.org/jira/browse/CASSANDRA-6905
>
> ---
>
>
> *Ben Slater*
> *Chief Product Officer*
>
>
> <https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
> <https://www.linkedin.com/company/instaclustr>
>
> Read our latest technical blog posts here
> <https://www.instaclustr.com/blog/>.
>
> This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
> and Instaclustr Inc (USA).
>
> This email and any attachments may contain confidential and legally
> privileged information.  If you are not the intended recipient, do not copy
> or disclose its content, but please reply to this email immediately and
> highlight the error to the sender and then immediately delete the message.
>
>
> On Thu, 20 Dec 2018 at 21:07, Morten Vejen Nielsen <mvejen@xxxxxxxxx>
> wrote:
>
> > Hi,
> >
> > (Moved from user mailing list to here)
> >
> > I have found a statement in the Datastax documentation regarding
> CommitLog
> > recovery that concerns me, namely:
> >
> > "*Restore stops when the first client-supplied timestamp is greater than
> > the restore point timestamp. Because the order in which the database
> > receives mutations does not strictly follow the timestamp order, this can
> > leave some mutations unrecovered.*"
> >
> > From:
> >
> >
> https://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configLogArchive.html
> > Which to me means that point in time restore really doesn't guarantee
> point
> > in time replay for the configured time. Since we expect to have mutations
> > out of order in our setup.
> >
> > I conducted a few experiments on this myself by forcing my Cassandra
> > instance to do CommitLog replay with changes ahead in time. But I was not
> > able to reproduce this behavior.
> > I used a fresh instance taken from the official Cassandra docker image to
> > run the tests, so no changes to any configs was done other than setting
> the
> > restore_point_in_time as specified below.
> > I did the experiment as follows:
> >
> > --edit /etc/cassandra/commitlog_archiving.properties, set
> > *restore_point_in_time* to something in the near future (lets say 2
> > hours ahead of server-time)
> >
> > ssh into instance
> >
> > cqlsh
> > create keyspace thezoo with replication =
> > {'class':'SimpleStrategy','replication_factor':1};
> > use thezoo;
> > create table animal (id int primary key, name varchar);
> > insert into animal (id, name) values (1, 'Bear1');insert into animal
> > (id, name) values (2, 'Bear2');insert into animal (id, name) values
> > (3, 'Bear3');insert into animal (id, name) values (4, 'Bear4');insert
> > into animal (id, name) values (5, 'Bear5');insert into animal (id,
> > name) values (6, 'Bear6');insert into animal (id, name) values (7,
> > 'Bear7');insert into animal (id, name) values (8, 'Bear8');insert into
> > animal (id, name) values (9, 'Bear9');insert into animal (id, name)
> > values (10, 'Bear10');
> > select id,name,writetime(name) from animal;
> > --Add some to timestamp, and use this as future_timestamp, must be
> > ahead of what was defined in commitlog config file
> > insert into animal (id, name) values (11, 'DuckFromFuture') using
> > timestamp <future_timestamp>
> > insert into animal (id, name) values (12, 'Bird1');insert into animal
> > (id, name) values (13, 'Bird2');insert into animal (id, name) values
> > (14, 'Bird3');insert into animal (id, name) values (15,
> > 'Bird4');insert into animal (id, name) values (16, 'Bird5');insert
> > into animal (id, name) values (17, 'Bird6');insert into animal (id,
> > name) values (18, 'Bird7');insert into animal (id, name) values (19,
> > 'Bird8');insert into animal (id, name) values (20, 'Bird9');insert
> > into animal (id, name) values (21, 'Bird10');
> >
> > --Now I simply forced the power off the machine held the power button
> > down. And restarted
> >
> > --During startup verify that commitlog replay has been done in log
> >
> > ssh into instance and enter cqlsh
> >
> > cqlsh:thezoo> select * from animal;
> >
> > --Which shows all the bears and birds have been replayed but not the
> duck!
> >
> > I also did some digging in the Cassandra source code, and made the
> > following findings:
> >
> > I think the code that skips mutations ahead of time is in
> CommitLogReplayer
> > class:
> > See lines: 194-195 (at the time of writing)
> > if (commitLogReplayer.pointInTimeExceeded(mutation))
> >        return;
> > This code is triggerred from CommitLogReader, where readSection seems to
> be
> > responsible for reading the commit logs, this is wrapped in a while loop,
> > that just reads the file until EOF.
> > See:
> >  while (statusTracker.shouldContinue() && reader.getFilePointer() < end
> &&
> > !reader.isEOF())
> > This method is called file by file from CommitLog.recover to recover all
> > commitlog segment files.
> > And just a note statusTracker.shouldContinue will fail if
> > statusTracker.requestTermination(); is called but I dont see this being
> the
> > case for the pointInTimeExceeded case.
> >
> > I am a bit concerned if this is some hidden feature in Cassandra, and as
> > such we might have to revise our backup strategies, if this is the case.
> > However as far as I can see the Datastax documentation on this is simply
> > wrong unfortunately the official documentation on this just seems to be
> > work in progress.
> > The fact that it doesn't do this is in fact a positive result for me as I
> > would also expect the point-in-time to guarantee that all mutatations up
> > until this point-in-time is in fact recovered.
> >
> > Can anyone confirm if this is just the documentation that is wrong or
> maybe
> > if I did something wrong in my experiments.
> >
> > (For reference I also conducted some experiments with larger amount of
> data
> > where the recovery went through multiple commitlog files, but I got the
> > same results, namely that it recovered ALL records before
> > restore_point_in_time.)
> >
> > Best regards
> > Morten V. Nielsen
> >
>


-- 
//Morten Vejen Nielsen - mvejen