Subject: [jira] [Updated] (SOLR-11069) LASTPROCESSEDVERSION
for CDCR is flawed when buffering is enabled


Erick Erickson updated SOLR-11069:
Attachment: SOLR-11069.patch

figuring out the LPV issue is hard because bootstrapping had a problem. At the
end of the process, the core is reloaded. However, that means that the code
that checks on the state of the replication returns a "notfound", which causes
another bootstrap command to be sent.

So this patch moves the relevant objects to (Default)SolrCoreState where
they're preserved around core reloads. With this patch (PoC) I can get
bootstrapping to occur, enable/disable buffering, bring the target up and down
etc. The fact that LPV is -1 when buffering is enabled doesn't seem to be a

So if others can give this a whirl and see if their testing is OK with it then
maybe the LPV issue is not an issue.

Mostly I'm throwing this out for others to consider. What do people think about
putting the additional objects in SolrCoreState? Putting the objects there was
quick, I'm interested in seeing if my results work for others. If so we can
decide whether this is the right way to go.

Haven't run precommit, haven't run the full test suite. Did run
CdcrBootstrapTest. Also, the CDCR docs need to be updated.

> LASTPROCESSEDVERSION for CDCR is flawed when buffering is enabled
> -----------------------------------------------------------------
> Key: SOLR-11069
> URL:
> Project: So...


> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: CDCR
> Affects Versions: 7.0
> Reporter: Amrit Sarkar
> Assignee: Erick Erickson
> Attachments: SOLR-11069.patch
> {{LASTPROCESSEDVERSION}} (a.b.v. LPV) action for CDCR breaks down due to
> poorly initialised and maintained buffer log for either source or target
> cluster core nodes.
> If buffer is enabled for cores of either source or target cluster, it return
> {{-1}}, *irrespective of number of entries in tlog read by the {{leader}}*
> node of each shard of respective collection of respective cluster. Once
> disabled, it starts telling us the correct LPV for each core.
> Due to the same flawed behavior, Update Log Synchroniser may doesn't work
> properly as expected, i.e. provides incorrect seek to the {{non-leader}}
> nodes to advance at. I am not sure whether this is an intended behavior for
> sync but it surely doesn't feel right.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Programming list archiving by: Enterprise Git Hosting