Subject: [jira] [Commented] (SOLR-11069)
LASTPROCESSEDVERSION for CDCR is flawed when
buffering is enabled




[
https://issues.apache.org/jira/browse/SOLR-11069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124804#comment-16124804
]

Erick Erickson commented on SOLR-11069:
---------------------------------------

I'm dithering back and forth about this. I suspect that we're conflating a
couple of issues. There's definitely a problem with bootstrapping (I'll attach
a patch in a minute). It may well be that the LASTPROCESSEDVERSION is not
actually a problem, at least in some testing (with the attached patch) the fact
that it is -1 when buffering is enabled seems to be OK.

I propose we use the patch as a starting point to see if this
LASTPROCESSEDVERSION is a problem or not.

1> when buffering is enabled, tlogs will accrue forever according to the
original intent. From Renaud:

The original goal of the buffer on cdcr is to indeed keep indefinitely the
tlogs until the buffer is deactivated
(https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=62687462#CrossDataCenterReplication(CDCR)-TheBufferElement).
This was useful for example during maintenance operations, to ensure that the
source cluster will keep all the tlogs until the target clsuter is properly
initialised. In this scenario, one will activate the buffer on the source. The
source will start to store all the tlogs (and does not purge them). Once the
target cluster is initialised, and has register a tlog pointer on the source,
one can deactivate the buffer on the source and the tlog will start to be
purged once they are read by the target cluster.

But additionally he had this to say:
Regarding the issue about LPV = -1, I am a bit surprised as this sentinel value
should be used only when the source cluster does not have any log pointers,
i.e., no target cluster were configured and initialised with this source
cluster. In this case it indicates that there is no registered log reader, and
that we should not remove any tlogs if buffer is enabled (as we have to wait
for the target to register a log reader and log pointer).

And enabling buffering definitely causes LASTPROCESSEDVERSION to return -1.
However, with the patch LPV immediately goes back to a reasonable value as soon
as buffering is disabled, the tlogs get cleaned up etc. without bootstrapping.
So I do wonder if the -1 value is just overloaded in this case to also mean
"don't purge tlogs".

We need to unentangle a couple of things. I'll attach a patch in a few minutes
that might help.

> LASTPROCESSEDVERSION for CDCR is flawed when buffering is enabled
> -----------------------------------------------------------------
>
> Key: SOLR-11069
> URL: https://issues.apache.org/jira/browse/SOLR-11069
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: CDCR
> Affects Versions: 7.0
> Reporter: Amrit Sarkar
> Assignee: Erick Erickson
>
> {{LASTPROCESSEDVERSION}} (a.b.v. LPV) action for CDCR breaks down due to
> poorly initialised and maintained buffer log for either source or target
> cluster core nodes.
> If buffer is enabled for cores of either source or target cluster, it return
> {{-1}}, *irrespective of number of entries in tlog read by the {{leader}}*
> node of each shard of respective collection of respective cluster. Once
> disabled, it starts telling us the correct LPV for each core.
> Due to the same flawed behavior, Update Log Synchroniser may doesn't work
> properly as expected, i.e. provides incorrect seek to the {{non-leader}}
> nodes to advance at. I am not sure whether this is an intended behavior for
> sync but it surely doesn't feel right.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Programming list archiving by: Enterprise Git Hosting