git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Improve the performance of CAS


@Jason, pinged Sylvain on the jira.

@Jeremiah,
In the contention case, if we combine the prepare and quorum read together, we
will retry the Prepare phase, which may trigger the read on different
replicas again, it's a overhead. We can improve it by avoid executing the
read, if the replica already promised a ballot great than the prepared one.
In commit failure case, each replica should already have the
PartitionUpdate stored in system table, after the Propose phase. Then a
following readWithPaxos or cas operation, can repair the in progress paxos
state, and commit the data.

Thanks
Dikang.

On Wed, May 16, 2018 at 3:17 PM, J. D. Jordan <jeremiah.jordan@xxxxxxxxx>
wrote:

> I have not reasoned through this completely, but something I would want to
> see before messing with this is how changing the number of rounds behaves
> under contention and failure scenarios. Also how ignoring commit success
> behaves in those scenarios especially under contention and with respect to
> obeying CL semantics.
>
> -Jeremiah
>
> > On May 16, 2018, at 6:05 PM, Jason Brown <jasedbrown@xxxxxxxxx> wrote:
> >
> > Hey all,
> >
> > Before we go bananas, let's see if Sylvain, the primary author of the
> > original patch, has the opportunity to chime with some explanatory notes
> or
> > other guidance. There may be some subtle points or considerations that
> are
> > not obvious, and I'd hate to lose that context.
> >
> > Thanks,
> >
> > -Jason
> >
> >> On Wed, May 16, 2018 at 2:57 PM, Ariel Weisberg <ariel@xxxxxxxxxxx>
> wrote:
> >>
> >> Hi,
> >>
> >> I think you are looking at the right low hanging fruit.  Cassandra
> >> deserves a better consensus protocol, but it's a very big project.
> >>
> >> Regards,
> >> Ariel
> >>> On Wed, May 16, 2018, at 5:51 PM, Dikang Gu wrote:
> >>> Cool, create a jira for it,
> >>> https://issues.apache.org/jira/browse/CASSANDRA-14448. I have a draft
> >> patch
> >>> working internally, will clean it up.
> >>>
> >>> The EPaxos is more complicated, could be a long term effort.
> >>>
> >>> Thanks
> >>> Dikang.
> >>>
> >>> On Wed, May 16, 2018 at 2:20 PM, sankalp kohli <kohlisankalp@xxxxxxxxx
> >
> >>> wrote:
> >>>
> >>>> Hi,
> >>>>    The idea of combining read with prepare sounds good. Regarding
> >> reducing
> >>>> the commit round trip, it is possible today by giving a lower
> >> consistency
> >>>> level for commit I think.
> >>>>
> >>>> Regarding EPaxos, it is a large change and will take longer to land. I
> >>>> think we should do this as it will help lower the latencies a lot.
> >>>>
> >>>> Thanks,
> >>>> Sankalp
> >>>>
> >>>> On Wed, May 16, 2018 at 2:15 PM, Jeremy Hanna <
> >> jeremy.hanna1234@xxxxxxxxx>
> >>>> wrote:
> >>>>
> >>>>> Hi Dikang,
> >>>>>
> >>>>> Have you seen Blake’s work on implementing egalitarian paxos or
> >> epaxos*?
> >>>>> That might be helpful for the discussion.
> >>>>>
> >>>>> Jeremy
> >>>>>
> >>>>> * https://issues.apache.org/jira/browse/CASSANDRA-6246
> >>>>>
> >>>>>> On May 16, 2018, at 3:37 PM, Dikang Gu <dikang85@xxxxxxxxx> wrote:
> >>>>>>
> >>>>>> Hello C* developers,
> >>>>>>
> >>>>>> I'm working on some performance improvements of the lightweight
> >>>>> transitions
> >>>>>> (compare and set), I'd like to hear your thoughts about it.
> >>>>>>
> >>>>>> As you know, current CAS requires 4 round trips to finish, which
> >> is not
> >>>>>> efficient, especially in cross DC case.
> >>>>>> 1) Prepare
> >>>>>> 2) Quorum read current value
> >>>>>> 3) Propose new value
> >>>>>> 4) Commit
> >>>>>>
> >>>>>> I'm proposing the following improvements to reduce it to 2 round
> >> trips,
> >>>>>> which is:
> >>>>>> 1) Combine prepare and quorum read together, use only one round
> >> trip to
> >>>>>> decide the ballot and also piggyback the current value in response.
> >>>>>> 2) Propose new value, and then send out the commit request
> >>>>> asynchronously,
> >>>>>> so client will not wait for the ack of the commit. In case of
> >> commit
> >>>>>> failures, we should still have chance to retry/repair it through
> >> hints
> >>>> or
> >>>>>> following read/cas events.
> >>>>>>
> >>>>>> After the improvement, we should be able to finish the CAS
> >> operation
> >>>>> using
> >>>>>> 2 rounds trips. There can be following improvements as well, and
> >> this
> >>>> can
> >>>>>> be a start point.
> >>>>>>
> >>>>>> What do you think? Did I miss anything?
> >>>>>>
> >>>>>> Thanks
> >>>>>> Dikang
> >>>>>
> >>>>>
> >>>>> ------------------------------------------------------------
> >> ---------
> >>>>> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> >>>>> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
> >>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Dikang
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> >> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
> >>
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
>
>


-- 
Dikang