git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: CASSANDRA-13241 lower default chunk_length_in_kb


IMO slightly bigger memory requirements for substantial improvements is a
good exchange, especially for a 4.0 release of the database. Optane and
lots of other memory are coming down the hardware pipeline, and risk-wise
almost all cassandra people know to testbed the major versions, so major
versions are a good time for significant default changes (vnode count,
this). I've read TLP blogs on this before, and the memory impact seems to
only get huge for node sizes that start to get out of ideal size, and if
they want to run nodes that big then fine, run big memory too.

But I don't actually write code for the project so I don't count :-)

On Mon, Oct 29, 2018 at 2:42 PM Jonathan Haddad <jon@xxxxxxxxxxxxx> wrote:

> Looks straightforward, I can review today.
>
> On Mon, Oct 29, 2018 at 12:25 PM Ariel Weisberg <ariel@xxxxxxxxxxx> wrote:
>
> > Hi,
> >
> > Seeing too many -'s for changing the representation and essentially no
> +1s
> > so I submitted a patch for just changing the default. I could use a
> > reviewer for https://issues.apache.org/jira/browse/CASSANDRA-13241
> >
> > I created https://issues.apache.org/jira/browse/CASSANDRA-14857  "Use a
> > more space efficient representation for compressed chunk offsets" for
> post
> > 4.0.
> >
> > Regards,
> > Ariel
> >
> > On Tue, Oct 23, 2018, at 11:46 AM, Ariel Weisberg wrote:
> > > Hi,
> > >
> > > To summarize who we have heard from so far
> > >
> > > WRT to changing just the default:
> > >
> > > +1:
> > > Jon Haddadd
> > > Ben Bromhead
> > > Alain Rodriguez
> > > Sankalp Kohli (not explicit)
> > >
> > > -0:
> > > Sylvaine Lebresne
> > > Jeff Jirsa
> > >
> > > Not sure:
> > > Kurt Greaves
> > > Joshua Mckenzie
> > > Benedict Elliot Smith
> > >
> > > WRT to change the representation:
> > >
> > > +1:
> > > There are only conditional +1s at this point
> > >
> > > -0:
> > > Sylvaine Lebresne
> > >
> > > -.5:
> > > Jeff Jirsa
> > >
> > > This
> > > (
> >
> https://github.com/aweisberg/cassandra/commit/a9ae85daa3ede092b9a1cf84879fb1a9f25b9dce
> )
> >
> > > is a rough cut of the change for the representation. It needs better
> > > naming, unit tests, javadoc etc. but it does implement the change.
> > >
> > > Ariel
> > > On Fri, Oct 19, 2018, at 3:42 PM, Jonathan Haddad wrote:
> > > > Sorry, to be clear - I'm +1 on changing the configuration default,
> but
> > I
> > > > think changing the compression in memory representations warrants
> > further
> > > > discussion and investigation before making a case for or against it
> > yet.
> > > > An optimization that reduces in memory cost by over 50% sounds pretty
> > good
> > > > and we never were really explicit that those sort of optimizations
> > would be
> > > > excluded after our feature freeze.  I don't think they should
> > necessarily
> > > > be excluded at this time, but it depends on the size and risk of the
> > patch.
> > > >
> > > > On Sat, Oct 20, 2018 at 8:38 AM Jonathan Haddad <jon@xxxxxxxxxxxxx>
> > wrote:
> > > >
> > > > > I think we should try to do the right thing for the most people
> that
> > we
> > > > > can.  The number of folks impacted by 64KB is huge.  I've worked on
> > a lot
> > > > > of clusters created by a lot of different teams, going from brand
> > new to
> > > > > pretty damn knowledgeable.  I can't think of a single time over the
> > last 2
> > > > > years that I've seen a cluster use non-default settings for
> > compression.
> > > > > With only a handful of exceptions, I've lowered the chunk size
> > considerably
> > > > > (usually to 4 or 8K) and the impact has always been very
> noticeable,
> > > > > frequently resulting in hardware reduction and cost savings.  Of
> all
> > the
> > > > > poorly chosen defaults we have, this is one of the biggest
> offenders
> > that I
> > > > > see.  There's a good reason ScyllaDB  claims they're so much faster
> > than
> > > > > Cassandra - we ship a DB that performs poorly for 90+% of teams
> > because we
> > > > > ship for a specific use case, not a general one (time series on
> > memory
> > > > > constrained boxes being the specific use case)
> > > > >
> > > > > This doesn't impact existing tables, just new ones.  More and more
> > teams
> > > > > are using Cassandra as a general purpose database, we should
> > acknowledge
> > > > > that adjusting our defaults accordingly.  Yes, we use a little bit
> > more
> > > > > memory on new tables if we just change this setting, and what we
> get
> > out of
> > > > > it is a massive performance win.
> > > > >
> > > > > I'm +1 on the change as well.
> > > > >
> > > > >
> > > > >
> > > > > On Sat, Oct 20, 2018 at 4:21 AM Sankalp Kohli <
> > kohlisankalp@xxxxxxxxx>
> > > > > wrote:
> > > > >
> > > > >> (We should definitely harden the definition for freeze in a
> separate
> > > > >> thread)
> > > > >>
> > > > >> My thinking is that this is the best time to do this change as we
> > have
> > > > >> not even cut alpha or beta. All the people involved in the test
> will
> > > > >> definitely be testing it again when we have these releases.
> > > > >>
> > > > >> > On Oct 19, 2018, at 8:00 AM, Michael Shuler <
> > michael@xxxxxxxxxxxxxx>
> > > > >> wrote:
> > > > >> >
> > > > >> >> On 10/19/18 9:16 AM, Joshua McKenzie wrote:
> > > > >> >>
> > > > >> >> At the risk of hijacking this thread, when are we going to
> > transition
> > > > >> from
> > > > >> >> "no new features, change whatever else you want including
> > refactoring
> > > > >> and
> > > > >> >> changing years-old defaults" to "ok, we think we have something
> > that's
> > > > >> >> stable, time to start testing"?
> > > > >> >
> > > > >> > Creating a cassandra-4.0 branch would allow trunk to, for
> > instance, get
> > > > >> > a default config value change commit and get more testing. We
> > might
> > > > >> > forget again, from what I understand of Benedict's last comment
> :)
> > > > >> >
> > > > >> > --
> > > > >> > Michael
> > > > >> >
> > > > >> >
> > ---------------------------------------------------------------------
> > > > >> > To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> > > > >> > For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
> > > > >> >
> > > > >>
> > > > >>
> > ---------------------------------------------------------------------
> > > > >> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> > > > >> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
> > > > >>
> > > > >>
> > > > >
> > > > > --
> > > > > Jon Haddad
> > > > > http://www.rustyrazorblade.com
> > > > > twitter: rustyrazorblade
> > > > >
> > > >
> > > >
> > > > --
> > > > Jon Haddad
> > > > http://www.rustyrazorblade.com
> > > > twitter: rustyrazorblade
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> > > For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> > For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
> >
> >
>
> --
> Jon Haddad
> http://www.rustyrazorblade.com
> twitter: rustyrazorblade
>