git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: CASSANDRA-13241 lower default chunk_length_in_kb


Hi,

It's really not appreciably slower compared to the decompression we are going to do which is going to take several microseconds. Decompression is also going to be faster because we are going to do less unnecessary decompression and the decompression itself may be faster since it may fit in a higher level cache better. I ran a microbenchmark comparing them.

https://issues.apache.org/jira/browse/CASSANDRA-13241?focusedCommentId=16653988&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16653988

Fetching a long from memory:       56 nanoseconds
Compact integer sequence   :       80 nanoseconds
Summing integer sequence   :      165 nanoseconds

Currently we have one +1 from Kurt to change the representation and possibly a -0 from Benedict. That's not really enough to make an exception to the code freeze. If you want it to happen (or not) you need to speak up otherwise only the default will change.

Regards,
Ariel

On Wed, Oct 17, 2018, at 6:40 AM, kurt greaves wrote:
> I think if we're going to drop it to 16k, we should invest in the compact
> sequencing as well. Just lowering it to 16k will have potentially a painful
> impact on anyone running low memory nodes, but if we can do it without the
> memory impact I don't think there's any reason to wait another major
> version to implement it.
> 
> Having said that, we should probably benchmark the two representations
> Ariel has come up with.
> 
> On Wed, 17 Oct 2018 at 20:17, Alain RODRIGUEZ <arodrime@xxxxxxxxx> wrote:
> 
> > +1
> >
> > I would guess a lot of C* clusters/tables have this option set to the
> > default value, and not many of them are having the need for reading so big
> > chunks of data.
> > I believe this will greatly limit disk overreads for a fair amount (a big
> > majority?) of new users. It seems fair enough to change this default value,
> > I also think 4.0 is a nice place to do this.
> >
> > Thanks for taking care of this Ariel and for making sure there is a
> > consensus here as well,
> >
> > C*heers,
> > -----------------------
> > Alain Rodriguez - alain@xxxxxxxxxxxxxxxxx
> > France / Spain
> >
> > The Last Pickle - Apache Cassandra Consulting
> > http://www.thelastpickle.com
> >
> > Le sam. 13 oct. 2018 à 08:52, Ariel Weisberg <ariel@xxxxxxxxxxx> a écrit :
> >
> > > Hi,
> > >
> > > This would only impact new tables, existing tables would get their
> > > chunk_length_in_kb from the existing schema. It's something we record in
> > a
> > > system table.
> > >
> > > I have an implementation of a compact integer sequence that only requires
> > > 37% of the memory required today. So we could do this with only slightly
> > > more than doubling the memory used. I'll post that to the JIRA soon.
> > >
> > > Ariel
> > >
> > > On Fri, Oct 12, 2018, at 1:56 AM, Jeff Jirsa wrote:
> > > >
> > > >
> > > > I think 16k is a better default, but it should only affect new tables.
> > > > Whoever changes it, please make sure you think about the upgrade path.
> > > >
> > > >
> > > > > On Oct 12, 2018, at 2:31 AM, Ben Bromhead <ben@xxxxxxxxxxxxxxx>
> > wrote:
> > > > >
> > > > > This is something that's bugged me for ages, tbh the performance gain
> > > for
> > > > > most use cases far outweighs the increase in memory usage and I would
> > > even
> > > > > be in favor of changing the default now, optimizing the storage cost
> > > later
> > > > > (if it's found to be worth it).
> > > > >
> > > > > For some anecdotal evidence:
> > > > > 4kb is usually what we end setting it to, 16kb feels more reasonable
> > > given
> > > > > the memory impact, but what would be the point if practically, most
> > > folks
> > > > > set it to 4kb anyway?
> > > > >
> > > > > Note that chunk_length will largely be dependent on your read sizes,
> > > but 4k
> > > > > is the floor for most physical devices in terms of ones block size.
> > > > >
> > > > > +1 for making this change in 4.0 given the small size and the large
> > > > > improvement to new users experience (as long as we are explicit in
> > the
> > > > > documentation about memory consumption).
> > > > >
> > > > >
> > > > >> On Thu, Oct 11, 2018 at 7:11 PM Ariel Weisberg <ariel@xxxxxxxxxxx>
> > > wrote:
> > > > >>
> > > > >> Hi,
> > > > >>
> > > > >> This is regarding
> > > https://issues.apache.org/jira/browse/CASSANDRA-13241
> > > > >>
> > > > >> This ticket has languished for a while. IMO it's too late in 4.0 to
> > > > >> implement a more memory efficient representation for compressed
> > chunk
> > > > >> offsets. However I don't think we should put out another release
> > with
> > > the
> > > > >> current 64k default as it's pretty unreasonable.
> > > > >>
> > > > >> I propose that we lower the value to 16kb. 4k might never be the
> > > correct
> > > > >> default anyways as there is a cost to compression and 16k will still
> > > be a
> > > > >> large improvement.
> > > > >>
> > > > >> Benedict and Jon Haddad are both +1 on making this change for 4.0.
> > In
> > > the
> > > > >> past there has been some consensus about reducing this value
> > although
> > > maybe
> > > > >> with more memory efficiency.
> > > > >>
> > > > >> The napkin math for what this costs is:
> > > > >> "If you have 1TB of uncompressed data, with 64k chunks that's 16M
> > > chunks
> > > > >> at 8 bytes each (128MB).
> > > > >> With 16k chunks, that's 512MB.
> > > > >> With 4k chunks, it's 2G.
> > > > >> Per terabyte of data (pre-compression)."
> > > > >>
> > > > >>
> > >
> > https://issues.apache.org/jira/browse/CASSANDRA-13241?focusedCommentId=15886621&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15886621
> > > > >>
> > > > >> By way of comparison memory mapping the files has a similar cost per
> > > 4k
> > > > >> page of 8 bytes. Multiple mappings makes this more expensive. With a
> > > > >> default of 16kb this would be 4x less expensive than memory mapping
> > a
> > > file.
> > > > >> I only mention this to give a sense of the costs we are already
> > > paying. I
> > > > >> am not saying they are directly related.
> > > > >>
> > > > >> I'll wait a week for discussion and if there is consensus make the
> > > change.
> > > > >>
> > > > >> Regards,
> > > > >> Ariel
> > > > >>
> > > > >>
> > ---------------------------------------------------------------------
> > > > >> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> > > > >> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
> > > > >>
> > > > >> --
> > > > > Ben Bromhead
> > > > > CTO | Instaclustr <https://www.instaclustr.com/>
> > > > > +1 650 284 9692
> > > > > Reliability at Scale
> > > > > Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> > > > For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
> > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> > > For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
> > >
> > >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx