git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: CASSANDRA-13241 lower default chunk_length_in_kb


Hi,

This would only impact new tables, existing tables would get their chunk_length_in_kb from the existing schema. It's something we record in a system table.

I have an implementation of a compact integer sequence that only requires 37% of the memory required today. So we could do this with only slightly more than doubling the memory used. I'll post that to the JIRA soon.

Ariel

On Fri, Oct 12, 2018, at 1:56 AM, Jeff Jirsa wrote:
> 
> 
> I think 16k is a better default, but it should only affect new tables. 
> Whoever changes it, please make sure you think about the upgrade path. 
> 
> 
> > On Oct 12, 2018, at 2:31 AM, Ben Bromhead <ben@xxxxxxxxxxxxxxx> wrote:
> > 
> > This is something that's bugged me for ages, tbh the performance gain for
> > most use cases far outweighs the increase in memory usage and I would even
> > be in favor of changing the default now, optimizing the storage cost later
> > (if it's found to be worth it).
> > 
> > For some anecdotal evidence:
> > 4kb is usually what we end setting it to, 16kb feels more reasonable given
> > the memory impact, but what would be the point if practically, most folks
> > set it to 4kb anyway?
> > 
> > Note that chunk_length will largely be dependent on your read sizes, but 4k
> > is the floor for most physical devices in terms of ones block size.
> > 
> > +1 for making this change in 4.0 given the small size and the large
> > improvement to new users experience (as long as we are explicit in the
> > documentation about memory consumption).
> > 
> > 
> >> On Thu, Oct 11, 2018 at 7:11 PM Ariel Weisberg <ariel@xxxxxxxxxxx> wrote:
> >> 
> >> Hi,
> >> 
> >> This is regarding https://issues.apache.org/jira/browse/CASSANDRA-13241
> >> 
> >> This ticket has languished for a while. IMO it's too late in 4.0 to
> >> implement a more memory efficient representation for compressed chunk
> >> offsets. However I don't think we should put out another release with the
> >> current 64k default as it's pretty unreasonable.
> >> 
> >> I propose that we lower the value to 16kb. 4k might never be the correct
> >> default anyways as there is a cost to compression and 16k will still be a
> >> large improvement.
> >> 
> >> Benedict and Jon Haddad are both +1 on making this change for 4.0. In the
> >> past there has been some consensus about reducing this value although maybe
> >> with more memory efficiency.
> >> 
> >> The napkin math for what this costs is:
> >> "If you have 1TB of uncompressed data, with 64k chunks that's 16M chunks
> >> at 8 bytes each (128MB).
> >> With 16k chunks, that's 512MB.
> >> With 4k chunks, it's 2G.
> >> Per terabyte of data (pre-compression)."
> >> 
> >> https://issues.apache.org/jira/browse/CASSANDRA-13241?focusedCommentId=15886621&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15886621
> >> 
> >> By way of comparison memory mapping the files has a similar cost per 4k
> >> page of 8 bytes. Multiple mappings makes this more expensive. With a
> >> default of 16kb this would be 4x less expensive than memory mapping a file.
> >> I only mention this to give a sense of the costs we are already paying. I
> >> am not saying they are directly related.
> >> 
> >> I'll wait a week for discussion and if there is consensus make the change.
> >> 
> >> Regards,
> >> Ariel
> >> 
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> >> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
> >> 
> >> --
> > Ben Bromhead
> > CTO | Instaclustr <https://www.instaclustr.com/>
> > +1 650 284 9692
> > Reliability at Scale
> > Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx