git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Compression Tuning Tutorial


Thank you Jon, great article as usually!


One topic that was discussed in the article is filesystem cache which is traditionally leveraged for data caching in Cassandra (with row-caching disabled by default).

IIRC mmap() is used.

Some RDBMS and NoSQL DB's as well use direct I/O + async I/O + maintain own, not kernel-managed, DB Cache thus improving overall performance.

As Cassandra is designed to be a DB with low response time, this approach with DIO/AIO/DB Cache seems to be a really useful feature.

Just out of curiosity, are there reasons why this advanced IO stack wasn't implemented, except lack of resources to do this?


Regards,

Kyrill


From: Eric Plowe <eric.plowe@xxxxxxxxx>
Sent: Wednesday, August 8, 2018 9:39:44 PM
To: user@xxxxxxxxxxxxxxxxxxxx
Subject: Re: Compression Tuning Tutorial
 
Great post, Jonathan! Thank you very much. 

~Eric

On Wed, Aug 8, 2018 at 2:34 PM Jonathan Haddad <jon@xxxxxxxxxxxxx> wrote:
Hey folks,

We've noticed a lot over the years that people create tables usually leaving the default compression parameters, and have spent a lot of time helping teams figure out the right settings for their cluster based on their workload.  I finally managed to write some thoughts down along with a high level breakdown of how the internals function that should help people pick better settings for their cluster.  

This post focuses on a mixed 50:50 read:write workload, but the same conclusions are drawn from a read heavy workload.  Hopefully this helps some folks get better performance / save some money on hardware!



--
Jon Haddad
Principal Consultant, The Last Pickle