git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Compaction strategy for update heavy workload


I wouldn't use TWCS if there's updates, you're going to risk having
data that's never deleted and really small sstables sticking around
forever.  If you use really large buckets, what's the point of TWCS?

Honestly this is such a small workload you could easily use STCS or
LCS and you'd likely never, ever see a problem.
On Wed, Jun 13, 2018 at 3:34 PM kurt greaves <kurt@xxxxxxxxxxxxxxx> wrote:
>
> TWCS is probably still worth trying. If you mean updating old rows in TWCS "out of order updates" will only really mean you'll hit more SSTables on read. This might add a bit of complexity in your client if your bucketing partitions (not strictly necessary), but that's about it. As long as you're not specifying "USING TIMESTAMP" you still get the main benefit of efficient dropping of SSTables - C* only cares about the write timestamp of the data in regards to TTL's, not timestamps stored in your partition/clustering key.
> Also keep in mind that you can specify the window size in TWCS, so if you can increase it enough to cover the "out of order" updates then that will also solve the problem w.r.t old buckets.
>
> In regards to LCS, the only way to really know if it'll be too much compaction overhead is to test it, but for the most part you should consider your read/write ratio, rather than the total number of reads/writes (unless it's so small that it's irrelevant, which it may well be).
>
> On 13 June 2018 at 19:25, manuj singh <s.manuj545@xxxxxxxxx> wrote:
>>
>> Hi all,
>> I am trying to determine compaction strategy for our use case.
>> In our use case we will have updates on a row a few times. And we have a ttl also defined on the table level.
>> Our typical workload is less then 1000 writes + reads per second. At the max it could go up to 2500 per second.
>> We use SSD and have around 64 gb of ram on each node. Our cluster size is around 70 nodes.
>>
>> I looked at time series but we cant guarantee that the updates will happen within a give time window. And if we have out of order updates it might impact on when we remove that data from the disk.
>>
>> So i was looking at level tiered, which supposedly is good when you have updates. However its io bound and will affect the writes. everywhere i read it says its not good for write heavy workload.
>> But Looking at our write velocity, is it really write heavy ?
>>
>> I guess what i am trying to find out is will level tiered compaction will impact the writes in our use case or it will be fine given our write rate is not that much.
>> Also is there anything else i should keep in mind while deciding on the compaction strategy.
>>
>> Thanks!!
>
>


-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@xxxxxxxxxxxxxxxxxxxx
For additional commands, e-mail: user-help@xxxxxxxxxxxxxxxxxxxx




( ! ) Warning: include(msgfooter.php): failed to open stream: No such file or directory in /var/www/git/apache-cassandra-users/msg06180.html on line 112
Call Stack
#TimeMemoryFunctionLocation
10.0006368696{main}( ).../msg06180.html:0

( ! ) Warning: include(): Failed opening 'msgfooter.php' for inclusion (include_path='.:/var/www/git') in /var/www/git/apache-cassandra-users/msg06180.html on line 112
Call Stack
#TimeMemoryFunctionLocation
10.0006368696{main}( ).../msg06180.html:0