git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: TWCS Compaction backed up


Hi Jeff, mostly lots of little files, like there will be 4-5 that are 1-1.5gb or so and then many at 5-50MB and many at 40-50MB each.   

Re incremental repair; Yes one of my engineers started an incremental repair on this column family that we had to abort.  In fact, the node that the repair was initiated on ran out of disk space and we ended replacing that node like a dead node.   

Oddly the new node is experiencing this issue as well.  

-B


On Tue, Aug 7, 2018 at 8:04 PM Jeff Jirsa <jjirsa@xxxxxxxxx> wrote:
You could toggle off the tombstone compaction to see if that helps, but that should be lower priority than normal compactions

Are the lots-of-little-files from memtable flushes or repair/anticompaction?

Do you do normal deletes? Did you try to run Incremental repair?  

-- 
Jeff Jirsa


On Aug 7, 2018, at 5:00 PM, Brian Spindler <brian.spindler@xxxxxxxxx> wrote:

Hi Jonathan, both I believe.  

The window size is 1 day, full settings: 
    AND compaction = {'timestamp_resolution': 'MILLISECONDS', 'unchecked_tombstone_compaction': 'true', 'compaction_window_size': '1', 'compaction_window_unit': 'DAYS', 'tombstone_compaction_interval': '86400', 'tombstone_threshold': '0.2', 'class': 'com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy'} 


nodetool tpstats 

Pool Name                    Active   Pending      Completed   Blocked  All time blocked
MutationStage                     0         0    68582241832         0                 0
ReadStage                         0         0      209566303         0                 0
RequestResponseStage              0         0    44680860850         0                 0
ReadRepairStage                   0         0       24562722         0                 0
CounterMutationStage              0         0              0         0                 0
MiscStage                         0         0              0         0                 0
HintedHandoff                     1         1            203         0                 0
GossipStage                       0         0        8471784         0                 0
CacheCleanupExecutor              0         0            122         0                 0
InternalResponseStage             0         0         552125         0                 0
CommitLogArchiver                 0         0              0         0                 0
CompactionExecutor                8        42        1433715         0                 0
ValidationExecutor                0         0           2521         0                 0
MigrationStage                    0         0         527549         0                 0
AntiEntropyStage                  0         0           7697         0                 0
PendingRangeCalculator            0         0             17         0                 0
Sampler                           0         0              0         0                 0
MemtableFlushWriter               0         0         116966         0                 0
MemtablePostFlush                 0         0         209103         0                 0
MemtableReclaimMemory             0         0         116966         0                 0
Native-Transport-Requests         1         0     1715937778         0            176262

Message type           Dropped
READ                         2
RANGE_SLICE                  0
_TRACE                       0
MUTATION                  4390
COUNTER_MUTATION             0
BINARY                       0
REQUEST_RESPONSE          1882
PAGED_RANGE                  0
READ_REPAIR                  0


On Tue, Aug 7, 2018 at 7:57 PM Jonathan Haddad <jon@xxxxxxxxxxxxx> wrote:
What's your window size?

When you say backed up, how are you measuring that?  Are there pending tasks or do you just see more files than you expect?

On Tue, Aug 7, 2018 at 4:38 PM Brian Spindler <brian.spindler@xxxxxxxxx> wrote:
Hey guys, quick question: 
 
I've got a v2.1 cassandra cluster, 12 nodes on aws i3.2xl, commit log on one drive, data on nvme.  That was working very well, it's a ts db and has been accumulating data for about 4weeks.  

The nodes have increased in load and compaction seems to be falling behind.  I used to get about 1 file per day for this column family, about ~30GB Data.db file per day.  I am now getting hundreds per day at  1mb - 50mb.

How to recover from this? 

I can scale out to give some breathing room but will it go back and compact the old days into nicely packed files for the day?    

I tried setting compaction throughput to 1000 from 256 and it seemed to make things worse for the CPU, it's configured on i3.2xl with 8 compaction threads. 

-B

Lastly, I have mixed TTLs in this CF and need to run a repair (I think) to get rid of old tombstones, however running repairs in 2.1 on TWCS column families causes a very large spike in sstable counts due to anti-compaction which causes a lot of disruption, is there any other way?  




--
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade