git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

replicated data in different sstables


It would seem to me that if the replicated data managed by a node is in
separate sstables from the "main" data it manages, when a new node came
online it would be easier to discard the data it no longer is responsible
for since it was shifted a slot down the ring.

Generally speaking I've been asking lots of questions about sstables that
would increase the number of them. It is my impression that the size of
bloom filters are linearly proportional to the number of hash keys
contained in the sstables of a particular node. Is that true?

We also want to avoid massive numbers of sstables mostly due to
filesystem/inode problems? Because the endstate of me suggesting sstables
be segmented by RACS, primary/replicated, and possibly application-specific
separations would impose say 5-10x more sstables, even though the absolute
amount of data and partition keys wouldn't change.