git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: replicated data in different sstables


Oh duh, RACS does this already. But it would be nice to get some education
on the bloom filter memory use vs # sstables question.

On Wed, Jul 25, 2018 at 10:41 AM Carl Mueller <carl.mueller@xxxxxxxxxxxxxxx>
wrote:

> It would seem to me that if the replicated data managed by a node is in
> separate sstables from the "main" data it manages, when a new node came
> online it would be easier to discard the data it no longer is responsible
> for since it was shifted a slot down the ring.
>
> Generally speaking I've been asking lots of questions about sstables that
> would increase the number of them. It is my impression that the size of
> bloom filters are linearly proportional to the number of hash keys
> contained in the sstables of a particular node. Is that true?
>
> We also want to avoid massive numbers of sstables mostly due to
> filesystem/inode problems? Because the endstate of me suggesting sstables
> be segmented by RACS, primary/replicated, and possibly application-specific
> separations would impose say 5-10x more sstables, even though the absolute
> amount of data and partition keys wouldn't change.
>