You are correct that the cluster decides where data goes (based on the hash of the partition key). However, if you choose a “bad” partition key, you may not get good distribution of the data, because the hash is deterministic (it always goes to the same nodes/replicas). For example, if you have a partition key of a datetime, it is possible that there is more data written for a certain time period – thus a larger partition and an imbalance across the cluster. Choosing a “good” partition key is one of the most important decisions for a Cassandra table.
Also, I have seen the use of racks in the topology cause an imbalance in the “first” node of the rack.
To help you more, we would need the create table statement(s) for your keyspace and the topology of the cluster (like with nodetool status).
We do not chose the node where partition will go. I thought it is snitch's role to chose replica nodes. Even the partition size does not vary on our largest column family:
Percentile SSTables Write Latency Read Latency Partition Size Cell Count
(micros) (micros) (bytes)
50% 0.00 17.08 61.21 3311 1
75% 0.00 20.50 88.15 3973 1
95% 0.00 35.43 105.78 3973 1
98% 0.00 42.51 126.93 3973 1
99% 0.00 51.01 126.93 3973 1
Min 0.00 3.97 17.09 61
Max 0.00 73.46 126.93 11864 1
We are kinda stuck here to identify, what could be causing this un-balance.
On Tuesday, June 19, 2018, 7:15:28 AM EDT, Joshua Galbraith <jgalbraith@xxxxxxxxxxxx.INVALID> wrote:
>If it was partition key issue, we would see similar number of partition keys across nodes. If we look closely number of keys
across nodes vary a lot.
On Mon, Jun 18, 2018 at 6:07 PM, learner dba <email@example.com> wrote:
|( ! ) Warning: include(msgfooter.php): failed to open stream: No such file or directory in /var/www/git/apache-cassandra-users/msg06235.html on line 473|
|( ! ) Warning: include(): Failed opening 'msgfooter.php' for inclusion (include_path='.:/var/www/git') in /var/www/git/apache-cassandra-users/msg06235.html on line 473|