git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Re: Data model storage optimization


considering:
row size large or not
update a lot or not   ----- update is insert actually
read heavy or not
overall read performance

if row size large , you may consider table:user_detail , add column id in all tables.
In application side, merge/join by id.
But paid read price, 2nd query to user_detail.

Just my 2 cents.  hope helpful.

Thanks,

James


On Sun, Jul 29, 2018 at 11:20 PM, onmstester onmstester <onmstester@xxxxxxxx> wrote:

How many rows in average per partition?
around 10K.


Let me get this straight : You are bifurcating your partitions on either email or username , essentially potentially doubling the data because you don’t have a way to manage a central system of record of users ?

We are just analyzing output logs of a "perfectly" running application!, so no one let me change its data design, i thought maybe it would be a more general problem for cassandra users that someone both
1. needed to access a identical set of columns by multiple keys (all the keys should be present in rows)
2. there was a storage limit (due to TTL * input rate would be some TBs)
I know that there is a strict rule in cassandra data modeling : "never use foreign keys and sacrifice disk instead", but anyone ever been forced to do such a thing and How?