git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

saving distinct data in cassandra result in many tombstones


Hi,

I needed to save a distinct value for a key in each hour, the problem with saving everything and computing distincts in memory is that there
are too many repeated data.
Table schema:
Table distinct(
hourNumber int,
key text,
distinctValue long
primary key (hourNumber)
)

I want to retrieve distinct count of all keys in a specific hour and using this data model it would be achieved by reading a single partition.
The problem : i can't read from this table, system.log indicates that more than 100K tombstones read and no live data in it. The gc_grace time is
the default (10 days), so i thought decreasing it to 1 hour and run compaction, but is this a right approach at all? i mean the whole idea of replacing
some millions of rows. each  10 times in a partition again and again that creates alot of tombstones just to achieve distinct behavior?

Thanks in advance

Sent using Zoho Mail





( ! ) Warning: include(msgfooter.php): failed to open stream: No such file or directory in /var/www/git/apache-cassandra-users/msg06151.html on line 68
Call Stack
#TimeMemoryFunctionLocation
10.0008364600{main}( ).../msg06151.html:0

( ! ) Warning: include(): Failed opening 'msgfooter.php' for inclusion (include_path='.:/var/www/git') in /var/www/git/apache-cassandra-users/msg06151.html on line 68
Call Stack
#TimeMemoryFunctionLocation
10.0008364600{main}( ).../msg06151.html:0