git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: secondary index table - tombstones surviving compactions


Hi,

I apologise for a late response I wanted to run some further tests so I can
provide more information to you.

@Jeff, no I don't set the "only_purge_repaired_tombstone" option. It should
be default: False.
But no I don't run repairs during the tests.

@Eric, I understand that rapid deletes/inserts are some kind of
antipattern, nevertheless I'm not experiencing any problems with that
(except for the 2nd indices).

Update: I run a new test where I delete the indexed columns extra, plus
delete the whole row at the end.
And surprisingly this test scenario works fine. Using nodetool flush +
compact (in order to expedite the test) seems to always purge the index
table.
So that's great because I seem to have found a workaround, on the other
hand, could there be a bug in Cassandra - leaking index table?

Test details:
Create table with LeveledCompactionStrategy;
'tombstone_compaction_interval': 60; gc_grace_seconds=60
There are two indexed columns for comparison: column1, column2
Insert keys {1..x} with random values in column1 & column2
Delete {key:column2}     (but not column1)
Delete {key}
Repeat n-times from the inserts
Wait 1 minute
nodetool flush
nodetool compact (sometimes compact <keyspace> <table.index>
nodetool cfstats

What I observe is, that the data table is empty, column2 index table is
also empty and column1 index table has non-zero (leaked) "space used" and
"estimated rows".

Roman






On 18 May 2018 at 16:13, Jeff Jirsa <jjirsa@xxxxxxxxx> wrote:

> This would matter for the base table, but would be less likely for the
> secondary index, where the partition key is the value of the base row
>
> Roman: there’s a config option related to only purging repaired tombstones
> - do you have that enabled ? If so, are you running repairs?
>
> --
> Jeff Jirsa
>
>
> > On May 18, 2018, at 6:41 AM, Eric Stevens <mightye@xxxxxxxxx> wrote:
> >
> > The answer to Question 3 is "yes."  One of the more subtle points about
> > tombstones is that Cassandra won't remove them during compaction if there
> > is a bloom filter on any SSTable on that replica indicating that it
> > contains the same partition (not primary) key.  Even if it is older than
> > gc_grace, and would otherwise be a candidate for cleanup.
> >
> > If you're recycling partition keys, your tombstones may never be able to
> be
> > cleaned up, because in this scenario there is a high probability that an
> > SSTable not involved in that compaction also contains the same partition
> > key, and so compaction cannot have confidence that it's safe to remove
> the
> > tombstone (it would have to fully materialize every record in the
> > compaction, which is too expensive).
> >
> > In general it is an antipattern in Cassandra to write to a given
> partition
> > indefinitely for this and other reasons.
> >
> > On Fri, May 18, 2018 at 2:37 AM Roman Bielik <
> > roman.bielik@xxxxxxxxxxxxxxxxxxxx> wrote:
> >
> >> Hi,
> >>
> >> I have a Cassandra 3.11 table (with compact storage) and using secondary
> >> indices with rather unique data stored in the indexed columns. There are
> >> many inserts and deletes, so in order to avoid tombstones piling up I'm
> >> re-using primary keys from a pool (which works fine).
> >> I'm aware that this design pattern is not ideal, but for now I can not
> >> change it easily.
> >>
> >> The problem is, the size of 2nd index tables keeps growing (filled with
> >> tombstones) no matter what.
> >>
> >> I tried some aggressive configuration (just for testing) in order to
> >> expedite the tombstone removal but with little-to-zero effect:
> >> COMPACTION = { 'class':
> >> 'LeveledCompactionStrategy', 'unchecked_tombstone_compaction': 'true',
> >> 'tombstone_compaction_interval': 600 }
> >> gc_grace_seconds = 600
> >>
> >> I'm aware that perhaps Materialized views could provide a solution to
> this,
> >> but I'm bind to the Thrift interface, so can not use them.
> >>
> >> Questions:
> >> 1. Is there something I'm missing? How come compaction does not remove
> the
> >> obsolete indices/tombstones from 2nd index tables? Can I trigger the
> >> cleanup manually somehow?
> >> I have tried nodetool flush, compact, rebuild_index on both data table
> and
> >> internal Index table, but with no result.
> >>
> >> 2. When deleting a record I'm deleting the whole row at once - which
> would
> >> create one tombstone for the whole record if I'm correct. Would it help
> to
> >> delete the indexed columns separately creating extra tombstone for each
> >> cell?
> >> As I understand the underlying mechanism, the indexed column value must
> be
> >> read in order a proper tombstone for the index is created for it.
> >>
> >> 3. Could the fact that I'm reusing the primary key of a deleted record
> >> shortly for a new insert interact with the secondary index tombstone
> >> removal?
> >>
> >> Will be grateful for any advice.
> >>
> >> Regards,
> >> Roman
> >>
> >> --
> >> <http://www.openmindnetworks.com>
> >> <http://www.openmindnetworks.com/>
> >> <https://www.linkedin.com/company/openmind-networks>
> >> <https://twitter.com/Openmind_Ntwks>  <http://www.openmindnetworks.com/
> >
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
>
>

-- 
 <http://www.openmindnetworks.com>
 <http://www.openmindnetworks.com/> 
<https://www.linkedin.com/company/openmind-networks>  
<https://twitter.com/Openmind_Ntwks>  <http://www.openmindnetworks.com/>