We use Cassandra in non-conventional way, where our data is short termed (life cycle of about 20-30 minutes) where each record is updated ~5 times and then deleted. We have GC grace of 15 minutes.
We are seeing 2 problems
1.) A certain number of Cassandra nodes goes down and then we remove it from the cluster using Cassandra removenode command and replace the dead nodes with new nodes. While new nodes are joining in, we see more nodes down (which are not actually down) but we see following errors in the log
“Gossip not settled after 321 polls. Gossip Stage active/pending/completed: 1/816/0”
To fix the issue, I restarted the server and the nodes now appear to be up and the problem is solved
Can this problem be related to https://issues.apache.org/jira/browse/CASSANDRA-6590 ?
2.) Meanwhile, after restarting the nodes mentioned above, we see that some old deleted data is resurrected (because of short lifecycle of our data). My guess at the moment is that these data is resurrected due to hinted handoff. Interesting point to note here is that data keeps resurrecting at periodic intervals (like an hour) and then finally stops. Could this be caused by hinted handoff? if so is there any setting which we can set to specify that “invalidate, hinted handoff data after 5-10 minutes”.