git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Data Corruption due to multiple Cassandra 2.1 processes?


Hello,

 

with 2.1, in case a second Cassandra process/instance is started on a host (by accident), may this result in some sort of corruption, although Cassandra will exit at some point in time due to not being able to bind TCP ports already in use?

 

What we have seen in this scenario is something like that:

 

ERROR [main] 2018-08-05 21:10:24,046 CassandraDaemon.java:120 - Error starting local jmx server:

java.rmi.server.ExportException: Port already in use: 7199; nested exception is:

                java.net.BindException: Address already in use (Bind failed)

 

But then continuing with stuff like opening system and even user tables:

 

INFO  [main] 2018-08-05 21:10:24,060 CacheService.java:110 - Initializing key cache with capacity of 100 MBs.

INFO  [main] 2018-08-05 21:10:24,067 CacheService.java:132 - Initializing row cache with capacity of 0 MBs

INFO  [main] 2018-08-05 21:10:24,073 CacheService.java:149 - Initializing counter cache with capacity of 50 MBs

INFO  [main] 2018-08-05 21:10:24,074 CacheService.java:160 - Scheduling counter cache save to every 7200 seconds (going to save all keys).

INFO  [main] 2018-08-05 21:10:24,161 ColumnFamilyStore.java:365 - Initializing system.sstable_activity

INFO  [SSTableBatchOpen:2] 2018-08-05 21:10:24,692 SSTableReader.java:475 - Opening /var/opt/xxx-managed/cassandra/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/system-sstable_activity-ka-165 (2023 bytes)

INFO  [SSTableBatchOpen:3] 2018-08-05 21:10:24,692 SSTableReader.java:475 - Opening /var/opt/xxx-managed/cassandra/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/system-sstable_activity-ka-167 (2336 bytes)

INFO  [SSTableBatchOpen:1] 2018-08-05 21:10:24,692 SSTableReader.java:475 - Opening /var/opt/xxx-managed/cassandra/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/system-sstable_activity-ka-166 (2686 bytes)

INFO  [main] 2018-08-05 21:10:24,755 ColumnFamilyStore.java:365 - Initializing system.hints

INFO  [SSTableBatchOpen:1] 2018-08-05 21:10:24,758 SSTableReader.java:475 - Opening /var/opt/xxx-managed/cassandra/system/hints-2666e20573ef38b390fefecf96e8f0c7/system-hints-ka-377 (46210621 bytes)

INFO  [main] 2018-08-05 21:10:24,766 ColumnFamilyStore.java:365 - Initializing system.compaction_history

INFO  [SSTableBatchOpen:1] 2018-08-05 21:10:24,768 SSTableReader.java:475 - Opening /var/opt/xxx-managed/cassandra/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/system-compaction_history-ka-129 (91269 bytes)

 

Replaying commit logs:

 

INFO  [main] 2018-08-05 21:10:25,896 CommitLogReplayer.java:267 - Replaying /var/opt/dynatrace-managed/cassandra/commitlog/CommitLog-4-1533133668366.log

INFO  [main] 2018-08-05 21:10:25,896 CommitLogReplayer.java:270 - Replaying /var/opt/dynatrace-managed/cassandra/commitlog/CommitLog-4-1533133668366.log (CL version 4, messaging version 8)

 

Even writing memtables already (below just pasted system tables, but also user tables):

 

INFO  [MemtableFlushWriter:4] 2018-08-05 21:11:52,524 Memtable.java:347 - Writing Memtable-size_estimates@1941663179(2.655MiB serialized bytes, 325710 ops, 2%/0% of on/off-heap limit)

INFO  [MemtableFlushWriter:3] 2018-08-05 21:11:52,552 Memtable.java:347 - Writing Memtable-peer_events@1474667699(0.199KiB serialized bytes, 4 ops, 0%/0% of on/off-heap limit)

 

Until it comes to a point where it can’t bind ports like the storage port 7000:

 

ERROR [main] 2018-08-05 21:11:54,350 CassandraDaemon.java:395 - Fatal configuration error

org.apache.cassandra.exceptions.ConfigurationException: /XXX:7000 is in use by another process.  Change listen_address:storage_port in cassandra.yaml to values that do not conflict with other services

                at org.apache.cassandra.net.MessagingService.getServerSockets(MessagingService.java:495) ~[apache-cassandra-2.1.18.jar:2.1.18]

 

Until Cassandra stops:

 

INFO  [StorageServiceShutdownHook] 2018-08-05 21:11:54,361 Gossiper.java:1454 - Announcing shutdown

 

 

So, we have around 2 minutes where Cassandra is mangling with existing data, although it shouldn’t.

 

Sounds like a potential candidate for data corruption, right? E.g. later on we then see things like (still while being in progress to shutdown?):

 

WARN  [SharedPool-Worker-1] 2018-08-05 21:11:58,181 AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread Thread[SharedPool-Worker-1,5,main]: {}

java.lang.RuntimeException: java.io.FileNotFoundException: /var/opt/xxx-managed/cassandra/xxx/xxx-fdc68b70950611e8ad7179f2d5bfa3cf/xxx-xxx-ka-15-Data.db (No such file or directory)

                at org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:52) ~[apache-cassandra-2.1.18.jar:2.1.18]

                at org.apache.cassandra.io.util.CompressedPoolingSegmentedFile.createPooledReader(CompressedPoolingSegmentedFile.java:95) ~[apache-cassandra-2.1.18.jar:2.1.18]

                at org.apache.cassandra.io.util.PoolingSegmentedFile.getSegment(PoolingSegmentedFile.java:62) ~[apache-cassandra-2.1.18.jar:2.1.18]

 

 

I found this one here: https://issues.apache.org/jira/browse/CASSANDRA-11540

 

So, if this all leads to corruption, might this be a candidate for a backport for a 2.1 bugfix release?

 

Thanks a lot!

 

Thomas

 

The contents of this e-mail are intended for the named addressee only. It contains information that may be confidential. Unless you are the named addressee or an authorized designee, you may not copy or use it, or disclose it to anyone else. If you received it in error please notify us immediately and then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a company registered in Linz whose registered office is at 4040 Linz, Austria, Freistädterstraße 313