Repairing OpsCenter rollups60 Results in Snapshot Errors

Question

I am running a 6 node cluster using Apache Cassandra 2.1.2 with DataStax OpsCenter 5.0.2 from the AWS EC2 AMI "DataStax Auto-Clustering AMI 2.5.1-hvm" (DataStax Community AMI). When I try to run a repair on the rollups60 column family in the OpsCenter keyspace, I get errors about failed snapshot creation in the Cassandra system log. The repair seems to continue, though it hasn't finished yet.

I am wondering whether this is making the repair ineffectual, or whether I can expect it to finish at all.

I am running the command

nodetool repair OpsCenter rollups60

on one of the nodes (10.63.74.70). From the command, I've gotten this output so far:

[2015-01-23 19:36:06,261] Starting repair command #9, repairing 511 ranges for keyspace OpsCenter (seq=true, full=true)

And here is an example of what I see in the log:

INFO  [AntiEntropyStage:1] 2015-01-23 19:38:28,235 RepairSession.java:171 - [repair #138b42e0-a337-11e4-9e78-37e5027a626b] Received merkle tree for rollups60 from /10.63.74.70
INFO  [AntiEntropySessions:9] 2015-01-23 19:38:28,236 RepairSession.java:260 - [repair #67772db0-a337-11e4-9e78-37e5027a626b] new session: will sync /10.63.74.70, /10.51.180.16 on range (5848435723460298978,5868916338423419522] for OpsCenter.[rollups60]
INFO  [RepairJobTask:3] 2015-01-23 19:38:28,237 Differencer.java:74 - [repair #138b42e0-a337-11e4-9e78-37e5027a626b] Endpoints /10.13.157.190 and /10.63.74.70 have 1 range(s) out of sync for rollups60
INFO  [AntiEntropyStage:1] 2015-01-23 19:38:28,237 ColumnFamilyStore.java:840 - Enqueuing flush of rollups60: 465365 (0%) on-heap, 0 (0%) off-heap
INFO  [MemtableFlushWriter:25] 2015-01-23 19:38:28,238 Memtable.java:325 - Writing Memtable-rollups60@204861223(51960 serialized bytes, 1395 ops, 0%/0% of on/off-heap limit)
INFO  [RepairJobTask:3] 2015-01-23 19:38:28,239 StreamingRepairTask.java:68 - [streaming task #138b42e0-a337-11e4-9e78-37e5027a626b] Performing streaming repair of 1 ranges with /10.13.157.190
INFO  [MemtableFlushWriter:25] 2015-01-23 19:38:28,262 Memtable.java:364 - Completed flushing /raid0/cassandra/data/OpsCenter/rollups60-445613507ca411e4bd3f1927a2a71193/OpsCenter-rollups60-ka-331933-Data.db (29998 bytes) for commitlog position ReplayPosition(segmentId=1422038939094, position=31047766)
ERROR [RepairJobTask:2] 2015-01-23 19:38:39,067 RepairJob.java:127 - Error occurred during snapshot phase
java.lang.RuntimeException: Could not create snapshot at /10.63.74.70
        at org.apache.cassandra.repair.SnapshotTask$SnapshotCallback.onFailure(SnapshotTask.java:77) ~[apache-cassandra-2.1.2.jar:2.1.2]
        at org.apache.cassandra.net.MessagingService$5$1.run(MessagingService.java:347) ~[apache-cassandra-2.1.2.jar:2.1.2]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_51]
        at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_51]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_51]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_51]
        at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
INFO  [AntiEntropySessions:10] 2015-01-23 19:38:39,068 RepairSession.java:260 - [repair #6dec29c0-a337-11e4-9e78-37e5027a626b] new session: will sync /10.63.74.70, /10.51.180.16 on range (-6918744323658665195,-6916171087863528821] for OpsCenter.[rollups60]
ERROR [AntiEntropySessions:9] 2015-01-23 19:38:39,068 RepairSession.java:303 - [repair #67772db0-a337-11e4-9e78-37e5027a626b] session completed with the following error
java.io.IOException: Failed during snapshot creation.
        at org.apache.cassandra.repair.RepairSession.failedSnapshot(RepairSession.java:344) ~[apache-cassandra-2.1.2.jar:2.1.2]
        at org.apache.cassandra.repair.RepairJob$2.onFailure(RepairJob.java:128) ~[apache-cassandra-2.1.2.jar:2.1.2]
        at com.google.common.util.concurrent.Futures$4.run(Futures.java:1172) ~[guava-16.0.jar:na]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_51]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_51]
        at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
ERROR [AntiEntropySessions:9] 2015-01-23 19:38:39,070 CassandraDaemon.java:153 - Exception in thread Thread[AntiEntropySessions:9,5,RMI Runtime]
java.lang.RuntimeException: java.io.IOException: Failed during snapshot creation.
        at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.jar:na]
        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) ~[apache-cassandra-2.1.2.jar:2.1.2]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_51]
        at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_51]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_51]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_51]
        at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
Caused by: java.io.IOException: Failed during snapshot creation.
        at org.apache.cassandra.repair.RepairSession.failedSnapshot(RepairSession.java:344) ~[apache-cassandra-2.1.2.jar:2.1.2]
        at org.apache.cassandra.repair.RepairJob$2.onFailure(RepairJob.java:128) ~[apache-cassandra-2.1.2.jar:2.1.2]
        at com.google.common.util.concurrent.Futures$4.run(Futures.java:1172) ~[guava-16.0.jar:na]
        ... 3 common frames omitted

The errors are repeated many times. The IP Address 10.63.74.70 in the log is the node I'm running the repair from. I am able to repair the rest of the OpsCenter column families, and they complete pretty quickly without error.

I have tried creating my own snapshot, and it completes successfully with nothing logged.

nodetool snapshot OpsCenter

The disk has plenty of space left. Are these errors problematic? Should I just let the repair process continue for however long it takes? The cluster is currently not in use by any application, yet it has some load, so it's not sitting idle (it has no load when I'm not repairing).

Thanks for any help.

BTW, there is no datastax-community tag here, so I have to use the datastax-enterprise one.

The repair did finish, with a lot of messages like this: `[2015-01-23 21:08:16,242] Repair session 67772db0-a337-11e4-9e78-37e5027a626b for range (5848435723460298978,5868916338423419522] failed with error java.io.IOException: Failed during snapshot creation.` — pgn674, Jan 23 '15 at 21:33
I believe the next best step is to check the logs on the replica that failed to create the snapshot, where you should be able to find more information. At the risk of sounding like I'm trying to pass you off, the best place to get Cassandra specific questions answered from the community in a timely manner is on the Cassandra users mailing list, which you can find more information about at the bottom of this page: http://cassandra.apache.org/ — mbulman, Jan 27 '15 at 23:11
The snapshot failure is on 10.63.74.70, which is where I ran the repair command and is where I got the log from. So it appears it doesn't have more information. The Cassandra users mailing list looks good, I'll try over there. Thanks for the referral. — pgn674, Jan 29 '15 at 15:36
I've posted to that mailing list (at http://www.mail-archive.com/user@cassandra.apache.org/msg40686.html), and I've been directed to an open issue (at https://issues.apache.org/jira/browse/CASSANDRA-8696). — pgn674, Jan 29 '15 at 17:32

Repairing OpsCenter rollups60 Results in Snapshot Errors

0 Answers0