we're trying to check our Cassandra cluster data integrity with:
nodetool repair
but after several minutes (~2-10min), we got strange connection resets / broken pipe
stack trace on a first node:
ERROR [STREAM-OUT-/52.xx.xx.xx] 2016-01-14 17:16:38,022 StreamSession.java:524 - [Stream #2b784f90-bada-11e5-8356-d1461c60ce25] Streaming error occurred
org.apache.cassandra.io.FSReadError: java.io.IOException: Broken pipe
at org.apache.cassandra.io.util.ChannelProxy.transferTo(ChannelProxy.java:144) ~[apache-cassandra-2.2.4.jar:2.2.4]
at org.apache.cassandra.streaming.compress.CompressedStreamWriter$1.apply(CompressedStreamWriter.java:79) ~[apache-cassandra-2.2.4.jar:2.2.4]
at org.apache.cassandra.streaming.compress.CompressedStreamWriter$1.apply(CompressedStreamWriter.java:76) ~[apache-cassandra-2.2.4.jar:2.2.4]
at org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.applyToChannel(BufferedDataOutputStreamPlus.java:297) ~[apache-cassandra-2.2.4.jar:2.2.4]
at org.apache.cassandra.streaming.compress.CompressedStreamWriter.write(CompressedStreamWriter.java:75) ~[apache-cassandra-2.2.4.jar:2.2.4]
at org.apache.cassandra.streaming.messages.OutgoingFileMessage.serialize(OutgoingFileMessage.java:90) ~[apache-cassandra-2.2.4.jar:2.2.4]
at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:48) ~[apache-cassandra-2.2.4.jar:2.2.4]
at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:40) ~[apache-cassandra-2.2.4.jar:2.2.4]
at org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:47) ~[apache-cassandra-2.2.4.jar:2.2.4]
at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:363) ~[apache-cassandra-2.2.4.jar:2.2.4]
at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:335) ~[apache-cassandra-2.2.4.jar:2.2.4]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_65]
Caused by: java.io.IOException: Broken pipe
at sun.nio.ch.FileChannelImpl.transferTo0(Native Method) ~[na:1.8.0_65]
at sun.nio.ch.FileChannelImpl.transferToDirectlyInternal(FileChannelImpl.java:427) ~[na:1.8.0_65]
at sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:492) ~[na:1.8.0_65]
at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:607) ~[na:1.8.0_65]
at org.apache.cassandra.io.util.ChannelProxy.transferTo(ChannelProxy.java:140) ~[apache-cassandra-2.2.4.jar:2.2.4]
... 11 common frames omitted
the other one on the second node:
[14/01/16 17:17:32] myuser: DEBUG [STREAM-OUT-/62.xx.xx.xx] 2016-01-14 17:16:27,265 ConnectionHandler.java:334 - [Stream #2b78c4c0-bada-11e5-9838-f990aeb79d54] Sending Received (61637210-9dca-11e5-a58f-2107f8f5c5d4, #5)
ERROR [STREAM-IN-/62.xx.xx.xx] 2016-01-14 17:16:27,270 StreamSession.java:524 - [Stream #2b78c4c0-bada-11e5-9838-f990aeb79d54] Streaming error occurred
java.nio.channels.ClosedChannelException: null
at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:257) ~[na:1.8.0_65]
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:300) ~[na:1.8.0_65]
at org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:53) ~[apache-cassandra-2.2.4.jar:2.2.4]
at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:261) ~[apache-cassandra-2.2.4.jar:2.2.4]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_65]
DEBUG [STREAM-OUT-/62.xx.xx.xx] 2016-01-14 17:16:27,270 ConnectionHandler.java:334 - [Stream #2b78c4c0-bada-11e5-9838-f990aeb79d54] Sending Session Failed
ERROR [Thread-153] 2016-01-14 17:16:27,270 CassandraDaemon.java:185 - Exception in thread Thread[Thread-153,5,main]
java.lang.RuntimeException: java.lang.InterruptedException
at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.jar:na]
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) ~[apache-cassandra-2.2.4.jar:2.2.4]
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_65]
Caused by: java.lang.InterruptedException: null
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1220) ~[na:1.8.0_65]
at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:335) ~[na:1.8.0_65]
at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:350) ~[na:1.8.0_65]
at org.apache.cassandra.streaming.compress.CompressedInputStream$Reader.runMayThrow(CompressedInputStream.java:176) ~[apache-cassandra-2.2.4.jar:2.2.4]
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-2.2.4.jar:2.2.4]
... 1 common frames omitted
DEBUG [STREAM-IN-/62.xx.xx.xx] 2016-01-14 17:16:27,291 ConnectionHandler.java:110 - [Stream #2b78c4c0-bada-11e5-9838-f990aeb79d54] Closing stream connection handler on /62.xx.xx.xx
INFO [STREAM-IN-/62.xx.xx.xx] 2016-01-14 17:16:27,291 StreamResultFuture.java:182 - [Stream #2b78c4c0-bada-11e5-9838-f990aeb79d54] Session with /62.xx.xx.xx is complete
WARN [STREAM-IN-/62.xx.xx.xx] 2016-01-14 17:16:27,292 StreamResultFuture.java:209 - [Stream #2b78c4c0-bada-11e5-9838-f990aeb79d54] Stream failed
DEBUG [AntiEntropyStage:1] 2016-01-14 17:16:27,297 RepairSession.java:210 - [repair #bb979050-bad9-11e5-90e5-edb5e4de409d] Repair completed between /62.xx.xx.xx and /52.xx.xx.xx on table_one
WARN [RepairJobTask:5] 2016-01-14 17:16:27,299 RepairJob.java:162 - [repair #bb979050-bad9-11e5-90e5-edb5e4de409d] table_one sync failed
ERROR [RepairJobTask:5] 2016-01-14 17:16:27,306 RepairSession.java:290 - [repair #bb979050-bad9-11e5-90e5-edb5e4de409d] Session completed with the following error
org.apache.cassandra.exceptions.RepairException: [repair #bb979050-bad9-11e5-90e5-edb5e4de409d on our_keyspace/table_one, (-3542169710435610193,-2435099955373091584]] Sync failed between /62.xx.xx.xx and /52.xx.xx.xx
at org.apache.cassandra.repair.RemoteSyncTask.syncComplete(RemoteSyncTask.java:67) ~[apache-cassandra-2.2.4.jar:2.2.4]
at org.apache.cassandra.repair.RepairSession.syncComplete(RepairSession.java:211) ~[apache-cassandra-2.2.4.jar:2.2.4]
at org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:415) ~[apache-cassandra-2.2.4.jar:2.2.4]
at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:163) ~[apache-cassandra-2.2.4.jar:2.2.4]
at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) ~[apache-cassandra-2.2.4.jar:2.2.4]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_65]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_65]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_65]
ERROR [RepairJobTask:5] 2016-01-14 17:16:27,307 RepairRunnable.java:243 - Repair session bb979050-bad9-11e5-90e5-edb5e4de409d for range (-3542169710435610193,-2435099955373091584] failed with error [repair #bb979050-bad9-11e5-90e5-edb5e4de409d on our_keyspace/table_one, (-3542169710435610193,-2435099955373091584]] Sync failed between /62.xx.xx.xx and /52.xx.xx.xx
org.apache.cassandra.exceptions.RepairException: [repair #bb979050-bad9-11e5-90e5-edb5e4de409d on our_keyspace/table_one, (-3542169710435610193,-2435099955373091584]] Sync failed between /62.xx.xx.xx and /52.xx.xx.xx
at org.apache.cassandra.repair.RemoteSyncTask.syncComplete(RemoteSyncTask.java:67) ~[apache-cassandra-2.2.4.jar:2.2.4]
at org.apache.cassandra.repair.RepairSession.syncComplete(RepairSession.java:211) ~[apache-cassandra-2.2.4.jar:2.2.4]
at org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:415) ~[apache-cassandra-2.2.4.jar:2.2.4]
at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:163) ~[apache-cassandra-2.2.4.jar:2.2.4]
at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) ~[apache-cassandra-2.2.4.jar:2.2.4]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_65]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_65]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_65]
ERROR [Thread-151] 2016-01-14 17:16:27,915 CassandraDaemon.java:185 - Exception in thread Thread[Thread-151,5,main]
java.lang.RuntimeException: java.lang.InterruptedException
at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.jar:na]
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) ~[apache-cassandra-2.2.4.jar:2.2.4]
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_65]
Caused by: java.lang.InterruptedException: null
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014) ~[na:1.8.0_65]
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048) ~[na:1.8.0_65]
at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:353) ~[na:1.8.0_65]
at org.apache.cassandra.streaming.compress.CompressedInputStream$Reader.runMayThrow(CompressedInputStream.java:181) ~[apache-cassandra-2.2.4.jar:2.2.4]
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-2.2.4.jar:2.2.4]
... 1 common frames omitted
DEBUG [STREAM-IN-/62.xx.xx.xx] 2016-01-14 17:16:27,914 ConnectionHandler.java:266 - [Stream #2b784f90-bada-11e5-8356-d1461c60ce25] Received File (Header (cfId: 61637210-9dca-11e5-a58f-2107f8f5c5d4, #2, version: la, format: BIG, estimated keys: 5912576, transfer size: 2243586007, compressed?: true, repairedAt: 1452788012282, level: 0), file: /var/lib/cassandra/data/our_keyspace/table_one-616372109dca11e5a58f2107f8f5c5d4/tmp-la-4-big-Data.db)
DEBUG [STREAM-OUT-/62.xx.xx.xx] 2016-01-14 17:16:27,916 ConnectionHandler.java:334 - [Stream #2b784f90-bada-11e5-8356-d1461c60ce25] Sending Received (61637210-9dca-11e5-a58f-2107f8f5c5d4, #2)
ERROR [STREAM-IN-/62.xx.xx.xx] 2016-01-14 17:16:27,917 StreamSession.java:524 - [Stream #2b784f90-bada-1..
[14/01/16 17:18:46] myuser: [2016-01-14 17:16:27,308] Repair session bb979050-bad9-11e5-90e5-edb5e4de409d for range (-3542169710435610193,-2435099955373091584] failed with error [repair #bb979050-bad9-11e5-90e5-edb5e4de409d on our_keyspace/table_one, (-3542169710435610193,-2435099955373091584]] Sync failed between /62.xx.xx.xx and /52.xx.xx.xx (progress: 36%)
what could be the issue?