2

we're trying to check our Cassandra cluster data integrity with:

nodetool repair

but after several minutes (~2-10min), we got strange connection resets / broken pipe

stack trace on a first node:

ERROR [STREAM-OUT-/52.xx.xx.xx] 2016-01-14 17:16:38,022 StreamSession.java:524 - [Stream #2b784f90-bada-11e5-8356-d1461c60ce25] Streaming error occurred
org.apache.cassandra.io.FSReadError: java.io.IOException: Broken pipe
        at org.apache.cassandra.io.util.ChannelProxy.transferTo(ChannelProxy.java:144) ~[apache-cassandra-2.2.4.jar:2.2.4]
        at org.apache.cassandra.streaming.compress.CompressedStreamWriter$1.apply(CompressedStreamWriter.java:79) ~[apache-cassandra-2.2.4.jar:2.2.4]
        at org.apache.cassandra.streaming.compress.CompressedStreamWriter$1.apply(CompressedStreamWriter.java:76) ~[apache-cassandra-2.2.4.jar:2.2.4]
        at org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.applyToChannel(BufferedDataOutputStreamPlus.java:297) ~[apache-cassandra-2.2.4.jar:2.2.4]
        at org.apache.cassandra.streaming.compress.CompressedStreamWriter.write(CompressedStreamWriter.java:75) ~[apache-cassandra-2.2.4.jar:2.2.4]
        at org.apache.cassandra.streaming.messages.OutgoingFileMessage.serialize(OutgoingFileMessage.java:90) ~[apache-cassandra-2.2.4.jar:2.2.4]
        at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:48) ~[apache-cassandra-2.2.4.jar:2.2.4]
        at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:40) ~[apache-cassandra-2.2.4.jar:2.2.4]
        at org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:47) ~[apache-cassandra-2.2.4.jar:2.2.4]
        at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:363) ~[apache-cassandra-2.2.4.jar:2.2.4]
        at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:335) ~[apache-cassandra-2.2.4.jar:2.2.4]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_65]
Caused by: java.io.IOException: Broken pipe
        at sun.nio.ch.FileChannelImpl.transferTo0(Native Method) ~[na:1.8.0_65]
        at sun.nio.ch.FileChannelImpl.transferToDirectlyInternal(FileChannelImpl.java:427) ~[na:1.8.0_65]
        at sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:492) ~[na:1.8.0_65]
        at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:607) ~[na:1.8.0_65]
        at org.apache.cassandra.io.util.ChannelProxy.transferTo(ChannelProxy.java:140) ~[apache-cassandra-2.2.4.jar:2.2.4]
        ... 11 common frames omitted

the other one on the second node:

[14/01/16 17:17:32] myuser: DEBUG [STREAM-OUT-/62.xx.xx.xx] 2016-01-14 17:16:27,265 ConnectionHandler.java:334 - [Stream #2b78c4c0-bada-11e5-9838-f990aeb79d54] Sending Received (61637210-9dca-11e5-a58f-2107f8f5c5d4, #5)
ERROR [STREAM-IN-/62.xx.xx.xx] 2016-01-14 17:16:27,270 StreamSession.java:524 - [Stream #2b78c4c0-bada-11e5-9838-f990aeb79d54] Streaming error occurred
java.nio.channels.ClosedChannelException: null
        at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:257) ~[na:1.8.0_65]
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:300) ~[na:1.8.0_65]
        at org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:53) ~[apache-cassandra-2.2.4.jar:2.2.4]
        at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:261) ~[apache-cassandra-2.2.4.jar:2.2.4]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_65]
DEBUG [STREAM-OUT-/62.xx.xx.xx] 2016-01-14 17:16:27,270 ConnectionHandler.java:334 - [Stream #2b78c4c0-bada-11e5-9838-f990aeb79d54] Sending Session Failed
ERROR [Thread-153] 2016-01-14 17:16:27,270 CassandraDaemon.java:185 - Exception in thread Thread[Thread-153,5,main]
java.lang.RuntimeException: java.lang.InterruptedException
        at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.jar:na]
        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) ~[apache-cassandra-2.2.4.jar:2.2.4]
        at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_65]
Caused by: java.lang.InterruptedException: null
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1220) ~[na:1.8.0_65]
        at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:335) ~[na:1.8.0_65]
        at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:350) ~[na:1.8.0_65]
        at org.apache.cassandra.streaming.compress.CompressedInputStream$Reader.runMayThrow(CompressedInputStream.java:176) ~[apache-cassandra-2.2.4.jar:2.2.4]
        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-2.2.4.jar:2.2.4]
        ... 1 common frames omitted
DEBUG [STREAM-IN-/62.xx.xx.xx] 2016-01-14 17:16:27,291 ConnectionHandler.java:110 - [Stream #2b78c4c0-bada-11e5-9838-f990aeb79d54] Closing stream connection handler on /62.xx.xx.xx
INFO  [STREAM-IN-/62.xx.xx.xx] 2016-01-14 17:16:27,291 StreamResultFuture.java:182 - [Stream #2b78c4c0-bada-11e5-9838-f990aeb79d54] Session with /62.xx.xx.xx is complete
WARN  [STREAM-IN-/62.xx.xx.xx] 2016-01-14 17:16:27,292 StreamResultFuture.java:209 - [Stream #2b78c4c0-bada-11e5-9838-f990aeb79d54] Stream failed
DEBUG [AntiEntropyStage:1] 2016-01-14 17:16:27,297 RepairSession.java:210 - [repair #bb979050-bad9-11e5-90e5-edb5e4de409d] Repair completed between /62.xx.xx.xx and /52.xx.xx.xx on table_one
WARN  [RepairJobTask:5] 2016-01-14 17:16:27,299 RepairJob.java:162 - [repair #bb979050-bad9-11e5-90e5-edb5e4de409d] table_one sync failed
ERROR [RepairJobTask:5] 2016-01-14 17:16:27,306 RepairSession.java:290 - [repair #bb979050-bad9-11e5-90e5-edb5e4de409d] Session completed with the following error
org.apache.cassandra.exceptions.RepairException: [repair #bb979050-bad9-11e5-90e5-edb5e4de409d on our_keyspace/table_one, (-3542169710435610193,-2435099955373091584]] Sync failed between /62.xx.xx.xx and /52.xx.xx.xx
        at org.apache.cassandra.repair.RemoteSyncTask.syncComplete(RemoteSyncTask.java:67) ~[apache-cassandra-2.2.4.jar:2.2.4]
        at org.apache.cassandra.repair.RepairSession.syncComplete(RepairSession.java:211) ~[apache-cassandra-2.2.4.jar:2.2.4]
        at org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:415) ~[apache-cassandra-2.2.4.jar:2.2.4]
        at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:163) ~[apache-cassandra-2.2.4.jar:2.2.4]
        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) ~[apache-cassandra-2.2.4.jar:2.2.4]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_65]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_65]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_65]
ERROR [RepairJobTask:5] 2016-01-14 17:16:27,307 RepairRunnable.java:243 - Repair session bb979050-bad9-11e5-90e5-edb5e4de409d for range (-3542169710435610193,-2435099955373091584] failed with error [repair #bb979050-bad9-11e5-90e5-edb5e4de409d on our_keyspace/table_one, (-3542169710435610193,-2435099955373091584]] Sync failed between /62.xx.xx.xx and /52.xx.xx.xx
org.apache.cassandra.exceptions.RepairException: [repair #bb979050-bad9-11e5-90e5-edb5e4de409d on our_keyspace/table_one, (-3542169710435610193,-2435099955373091584]] Sync failed between /62.xx.xx.xx and /52.xx.xx.xx
        at org.apache.cassandra.repair.RemoteSyncTask.syncComplete(RemoteSyncTask.java:67) ~[apache-cassandra-2.2.4.jar:2.2.4]
        at org.apache.cassandra.repair.RepairSession.syncComplete(RepairSession.java:211) ~[apache-cassandra-2.2.4.jar:2.2.4]
        at org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:415) ~[apache-cassandra-2.2.4.jar:2.2.4]
        at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:163) ~[apache-cassandra-2.2.4.jar:2.2.4]
        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) ~[apache-cassandra-2.2.4.jar:2.2.4]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_65]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_65]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_65]
ERROR [Thread-151] 2016-01-14 17:16:27,915 CassandraDaemon.java:185 - Exception in thread Thread[Thread-151,5,main]
java.lang.RuntimeException: java.lang.InterruptedException
        at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.jar:na]
        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) ~[apache-cassandra-2.2.4.jar:2.2.4]
        at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_65]
Caused by: java.lang.InterruptedException: null
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014) ~[na:1.8.0_65]
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048) ~[na:1.8.0_65]
        at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:353) ~[na:1.8.0_65]
        at org.apache.cassandra.streaming.compress.CompressedInputStream$Reader.runMayThrow(CompressedInputStream.java:181) ~[apache-cassandra-2.2.4.jar:2.2.4]
        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-2.2.4.jar:2.2.4]
        ... 1 common frames omitted
DEBUG [STREAM-IN-/62.xx.xx.xx] 2016-01-14 17:16:27,914 ConnectionHandler.java:266 - [Stream #2b784f90-bada-11e5-8356-d1461c60ce25] Received File (Header (cfId: 61637210-9dca-11e5-a58f-2107f8f5c5d4, #2, version: la, format: BIG, estimated keys: 5912576, transfer size: 2243586007, compressed?: true, repairedAt: 1452788012282, level: 0), file: /var/lib/cassandra/data/our_keyspace/table_one-616372109dca11e5a58f2107f8f5c5d4/tmp-la-4-big-Data.db)
DEBUG [STREAM-OUT-/62.xx.xx.xx] 2016-01-14 17:16:27,916 ConnectionHandler.java:334 - [Stream #2b784f90-bada-11e5-8356-d1461c60ce25] Sending Received (61637210-9dca-11e5-a58f-2107f8f5c5d4, #2)
ERROR [STREAM-IN-/62.xx.xx.xx] 2016-01-14 17:16:27,917 StreamSession.java:524 - [Stream #2b784f90-bada-1..
[14/01/16 17:18:46] myuser: [2016-01-14 17:16:27,308] Repair session bb979050-bad9-11e5-90e5-edb5e4de409d for range (-3542169710435610193,-2435099955373091584] failed with error [repair #bb979050-bad9-11e5-90e5-edb5e4de409d on our_keyspace/table_one, (-3542169710435610193,-2435099955373091584]] Sync failed between /62.xx.xx.xx and /52.xx.xx.xx (progress: 36%)

what could be the issue?

Greg M.
  • 41
  • 3
  • Check this post to help troubleshoot repair and streaming http://www.sestevez.com/cassandra-repair-logs/ – phact Jan 14 '16 at 23:26
  • it was probably corrupted data but we're unsure – Greg M. Aug 05 '16 at 08:38
  • Did you try instead of doing a repair on all keyspaces, just doing one column family at a time and find if the error appears on a specific column family ? – Baptiste Mille-Mathias Aug 08 '16 at 08:25
  • No I didn't but deleting the faulting node seems to "resolve" the problem. At least the cluster can do a full repair without errors! I think hardware was in cause but it's hard to be sure because of the hosted nature of machines. – Greg M. Aug 08 '16 at 19:48

0 Answers0