0

Right now have a 6-node Riak cluster that is experiencing very high latency and timeouts. When I go to check riak-admin transfers I get the following:

ubuntu@ip-172-31-38-8:~$ riak-admin transfers
'riak@prod-riak-19' waiting to handoff 54 partitions
'riak@prod-riak-18' waiting to handoff 54 partitions
'riak@prod-riak-17' waiting to handoff 53 partitions
'riak@prod-riak-16' waiting to handoff 53 partitions
'riak@prod-riak-15' waiting to handoff 53 partitions
'riak@prod-riak-14' waiting to handoff 53 partitions

I've since turned off Active Anti-Entropy, and still experiencing high latency but nothing else seems to be giving us a problem. When I check the error logs there aren't any errors for the last 5 hours.

CPU usage looks like this:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 4016 riak      20   0 3775m 564m 6224 S    9  3.8   3:34.90 beam.smp

so the machine obviously isn't maxed out. Is this the sign of data corruption? What could possibly be going on here? Thanks

crockpotveggies
  • 336
  • 2
  • 11

1 Answers1

0

When a Riak node is started it spawns a vnode for every partition in the ring, even those that it doesn't own. Each vnode that it doesn't own will attempt a handoff with the node that does own it, and after a successful handoff will shut down. These handoffs are subject to the transfer-limit.

Assuming you have a ring size of 64, there would be 10 or 11 vnodes owned by each node. The transfers output you have shown would be expected if no handoffs hand completed since the last time the entire cluster had been restarted.

Joe
  • 166
  • 4