3

No idea what's going on here, but added a new Riak node to the cluster and committed the changes. The new node has taken 0% of cluster membership while the first node in the cluster is growing with over 37% membership. Here's the graph:

================================= Membership ==================================
Status     Ring    Pending    Node
-------------------------------------------------------------------------------
valid      37.5%     25.0%    'riak@prod-riak-08'
valid      17.2%     25.0%    'riak@prod-riak-09'
valid      25.0%     25.0%    'riak@prod-riak-10'
valid      20.3%     25.0%    'riak@prod-riak-11'
valid       0.0%      0.0%    'riak@prod-riak-12'

In the meantime, it looks like some data is inaccessible. Any idea what's going on? Using Riak 1.4.8.

Most recent log of riak-12:

2014-06-24 09:00:11.142 [info] <0.347.0>@riak_kv_entropy_manager:perhaps_log_throttle_change:826 Changing AAE throttle from 10 -> 0 msec/key, based on maximum vnode mailbox size 53 from 'riak@prod-riak-09'
2014-06-24 09:02:41.150 [info] <0.347.0>@riak_kv_entropy_manager:perhaps_log_throttle_change:826 Changing AAE throttle from 0 -> 10 msec/key, based on maximum vnode mailbox size 319 from 'riak@prod-riak-10'
2014-06-24 09:02:56.152 [info] <0.347.0>@riak_kv_entropy_manager:perhaps_log_throttle_change:826 Changing AAE throttle from 10 -> 0 msec/key, based on maximum vnode mailbox size 1 from 'riak@prod-riak-10'
crockpotveggies
  • 336
  • 2
  • 11
  • if a node went down and never came up did you `riak-admin cluster force-remove` it? If so that probably caused a lot of unnecessary reoganization. It is more efficient to use `cluster join` and `cluster force-replace` so the new node simply assumes all of the partitions from the downed ones, without requiring all of the other nodes to trade data around. – Joe Jun 27 '14 at 23:28
  • @Joe yes we did `force-remove` it. Interesting, so does Riak automatically trade data from a downed node to a new one entering the cluster using `force-replace`? I thought that command was reserved for bringing a downed node back up after IP change, but that's hugely helpful if it does do that. – crockpotveggies Jul 02 '14 at 17:45
  • 1
    `force-replace` moves all partition ownerships from one node to another without triggering any handoff or re-balancing. This will leave the new node with much less hard drive usage until AAE or read-repair restores consistency, but it avoids potentially multi-Tb exchanges between many nodes. – Joe Jul 02 '14 at 18:17

1 Answers1

3

Okay turns out it is a weird data balancing issue in Riak. Basically, riak-08 was still recovering from a node that had been removed from the cluster (went down, never came up).

After lowering the riak-admin transfer-limit 30 everything began to normalize itself at a consistent pace. After about an hour Riak then rebalanced the data back to riak-12:

================================= Membership ==================================
Status     Ring    Pending    Node
-------------------------------------------------------------------------------
valid      25.0%     18.8%    'riak@prod-riak-08'
valid      25.0%     18.8%    'riak@prod-riak-09'
valid      25.0%     18.8%    'riak@prod-riak-10'
valid      25.0%     25.0%    'riak@prod-riak-11'
valid       0.0%     18.8%    'riak@prod-riak-12'
crockpotveggies
  • 336
  • 2
  • 11