2

This article stays that you should use both fencing (aka STONITH) and redundant communication links. I'm trying to understand what benefits STONITH gives in case of split-brain. Consider for example you have node A, node B, STONITH device that are connected using switch I. If switch I dies - network is partitioned - and node A could not send request to STONITH device, so its useless.

Ok, we might have dedicated switch II that connects nodes with STONITH device. If switch I fails, we're still able to send signal to STONITH device and it might power off node B.

But the question is, why not just use switch II as a redundant communication path between node A and node B? If switch I fails, you can still use switch II and no need to shutdown node B.

kubanczyk
  • 13,502
  • 5
  • 40
  • 55
andrershov
  • 123
  • 2

1 Answers1

4

The idea is that when your cluster decides it requires to fail-over a node, there must be something wrong with that node.

Fencing, shooting the other node in the head (STONITH) is the best guarantee that the failed node will release all the resources it was holding, and odds are that if the issue was software related, it will be fixed by a hard reset, which is a nice bonus...

You build a cluster typically because the clustered service does not support running it concurrently, and when two instances do run concurrently things go horribly wrong. A lot of effort goes into preventing that. High-availability often omes a distant second in your priorities as a cluster designer.

HBruijn
  • 72,524
  • 21
  • 127
  • 192
  • But as long as you have redundant communication links between node A and node B, and node A does not receive heartbeats from node B, it might consider node B as dead. What is the reason to have STONITH? – andrershov Sep 15 '14 at 16:28
  • 1
    Ok, I see your point. You stay that STONITH is not able to help with network partitions, but it might help with misbehaving node that does not send heartbeats but still holds some resources? – andrershov Sep 15 '14 at 16:32
  • Yes, additionally, in case you are heading towards a split brain scenario, both nodes thinking the other dead, only one will be the first to have successfully fenced the other :) *"the quickest draw wins"* to keep up the shooting analogy. The last one standing takes all the spoils, and takes over all services that were held by the "losing" node. – HBruijn Sep 15 '14 at 16:44