0

I would like to set a policy such that my Failover Cluster will always come into service, even if only one (of the two nodes) is available.

Background: I have only two nodes in the cluster, plus a witness quorum in a share on the DC. For this question assume that the DC stays in-service. (Windows Server 2019).

If I shutdown node1, then node2 will be active. If I then shut down node2, then cluster will be stopped (obviously), however, if I then start only node1, the cluster will never recover. Not only will it not recover, without node2, but I don't see an easy way to make the cluster come into service with the cluster manager. The only way I can recover the cluster, in this scenario, would be to start node2, however, that does not seem (to me) to be real high-availability. IMO I should be able to set a policy or have a reasonably easy way to bring the cluster back on-line (perhaps after a waiting period), even if node2 never recovers.

Am I just thinking about this the wrong way or missing something obvious?

UPDATE: I do see an error:

Node 'SOM2' failed to form a cluster. This was because the 
witness was not accessible. Please ensure that the witness 
resource is online and available.

However, the witness was available at that time, which makes me suspect that this is a permission issue, that is, the witness share is available to the cluster but not the cluster service accounts on each node. Is that possible?

Is there some special permission setting on the witness share to ensure it can be accessed by the local service accounts on each node?

Update:

To fix the permission error (not the central problem), I needed to use a powershell command from:

https://docs.microsoft.com/en-us/powershell/module/failoverclusters/set-clusterquorum

Check the permissions on the witness to allow full control by the correct domain account, such as a service account where the password never expires and cannot be changed. Then, on a cluster host, first get rid of the current witness configuration:

Set-ClusterQuorum -NoWitness
Get-ClusterResource

if needed:

Remove-ClusterResource -Name "File Share Witness"

or remove it using Failover Cluster manager

then, re-add the file-share witness with necessary domain credentials to allow access:

Set-ClusterQuorum -NodeAndFileShareMajority \\server\path-to-witness -Credential $(Get-Credential)
Gregor
  • 456
  • 2
  • 11

2 Answers2

2

As @stuka noted, this is by design. The file was locked by a live node before the whole cluster went down. There's no way for Node1 to know that Node2 is not actually online but inaccessible over the cluster network. It has to rely on the locked file as being correct. It would be far worse for Node1 to come online in that scenario as if the cluster network went down, neither node would be able to break the quorum voting tie.

If you actually encounter this scenario, you have to edit the quorum settings and force a node back online manually.

In practice this shouldn't be of concern because it would be rare for the cluster to ever go entirely offline.

Two node clusters will always have a compromise in terms of HA. The witness file share establishes quorum, but it cannot cover all scenarios. A 3-node (or other odd node) cluster would provide better fault tolerance.

Conure
  • 71
  • 2
0

If the quorum witness share is accessible to the online node, it should definitely be able to bring the cluster online. This is standard WSFC behavior. If your cluster is not starting and the witness share is online, something else must be preventing it from starting. Look for any errors.

Also, how are the cluster quorum settings configured?

See here for reference: https://docs.microsoft.com/en-us/windows-server/failover-clustering/manage-cluster-quorum.

Massimo
  • 68,714
  • 56
  • 196
  • 319
  • Updated to add information on an error. – Gregor Nov 09 '21 at 01:05
  • The cluster is represented in Active Directory by a computer account with the same name as the cluster itself; that computer account, also known as the "Cluster Network Object", needs full control permissions on the witness share. – Massimo Nov 09 '21 at 10:07
  • 1
    I think behavior OP is facing is expected. Node 2 is an owner of the witness (locks the file share). Node 1 can't lock it. So until node 2 is back online, cluster is not available, because there is no quorum. https://techcommunity.microsoft.com/t5/failover-clustering/understanding-quorum-in-a-failover-cluster/ba-p/371678 – Stuka Nov 12 '21 at 15:39