Ran into an interesting issue. So we have a 3-node vsan cluster (all three nodes contribute computer and storage). We can call these three nodes esxi01, esxi02, and esxi03. Users reported errors and upon investigating the following was noticed:
- vCenter was unavailable
- Host esxi01 was completely hung
- Was able to login to esx02/03 directly and chunk of the VM's...our vCenter VM was showing as invalid. An attempt to unregister and register now shows the name of the vm (vcenter-server).
- Shutdown esxi01/02/03 and started the servers back up.
At this point esxi02 and 03 are formed a vsan cluster and esxi01 is in its own cluster (esxcli vsan cluster get). I attempted to leave the cluster on esxi01 (esxcli vsan cluster leave) and rejoin (esxcli vsan cluster join -u <uuid of Sub-Cluster UUID: from esxi02/03 cluster). The command does not fail but when running esxcli vsan cluster get on esxi01, it shows itself as the sub-cluster master with only itself in the cluster.
I have verified that there is not a firewall in between blocking it, all nics are online, the vmk for vsan traffic can communicate between all three hosts, and ran a tcpdump on esxi01 and can see port 12321 traffic.
Any thought on what could be causing this?