1

I have a Server 2008R2 Hyper-V cluster with 2 nodes. They use a CSV on a SAN. I use SCVMM to manage them. We recently had several crashes that caused a failover, making virtual machines die and start up on the other node. For the most part, this worked fine. At one point during a power failure, both nodes were unable to access the SAN for a moment, to the CSV went offline. Bringing it online in Failover Cluster Manager worked, and most of the virtual machines started just fine.

One virtual machine however will not start.

  • In SCVMM, it shows as missing.
  • In Failover Cluster Manager, it shows as Offline, with the "SCVMM hostname Configuration" resource failed.
  • Trying to start the failed Configuration resource, or move the virtual machine to the other node results in a 5 minute wait, followed by the error "Error Code: 0x80071714 The group is unable to accept the request since it is moving to another node".

Besides the error above, there don't seem to be any recent relevant logs in the failover cluster or windows event logs on either node. There are some in Critical events I can see in failover cluster manager from when the failures happened last week:

  • Event ID 21502: 'SCVMM hostname Configuration' failed to register the virtual machine with the virtual machine management service.
  • 25 minutes later, Event ID 1230: Cluster resource 'SCVMM hostname Configuration' (resource type '', DLL 'vmclusres.dll') either crashed or deadlocked. The Resource Hosting Subsystem (RHS) process will now attempt to terminate, and the resource will be marked to run in a separate monitor.
  • That one was repeated 3 more times, 5 minutes apart.
  • No logs since then.

I've looked at files on the SAN. All of them appear to be intact. The XML configuration file seems to be valid (some research showed this could happen if the XML file got corrupted).

Edit: I have also run the cluster validation report. Besides the failed resource and some expected errors that it couldn't test the disks while they are online, everything looks fine.

How do I go about getting this virtual machine running again?

Grant
  • 17,671
  • 14
  • 69
  • 101
  • This may be the time to call Microsoft support. If the VMM database is in an inconsistent state, a MS support call may be the only *supported* method of resolving the error. This is why it's so important to have UPSes with shutdown software. – MDMarra Jan 03 '14 at 19:40
  • @mdmarra its not really vmm that's messed up its failover clustering. But youre right support may be my best option. Google has found nothing but my own post. Thank god its a test VM not production. – Grant Jan 03 '14 at 21:15

1 Answers1

0

Despite not knowing exactly what caused the problem, it was pretty easy to get the VM running again:

  • Figure out which node the problem VM is on
  • Put it in maintenance mode in VMM (or just live migrate everything off that node). The problem VM will still be stuck on that node.
  • Stop the cluster service on that node, then start it again.

When I stopped the cluster service, the VM was immediately taken over by one of the remaining nodes and started up automatically.

Grant
  • 17,671
  • 14
  • 69
  • 101