I have a host that is part of a 4 host cluster in HA.
Sometime yesterday I noticed the host stopped responding, in the vsphere console it shows up greyed out as (not responding) and all VMs on it show up as (inaccessible). The VMs them self are still running normally, I can remote desktop to them and everything is up. There are critical servers on this machine. I have tried to right click the host and "Connect" after a few hours it simply fails. I cannot move the VMs on it, all actions are greyed out. On the Host pressing F2 gives me the login prompt, after entering my credentials nothing happens. ALT+F1 doesn't let me do anything as it's not enabled. SSH is not enabled. With ALT+F11 I can see that hostd has crashed, that's probably the problem. I have called Vmware as I have full support but after a very short call he said there's nothing to do but to forcefully shutdown the host.
I would rather not do that, I would like to restart the hostd but I can't seem to have any access. I tried PowerCLI but connection to the host times out. Vsphere directly to the host also times out. Pinging the host works, so there is network at least.
Anyone know any other way to get the shell?
Thanks.
More info: Running ESXi 5.5.0 1331820, on a Dell PowerEdge R720, Dell PERC H710
I checked the DRAC and the local volume is healthy. It's actually only a raid 1, all VMs are on a SAN. The vmware esxi welcome page works, but if I click on "browse datastores in this host's inventory" it never shows up. The mob seems to be working properly also "hostip/mob/?moid=ServiceInstance&doPath=content";
On the ALT+F11 console: 2014-09-11T7:15:02.329Z cpu12:57750311)hostd detected to be non-reponsive
The same line, different time and cpu 11 times.