I have a problem with a HP ProLiant ML350p Gen8 server. Most of the time it runs fine, but after several weeks of uptime, the server crashes out of the blue. This happened about 5 times now. When it crashes, the OS (VMWare ESXi 5.5) stops working and the fans are running on full speed. Pressing the power button doesn't change anything then. I have to unplug and plug back in the power cable to get it to restart. I've done a memtest without any errors. I also didn't find anything in the logs. Do you have any ideas how to solve this?
Asked
Active
Viewed 1,254 times
1
-
Do you have support on this server or through VMware? If you do, you should start there. Otherwise, you need to dig deeper in your logs. Have you checked the VMware diagnostic logs, or only the hardware logs? – tfrederick74656 Dec 10 '15 at 22:56
-
We're also experiencing similar random fan issues (started Dec '15) on two of our ProLiant DL380 Gen9 running ESXi 5.5. Have contacted and sent HP support ilo active health system and vmware logs, with the usual response to update driver/firmware. The latest was regarding the Bios, and NIC's. This is the only post I can find anywhere regarding all fans running 100% ending up with VM's and host eventually crashing. Although, in one occurrence we were able to quickly identify and reboot a hung VM without rebooting the host (fans eventually went back to normal). In another, all fans were 100% but – Feb 26 '16 at 20:26
1 Answers
3
There are a couple of reasons this could be happening.
- Firmware.
- Updates.
- possibly hardware.
Please see: http://meta.serverfault.com/q/6195/13325
If you're running Windows virtual machines configured with Intel e1000 virtual NICs, there is a chance that your VMware host is crashing. That's resolved with updates to ESXi and/or a change in your vNIC configuration.
If you're running old HP firmware, please update it.
Since you have HP hardware, please look in the ILO and the IML log to get a detailed reason for the crash. That will tell you if you're facing a hardware issue.
Memtest+ is useless on server equipment like this.
-
Thanks for the detailed answer. I will check and update everything next week. Since the problem occurs only occasionally, it can take some time to report if this helped. – Stocher Kahn Dec 11 '15 at 12:55