0

I have a server that seems to be cursed. Every 2-5 days it fails (I'm still pretty new to this stuff)

I've spent the last couple weeks or so trying to find solutions. But I'm clearly not getting anywhere. Any reference to a PSOD I've found is related to much older versions, directly caused by a VM running on the box, HP iLO drivers, or some other situation that doesn't seem to apply to my issue.

I wasn't able to get anything useful out of the wall-o-text, other than something about the vCenter Server agent and possibly something about my 10g NIC? (which I've since removed, just in case)

Here are a couple pics.

I should have more, but I apparently didn't save them?

Hardware Currently In Use


Hardware I've tried so far

  • 3 motherboards
    • Supermicro X9DRI-F-O
    • intel extreme dz77ga-70k
    • intel Desktop Board DX58SO
  • 2 sets of CPUs (a dual socket set and a single socket)
    • Xeon E5-2670
    • Xeon X5650 (This may not be right right cpu, but it's currently buried
  • 3 PSUs
    • 550w
    • 700w
    • 850w
  • 2 sets of tested RAM (one set ECC)
    • 32GB DDR3 ECC
    • 16GB DDR3
  • 2 install devices (one spinning rust, one USB)

Other things I've Tried

  • Reinstalling
  • Re-downloading install media and reinstalling from a different (and the same) USB drive
  • Having no VM's running

I kind of cheaped out on the chassis, so I'm not super confident in the backplanes. I've also tried swapping the bays around. and just connect straight to a SATA connection on the motherboard

And the server is behind a UPS, so power weirdness shouldn't be an issue.

I'm 99% sure there is something obvious that I'm missing, but after smashing your face into a wall for a while it starts to get hard to see. Luckily I needed to start posting questions to get rep anyway..

Dave M
  • 4,494
  • 21
  • 30
  • 30
Wayne
  • 27
  • 3
  • 3
    Well, none of these mainboards are on the [HCL for ESXi 6.5](https://www.vmware.com/resources/compatibility/search.php). Get supported hardware. – Gerald Schneider Mar 23 '18 at 07:41
  • fair enough, thank you. looks like switching back to xen or proxmox is what i'm going to have to do – Wayne Mar 27 '18 at 05:07

1 Answers1

2

of course you should use supported hardware if possible. There are also driver-sets for different vendors like HPE or Dell. So it simply could be a driver problem.

Another thing that i don't understand is why are you using Build 4564106 if you already reinstalled ESXi?

The current Build is 7388607. I don't know the whole patch history but i think it could not be bad to use the newest version with the newest bugfixes...

frupfrup
  • 853
  • 3
  • 13
  • 27
  • 1
    Because people [never update their VMware installations](https://meta.serverfault.com/q/6195/13325). I don't understand it. – ewwhite Mar 23 '18 at 13:31
  • That may be pictures from the earlier crashes, but i'll double check that I reinstalled with the newest version and will look into drivers I'm gonna be a little peeved if i re-installed using the same old version instead of the new one lol >.< thanks for the help – Wayne Mar 24 '18 at 20:41
  • doesn't appear to be build related. newest pic with build 7967591. https://drive.google.com/open?id=1JmZtQ-nk2SrSP6sVOvB5qhEKPJ2NvAcz – Wayne Mar 26 '18 at 09:43
  • Note that the newest version of ESXi is not available for install. You have to download and install the newest patches manually after the installation. – Gerald Schneider Mar 27 '18 at 05:10