2

We have one ESXi server (HP-ESXi-6.0.0 build-2492585) that has multiple VMs on it that have exhibited the following behavior:

When the VM is rebooted it will occasionally just lose all network connectivity. Access the command line of the VM from the host console in vSphere you can see the machines keep their network cards, network settings, etc. Cannot ping anything on the network or the gateway. No errors indicated in VM or host side logs that I can find. Once the error occurs subsequent reboots don't seem to change the behavior, though, that aspect has only been tested lightly.

The most direct way we've found to address the problem is to kill the current vNIC and add a new one. Sometimes simply changing the vNIC driver from VMXNET3 to E1000 works but I've recently found that the more likely 'fix' is when I change the MAC address from Automatic to Manual. We've definitely had occurrences where deleting the vNIC and adding it back with just a new adapter type alone does not do the trick, but changing the MAC does.

We have 3 other ESXi hosts on the same hardware and ESXi version where the VMs don't exhibit this behavior.

This occurs on VMs with both Linux-based and Windows OSes.

This issue can occur when the entire VM Host is rebooted. Actually the original manifestation of the issue occurred after a VM Host reboot. Only more recently have we found this can also occur when a single VM is rebooted or otherwise power-cycled.

Any insight about where or what to look for in the log files or thoughts on how to combat this issue would be greatly appreciated!

Sam K
  • 506
  • 5
  • 20

1 Answers1

1

As changing the MAC make the network to work again I would check the uplink port from the host, where it connect to.

To make sure the spanning tree and all the ports settings are the same for all your hosts ports. As it seem to me a switch problem, a bit tied to MAC poisoning. If a cisco switch I would do a show mac address-table to list your mac per port if I remember right the command.

yagmoth555
  • 16,300
  • 4
  • 26
  • 48
  • Somehow I hadn't really given the uplink switch much consideration. In my head I figured it was going wrong in the vSwitch or something in ESX. A lot of our switches are 'enterprise' bargain bin switches, with a couple old Cisco thrown in the mix. A server room overhaul and a re-ordering of switches was already on my to-do list but i think it just jumped a few pegs. This server in particular is NIC teaming off the same bargain bin HP switch. Not sure how that happened, its listed wrong in our port list and I never actually checked, the rest are at least in separate switches. – Sam K Jul 06 '18 at 20:03
  • Take a good close look at your bonding (teaming) config on both the physical and vSwitches. That's likely to be where the problem lies. – Brandon Xavier Jul 08 '18 at 07:05