4

I have some servers in this configuration:

Diagram of VMWare ESXi Network Configuration

(complete configuration) enter image description here

And I am not able, from VMGuest1, to ping either VMGuest3 or VMGuest4. I can, however, ping Host1 and Host2, which are attached to pSwitch1. The behavior is the same with VMGuest3 or 4 trying to ping VMGuest 1 or 2.

I don't have promiscuity enabled for any of these switches, nor do I have a bridge set up inside ESXi for the virtual switches. I know that one of these options is usually necessary when trying to get connectivity between two virtual switches. These switches are connected, however, through their respective physical switches which are bridged together.

Ping just times out, arp request looks like this: [root@vmguest1:~]# arp -a vmguest3 vmguest3.example.com (1.2.3.4) at <incomplete> on eth0 [root@vmguest1:~]# arp -a host1 host1.example.com (1.2.3.5) at 00:0C:64:97:1C:FF [ether] on eth0

VMGuest1 can reach hosts on pSwitch1, so why can't it get to hosts on vSwitch1 through pSwitch1 the same way?

ewwhite
  • 194,921
  • 91
  • 434
  • 799
Aaron R.
  • 467
  • 1
  • 7
  • 21
  • Just to make sure: This is one network/VLAN we're talking about, right? Something like everything's in VLAN x and network 1.2.3.0? – Mario Lenz Aug 22 '14 at 19:39
  • If you want the VMs to talk to each other, then why not put them on the same vSwitch? What you have up there is a lot of complexity that can easily be avoided. You can still achieve uplink redundancy even if everything is on the same vSwitch. – Reality Extractor Aug 22 '14 at 19:44
  • Btw: Your setup looks a bit weird. If pSwitch0 dies, both VM Guest1 and 2 loose their network connectivity. Why isn't vmnic1 connected to pSwitch1? – Mario Lenz Aug 22 '14 at 20:08
  • @Mario Yes, this is on the same VLAN. If it had to route out to the gateway, maybe I wouldn't have this problem. – Aaron R. Aug 22 '14 at 21:16
  • @RealityExtractor unfortunately, there are only two methods in VMWare for network redundancy: Link Status and Beacon Probing. Link Status is not good enough for us, since if a switch has issues downstream (has happened twice now), then it doesn't switch over. Beacon Probing doesn't work since it requires 3 separate pSwitches, which we do not have. I actually simplified the setup so I could get an answer to my question; each of the VM hosts actually have two vNICs, one connected to vSwitch0 and the other to vSwitch1. I'm using arp_ip_target redundancy inside the VMs. – Aaron R. Aug 22 '14 at 21:19
  • 1
    The complete setup, for those interested: http://www.gliffy.com/go/publish/image/6082545/L.png – Aaron R. Aug 22 '14 at 21:41

2 Answers2

3

Bonding your NIC connections inside of a virtual machine is akin to using software RAID inside of a VMware guest. You can do it, but it's not a reasonable method of protection for a VMware system.

Are you using managed switches?

I'd recommend simplifying your solution:

  • Place your VMs on the same vSwitch if they need to communicate with each other.
  • The uplinks from the vSwitch can go to one or more physical switches.
  • Ideally you can set up a stack between the physical switches, with vSwitch uplinks to each, but even a resilient bond between the physical switches (2 x 1GbE) will do the job.

That's really it... Is there something wrong with a configuration like:

enter image description here

or the more complex design described at: vSwitch configuration with 12 uplinks

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • At this point I am expecting to be wrong about this, but it _does_ look like there is something wrong with your configuration. It doesn't look like it relies on the link state being correct on the switch (meaning it can always pass traffic whenever the lights are green). Am I missing something here? Because we haven't been able to rely on link state so far for redundancy. Even when a switch reboots, there are some minutes where the link light shows up green when it isn't passing traffic. That case is permanent if it comes up having lost all its config, which happened to us recently. – Aaron R. Aug 25 '14 at 15:44
  • What type of switches are you using? It's rare that I'll modify link state tracking. – ewwhite Aug 25 '14 at 15:45
  • Or beacon-probing, for that matter. – ewwhite Aug 25 '14 at 15:52
  • I validated that beacon probing requires 3 separate switches to function correctly here http://serverfault.com/questions/510347/esxi-beacon-probing-limitation-three-switches-required. We're a small environment, and only have two switches for each VM host to plug in to. (We technically have two 4948's, and then two 2960's that are each connected to a 4948 as a layer-2 extension of those switches). The default for Link State Tracking is disabled, and I don't believe we have it enabled. – Aaron R. Aug 25 '14 at 16:15
  • I guess I'm not worried about my switches rebooting... Typically I'm using a chassis switch (HP 5400zl or Cisco 4500) *or* stacked switches with switch uplinks going to diverse switch blades or stack members. This is not really a vSphere problem. You should work on stabilizing the switch connections and possibly link-state tracking. But really, your switches shouldn't be rebooting, right? – ewwhite Aug 27 '14 at 12:38
0

So this is your setup:

enter image description here

A bit complex, I would advise you to not do NIC bonding inside your VMs.

Anyway: Both vmguest1 and 3 can ping host1 but not each other, right? To investigate this, start by issuing

[root@vmguest1:~]# arp -a vmguest3
vmguest3.example.com (1.2.3.4) at <incomplete> on eth0
[root@vmguest1:~]# 

and use tcpdump to see what's actually arriving and leaving your vNICs. Does the ARP request reach vmguest3? Does it answer on eth0, eth1 or both?

Mario Lenz
  • 1,612
  • 9
  • 13
  • unfortunately I had discovered the issue in our prod servers, and haven't had time to replicate the issue in one of our other environments, but I can confirm your first thought: Both vmguest1 and 3 can ping host1 and 2, but not each other. The ` on eth0` actually shows up as ` on bond0`. – Aaron R. Aug 25 '14 at 20:55