Bandwidth bottleneck on VMWare ESX Virtual Machine

Question

I have 2 loadbalanced apache virtual servers that handle a couple thousand requests per minute, and I am trying to diagnose the bottleneck that is slowing them down.

My webservers each have one virtual NIC in them, and their VMWare hosts have 7 gigabit NICs in each of them. All of these physical NICs feed into 100Mb/s switch ports.

At first I thought that the VMWare hosts would agregate all of the bandwidth available to them, and dole it out to the virtual machines according to demand. However, I'm wondering now if I'm wrong about that.

The way my coworker explained it to me, if I only have one virtual NIC in a VM , it will bind to a single physical NIC, rather than agregate the bandwidth of all of them - so then in my situation, that is limited by the bandwidth of the switch port to 100 Mb/s. is that correct?

Also, would two 100Mb/s connections (one on each server) be a bottleneck on a site that is only handling 1000-2000 requests per minute?

Helvick · Accepted Answer · 2010-01-30T07:19:33.220

As far as the NIC teaming is concerned your coworker is more or less correct.

By default NIC teaming in ESX maps each Virtual NIC in your VM's to a single uplink (the physical NICs) on the vSwitch that it is connected to. The specific NIC load blancing policies are:

Port ID: all traffic from each Virtual NIC is mapped to one uplink based on port number.
Source Mac hash: all traffic from each Virtual NIC is mapped to one uplink based on a hash of the Virtual NIC's MAC address
IP hash: a hash of both the source and destination ip-addresses for ip-based traffic is used to select the uplink.

Of these three only IP hashing will give you any aggregation effect. ESX can only control outbound traffic path selection so to get proper distribution of both inbound and outbound traffic your phyisical switches must also be configured appropriately for port aggregation (Etherchannel\LACP).

There is a very useful VMware KB article here about how to configure various switches (Cisco\HP) so that both inbound traffic (where the switch has to decide the path selection) and outbound (where ESX handles the path selection) are distributed.

Note that none of these policies will ever result in traffic being distributed across more than one uplink for traffic between a single source ip-address and a single destination, they only provide aggregation when there is a range of ip-addresses involved.

To answer your second question - that depends on how much data each request involves. A single 100Megabit connection can push through about 8000 packets/sec, possibly more if the payload size is much smaller than 1500 bytes. However (simplifying things massively and ignoring overheads) if a typical request involves 30k of data for example, then each one will need 20 packets to complete so the NIC can theoretically handle about 250 such requests/sec. If your requests involve 1Meg of traffic on average you are down to 7/8 requests per second at best and under normal conditions I'd say you'd be doing well if you got actual numbers that were >50% of those rates.

That is just to get a rough idea if the link can carry the raw data bandwidth. To get closer to what a specific link can handle you also need to factor in all the handshaking involved, end to end latency between the client and the server, how many concurrent connections your architecture can keep in flight and a lot else.

Does it make sense then, that if I were to add a second virtual card to the VM, that it would potentially double the bandwidth available to that virtual machine (assuming other VMs aren't already saturating the bandwidth)? — Brent, Jan 31 '10 at 22:44
This will only have any effect if you have some mechanism of distributing traffic between the two virtual NIC's within the Guest OS. In general adding additional VM NICs that are connected to the same Port Group on a vSwitch has no effect but if you have an app that can distribute your connections across multiple ip-addresses within the guest then multiple NICs (with separate ip-addresses) will give you more bandwidth and throughput provided you are sure the uplinks are going to be different. — Helvick, Jan 31 '10 at 23:19
802.3ad ethernet bonding (mode 4) could be used on the VM side, I just wasn't sure what would happen on the host side, or whether the virtual switch would support this. — Brent, Feb 03 '10 at 13:15
@Brent ESX vSwitches use 802.3ad for their teaming so I can't see that multiple NIC's in a VM (using 802.3ad in the Guest OS) connected to separate VM ports would achieve anything more than a single VM NIC feeding into a vSwitch that used 802.3ad on its uplinks. — Helvick, Feb 03 '10 at 15:23
is 802.3ad optional on the ESX vSwitches? Because I don't think it is enabled in our deployment. — Brent, Feb 04 '10 at 00:05
You will need 802.3ad enabled on your physical switches for the "Route Based on IP Hash" load balancing policy to work. — Helvick, Feb 04 '10 at 00:14

Bandwidth bottleneck on VMWare ESX Virtual Machine

1 Answers1