VMXNET3 receive buffer sizing and memory usage

Question

Background

We had an incident where a Windows failover cluster suffered an interruption. A post-mortem showed that the node was "removed" as described in this article.

We've only recently migrated this cluster fully into our VMware environment, and it appears that the event described above may have been the cause of the outage.

The associated VMware KB article about this talks about increasing the Small Rx Buffers and the Rx Ring #1 setting, but cautions that increasing these too much could drastically increase memory overhead on the host.

After an audit of the Network Interface\Packets Received Discarded performance counters for our ~150 Windows VMs, 22 vNICs across 16 guests had some discarded packets.

A small enough amount that I'm not worried about taxing the hosts with additional memory usage, but I want to understand how memory is used for these settings and where the memory comes from.

Questions

What is the relationship between number of buffers and ring size?
How does one calculate the amount of memory used for given values of these settings?
Because these settings are on the NIC itself within the guest OS, I assume they are driver settings. This makes me think that the RAM used might be paged or non-paged pool.
1. Is this correct?
2. If so, should I be worried about that?
Are there concerns I'm not taking into account here?

We're trying to determine whether there is a drawback to setting these to their maximums on affected VMs, other than VMware host memory usage. If we're increasing risk of pool memory being depleted in the guest for example, we're more inclined to start small.

Some (perhaps all) of these questions may not be specific to VMware or virtualization.

I've seen really flaky stuff when the TCP offload engine of the physical NIC was misbehaving and VMs were exhibiting odd behavior, might be a lead you can follow up on. — SpacemanSpiff, Aug 19 '15 at 04:07
@SpacemanSpiff it's worth checking, but only 16 VMs out of 150+ are exhibiting the behavior. Those 16 are spread across the 12 node cluster, and they all receive occasional high bursts of traffic which seems to be what triggers the symptoms described in the KB article. Some of these are Windows clusters so they don't move with DRS, otherwise I might look into whether all affected guests showed dropped packets while on a specific host before being vMotioned off. I'll check again and see if I can find any correlations. Thanks. — briantist, Aug 19 '15 at 04:16
@SpacemanSpiff IBM servers, a few different models and revisions, also not sure which NICs, I can check specifics tomorrow. — briantist, Aug 19 '15 at 05:02

score 6 · Accepted Answer · answered May 31 '17 at 23:19

What is the relationship between number of buffers and ring size?

They're related, but independent. The rx "ring" refers to a set of buffers in memory that are used as a queue to pass incoming network packets from the host (hypervisor) to the guest (Windows VM). The memory gets reserved in the guest by the network driver, and it gets mapped into host memory.

As new network packets come in on the host, they get put on the next available buffer in the ring. Then, the host triggers an IRQ in the guest, to which the guest driver responds by taking he packet off the ring, and dispatching it to the network stack of the guest OS, which presumably sends it to the guest application indending to receive it. Assuming the packets are coming in slow enough, and the guest driver is processing them fast enough, there should always be a free slot in the ring. However, if packets are coming in too fast, or the guest is processing them too slowly, the ring can become full, and packets may be dropped (as you've seen in your situation).

Increasing the ring size can help mitigate this issue. If you increase it, more slots will be available in the ring at a time. This segues into the second setting, "Small Rx Buffers", which is the total amount of buffers available that can be used to fill the slots in the ring. There needs to be at least as many buffers as slots in the ring. Typically you want more. When the guest takes a buffer off the ring to give to the guest network stack, it may not always be immediately returned back to the driver. If that happens, having spare buffers to fill the ring means you can go longer without dropping packets.

The Rx Ring #1 / Small Rx Buffers are used for non-jumbo frames. If you have a default NIC configuration, that's the only ring that will be used.

How does one calculate the amount of memory used for given values of these settings?

Assuming you're talking about non-jumbo frames, each buffer needs to be big enough to store an entire network packet, roughly 1.5kb. So if you have 8192 buffers available, that would use 12MB. A larger ring will also use more memory, but the descriptors are small (bytes), so it's really the buffers you have to worry about.

Because these settings are on the NIC itself within the guest OS, I assume they are driver settings. This makes me think that the RAM used might be paged or non-paged pool.

Yes, it's a non-paged pool. If the ring buffers were paged, it would likely result in dropped packets while the buffers were being paged back in.

Are there concerns I'm not taking into account here?

I'm not sure this is relevant to your situation, but it might be worth noting that a larger ring will increase the cache footprint of the network rx path. In microbenchmarks, you will see that a larger ring usually hurts performance. That said, in real life applications, if a packet gets dropped, that's usually a bigger deal than a small performance gain in speed bursts.

Source: I worked at VMware.

Thanks Roger, excellent first answer. I haven't been at this company for a while so this issue has been way off my radar, but for completeness, is there a memory usage concern to setting those to their maximums? The KB article makes it sound like you could use lots of memory that way but it seems like the amount would be pretty tiny. I ask this because it's also unclear how to size these values other than trial and error, so it may be easiest to set them to max if there's no/little downside. — briantist, Jun 01 '17 at 14:17
Re: memory usage, two things I would note: 1) If you aren't using jumbo frames, I agree, the amount of memory at the maximum setting is still pretty small. If you are using jumbo frames, the buffer size is around 9kb and so you are using more memory. 2) The amount of memory available in a non-paged pool is smaller than the total amount of memory on the host. I am not an expert here but this link has a pretty comprehensive rundown on how to calculate the available memory: https://blogs.technet.microsoft.com/markrussinovich/2009/03/10/pushing-the-limits-of-windows-paged-and-nonpaged-pool/ — Roger Jacobson, Jun 01 '17 at 19:01
Great thank you. I hope this answer helps someone in the future (maybe it will even be me if I run into this again!) — briantist, Jun 01 '17 at 19:09

score 0 · Answer 2 · answered Aug 14 '15 at 13:30

I don't have a reply for point 1-2-3 but you can check with your virtual enginner about Vmware host config . If he is VCP he will understand the stuff :)

You really have to check your host because windows problems could be on the host not in the guest .

There is many hardware feature that could explain your problems , directpath io , rss , vcpu , power management scheme ...

I can give you some link that help your virtual team , or you :)

This link is about tuning the host http://buildvirtual.net/tuning-esxi-host-networking-configuration/

And this fat pdf :

http://www.vmware.com/pdf/Perf_Best_Practices_vSphere5.0.pdf

And this one is about rss :

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2008925

Thanks for the reply, but I am a VCP. This isn't really about host config at all. The Microsoft article I linked to explains that the performance counter in question should not be higher than 0, and it is on several VMs. I'm trying to gain understanding about the vNIC settings beyond what's explained in the VMware KB article. — briantist, Aug 14 '15 at 16:35

score -1 · Answer 3 · answered Jan 29 '16 at 20:48

-1

I am not in a position to fully search and point you to the right pages: so I am asking you to look for the details yourself... ( sorry )

In Fail over Cluster there are 4 settings which can be tweeked; and they will not affect buffers or paged or non-paged... It changes the way Fail over Cluster makes the decision to consider a node "removed". These settings are:

SameSubnetDelay SameSubnetThreshold CrossSubnetDelay CrossSubnetThreshold

They may not solve your problem, but tweaking these may get you out of trouble at the moment...

When back on Monday, I will check back to this post if you have further questions

HTH, Edwin.

answered Jan 29 '16 at 20:48

Edwin van Mierlo

333
1
9

PS: can you let us know the version of Windows you are running? – Edwin van Mierlo Jan 29 '16 at 20:51
This was Windows 2008. I have gotten a reply from VMware (after all these months), but I'm not even at the company where I was when this happened. The answer is not straightforward and I've been meaning to read through their answer and post something, but I haven't had the time. I appreciate your tips about the cluster but I can't try them out at this time. – briantist Jan 29 '16 at 21:00
I only notice that the original post is a couple of months old, that was not very clear in the android-app... I will have a closer look next time... meanwhile my answer is still valid for other users who may search for similar experiences. – Edwin van Mierlo Feb 01 '16 at 09:51

VMXNET3 receive buffer sizing and memory usage

Background

Questions

3 Answers3