Background
We had an incident where a Windows failover cluster suffered an interruption. A post-mortem showed that the node was "removed" as described in this article.
We've only recently migrated this cluster fully into our VMware environment, and it appears that the event described above may have been the cause of the outage.
The associated VMware KB article about this talks about increasing the Small Rx Buffers
and the Rx Ring #1
setting, but cautions that increasing these too much could drastically increase memory overhead on the host.
After an audit of the Network Interface\Packets Received Discarded
performance counters for our ~150 Windows VMs, 22 vNICs across 16 guests had some discarded packets.
A small enough amount that I'm not worried about taxing the hosts with additional memory usage, but I want to understand how memory is used for these settings and where the memory comes from.
Questions
- What is the relationship between number of buffers and ring size?
- How does one calculate the amount of memory used for given values of these settings?
- Because these settings are on the NIC itself within the guest OS, I assume they are driver settings. This makes me think that the RAM used might be paged or non-paged pool.
- Is this correct?
- If so, should I be worried about that?
- Are there concerns I'm not taking into account here?
We're trying to determine whether there is a drawback to setting these to their maximums on affected VMs, other than VMware host memory usage. If we're increasing risk of pool memory being depleted in the guest for example, we're more inclined to start small.
Some (perhaps all) of these questions may not be specific to VMware or virtualization.