VMware Cross-NUMA Performance penalty

Question

I've been experiencing random unexpected slow-downs of a virtual SQL Server that I can't attribute to workload, storage or CPU (in fact it continued after the host was evacuated of other VMs).

I suspect it might be related to the NUMA configuration - particularly how physical memory is mapped.

Running coreinfo shows the following cross-NUMA Node Access Cost:

Approximate Cross-NUMA Node Access Cost (relative to fastest):
     00  01
00: 1.0 1.3
01: 1.4 1.5

Which seems odd - I'd have expected 01-01 to be closer to 1.0, and the penalty to be between nodes.

I think this suggests that memory is being allocated on the first pNUMA node on Vmware and might be causing performance penalty for memory access from the second vNUMA node.

With SQL Server being NUMA-aware, could it be making assumptions about the impact of cross-NUMA memory access that would impact performance in this scenario (ie trying to keep access on the one node and avoiding cross-NUMA access)?

Are there any steps I can take to try ensure that memory is being evenly allocated across pNUMA nodes?

Host is as follows:

vSphere 6.7.0
2x Xeon Gold 5217 (8 Core)
768GB total memory

The VM is as follows:

12x vCPU (3 core per socket = 4 sockets)
320GB RAM
Windows 2012 R2
SQL Server 2016 Enterprise

EDIT: x-mem is showing the following which doesn't match up with coreinfo

xmem-win-x64.exe -j6 -s -R -l -f test.csv -n5

         00      01
00  1.21124 1.18519
01  1.19831 1.18695

Have you applied any NUMA related configuration to the VM? What settings have you set? — Michael Hampton, Aug 19 '20 at 00:17
It's time to go experiment, then. Bearing in mind that it's production, though pretty much any change you make is likely to improve the situation. — Michael Hampton, Aug 19 '20 at 14:10
Indeed. I had a quick play in the development environment. On one host NUMA 1-1 was at 1.0 and the other was showing similar results to the current production environment. Adjusting the cores/socket made no difference. However testing the memory performance with Intel MLC didn't show any performance impact - so perhaps `coreinfo` isn't a useful tool for this. Either way more testing is needed :) — Peter Godwin, Aug 20 '20 at 01:29
xmem is showing something different (see edit) which doesn't make any sense. Maybe I'm on a single pNUMA node with the some of the vCPU being HT from CPU0? — Peter Godwin, Aug 20 '20 at 02:04
Do you have an option to use a more appropriate CPU and hardware setup for the workload? — ewwhite, Aug 20 '20 at 02:12
What would be more appropriate? I think the hardware would be fixed - other hosts have a similar configuration. — Peter Godwin, Aug 20 '20 at 02:37

VMware Cross-NUMA Performance penalty

0 Answers0