I've been experiencing random unexpected slow-downs of a virtual SQL Server that I can't attribute to workload, storage or CPU (in fact it continued after the host was evacuated of other VMs).
I suspect it might be related to the NUMA configuration - particularly how physical memory is mapped.
Running coreinfo
shows the following cross-NUMA Node Access Cost:
Approximate Cross-NUMA Node Access Cost (relative to fastest):
00 01
00: 1.0 1.3
01: 1.4 1.5
Which seems odd - I'd have expected 01-01 to be closer to 1.0, and the penalty to be between nodes.
I think this suggests that memory is being allocated on the first pNUMA node on Vmware and might be causing performance penalty for memory access from the second vNUMA node.
With SQL Server being NUMA-aware, could it be making assumptions about the impact of cross-NUMA memory access that would impact performance in this scenario (ie trying to keep access on the one node and avoiding cross-NUMA access)?
Are there any steps I can take to try ensure that memory is being evenly allocated across pNUMA nodes?
Host is as follows:
- vSphere 6.7.0
- 2x Xeon Gold 5217 (8 Core)
- 768GB total memory
The VM is as follows:
- 12x vCPU (3 core per socket = 4 sockets)
- 320GB RAM
- Windows 2012 R2
- SQL Server 2016 Enterprise
EDIT:
x-mem is showing the following which doesn't match up with coreinfo
xmem-win-x64.exe -j6 -s -R -l -f test.csv -n5
00 01
00 1.21124 1.18519
01 1.19831 1.18695