8

After installing SQL Server Enterprise 2012 with the Server + Cal license model, on a computer with 2 processors each with 16 cores (and no hyperthreading involved) and putting the server under extremely heavy load the 16 cores on the first processor were very underutilized, the first 4 cores on the 2nd CPU were heavily utilized, and the last 12 cores were not used at all (because of the 20 core limit for this sql server version). Total CPU utilization was displaying as around 25%. Unfortunately, the server suffered from extremely poor performance even though if the tasks were evenly distributed across the 20 cores it wouldn't have been anywhere near as bad.

The Windows Server was running on a VMWare virtual image under ESX Server, but all of the CPU was allocated to the windows server.

We tried changing affinity settings (e.g., allocating most cores to CPU and the others to I/O), but that didn't help solve the performance problems.

Upgrading the product edition to SQL Server Enterprise Core 2012 not only allowed the SQL Server to utilize the 12 previously unused cores on the 2nd processor, but it also resulted in a much more even distribution of tasks across all of the processors. To get through the backlog of requests cpU utilization jumped to around 90%, and then came down to around 33% once it was caught up, but performance improved dramatically since we failed over to the newly updated version And the performance issues went away.

I was wondering if anyone knows what might cause SQL Server to unevenly distribute the load, relying almost exclusively on the first 4 cores of the 2nd processor that had 12 cores idle, and allocate only a few tasks to each of the 16 cores on the first processor. Also, is there any way we could have more evenly distributed the load across the 20 cores that were being used without the product edition upgrade?

The flip side of that question is what did the product upgrade do that caused SQL Server to start evenly distributing the load across all of the cores that it recognized?

Thanks to any insight to answer these questions and/or links that might help me better understand how to make sense of what was happenings.

cooplarsh
  • 91
  • 1
  • 2
  • Are you saying that the machine in question is a VM with 32 vCPUs? In both scenarios? – mfinni Nov 12 '12 at 17:08
  • Yes, it was the same machine which had 2 processors, each with 16 cores (and no hyperthreading was involved). – cooplarsh Nov 14 '12 at 19:50
  • 1
    Why in the name of THE LORD do you have 32 vCPUs? Have you tried reducing that? I know that ESXi has improved its gang-scheduling of CPUs, but you're just asking for trouble with that many. What version of ESXi are you on and what's the underlying hardware? – mfinni Nov 14 '12 at 21:23

1 Answers1

4

The uneven performance was likely a combination of the 20-core limit combined with the way that sql server schedules threads on NUMA machines. Unfortunately, SQL Server 2012 doesn't use any intelligence in deciding which 20 cores to utilize, resulting in an unbalanced number of cores per NUMA node. With 32 cores spread across 2 NUMA Nodes, you'll likely end up with a 16/4 split. This is problematic because SQL will try to balance activity equally across NUMA nodes in a round-robin fashion (assuming that you're not forcing affinity w/resource governor).

In your case, 1/2 the load is assigned to 4 cores, and 1/2 to 16 cores. The bottleneck on the 4-core node effectively acts as a throttle, limiting the capacity of the machine to 2x 4 cores = 8 cores = 25% CPU usage.

Once you upgraded to core edition, then sql utilized all 32 cores across 2 numa nodes (16/16 split). Performance improved, etc.

One option that could have improved your performance would have been to utilize sql server resource governor to affinitize the majority of your workload to one numa node. For example, you could create a resource pool WEB_APP, and affinitize it to only run on the 16 core numa node. The load assigned to the WEB_APP pool could utilize 50% of the server capacity, plus the remaining 12.5% capacity from the 4-core node.

The other option would be limit the cores available to sql server to only be 10 from each numa node.

StrayCatDBA
  • 168
  • 5