Questions tagged [numa]

Non-Uniform Memory Access is what it stands for. For x86 architectures this is the method used to handle memory architectures where each processor has local memory and accessing another processor's memory is appreciably more expensive.

Non-Uniform Memory Access describes a memory architecture in which RAM is partitioned into more than one locality. Localities are called Nodes, and in most commodity hardware correlates to CPU processor sockets. In such systems access times to RAM is dependent upon which CPU is calling the FETCH and which NUMA Node the requested RAM resides in. RAM that is local to the CPU node will be fetched faster than RAM local to another CPU node.

NUMA-enabled systems provide hints to the OS in the form of certain BIOS structures. One such structure is the System Locality Information Table, which describes the relative cost of certain nodes communicating. In a fully-connected system where each node can talk directly to every other node this table is likely to have the same values for each node. In a system where nodes do not have direct connection, such as a ring topology, this table tells the OS how much longer it takes for the distant nodes to communicate.

NUMA allows NUMA-aware operating systems and programs an additional optimization center. Such programs ( is one such) will keep process-local memory on the same NUMA-node, which in turn allows for faster memory response times. For NUMA-aware operating systems operating policy is usually set for processes to be served out of a specific NUMA node's memory for as long as possible, which also restricts execution to the cores associated with that node.

For systems that will not be running NUMA-aware programs the differential memory access times can cause seemingly undiagnosable performance differences. The severity of this disparity is very dependent upon the Operating System being used. Because of this, most server manufacturers have a BIOS option to interleave memory between NUMA nodes to create uniform access times.

Historically, older servers (before 2011) set this BIOS setting to interleave by default. However, advances in OS support of NUMA and CPU manufacturers inter-node connection architecture advances have change this, and such settings are increasingly set to let the OS handle memory interleaving.

For Linux operating systems the command numactl can be used to manage the memory policy for a NUMA-enabled system.

65 questions
2
votes
0 answers

VMware Cross-NUMA Performance penalty

I've been experiencing random unexpected slow-downs of a virtual SQL Server that I can't attribute to workload, storage or CPU (in fact it continued after the host was evacuated of other VMs). I suspect it might be related to the NUMA configuration…
2
votes
0 answers

Consecutive CPU numbering on multi-socket NUMA Linux system

I've noticed that CPUs are numbered by Linux according to quite different schemes on different multi-socket NUMA systems. I mean the CPU numbers you use in kernel parameters such as isolcpus= or when setting the affinity of threads. You can check…
maxschlepzig
  • 694
  • 5
  • 16
1
vote
1 answer

Virtualbox performance on NUMA host (AMD Epyc)

At a small software development house, we virtualize our build servers (using VirtualBox) so that they can be easily backed up, snapshotted and the like. We've recently bought a new server using an AMD Epyc 7351P, which has 16 cores (32 with…
Atomjack
  • 21
  • 1
1
vote
0 answers

openstack shared PCI between numa with SR-IOV

I'm building SR-IOV supported compute node on HP 360g8 hardware and i have Qlogic interface card, my compute node has 32 core & 32GB memory. Problem: when i launch vm-1 (with flavor 16 vCPU core) on openstack it launch successful on numa0 node and…
Satish
  • 652
  • 3
  • 7
  • 20
1
vote
0 answers

Memcpy bandwidth ~1.6x faster on 1 vs 2 socket Intel Scalable (Skylake)?

I'm in the process of porting a complex performance oriented application to run on a new dual socket machine. I encountered some performance anomalies while doing so and, after much experimentation, discovered that memory bandwidth on the new…
Dave
  • 121
  • 4
1
vote
1 answer

Opteron 6274 Cache Differs from Manufacturer's Specs in Windows Server 2016

I just upgraded a DL585 g7 server by replacing its Opteron 6172 CPUs with 4 Opteron 6274 CPUs. Every source I read says that the Opteron 6274s are supposed to have 8x2MB of L2 Cache and 16MB of L3 cache, but Windows Server 2016 says that all four…
C.P.
  • 11
  • 3
1
vote
1 answer

ZFS on Linux and KVM: NUMA nodes for host

I am interested in using KVM images on zvols under ZFS on Linux, on a multi-socket system. I am wondering how I should pin NUMA nodes so as to maximize the benefits of ZFS ARC cache for all KVM images on the system. Obviously, I should pin each VM…
Stonecraft
  • 243
  • 2
  • 4
  • 15
1
vote
0 answers

Writing a Numa load balancer

I originally asked this question on StackOverflow, but as there came no answers, and this question is more about how to configure a server, this question might be more suited on ServerFault. I have some applications that I start with the Windows…
Patrick
  • 217
  • 2
  • 8
1
vote
1 answer

NUMA_NODE for 10GbE device

My server reports numa_node=-1 for all ethernet devices. I am interested in high speed UDP capture (all jumbo packets). I am running Debian Wheezy (kernel 3.2.68-1+deb7u2). I am told that one needs to pin the data receiving process to the NUMA node…
RK1974
  • 11
  • 1
1
vote
1 answer

Incorrect CPU count on HP XL230a Gen9

we have an issue with our HP XL230a Gen9 blades where some of the applications are only seeing half of the CPU. On the task manager we see all 56 cores but application are only seeing half of them. We also see half CPU on NUMBER_OF_PROCESSORS=28…
1
vote
0 answers

NUMA - all CPU's bound to node 0

I am running Dell PE 815 servers with two 16-core Opteron's, with four memory modules on each, OS - RHEL6, when I've started optimizing it for NUMA operation, found, that all cores are shown as bound to node0: [root@node1 ~]# numactl --show policy:…
GioMac
  • 4,444
  • 3
  • 24
  • 41
1
vote
0 answers

NUMA enabled but can't detect node

I have a Centos 6.4 (kernel 2.6.32) machine with 2x Intel X5670 (Westmere) in a SuperMicro X8DTG-D motherboard with 2.0a BIOS version. BIOS Settings for ACPI are: ACPI Aware O/S: Enabled APCI Version Features: APCI v3.0 NUMA Support: Enabled ACPI…
Obiphil
  • 111
  • 2
1
vote
1 answer

How to activate NUMA in HP Proliant DL580 G5 Server?

Currently I am working on a task regarding one server with NUMA. The OS running on the server is Ubuntu 14.04.1 LTS. The server has 4 nodes of 16 cpus: Intel(R) Xeon(R) CPU X7350 @ 2.93GHz, i.e. each node should have 4 cpus. I installed NUMA API in…
ZYJ
  • 13
  • 4
1
vote
1 answer

Linux interrupt affinity

We have a HP DL980 running SuSE Linux Enterprise Server 11 sp2. The machine hosts a PCIe digital IO card which is used to send a clock signal to sychronise with other machines. If we do a top, one of the processes shows with the 'command'…
JRT
  • 213
  • 2
  • 7
1
vote
2 answers

VMWare NUMA Node Boundary Configuration

I have been trying to find out what the best VM configuration would be for our SQL Server 2012 on VMware 5.1 would be. The VM host(s) has 2 sockets with 4 cores running hyper threading (total of 16 CPUS), we have a total 48 GB Memory on the…
Robert Brown
  • 125
  • 1
  • 4