Questions tagged [numa]

Non-Uniform Memory Access is what it stands for. For x86 architectures this is the method used to handle memory architectures where each processor has local memory and accessing another processor's memory is appreciably more expensive.

Non-Uniform Memory Access describes a memory architecture in which RAM is partitioned into more than one locality. Localities are called Nodes, and in most commodity hardware correlates to CPU processor sockets. In such systems access times to RAM is dependent upon which CPU is calling the FETCH and which NUMA Node the requested RAM resides in. RAM that is local to the CPU node will be fetched faster than RAM local to another CPU node.

NUMA-enabled systems provide hints to the OS in the form of certain BIOS structures. One such structure is the System Locality Information Table, which describes the relative cost of certain nodes communicating. In a fully-connected system where each node can talk directly to every other node this table is likely to have the same values for each node. In a system where nodes do not have direct connection, such as a ring topology, this table tells the OS how much longer it takes for the distant nodes to communicate.

NUMA allows NUMA-aware operating systems and programs an additional optimization center. Such programs ( is one such) will keep process-local memory on the same NUMA-node, which in turn allows for faster memory response times. For NUMA-aware operating systems operating policy is usually set for processes to be served out of a specific NUMA node's memory for as long as possible, which also restricts execution to the cores associated with that node.

For systems that will not be running NUMA-aware programs the differential memory access times can cause seemingly undiagnosable performance differences. The severity of this disparity is very dependent upon the Operating System being used. Because of this, most server manufacturers have a BIOS option to interleave memory between NUMA nodes to create uniform access times.

Historically, older servers (before 2011) set this BIOS setting to interleave by default. However, advances in OS support of NUMA and CPU manufacturers inter-node connection architecture advances have change this, and such settings are increasingly set to let the OS handle memory interleaving.

For Linux operating systems the command numactl can be used to manage the memory policy for a NUMA-enabled system.

65 questions
3
votes
0 answers

Move process across NUMA system

I am running many multi-threaded processes on a larger NUMA system with dozens of sockets. The memory access across different nodes is very slow, so I restrict each process to one socket and let it use the complete CPU. For this placement I use…
nfw
  • 56
  • 2
3
votes
1 answer

How can I tell if NUMA is enabled on a MongoDB server?

Our mongodb process is consistently using >100% of our CPU (this is on an Ubuntu 64-bit server on Linode) and we're casting about for performance improvements. One suggestion we found was that MongoDB and NUMA don't work well together:…
dreeves
  • 238
  • 1
  • 3
  • 9
3
votes
1 answer

How can I correct the NUMA setup of the memory on my server?

I am trying to install VMware ESXi onto a new dedicated server. However, when I boot from the VMware ESXi Installer CD, I am given the following error: The system has found a problem on your machine and cannot continue. The BIOS reports that NUMA…
Josh
  • 9,001
  • 27
  • 78
  • 124
3
votes
1 answer

Is there a limit to the amount of memory a single thread can access in a dual-processor system?

I'm looking to buy a workstation for data processing using MATLAB. I'm considering one of two workstations from DELL. The lower end workstation (3500) has a single processor and 24 GB of memory in 6 DIMMs. The higher end (7500) will only allow me…
Marc
  • 175
  • 3
2
votes
0 answers

Slurm - Does it maintain ccNUMA?

Does a SLURM cluster control, maintain or enforce Cache Coherence across the Nodes? Is it a configuration property, or does something like this not exist? I can't find anything inside the docs.
Semo
  • 271
  • 2
  • 9
2
votes
0 answers

Bad performance on better hardware

I have postgresql streaming replication on 2 hosts and I've faced with the problem of different performance compared between two servers. It looks like all sql queries on one host are slower on 70-90% than on another. At first I checked query…
2
votes
2 answers

NUMA placement failed, performance might be affected

I'm running SuperMicro 6048R-E1CR36H Storageserver on Ubuntu Xenial 16.04.03 LTS and Xen Kernel: 4.4.0-97-generic Xen: xen-hypervisor-4.6-amd64:amd64/xenial-security 4.6.5-0ubuntu1.2 Problem: when I run xl create or xl restore, I get this error…
2
votes
1 answer

Is a machine with a single NUMA node, actually a regular (non-NUMA) system?

First, let's check I got the fundamentals right: As I understand it, NUMA systems are a (asymmetric) network of NUMA nodes, where a NUMA node is usually (but not always) a physical CPU package. In a NUMA system, each node has its own local memory,…
Edd Barrett
  • 943
  • 3
  • 9
  • 19
2
votes
0 answers

find out NUMA locality of process RAM

I am doing an application benchmark with multiple instances of the same application. I found out that pinning their processes (with sched_setaffinity under Linux, with TaskManager under Windows) to specific CPUs increases performance. So I would…
2
votes
1 answer

NUMA node interleaving doesn't work for MariaDB

Please help activating NUMA node interleaving for MariaDB. Using MariaDB 10.1.21 on CentOS Linux release 7.3.1611. Per instructions on https://mariadb.com/kb/en/mariadb/systemd/ added…
Ivan
  • 21
  • 1
2
votes
2 answers

Is NUMA always completely NUMA or are there also hybrid systems?

I am working on a high-end server application where performance is critical. Given that servers are often employ NUMA-architectures, the server application also uses NUMA-aware memory allocation strategies to improve memory access…
Patrick
  • 217
  • 2
  • 8
2
votes
1 answer

Linux Opteron system appears to be UMA but should be NUMA

According to numactl, this dual CPU Opteron box is UMA rather than the expected NUMA: $ numactl --hardware available: 1 nodes (0) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 node 0 size: 65534 MB node 0 free: 381 MB node distances: node 0 …
Wayne Conrad
  • 635
  • 1
  • 7
  • 20
2
votes
2 answers

How to configure Linux for using only one CPU/core of a NUMA system

I'm currently working with an AMD Opteron-based NUMA system. For the needs of my current project, I'd like to make Linux and all of the system processes to utilize only the CPU0 (and preferably, only one of its cores), leaving all other cores for my…
2
votes
1 answer

Recommended NUMA nodes per socket on dual AMD Epyc 7643 server with 1TB of RAM

What do you suggest to set the NUMA nodes per socket to? It is factory set to NPS1, but I'm not sure if this is the most optimal value. (We use this device for massively multithreaded bioinformatic tasks) The possible choice is: NPS0 (will attempt…
2
votes
0 answers

Current single system image solutions

I'm designing a cluster for a small research institute. Since our computations require a large amount of memory, I'm looking for a solution that will allow our applications access to the whole memory distributed across different nodes. The access…
Piotr M
  • 23
  • 3