Is NUMA always completely NUMA or are there also hybrid systems?

Question

I am working on a high-end server application where performance is critical. Given that servers are often employ NUMA-architectures, the server application also uses NUMA-aware memory allocation strategies to improve memory access performance.

Benchmarks show that accessing memory when the thread and the memory are on a different NUMA node is about 30% slower compared to when thread and memory are on the same NUMA node.

My question is: when a server uses NUMA-architecture, is all memory dedicated to a specific processor, or can servers have a hybrid approach where besides the NUMA-memory there is also non-NUMA-memory. And is this case, how do local-NUMA-memory, non-local-NUMA-memory and non-NUMA-memory compare regarding performance?

EDIT: Server hardware is running Windows (Windows Server 2012 or 2012R2 is a reasonable minimum requirement).

What operating system are you using? That's a crucial detail. — ewwhite, May 31 '16 at 14:48
When programming for NUMA memory access, you should also be aware that there are systems out there, where only one of the NUMA nodes has any local resources. In this case the other node(s) always have to go remote to access any memory. In the past I have seen this especially on entry-level DP Opteron servers. — s1lv3r, May 31 '16 at 16:14

score 4 · Accepted Answer · answered May 31 '16 at 15:02

No, a NUMA system is a NUMA system, there are nodes (generally CPUs), each node has it's own memory - most modern OS's will try to ensure that all memory allocations to a process are on the same node that the process is running. If this can't happen then yes you get the memory slowdown you've seen where a node is essentially acting like a memory controller for another node, not ideal. But no, there's no hybrid that I know of available on mainstream systems right now - for a start, what would manage that memory?

score 1 · Answer 2 · answered Jul 29 '16 at 21:31

Since I cannot comment yet...

All memory is NUMA memory but you can hack your code to leverage locality in memory that is you can pin some applications on CPUs that have memory required to run those applications that can give you better results in terms of performance. Especially if you're using virtualization, this is pretty important. Have a look at this link http://redhatstackblog.redhat.com/2015/05/05/cpu-pinning-and-numa-topology-awareness-in-openstack-compute/. Hope that helps.

Is NUMA always completely NUMA or are there also hybrid systems?

2 Answers2