I am working on a high-end server application where performance is critical. Given that servers are often employ NUMA-architectures, the server application also uses NUMA-aware memory allocation strategies to improve memory access performance.
Benchmarks show that accessing memory when the thread and the memory are on a different NUMA node is about 30% slower compared to when thread and memory are on the same NUMA node.
My question is: when a server uses NUMA-architecture, is all memory dedicated to a specific processor, or can servers have a hybrid approach where besides the NUMA-memory there is also non-NUMA-memory. And is this case, how do local-NUMA-memory, non-local-NUMA-memory and non-NUMA-memory compare regarding performance?
EDIT: Server hardware is running Windows (Windows Server 2012 or 2012R2 is a reasonable minimum requirement).