1

At a small software development house, we virtualize our build servers (using VirtualBox) so that they can be easily backed up, snapshotted and the like. We've recently bought a new server using an AMD Epyc 7351P, which has 16 cores (32 with hyperthreading). We've found that assigning any given VM more than 4 CPU cores absolutely cripples performance: my assumption is that this is because the Epyc is actually a NUMA 4x4core architecture under the hood, so if you use more than 4 physical cores the guest starts thrashing memory.

Is there a way to make VirtualBox expose the NUMA configuration to the guest? If not, which other virtualisation solutions might work in this case? Ideally, I'd want to be able to fire up a VM using all of the physical cores without incurring too much of a performance penalty - you expect to lose some performance when virtualising, but a 20-core Virtualbox vm on this server runs at about 10% of the speed of a 4-core one, which is ridiculous.

Atomjack
  • 21
  • 1
  • 2
    "Is there a way to make VirtualBox" - is there a reason you, as a software house, use the most backward hypervisor? Between Hyper-V, VMWare and KVM there are plenty of enterprise grade hypervisors available. Voting to close - usage of VirtualBox for Servers is not best practices. It is more "you are fired" level, particualrly for a software developer. – TomTom Aug 09 '19 at 11:55
  • What operating system for host and guest? Tools to quantify NUMA and CPU utilization vary by OS. – John Mahowald Aug 09 '19 at 12:02
  • You need a software, that can emulate the NUMA systems for guests, that will be mapped into the host NUMA system topology. – Anton Danilov Aug 09 '19 at 12:03
  • Also, please share workload details. Number of compile threads, memory consumption observed, what this thrashing looks like above 4 cores. – John Mahowald Aug 09 '19 at 12:27
  • The use of VirtualBox is historic, but (a) up until now it works, and (b) it's cheap. I have no objection to moving to a different hypervisor, but that's why part of my question was: which one will allow me to set up a 16-core VM on a 16-core NUMA host while maintaining reasonable performance? – Atomjack Aug 12 '19 at 13:57
  • Sorry, should have provided more details. Linux host (Centos 7.6), Linux guest (Centos 6.10). The workload is a C++ compile: the guest memory usage is well within the 16GB allocated to it, but a compile takes significantly longer with 16 cores allocated and make -j16 than it does with 4 cores allocated and make -j4. – Atomjack Aug 12 '19 at 14:00

1 Answers1

2

Do not oversubscribe CPU. That is, by default, do not assign total guest CPUs more than the number of physical cores. Performance will degrade, sometimes buy a little, but sometimes severely.

NUMA effects could be a problem. Check what the NUMA topology looks like in the guest, and compare to different hypervisors. On Linux, numastat shows per node memory hits, and general documentation like the RHEL Performance Tuning Guide remain good references.

Regarding hypervisor choice, I don't think VirtualBox supports tweaking guest NUMA. Try some of the other choices for x86 virtualization, including bare metal, KVM, VMware ESXi, and Hyper-V. Although all are good for a server, they have different feature sets and operational processes.

John Mahowald
  • 30,009
  • 1
  • 17
  • 32
  • The server has 32 cores (16 physical, 16 hyperthreaded). Allocating more than 8 cores to a VM causes a massive slowdown, so this is a NUMA effect, not due to CPU oversubscription. – Atomjack Aug 12 '19 at 14:26
  • With Linux as the host and the guest, you can prove your theory by viewing NUMA topology and memory stats. If `lscpu` shows one node, the Linux scheduler is likely to be very sub-optimal. – John Mahowald Aug 12 '19 at 15:50
  • Your question stated "20-core Virtualbox vm on this server". More than 16 cores is oversubscribed. That's acceptable for some mostly-idle workloads, but probably not a compiler hitting certain execution units very hard. – John Mahowald Aug 12 '19 at 15:55
  • Ah, sorry - that was a mistake: the test was actually run with 16 cores in the VM. Thanks for the pointer to check 'lscpu' - that shows only one NUMA node, which does indeed explain the poor performance. – Atomjack Aug 13 '19 at 17:07