1

EDIT 2: My application benefits from hyper-threading

A. Yes I know what the technology is and what it does

B. Yes I know the difference between a physical core and a logical one

C. Yes turning HT off made the render run slower, this is expected!

D. No I am not overprivisoning when I assign all the logical (yes logical) cores to one VM, if you read the white papers from VMWare you will know that the scheduler generates a topology map of the physical hardware and uses that map when allocating resources, assigning ALL the logical cores to one VM generates 16 logical processors in Windows, the same as if I installed the VM on physical hardware. And whoa and behold, after 5 tests this arrangement has produced the fastest (and most efficient) render times.

F. The application in question is 3ds max 2014 using backburner and the Mental Ray renderer.


TL|DR: I (sometimes) want to run one VM on vSphere with as much CPU efficiency as possible, how?

I'm hoping to use VMWare's ESXI / vSphere hypervisor in a bit of a non-standard way.

Normally people use a hypervisor to run multiple VM's simultaneously on one system. I want to use the hypervisior to let me quickly switch between applications, but only ever really run one VM / App at a time.

It's actually a pet project, I have a 5 node renderfarm (ea. node 2x Intel Xeon E5540) that for the most part stays off (when I'm not rendering I have no need to run these machines). It seems like a waste of valuable compute time so I was hoping to use them for other things when not rendering (kind of a general purpose 40 core / 80 thread compute cluster).

I was hoping that vSphere could let me spin up render node VM's when rendering and other things when not. Problem is, I really really need a high efficiency when it comes to CPU when the render VM is running.

I'm using a render job as a benchmark and getting about 88% of the speed on the VM as I can get on a non-VM setup. I was hoping for closer to 95%, any ideas how I could get there?

EDIT: Details:

Resources being used by the render VM, I don't fully understand why this bar is not full:

enter image description here

Resource settings for that VM:

enter image description here

Even though the VM doesn't show as using 100% of the resources, the host does:

enter image description here

I don't entirely understand the % shares here, is this when all these VM's are on? Also I didn't configure the other VM's to reserve 10%:

enter image description here

Finally the host does show as being fully utilized, although not shown here, the MHz utilization is lower (IE not 100%):

enter image description here

VM Config:

enter image description here

I understand this is a interesting case, but nevertheless I feel the question is valid and good and may help others in a similar situation down the line (although I admit this case is quite specific).

Cody Smith
  • 247
  • 2
  • 10
  • I'm not sure I fully understand the problem or what you're trying to do here. – ewwhite Feb 17 '14 at 05:28
  • Did you give _both_ physical CPUs to a VM? What version of ESXi are you using? – Michael Hampton Feb 17 '14 at 05:31
  • Your render job is pretty much a benchmark of... that render job. It doesn't represent anything else. – ewwhite Feb 17 '14 at 05:33
  • Also, does it really matter much if an 11 hour job takes 12 hours, if you can use the machines for other stuff between those jobs? – Michael Hampton Feb 17 '14 at 05:35
  • @ewwhite, not true. The job is running on identical hardware, involves mostly CPU and is highly repeatable. The job is accurately measuring the difference in computing efficiency of my VM setup v/s a real node. I feel the issue may be configuration related so I have posted a lot of details about the setup. I apologize for not posting these details with the question originally. – Cody Smith Feb 17 '14 at 05:49
  • @CodySmith Please show the [*actual configuration of the virtual machine*](http://i.stack.imgur.com/0KLw1.png), version of VMware and build number of ESXi. – ewwhite Feb 17 '14 at 06:00
  • Sorry about that, I understand now. I added it, last image in the question. Thanks for the patience / help. -Cody – Cody Smith Feb 17 '14 at 06:07
  • 2
    @CodySmith have a look at this page that explains how HT works in VMware (http://wahlnetwork.com/2013/09/30/hyper-threading-gotcha-virtual-machine-vcpu-sizing/) the bottom line is that you do not gain anything if you give it more vCPUs than you have logical cores. – Reality Extractor Feb 17 '14 at 07:22
  • I am not over-provisioning, I have 2x quad-cores with HT (8 cores, 16 logical cores) which is exactly what I have provisioned. It's a 1:1 logical core relationship. – Cody Smith Feb 17 '14 at 07:28
  • 1
    @CodySmith I was mistyped in my previous comment, it was meant to say that you shouldn't give it more vCPUs than you have physical cores (not logical cores). That's the whole point of the article I linked. As far as your shares go, you gave your renderer 32k shares, and the other two VMs have 4k shares each, for a total of 40k shares, so 4k shares is 10%. The shares are irrelevant in this case as they only matter if there is contention between VMs. Since the other two are powered down it's a non-issue. – Reality Extractor Feb 17 '14 at 09:45

2 Answers2

5

You've misconfigured your virtual machine(s) and host.

Things to consider:

  • If you have a computationally-heavy process, you may want to disable HyperThreading.
  • HyperThreaded (logical) cores are not the same as physical cores!!
  • Intel E5540 CPUs date back to 2009. They are quad-core CPUs. You'll have 8 physical cores and 8 logical cores (16 total).
  • If you've configured a single VM with 16 vCPUs, scale back!!
  • ESXi requires some resources, too.
  • Try right-sizing your virtual machine (8 vCPU) if you're not willing to disable HyperThreading.

Other things to do (in general)...

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • I'm willing to try it, but this task greatly benefits from hyper-threading so I don't want to turn off the feature. – Cody Smith Feb 17 '14 at 06:30
  • @CodySmith Just try this with 8 vCPU. Hyperthreading isn't benefitting you here. – ewwhite Feb 17 '14 at 06:42
  • 1
    Running 8 vCPU's made the render preform worse, from 88% of the real hardware down to 76%. As I expected the hyper-threading argument is pretty much bull shit. However updating did seem to help - getting 90-91% now which is definitely a step in the right direction. However I still cannot get the host to give up more than 26 GHz (seen in picture 1) which is driving me crazy. – Cody Smith Feb 17 '14 at 09:26
  • 1
    @CodySmith let's think about this, your physical hardware only has 8 physical cores, so you have 20.264 Ghz of real cycles available. Logical cores are just abstract constructs, they are essentially two "threads" per physical core. vSphere is essentially displaying the total available Ghz wrong. You have two logical cores per physical core, but the logical cores share the physical core resources. Your physical core runs at 2.533 Ghz, the two logical cores run at a combined 2.533 Ghz, they cannot exceed that combined maximum for their physical core. – Reality Extractor Feb 17 '14 at 09:51
  • 4
    @codysmith My argument is not incorrect. While VMware allows you to configure more CPUs than physical cores, you really shouldn't. – ewwhite Feb 17 '14 at 12:54
  • [VMware's own best practices recommendations](http://www.vmware.com/pdf/Perf_Best_Practices_vSphere5.0.pdf) seem to disagree with you entirely with respect to hyperthreading. I still suspect this is a NUMA edge case. – Michael Hampton Feb 18 '14 at 00:43
  • @MichaelHampton You think he needs to keep everything on one NUMA node? So, 4 vCPU? – ewwhite Feb 18 '14 at 01:08
  • Maybe two VMs of 4 vCPUs with HT off, and two VMs of 8 vCPUs with HT on, would make an interesting benchmark. – Michael Hampton Feb 18 '14 at 02:40
3

I think you've reached about the maximum of what you're likely to get with those old Xeons, though unlike ewwhite I do not believe hyperthreading is causing you any sort of problem. Indeed, at least since ESXi 5.0, VMware has recommended using hyperthreading for most workloads, and your own testing seems to confirm that you are benefiting from HT. As ewwhite correctly notes, though, using HT will make some metrics in vSphere appear strangely.

I think you have one obvious issue and possibly one non-obvious issue here:

First is the obvious issue that virtualization itself incurs overhead that you can never fully eliminate. In the case of the CPU, certain instructions must be virtualized in order for the hypervisor to correctly isolate one virtual machine from another. Thus instead of executing the instruction directly, as in bare metal, the hypervisor will intercept the call and execute several instructions in its place. From prior experience we can see that 87-90% is about what you should expect for the CPU. Getting much past that would require a significant advance in hardware. If you're now seeing 91% of native CPU performance, it's probably about as good as it's going to get.

Second is the non-obvious issue of NUMA. This is an issue with multiprocessor systems, where part of the memory is faster when accessed by the nearest CPU, and slower when accessed by other CPUs. Depending on how your rendering job handles memory, you might see some benefit by running two parallel renderers in two VMs, each of which is pinned to a specific CPU and always accesses the slightly faster memory. (If you run two VMs on a single host, each using half the available vCPUs, ESXi should sort this out automatically for you.) Though if you aren't seeing this issue on bare metal, you probably will gain little benefit by trying this.

Michael Hampton
  • 237,123
  • 42
  • 477
  • 940
  • 1
    Thank you, this is an actual answer, informative, **correct**, to the point and polite! I will try the 2 VM idea, it's worth a shot. – Cody Smith Feb 20 '14 at 06:22