28

In order to assess performance monitoring accuracy on virtualization platforms, the CPU steal time has become an increasingly relevant metric - see EC2 monitoring: the case of stolen CPU for an instructive summary in the context of Amazon EC2 and IBM's paper on CPU time accounting for a more in-depth technical explanation (including illustrations) of the concept:

Steal time is the percentage of time a virtual CPU waits for a real CPU while the hypervisor is servicing another virtual processor.

Accordingly, it is exposed in most related Unix/Linux monitoring tools nowadays - see e.g. columns %steal or st in sar or top:

st -- Steal Time
The amount of CPU 'stolen' from this virtual machine by the hypervisor for other tasks (such as running another virtual machine).

I've been unable to figure out how to capture the same metric on Windows though, is this possible already? (Ideally for the Windows 2008 Server R2 AMIs on EC2 and via a respective Windows Performance Counters of course.)

mgrandi
  • 103
  • 3
Steffen Opel
  • 5,560
  • 35
  • 55

2 Answers2

34

Edit: Updating on Oct. 1 2013 - Some of my original answer has since become obsolete.

I'm not sure if you're still active on this site or that you'll see this, but I wanted you to know that I read this question today and it fascinated me, and so I spent all day (when I should have been working) researching Hyper-V and Windows internals and even digging in to the very concepts of virtualization itself in hopes that I might be ready to answer your question.

Let me preface by saying that I am coming from the point of view of Hyper-V as a virtualization platform because that is where I have the most experience. Even though there may be certain tenets of virtualization, as we know it, that cannot be deviated from, Microsoft and VMware and Xen all have different strategies for how they design their hypervisors.

That's the first thing that makes your question challenging. You pose your question as if it were hypervisor-agnostic, when in truth it is not. Amazon EC2, for example, uses the Xen hypervisor, and the "CPU Steal Time" metric that you see in the output of a top command issued from within a Linux VM running on that hypervisor is a result of the integration services installed on that guest OS (or virtualization-aware tools on the guest) in conjunction with data provided by that specific hypervisor.

First off let me just answer your question straight up: There is no way to see from inside a virtual machine running Windows how much time the processors belonging to the physical machine on which the hypervisor runs spends doing other things, unless the particular virtual tools/services or virtualization-aware tools for your particular hypervisor are installed in the guest VM and the particular hypervisor on which the guest is running exposes that data to the guest. Even a Windows guest running on a Hyper-V hypervisor will not have immediate access to information regarding the time spent that the physical processors on the hypervisor were doing other things. (To quote voretaq7, something that "breaks the fourth wall.") Even though Windows client and server operating systems running as virtualized guests in Hyper-V with the correct integration services/tools installed make use of "enlightenments" (which are literally kernel code alterations made especially for VMs) that significantly increase their performance in using the resources of a physical host, the bottom line is that the hypervisor does not have to give any more information to the guest OS than it wants to. That means the hypervisor does not have to tell a guest VM what else it is doing besides servicing that VM... unless it wants to. And that information about what else the physical processors are doing is necessary for deriving a metric from the perspective of the VM such as "CPU Steal Time: the percentage of time the vCPU waits for a physical CPU."

How could the guest OS know that, if it didn't even realize that it was actually virtualized?

In other words, without the right integration tools installed on the guest, the guest OS won't even know that its CPU is actually a vCPU. It won't even know that there is another force outside of itself "stealing" CPU cycles from it, therefore that metric will not exist on the guest VM.

VMware has begun to expose this data to Windows guests as well as of ESXi 5.0. VMware integration tools also need to be updated on the guest. Here is a reference; they refer to it as "CPU Stolen Time".

A hypervisor such as Hyper-V does not give guests direct access to physical resources such as physical processors or processor cores. Instead the hypervisor gives them vDevs - virtual devices - such as vCPUs.

A prime example of why: Say a virtual machine guest OS makes the call to flush the TLB (translation look-aside buffer) which is a physical component of a physical CPU. If the guest OS was allowed to clear the entire TLB on a physical processor, that would have negative performance effects for all the other VMs that were also sharing that same physical TLB. In the case of Windows, that call in the guest OS is translated into a "hypercall" or "enlightened" call which is interpreted by the hypervisor so that only the section of the TLB that is relevant to that virtual machine is flushed.


(Interestingly, that hints to me that guest VMs that do not have the proper integration tools and/or services could have the ability to impact the performance of all the other VMs on the same host, but that is completely outside the scope of this topic.)


All that to say that you can still detect in a Hyper-V host the time that a virtual processor spent waiting for a real processor to become available so that it could scheduled to run. But you can only see that data on a Windows Hyper-V hypervisor. If it is possible to see this in other hypervisors, I urge others to tell us how to see this in that hypervisor and also if it is exposed to the guests. (Edit 10/1/2013 Thank you evilensky for doing just that!)

My test machine was Hyper-V Server 2012, which is the free edition of Server 2012 that only runs Core and the Hyper-V role. It's effectively the same as any Windows Server 2012 running Hyper-V.

Fire up Perfmon on your parent partition, aka physical host. Load this counter:

Hyper-V Hypervisor Virtual Processor\CPU Wait Time Per Dispatch\*

You will notice that there will be an instance of that counter for each virtual machine on that hypervisor, as well as _Total. The Microsoft definition of that Perfmon counter is:

The average time (in nanoseconds) spent waiting for a virtual processor to be dispatched onto a logical processor.

Obviously, you want that number to be as low as possible. For computers, waiting is almost never a good thing.

Other performance counters on the hypervisor that you will want to investigate are Hyper-V Hypervisor Root Virtual Processor\% Guest Run Time, % Hypervisor Run Time, and % Total Run Time. These counters provide you with the percentages that could be used to determine facts such as how much time the "real" processors spend doing things other than servicing a VM or all VMs.

So in conclusion, the metric that you are looking for in a guest virtual machine depends on the hypervisor that it is running on, whether that hypervisor chooses to provide the data about how it spends its time other than servicing that VM, and if the guest OS has the right virtualization integration tools/services/drivers to be aware enough to realize that the hypervisor is making that data available.

I know of no way on a Windows guest, integration tools installed or not, to see how much time, in terms of seconds or percentage, that VM's host has spent servicing it or not servicing it respective to the total physical processor time. (Edit 10/1/2013: ESXi 5.0 or better exposes this data to the guest VM through the integration tools. Still nothing on Hyper-V though.)

Ryan Ries
  • 55,011
  • 9
  • 138
  • 197
  • 1
    +1 - epic answer is epic – Mark Henderson Dec 06 '12 at 07:53
  • 1
    +1 for the thorough exploration indeed, highly appreciated. So regarding EC2 this apparently boils down to Unix/Linux guests running in `Xen paravirtual` mode, which is a _paravirtualized domain_ (i.e. not full virtualization, the guest OS is modified to run on the host), where _steal time_ is available, vs. Windows/FreeBSD/... guests running in `Xen HVM`, which is a _hardware emulated domain_ (i.e. the guest OS is unmodified to run on the host), where it apparently isn't - so a definite negative answer, guess that's what counts. Do I read you correctly that it could get added eventually? – Steffen Opel Dec 06 '12 at 20:51
  • 2
    It could, in theory, but don't hold your breath. Like I said, it requires that the vendor-specific hypervisor and the vendor-specific guest OS make a coordinated effort to make that data about the physical host available and accessible from within the guest VM. http://wiki.xen.org/wiki/Xen_Kernel_Feature_Matrix – Ryan Ries Dec 07 '12 at 01:49
  • 2
    Tangential: VMware hypervisors will refer to steal team as [ready time](http://www.vmware.com/support/developer/vc-sdk/visdk41pubs/ApiReference/cpu_counters.html) -- the amount of time a virtual CPU is waiting to be serviced by a physical resource. – Yolo Perdiem Dec 12 '12 at 22:09
  • 1
    this is a well known topic on the mainframe. The metric has been available for decades. It's called "wait on cpu". –  Apr 03 '13 at 06:13
  • 1
    Still fascinated by this topic, I looked up a Windows 2003 machine running on ESXi 5.1, and it looks like CPU information is available to the guest about it's virtualized CPU state, specifically it's stolen time (referred to as just that on the guest), CPU shares, CPU limit, and CPU reservation. [pic](https://gs1.wac.edgecastcdn.net/8019B6/data.tumblr.com/5fb69f67fe9976259b0209c63cb6d7fc/tumblr_mtzr01k5iF1qzcm9mo1_500.png) – Yolo Perdiem Oct 01 '13 at 13:15
  • 1
    @evilensky Indeed! It's newly added in ESXi 5.0 or later. Also the VMware tools must be up to date. Here's one reference: http://publib.boulder.ibm.com/infocenter/tivihelp/v61r1/index.jsp?topic=%2Fcom.ibm.itm.doc_6.3%2Foswin%2Fattr_vmprocssr.htm – Ryan Ries Oct 01 '13 at 13:22
  • @RyanRies, Thats an awesome explanation. I am working on fetching "cpu steal" metric from a hyperv-vm, using WMI counters using a vb script.. Right now, i cant see "Hyper-V Hypervisor Virtual Processor" class on the vm. But its present on the physical host. I am able to connect to the to the host machine(by providing username and password) and get this data. But its not practical for a vm to the credentials of host. Is there any other way i can get these WMI counters without needing to connect to the host? – Venkat Teki Jul 08 '16 at 11:37
2

FWIW, I just looked through the Perfmon counters of a Windows 2008r2 server running under Hyper-V and did not see anything related steal time (or to virtualization at all for that matter).

uSlackr
  • 6,337
  • 21
  • 36
  • Thanks for checking - apparently [Virtualization Counters](http://technet.microsoft.com/en-us/library/ff367892.aspx) for Hyper-V should actually be available, maybe they must be installed/activated somehow first? I'm still unable to identify/deduce a similar/related metric amongst these though. – Steffen Opel May 25 '12 at 10:02