cpu load measure with hyperthreading on linux

12

4

How can I get the true usage of a multicore hyperthreading enabled cpu?

For example lets consider a 2 core CPU, expressing 4 virtual cores.

A single threaded workload would now show up as 100% in top, as one core of the virtual cores is completely used. The CPU and top work as expected, like there would be 4 real cores.

With two threads however, the things get arkward: If all works well, they are balanced to the two real cores, so we got 200% usage: Two times 100% and two idle virtual cores, and are using all of the available CPU power. Seems ok to me.

However, if the two threads would run on a single real core, they would show up as using two times 100%, that makes 200% virtual core usage. But on the real side, that would be one core sharing its power on the two threads, which are then using only one half of the total CPU power.

So the usage numbers shown by top can not be used to measure the total CPU workload.

I also wonder how hyperthreading balances two virtual on a real core. If two threads take a different amount of cycles, would the virtual cores 'adapt' so that both show a 100% load even if the real load differ?

dronus

Posted 2013-06-28T08:52:42.297

Reputation: 1 482

1@Ramhound, so if I have a physical 4-core processor with 8 logical cores, and my load averages say 4.00, am I at 100% utilization or 50%? – Buttle Butkus – 2015-10-08T21:26:42.703

1You do understand the operator system is not aware of the difference between a hyperthreading virtual core and the physical core right? – Ramhound – 2013-06-28T10:15:27.367

It seems so, but it doesn't have to? The real vs. virtual core mapping is a simple one to two map. The problem is how to measure load on a virtual core that actually changes its available performance by getting scheduled with another one on the real core. But all data is avalable I think, the question is just where are the tools that get a proper result out of them? – dronus – 2013-07-19T00:14:59.210

1I just like to have a load measure where 100% would mean that every cycle of every real core is used. – dronus – 2013-07-19T00:17:15.373

It's not clear why you come to that conclusion – Ramhound – 2013-07-19T22:47:33.983

This is obviously a reason why the OS should be aware of virtual cores. So it has to find out and compute the physical core usage based on it. Otherwise the whole concept of measureing "usage" and "load" has no use to the user. If I run top, I usually have some question like "Is the system running at it's limit?" or "Would it be useful to divide the work into more processes?" etc. This questions can't be reliable answered by the current topoutput. – dronus – 2013-07-25T12:45:51.390

Its still not clear what your actual question is. If the operating ssytem is not aware the difference between physical cores and a virtual cores ( at least with regards to threads ) it can detect the amount of physical cores because thats something the CPU actually transmits. – Ramhound – 2013-07-25T12:57:54.067

My question is how to get the total usage of available cpu cycles. I think you're saying 'you can't get that value because the OS isn't capable of knowing it', but I am not aware why. The OS may know if hyperthreading happens by probing the behaviour or having some CPU driver defining if the current CPU model uses it or not. In worst case, the OS truely is incapable of handling this, I like to have a suggestion how to compute the value by myself using my knowledge of my current CPU hyperthreading capabilities. – dronus – 2013-08-24T20:10:57.627

1Simply spoken: How to tell at a given moment, if my CPU would be capable to do further work, without slowing down the currently ongoing work? – dronus – 2013-08-24T20:13:12.370

Answers

5

Martin Tegtmeier at Oracle has written an interesting blog-post about this last year: https://blogs.oracle.com/solaris/cpu-utilization-of-multi-threaded-architectures-explained-v2

The short answer; Hyperthreading really messes with top's ability to report overall cpu-utilisation / cpu-idle percentages.

In the worst case, a 2-core 4-virtual-core CPU running 2 threads at 100%-utilisation-per-core, could nearly saturate the cpu. (Depending on execution port usage; only threads that use entirely different computing resources on the cpu could still run without affecting the performance on the current thread.) However, top will still report 50% idle in this case.

TinkerTank

Posted 2013-06-28T08:52:42.297

Reputation: 231

1

Current working link: https://blogs.oracle.com/partnertech/cpu-utilization-of-multi-threaded-architectures-explained

– Ján Lalinský – 2017-08-15T19:27:48.230

3

Core utilization is very different than the load on the system. Core utilization is only showing how much the core is calculating something or waiting for instructions. It can be 100% which corresponds to any given time the CPU is calculating something.

But load is a different thing, load is generally measured to determine if any process has to wait for any resource or not. If processes are not waiting for any resources you'll see a very performant system. But sometimes you will see slow systems but low CPU utilization. That generally means some processes are waiting for a resource and not releasing the CPU. For this kind of scenario you will not see high CPU utilization but the system may be well over its capacity.

In a Linux system Load average is a calculated value to measure the overall performance of a system. Value of the load average should be compared to the parallel computing resources, cores to be specific. So if a system with 4 physical cores has a load average of 4 or more we can safely say that some processes will wait for a resource.

It is not important if the CPU utilization is 100 or 10 percent. Load average can be as high as 200 or 300, in these cases system will be barely responsive.

In a normal operating condition server load average should not exceed the number of cores for long duration. Short spikes are not important in my opinion. 3 numbers which you will see in a w output is load av. for 1/5/15 minutes.

Hkntn

Posted 2013-06-28T08:52:42.297

Reputation: 131

0

In my opinion none of the above answers is satisfactory.

I think the article I am referring on the following link is well targeted to answer this question: http://perfdynamics.blogspot.ch/2014/01/monitoring-cpu-utilization-under-hyper.html

QUOTE:

The idea behind HT is to allow a different application thread to run when the currently running app stalls; due to branch misprediction, bubbles in the pipeline, etc. To make that possible, there has to be another port or AS register. That register becomes visible to the OS when HT is enabled. However, the OS (and all the way up the food chain to whatever perf tools you are using) now thinks twice the processor capacity is available, i.e., 100% CPU at each AS port.

But under the hood, there is still only one execution unit: the single, physical, core you started with before HT was enabled. The difference is that it is being shared in some way between the 2 AS ports. How the single core gets switched between the two ports is very complicated but is most easily understood in terms of polled queues. I go into that level of detail in my GCaP classes.

The best-case test measurements I have, indicate that each HT port cannot become more than 75% busy, on average, or 150% of the total expected 200% capacity according to the OS. The "missing" 50% capacity, that I referred to earlier, is an illusion. Intel has claimed that something in the range of 120% to 130% can be expected for general applications.

In fact, I am pretty sure the operating system can reach 100% on each virtual core, no doubt about that. I have just done a:

mvn clean install -DskipTests -T 5

And i can assure you my 8 virtual cores, and 4 physical cores all went to 100% CPU utilizations. And i definitely do not have 8 cores on my machine.

Long story short, you can assume the following if the total CPU load goes above 100% you are at most, and most likely quite accuratenly, using exactly 100% of physical core. That menas, if you have physical CORE 1 split into operating system CPU 1 and CPU 2. And on CPU 1 you have a total usage of 50% and on CPU 2 you have total usage of 50%, most likely in real life you are putting a pressure of a total usage of 100% on that CPU. You have maxed it out.

But of course the operating system in its system monitoring tools has no idea that it is selling you an illusion. From the prespective of the operating system and how it manages resources, it will just believe eeach of those two virtual cores is still 50 percent idle so if there are more tasks to be put to run it will try to distribute them uniformly over those two cores. So when you go over 100% CPU utilization, during a period of CPU usage, there is always queued work to run in that period of time that never had a change to get a time sclice on the CPU. Eventually it will get it, but there are always some threads that are actually not even running even though they are scheduled to run.

Thanks

99Sono

Posted 2013-06-28T08:52:42.297

Reputation: 101