CPU time count on HyperThreading systems on linux

0

I am trying to estimate the amount of work a multithreaded application takes. That is quite a simple task with real cpu/cores as I can just take the CPU time from proc and that will be an estimation of how much of the CPU an application takes.

But what about HT enabled processors? How is the time counted? If the thread just waits for the processor pipe to free up in case of HT race - is it counted as time spent in CPU? Or if the thread could take like 10% advantage of HT it will count 10% of actual CPU run time?

grandrew

Posted 2016-07-11T18:40:36.023

Reputation: 148

Answers

1

Sorry, but it is very non-definite.

If you have HT enabled you have two logical processors per core. If you have it disabled, you have just one. (This lets us talk about how the scheduler works without constantly qualifying what we mean by a "CPU".) Either way, a logical processor is seen by the OS as a processor, and except for some attempts at scheduling optimizations* the OS doesn't do anything else by, for, or because of hyperthreading.

From the time an LP context-switches to a thread, to the time it switches to some other thread, the LP is considered to be used 100% by that thread. The OS has no way to know whether a thread in an LP is using 10% of the core, or 90% of the core, or stalled completely because of something the thread in the other LP is doing. The OS just thinks it's running.

Nor does HT implement anything like thread priorities. So if two threads are trying to run in the two LPs on one core, and one is set in the OS to a higher priority than the other, the core can't do anything about that - there's no way it can even know. The core will treat the two threads as having the same priority and will assign microarchitecture resources accordingly.

*Optimizations: Modern OSs are aware of the relationship of LPs to cores and will try, for example, to use just one LP out of each core, until more than number_of_cores threads want to run; the two LP of a core are considered equivalent as far as cache investment is concerned; etc.

Jamie Hanrahan

Posted 2016-07-11T18:40:36.023

Reputation: 19 777

Jamie, thanks for a thorough explanation. As far as I understand, what you are trying to say is that in case I have NTHREADS == NCORES - logical core time will actually count as CPU time disregarding the actual amount of instructions it was able to share. So I will virtually get double the CPU time spent with HT enabled and probably get half the average FLOPS per (LP)core in case of (completely) fair queuing? – grandrew – 2016-07-14T10:15:41.050

Q1: yes, though I don't get your phrase re "instructions it was able to share" - what would you be sharing in that situ.? Q2: Yes, with HT enabled and NTHREADS==2xNCORES you'd see 2x the apparent CPU time used, but not 2x the work done. 3rd: if the core's FP unit was your bottleneck when HT was disabled, then with HT and two threads per core, the total FP work done would be about the same as with one thread per core, but each thread would only get half the FP thruput. However, FP performance also depends on things other than the FP unit (like memory access), so this isn't certain. – Jamie Hanrahan – 2016-07-14T10:32:56.960

As I see it, when two threads compete for time of a single core in HT cpu - the threads can "share" some of the core instead of waiting for core to free up - in case core finds it has anything to share with another(waiting) thread while executing the first thread. – grandrew – 2016-07-18T12:23:33.553

Well... there is no concept of "the first thread", i.e. neither of the LPs has precedence. The firmware tries to schedule the core's resources to allow both LPs to make progress. – Jamie Hanrahan – 2016-07-19T13:21:22.427