Hyperthreading Calculating Load

I have a dual core core i5 with hyperthreading. Since hyperthreading allows for unused cpu time to be used by other processes, why would this make it seem like 4 logical processors (2 per actual core). How is this different than out of order execution?

Lets say core0/core1 are one core and core2/core3 are another. If core0 and core2 are at 100% is my cpu at 100% load or only 50%? If all of them are at 100%, what does that entail? Also, how come processes tend to execute on core0 and core2 rather than core1 and core3?

enter image description here

agz

Posted 2013-06-26T19:41:22.917

Reputation: 6 820

Related: How does Windows processor affinity work with hyperthreaded CPUs? (yes the question is Windows-specific, but my answer covers a lot of generic information related to Hyperthreading).

– Breakthrough – 2013-06-26T20:09:12.840

Answers

HT is like out-of-order execution, except that it can schedule a completely different thread instead of just reordering instructions. Sometimes the CPU sits idle because it's already reordered all of the instructions that it can, and it's still waiting on data or something else. HT allows another thread to be ready to run while the first is waiting.

There are still only two physical cores in your system. If core0 and core2 are at 100%, then your processor is at 100%. If the logical cores (core0, core1, core2, and core3) are all at 100%, then your CPU is actually running at about 115% of it's total without hyper-threading.

The reason you see the alternating core use is because core0 and core1 share the same physical core, and core2 and core3 share the same physical core. If the OS scheduled core0 and core1 instead of core0 and core2, then half the processor would sit idle most of the time.

Darth Android

Posted 2013-06-26T19:41:22.917

Reputation: 35 133

How is hyperthreading "like out-of-order execution"? – Breakthrough – 2013-06-26T19:54:59.707

@Breakthrough out-of-order execution is allowing various parts of the pipeline to be utilized by other instructions while other parts are busy (i.e., incrementing a register while the previous instruction is waiting for a memory fetch or write). Similarly, HT allows parts of the CPU that would otherwise be idle be utilized by another thread (as opposed to another instruction) in order to keep the CPU closer to 100% utilization. – Darth Android – 2013-06-26T19:59:34.687

How can a cpu run at 115%? Would it be because during downtimes in a process, another proceess could run? – agz – 2013-06-26T20:00:23.327

@agz see how Linux schedules processes, you can have a unit load greater than the number of cores in your system. – Breakthrough – 2013-06-26T20:00:48.730

I thought having a system load avg being greater was just processes waiting to be executed – agz – 2013-06-26T20:01:38.833

@DarthAndroid HT still requires an operating system schedule the processes to efficiently use the hardware, whereas out-of-order execution is implicit from a software perspective. Adding additional execution units is not the same thing as out-of-order execution, nor is it even close to comparable. HT doesn't magically allow parts of the CPU to be less idle, it actually adds more hardware with that being the end goal. From a software perspective, HT doesn't "allow" anything - it literally doubles the number of available cores that can execute instructions simultaneously. – Breakthrough – 2013-06-26T20:03:40.023

"If the OS scheduled core0 and core1 instead of core0 and core2, then half the processor would sit idle most of the time." Why does the os not schedual core1 and core3? – agz – 2013-06-26T20:07:44.800

@DarthAndroid or rather, to put my point more concisely: out of order execution deals with what instruction to run at what point in time. Hyperthreading adds a physical execution unit in hardware to allow you to run two different programs in parallel on one physical core. If anything, the concept of Hyperthreading is closer to multi-threaded or multi-core programming itself rather than OOE. – Breakthrough – 2013-06-26T20:11:43.790

@Breakthrough Yeah, I see what you're saying. I think I was looking at it more from the perspective of both add cheap complexity to better utilize relatively expensive hardware. – Darth Android – 2013-06-26T20:15:17.900

@agz Probably because 0 comes before 1 and 2 comes before 3. Computers do a surprising amount of things for reasons as simple as those, but in this case it's mostly speculation on my part. I'm not aware of performance differences between core 0 and core 1, but I wouldn't be surprised if there are. The link at the bottom of Breakthrough's answer further suggests that there's no difference between core 0 and core 1 – Darth Android – 2013-06-26T20:15:55.840

Since hyperthreading allows for unused cpu time to be used by other processes, why would this make it seem like 4 logical processors (2 per actual core). How is this different than out of order execution?

This isn't wholly correct. Hyper-threading simply adds an additional execution unit per core (in hardware), but shares the rest of the core, which is why in certain cases hyperthreading can actually reduce performance. Furthermore, hyperthreading does not explicitly "allow unused CPU time to be used by other processes". That is handled by your operating system's process scheduler.

Hyperthreading does, however, expose another "core" to the operating system, allowing it to schedule more programs to run simultaneously. Except in the cases where this would exhaust the shared hardware in a Hyperthreaded core (e.g. if two threads are running on separate execution units in a Hyperthreaded core which heavily use the cache, which is shared by both threads in a HT-core). For technical details about hyperthreading, see the Intel Core Processor Datasheet, Volume 1.

Lets say core0/core1 are one core and core2/core3 are another. If core0 and core2 are at 100% is my cpu at 100% load or only 50%? Also, how come processes tend to execute on core0 and core2 rather than core1 and core3?

It depends which parts of your processor are being stressed (performance gains by HT is very application-specific, again see above). Technically you might only be using 50% of the execution units in Core 0, but the rest of the hardware in the core might be loaded at 100% (e.g. if the program makes heavy use of cache).

For details, see the question How does Windows processor affinity work with hyperthreaded CPUs?.

Breakthrough

Posted 2013-06-26T19:41:22.917

Reputation: 32 927