Are both threads on a physical CPU treated equally?
Yes. There is no preference in allocation of the core's execution resources to one thread or the other. (What's an "execution resource"? See the article I linked below. But examples are things like the architectural registers (IP, SP, EAX, etc.), the "execution units" that implement specific operations like arithmetic, etc.)
As we know, hyper threading on intel CPUs is a system in which each physical core is presented as 2 virtual cores to the OS.
It's actually just presented as two things-that-seem-to-the-OS-to-be-cores - or two "logical processors" as Windows calls them. (Despite what Ramhound claimed.) If you have HT turned off, then each core just has one LP, so the same terminology gets used.
If you have a not-HT-aware OS then your core with HT enabled will just look like two cores, and will be enumerated and used that way by the OS. In fact this was the case in Windows 2000, which had no awareness of HT whatsoever.
These 2 virtual cores give the processor the ability to context switch between 2 execution units (threads @ virtual cores) on events that are unknown to the OS (page faults, other cpu-internal events) that would normally make the CPU waste cycles waiting on other IO events.
That isn't actually how it works (and page faults, I would add, are most definitely known to the OS! Perhaps you're thinking of memory access latency). The HT-enabled processor does not do anything like an OS-level context switch between threads. Remember, an HT core actually enumerates as two different (logical) processors, each with its own set of architectural registers, etc. In a thread context switch the contents of those registers are copied into memory (in the thread object) and the registers are loaded from the saved context of some other thread. That doesn't happen in HT, because those registers (and many other resources) are duplicated by the two LPs. So the state of each LP is maintained continuously within the CPU.
But there are other resources that aren't duplicated. HT allows the processor firmware to use execution resources, resources that would be wasted if just one LP was running, to support the activities of a second LP. Here is a really good description: http://arstechnica.com/features/2002/10/hyperthreading/
By default, an OS does not need to take hyper-threading into account. All cores will eventually do the work, only difference being that now not all visible/virtual cores may process at the same speed. Work scheduled for 2 threads on the same physical core (VCPU0+1 -> CPU 0) will not be as fast as work scheduled on 2 different cores (VCPU0+2 -> CPU 0 + 1).
All correct. Windows 2000 had no awareness at all of HT, but when it was run on a single CPU core with HT enabled, it "saw" two processors and used them. (Due to the enumeration order with early firmware for such platforms, if you were running an edition of Win2K that only supported two CPUs, it would unfortunately use two LPs within one of the CPU packages! This was later fixed with a firmware ("BIOS") update that changed the order of the LPs' appearance in the ACPI tables.)
From what I've researched, 'hyper-threading' aware OSs will go as far as to try and schedule work per physical cores before doubling up on scheduling on 'virtual cores'.
Windows certainly does. They try to use only one LP per core. For example, when looking for an idle LP to run a newly-Ready thread on, the Windows scheduler first tries to find an idle LPs that's in a core where both LPs are idle.
I usually see this as the 'even' VCPU getting scheduled first (fill VCPU 0+2 before 1+3). Are both the 'even' and 'odd' thread equal?
Well, they're equal in that there is no built-in bias within the core for one LP over the other. It may be that one happens to get work done faster than the other because the thread running on one needs fewer execution resources than the other.
It is actually something of a deficiency of HT, by the way, that the HT-enabled CPU does not implement any concept of priority between the two LPs. Let's say you have two compute-bound threads which to the OS are of different priority. By the OS's rules, if you had just one logical processor, the higher-priority thread would get practically all of the CPU time. But, suppose instead that we have two free LPs that happen to be in the same core. The core firmware will try to run them approximately equally even though that's not really what the OS wants. (At least, the last time I looked at these details, this was the case.)
(theres not actually a 'hyper thread' virtual CPU). In other words, is there not a primary/secondary 'thread' for a physical CPU?
Correct. There is not. By the way, CPUs and CPU cores don't "have threads". A logical processor can run a thread. The more LPs you have, the more threads can be "computing" at the same time.
If i schedule work on just VCPU 1, will it perform the same as if i just scheduled on VCPU 0?
Yes.
Assuming that, if equal work is scheduled on both, will it take roughly twice as long for both threads to complete?
Well, no. The whole point of HT is that a core typically has more execution resources available than any one thread can use. By presenting the core as two LPs, two threads can run "at the same time" and have fewer execution resources sitting idle. With most threads you can expect your two threads to complete in somewhere between 1.4x to 1.7x the time that just one of them would have taken.
An extreme case would be if one of the threads was doing almost all integer arithmetic, and the other almost all floating-point work. Still, you would be unlikely to get the same performance as if the two threads were running on two different cores, due to shared L3 cache and memory bandwidth issues. But if both threads were not doing much memory-intensive work, you might come quite close.
Software can't tell the difference between a physical core and a virtual core. The OS can but that is because the mode it runs in. As we know, hyper threading on Intel CPUs is a system in which each physical core is presented as 2 virtual cores to the OS. - This is false. It actually is reported ,as 1 physical core and 1 virtual core, only the OS is aware of this distinction. You don't seem to be all the familiar with Hyperthreading.. – Ramhound – 2016-04-11T18:05:33.897
1Ramhound, the CPU package per se does enumerate as two "processors" (Windows calls them "logical processors", LPs) per core if HT is enabled in the firmware. There is no such concept as "physical core" vs. "virtual core"'; the OS need do nothing special to use both. Otherwise, OSs that are completely unaware of HT, like Windows 2000, would not have been able to use the two LPs per core. But, of course, they did. It is only by querying ACPI tables that more recent OSs can learn the relations of LPs to cores (and cores to CPU sockets) and optimize their scheduling decisions accordingly. – Jamie Hanrahan – 2016-04-12T19:49:33.110
yea what i wanted to confirm here was that the OS was truly presented with: 2 equal 'cores' (on 1 physical core), instead 1 'core' and 1 'virtual core'. plus confirm that they all are real cores as far as any OS should be concerned, with exception to the special scheduling concerns discussed here – Colin Godsey – 2016-04-13T02:26:16.867
Again... since Win2K is able to use the two "logical processors" even though it has no awareness of SMT/HT whatsoever, that is pretty good indication that as far as any non-HT-aware OS is concerned they're indistinguishable from "real cores" that don't have HT enabled. But from the ACPI tables the OS can learn the relationships of LPs to cores, and cores to physical CPU packages (the things that go in sockets). Also it is worth mentioning that anything running on, say, two HT LPs has to be just as careful about serialization issues ("race conditions") as it would on two non-HT cores. – Jamie Hanrahan – 2016-04-13T04:13:46.300