Is there a BIOS setting that controls cpu load sharing?

13

1

I've noticed that for any long running single threaded task, my home PC allocates full usage to a single logical core for the entire process. However, for the exact same process, my work PC shares the load among all cores (each core takes a turn at running the single threaded process).

Both PCs run Windows 10. My home PC has a different CPU and a different motherboard (ASUS ROG 11th edition).

This seems to be the case for any process, but the example I have just tested it on is an R script I wrote. Both PCs running exactly the same R script, same version of R, have different approaches to cpu load sharing. What's worse is that my home PC seems always to use CPU0 for this sort of thing.

I am hoping that there is a BIOS setting I can apply on my home PC to get it to share the load evenly. Is there?

Chechy Levas

Posted 2019-12-08T08:48:01.220

Reputation: 249

What make and model are the CPUs in question? One might have turbo boost while the other does not. – Daniel B – 2019-12-08T10:48:06.823

12BTW bouncing a thread between CPUs is not load sharing or advantageous in most cases processing (warm cache, less context switching) is more efficient if sticking to a single core. – eckes – 2019-12-08T23:32:01.857

5"I am hoping that there is a BIOS setting I can apply on my home PC to get it to share the load evenly" - why would you need this? For aesthetic reasons? If you're concerned that the CPU will burn prematurely, I'd say that in my experience most PCs get replaced for other reasons. – Ivan Milyakov – 2019-12-09T00:48:53.533

Answers

11

I believe the most likely culprit is that your home machine is utilizing a feature in the Windows 10 scheduler, widely known as "favored core" support, that prioritizes high-performance cores over low-performance ones. Previous to 2018, a desktop CPU could generally be trusted to run a thread at the same speed no matter which core you put it on. Even if one core were theoretically capable of running at a higher frequency for a given voltage than another core, the CPU was not designed to allow it to.

It was only with the advent of AMD's Zen+ Ryzen CPUs in 2018 that a change in this situation became widespread. With these models, AMD started allowing CPUs with cores of mixed quality to boost to different clock frequencies depending upon which cores are under load. This was rendered largely ineffective when the scheduler swapped the thread around to every core regardless of performance profile. The performance penalty was compounded by AMD's architecture of having the cores split into groups called "CCX"s; transferring a thread from one core to another within a CCX is faster than spreading it among different CCX's.

Intel's "Extreme Edition" CPUs have this sort of explicit, mixed-performance support as well. They refer to it as Intel Turbo Boost Max Technology 3.0. Intel states that the earliest version of Windows 10 supporting this is "RS5", which appears to be 1809.

With Intel® Turbo Boost Max Technology 3.0, lightly-threaded performance is optimized by identifying your processor's fastest cores and directing your most critical workloads to them.

Until 2019, all versions of Windows were ignorant of these facts and scheduled threads equally across all physical cores for AMD CPUs. Windows 10 version 1903 included an updated scheduler that is aware of AMD's CCX units and tries to keep threads within the same unit. link

The improvements are intended to have a special effect on tasks that use only a few cores. The threads would now switch back and forth between the individual CCXs less.

Windows 10 version 1909 brought further improvements to the scheduler, now making it aware of the mixed-performance core situation, in a feature that is called "Favored CPU Core Optimization". link

in a recent Windows Insider blog post Microsoft has stated that Windows 10 19H2 will include optimizations on how instructions are distributed to these favored cores.

I admit, my understanding of this timeline is not 100% certain, and favored cores may be utilized in earlier versions, but it's been surprisingly difficult to find concrete information on this. Most news posts seem to agree that "favored core" support is entirely new to 1909, despite the language implying that it was present in earlier versions.

The ARM architecture actually had explicit support for a mixed-performance configuration like this called "big.LITTLE" since 2011. A Windows 10 build that runs on ARM was released in 2017, and support for big.LITTLE was included either from the start or at least by 2018. This seems to have dovetailed nicely with adding support for our modern Intel & AMD situation.

As an aside, logical cores are only excluded until needed because they are parked, not because the scheduler itself understands anything about them. link

Core Parking is supported only on Windows Server 2008 R2. However, the Core Parking algorithm and infrastructure is also used to balance processor performance between logical processors on Windows 7 client systems with processors that include Intel Hyper-Threading Technology.

Corrodias

Posted 2019-12-08T08:48:01.220

Reputation: 399

1I am changing the accepted answer to this one. All the answers so far have been very useful and informative, but this hits the nail on the head. I have confirmed that my work cpu does not support ITBMT3.0, whilst my home PC does. – Chechy Levas – 2019-12-10T04:29:03.687

1Somewhat nitpicky admittedly: "Previous to 2018, a CPU could generally be trusted to run a thread at the same speed no matter which core you put it on". True for your run-of-the-mill desktop CPU, but if you look a bit outside there are definitely earlier widespread examples. It's quite likely you had a phone with e.g. ARMs big.LITTLE implementation way before 2018 (not sure if the NT Kernel scheduler ever considered these, given that Windows didn't really catch much phone market share). – Voo – 2019-12-10T16:07:28.273

@Voo Oh, very interesting! I'll alter the wording slightly to say "desktop CPU". – Corrodias – 2019-12-11T10:18:25.990

26

Scheduling of threads to cores is an art, and a very difficult one. This has to do with the way modern multicore CPUs manage their thermal profile. Depending on the exact model the CPU might do more or less of one or more of the following:

  • Boost the clock frequency of one or more cores if there is thermal wiggle room
  • Shut down unused components (including cores) to make more room in the thermal profile
  • Throttle some cores to allow boosting others insider the thermal envelope
  • much more

This implies, that for a single-threaded workload (like an R script) the best strategy varies wildly:

  • If the CPU can boost one core at the expense of others, it is a good idea to "pin" that thread to the one boosted core
  • If the thermal envelope is small, it makes sense to assign the task to a different core from time to time to the coolest core

Whatever the scheduler choses, you should trust it to do a better job than any human could ever do.

Eugen Rieck

Posted 2019-12-08T08:48:01.220

Reputation: 15 128

2Scheduler is a program written by a human. "Pining" an R thread to a core would also be programmed by a human. While I understand the sentiment, that the scheduler was written by presumably competent people in the area, and the OP, since they have to ask are less competent in the area, a statement "scheduler .. do a better job than any human could ever do" sounds off to me. – Andrew Savinykh – 2019-12-08T19:29:22.117

12@AndrewSavinykh I think the point is that tinkering with processor affinity is a bit like making manual calls to a garbage collector or memory manager. Most of the time, the system will do better on its own. The need for intervention in such systems is rare, and such interventions require a deep understanding of what the system is doing in order to properly evaluate whether an intervention will result in a more efficient workflow or not. There isn't really a 'rule of thumb' that works in such cases. Each circumstance really needs to be evaluated on its own merits, and this isn't an easy job. – J... – 2019-12-08T20:19:48.237

10@AndrewSavinykh While the scheduler has been written by a human being, it can do something that no human being can ever achieve: Read a lot of state information every Millisecond and act accordingly. While a human can of course design such behaviour, a human can never actually act that way. – Eugen Rieck – 2019-12-08T21:07:24.857

3

Scheduling (like any other piece of system software) makes trade-offs (thoughput, latency, fairness, etc) and makes certain choices. A user may be able to make better choices because they might have better knowledge of their use-case and the trade-offs they prefer.

As an example, consider the paper "FreeBSD ULE vs. Linux CFS" (https://www.usenix.org/system/files/conference/atc18/atc18-bouron.pdf) which demonstrates how these schedulers make diff trade-offs and achieve better or worse performance in diff scenarios.

– hojusaram – 2019-12-09T07:15:02.463

@hojusaram Indeed different schedulers make different trade-offs. And all of them work a lot better than manual scheduling - a human is just not fast enough. TBC this is what I ment with "a better job than any human could ever do" – Eugen Rieck – 2019-12-09T10:59:44.420

13

No. A computer's BIOS might be able to control which CPU cores are enabled or disabled and the core's speed, but it has no control over what is executed on it. The execution of a program and its threads are controlled by the operating system.

Now as to why the your two computers behave differently, well that is a completely different question. It could be your OS setup or R configuration. That would need to be asked in a different question and would require more details on your hardware and software configurations.

I also want to note, there is nothing wrong with it running solely on one core. Running programs is what it is designed to do. Perhaps, your work computer is running more simultaneous tasks and has to juggle CPU usage. It may be your home computer has faster cores and has no need to swap threads around to other cores.

Keltari

Posted 2019-12-08T08:48:01.220

Reputation: 57 019

Sounds logical. I'm pretty sure R is exactly the same as I set it on both recently. I completely reinstalled Windows 10 on my home PC recently and didn't do anything funky. Can't comment on work PC setup. Home PC definitely has faster cores. I guess my implicit concern is about excessive wear on one core at home. Is that even a valid concern? – Chechy Levas – 2019-12-08T09:38:03.640

11@ChechyLevas No. Again, executing code is what they are made for. As long as you arent doing some crazy overclocking and your cooling is adequate, there is nothing to be concerned about. – Keltari – 2019-12-08T11:41:01.357