14

I have a server that's primarily running a Ruby script. Because Ruby (2.7) has a GIL, it is single threaded.

My computer (server) has an Intel i3 dual core processor, but due to hyperthreading I see 4 cores. Ruby only utilizes 25% CPU under heavy load. I wanted to see if disabling hyperthreading benefits a programming language that runs on single thread.

Also, my server is running a very minimal desktop environment and it doesn't use more than 2% CPU. So I wanted to make most of the resources available to Ruby. I did a benchmark to see if I really get any performance boost by disabling hyperthreading.


Benchmark:

I wrote a simple Ruby script that runs a while loop and adds a the value of the loop counter with another variable. This program should use 100% of a CPU core:

#!/usr/bin/env ruby
$-v = true

LOOPS = ENV['N'].to_i.then { |x| x < 1 ? 100_000_000 : x } + 1
i, j, t = 0, 0, Time.now

puts "Counting till #{LOOPS - 1} and adding values to V..."
while (i += 1) < LOOPS
    if i % 10000 == 0
        e = Time.now - t
        r = LOOPS.*(e)./(i).-(e).round(2)
        print "\e[2KN: #{i} | Done: #{i.*(100) / LOOPS}% | Elapsed: #{e.round(2)}s | Estimated Rem: #{r}s\r"
    end

    j += i
end

puts "\nV = #{j}\nTime: #{(Time.now).-(t).round(2)}s"
  • With Hyperthreading:
⮚ ruby p.rb
Counting till 100000000 and adding values to V...
N: 100000000 | Done: 99% | Elapsed: 4.55s | Estimated Rem: 0.0s
V = 5000000050000000
Time: 4.55s

⮚ ruby p.rb
Counting till 100000000 and adding values to V...
N: 100000000 | Done: 99% | Elapsed: 4.54s | Estimated Rem: 0.0s
V = 5000000050000000
Time: 4.54s

⮚ ruby p.rb
Counting till 100000000 and adding values to V...
N: 100000000 | Done: 99% | Elapsed: 4.67s | Estimated Rem: 0.0s
V = 5000000050000000
Time: 4.67s

gnome-system-monitor reported 25% CPU usage by Ruby while the test was running.

  • Without Hyperthreading:

[ # echo 0 | tee /sys/devices/system/cpu/cpu{2,3}/online used to disable hyperthreads ]

⮚ ruby p.rb
Counting till 100000000 and adding values to V...
N: 100000000 | Done: 99% | Elapsed: 4.72s | Estimated Rem: 0.0s
V = 5000000050000000
Time: 4.72s

⮚ ruby p.rb
Counting till 100000000 and adding values to V...
N: 100000000 | Done: 99% | Elapsed: 4.54s | Estimated Rem: 0.0s
V = 5000000050000000
Time: 4.54s

⮚ ruby p.rb
Counting till 100000000 and adding values to V...
N: 100000000 | Done: 99% | Elapsed: 4.56s | Estimated Rem: 0.0s
V = 5000000050000000
Time: 4.56s

gnome-system-monitor reported 50% CPU usage by Ruby while the test was running.


I have even ran the test on my laptop, which takes around twice the time it took on my computer. But the result is identical: disabling hyperthreading doesn't help the process to do better. And even worse, my laptop gets a bit slower when multitasking.

So in the non-hyperthreading mode, Ruby used 2x the CPU power compared to the hyperthreaded mode. But why did it still take the same amount of time to complete the same task?

S.Goswami
  • 277
  • 1
  • 3
  • 9
  • 12
    _"`echo 0 | tee /sys/devices/system/cpu/cpu{2,3}/online` used to disable hyperthreads"_ - Are you sure that correctly disables hyperthreading? You might have reduced yourself to a single core with HT, instead of two cores without HT. – marcelm Mar 05 '20 at 10:54
  • As I have read for a dual core processor, if `cat /sys/devices/system/cpu/cpuN/topology/core_id` prints 0 or 1 that means they are using core 0 and 1 internally. For example, on my desktop and laptop both with dual core i3, `cat /sys/devices/system/cpu/cpu{2,3}/topology/core_id` prints 0 and 1 respectively. On my raspberry pi 3B, it prints 3 and 4 because the pi doesn't have ht support (and not present in /proc/cpuinfo). So I think for my system with i3 2 core processor, disabling 3 and 4 actually disables HT, doesn't it? – S.Goswami Mar 05 '20 at 11:28
  • 4
    In that case, yes, I would expect that to effectively disable HT. Although if you're benchmarking and the results are very important, I would opt for rebooting and disabling HT in the BIOS/EFI, just to be sure :) – marcelm Mar 05 '20 at 11:31
  • Ok so I did disable the hyperthreading support from the motherboard's UEFI setup, but it didn't perform any better, but it gets worse at multitasking... The same effect when I disable after boot using `echo 0 | tee ...` – S.Goswami Mar 05 '20 at 11:48
  • 15
    I don't see any difference between the two tests. You have values in the range ~4.5-4.7 seconds in both cases – Giacomo Alzetta Mar 05 '20 at 12:43
  • 21
    Do you know what HyperThreading is? Modern CPUs spend a LOT of their time waiting for data from RAM. Like, really a LOT. So what hyperthreading does is that when one thread is waiting for RAM, the core just puts it aside and starts to execute another thread. Until that too gets to wait for RAM. Then they both wait until one of them gets the data and can continue. This makes the use of a single core much more efficient, so I would be really surprised if disabling HT would somehow make something faster. – Vilx- Mar 05 '20 at 14:27
  • 8
    @Vilx- Even simpler: If disabling HT made things faster then Intel wouldn't put HT on their chips. – user253751 Mar 05 '20 at 16:52
  • Well actually phoronix has a benchmark about that. https://www.phoronix.com/scan.php?page=article&item=intel-ht-2018&num=1 ("In most multi-threaded workloads carried out, Hyper Threading is still very much relevant in 2019."). My question actually is about improving perf in a server system that doesn't need to run multiple apps and also, it runs one instance of a programming language that has a GIL. Intel takes into account that home users need to run multiple apps and most of them are multithreaded, like a DE or a web browser... – S.Goswami Mar 05 '20 at 17:16
  • 1
    @Vilx-, the difference is surprisingly little -- even for compiling, which is pretty much the worst case for memory dependencies, I see that once the number of threads scheduled exceeds the number of cores, wall clock time does not go down as quickly as CPU time goes up. My last experiment compiling KiCad saw 63 minutes of CPU time without HT, and 100 minutes with, wall clock time went from 15.3 to 12 minutes. – Simon Richter Mar 06 '20 at 11:04
  • @Vilx- But that's not how HT works. At least not the one on Intel's CPUs. HT is the Intel's cheap attempt at a low form of ILP where the front-end alternates (each cycle and on stalls) the fetching between the two threads. The backend is aware of HT only due to the two architectural states. So the goal of HT is to fully use all the micro-arch resources, it has nothing to do with memory (btw, an L1 hit is 4-5 cycle) but with OoO. E.g. a CPU can do an int add in 2 EU, but a program may have data dependencies that force the adds to be serial. Disabling HT can easily improve performance *if* ... – Margaret Bloom Mar 06 '20 at 14:31
  • @Vilx- ... you finely tuned your (most likely assembly) code to already exploit all the micro-arch resources. In that case, the other thread is stealing EUs. – Margaret Bloom Mar 06 '20 at 14:32
  • 1
    "*Because Ruby (2.7) has a GIL, it is single threaded.*" It is multi-threaded, but only one can be executing at a time. Ruby threads are useful to do multiple long-running tasks effectively simultaneously. And they can prevent the whole application from blocking if the code must wait for input or I/O or network. – Schwern Mar 07 '20 at 00:30
  • Yes, I am aware of the Thread and Fiber classes in Ruby, but in the lower level, they can't use more than one core. Ruby 3.0 may not have the GIL though... – S.Goswami Mar 07 '20 at 06:07

2 Answers2

46

Your Ruby program did not use 2x the CPU time when running with HT disabled. Rather, as it maximizes one core out of two total cores, gnome-system-monitor will report as the utilization as 50%. If, due to HT, the system reports four total cores, one core out of four would be 25%.

Disabling HT did cause more variation in your results because less resources were available: recent Intel (or AMD) cores are quite wide, so additional threads are often useful to extract 10-20% more aggregate performance. If some background process was automatically executed during the test runs, the system without HT is prone to more variance and lower total throughput.

shodanshok
  • 44,038
  • 6
  • 98
  • 162
  • 3
    It is worth noting that many CPU monitors read relative to a single core, in those you will often see a process taking >100% CPU because it is threaded and taking advantage of more than one full core worth of CPU time. If is important to know what your monitor is reading when interpreting the results. – David Spillett Mar 06 '20 at 15:20
  • 1
    Yep, this is how `top` and the likes work. – shodanshok Mar 06 '20 at 17:07
  • @DavidSpillett That can get a bit confusing with Windows, especially since the algorithm was changed with newer versions of Windows. Windows 8/Server 2012's default view works exactly as you describe (relative to a single core), while Windows 7/Server 2008 R2's view (as well as later versions' Details view) reads relative to the entire CPU. – gparyani Mar 07 '20 at 22:27
3

I wanted to see if disabling hyperthreading benefits a programming language that runs on single thread.

I don't know how cutting the number of cores would improve performance, even for a single threaded app. When hyperthreading is enabled, your cpu is running with 4 virtual cores. A single threaded app using all the cpu it can would use 25% of the available CPU. When you disabled hyperthreading, you took the number of cores down to 2. Now that single threaded app can use 50% of the available CPU.

Ruby isn't using 2x the CPU, it's that you have 1/2 the CPU available when you disable hyperthreading. If you have a large cup 1/4 full of water and pour it into a smaller cup that becomes 1/2 full of water, you still have the same amount of water.

I have even ran the test on my laptop, which takes around twice the time it took on my computer. But the result is identical: disabling hyperthreading doesn't help the process to do better. And even worse, my laptop gets a bit slower when multitasking.

Yes, you are taking away about 1/2 the power of your CPU. That can make the Ruby thread run slower also. Say you have 3 threads that want to be running at the same time in addition to your Ruby thread. If you cut the virtual cores down to 2, it's more likely that your Ruby thread will be paused at least a little to let another thread have sime time.

Jason Goemaat
  • 661
  • 1
  • 6
  • 14
  • 2
    Disabling HT will improve the performance of high-priority tasks that effectively get exclusive access to a CPU, because there are no instructions from a lower-priority task interjected, which can reduce the performance of that single task by 40%. Wall clock time until "all tasks completed" is going to be lower with HT, wall clock time until "high priority tasks completed" is going to be lower *without* HT unless you have more high priority tasks than cores (because if everyone has high priority, then no one has). – Simon Richter Mar 06 '20 at 10:45
  • 1
    @SimonRichter I've even [actually seen](https://stackoverflow.com/q/20640193/673852) 1..4-job parallel compilations speed up when I disabled HT on a 4-core i7 CPU. – Ruslan Mar 06 '20 at 13:15
  • 1
    @Ruslan To be fair, that's seven years old and on Linux. Linux doesn't have the best track record on keeping up with driver quality - it's very possible the kernel you were using simply didn't have any idea about the HT "virtual" cores and treated them as homogeneous, thus leading to overscheduling the already partially utilized cores. It'd be interesting to see if anything changed in the meantime, but then again, Intel no longer recommends using HT (though it's for security rather than performance reasons). – Luaan Mar 06 '20 at 14:14
  • 1
    @Luaan it's unlikely to have had no idea about HT: I remember enabling `CONFIG_SCHED_SMT` when configuring the kernel. – Ruslan Mar 06 '20 at 14:26
  • And I have ran the xonotic game without HT, xonotic uses multiple cores at once. But still, I got a FPS increase and was able to enable some of the effects even without a GPU! – S.Goswami Mar 07 '20 at 06:11