Why does a CPU with lower frequency score better than one with more?

2

I've just seen this comparison of two differenc CPUs.

The first one has 4.4GHz and the other one 2.5 to 3.5GHz. However, the one with the lower frequency scored better in the single-thread-rating than the one with 4.4GHz. What leads to this interesting result?

MinecraftShamrock

Posted 2014-10-31T16:47:56.800

Reputation:

4There are many different factors that determine performance than just clock frequency. – Matthew – 2014-10-31T16:49:56.123

@Matthew, and which factors are that for these two CPUs? – None – 2014-10-31T16:52:09.740

architecture, number of instructions that can run at the same time, cache size, memory bandwidth, instruction set... – phuclv – 2014-10-31T16:54:42.863

2The only thing I can find that's "worse" about the lower rated one is that it doesn't have AVX2. So probably the benchmark uses that if available. – None – 2014-10-31T16:58:17.477

The slower one has twice as many cores. So the fast one can do 4.4GHz cyles on 4 cores for a total of 17.6 GHz, whereas the slower one can do 2.5GHz cycles on 8 cores for a total of 20GHz, i.e. more computations per second. – Mark Setchell – 2014-10-31T17:16:34.280

1@MarkSetchell which explains the large difference in multit-hreaded score, but not the difference for the single-threaded score – None – 2014-10-31T17:21:45.210

@MarkSetchell you can't count multicore CPUs' clock like that. It's still the same. And with 4 cores the performance isn't exactly 4 times even if the application can utilize all the available cores – phuclv – 2014-10-31T18:23:56.380

From the PassMark site: "To ensure that the full CPU power of a PC system is realized, PerformanceTest runs each CPU test on all available CPUs. Specifically, PerformanceTest runs one simultaneous CPU test for every logical CPU (Hyper-threaded); physical CPU core (dual core) or physical CPU package (multiple CPU chips). So hypothetically if you have a PC that has two CPUs, each with dual cores that use hyper-threading then PerformanceTest will run eight simultaneous tests." So core count matters.

– Paul A. Clayton – 2014-10-31T18:46:48.027

In single-thread operation the E3 is likely to Turboboost to 3.5 GHz, removing a large portion of the frequency difference. (Without more details about the sub-benchmarks, one could only speculate about why the more recent design at lower frequency performs better.) – Paul A. Clayton – 2014-10-31T18:55:09.247

it's all about the architecture, not clock speed. For example - http://wccftech.com/amd-invest-cpu-ipc-visc-soft-machines/

– vsync – 2014-10-31T22:48:04.300

@PaulA.Clayton yes core count matters but the performance is not exactly 4 times if there are 4 cores. Benchmarks scores on 4 threads thread are never 4 times on single threaded test. – phuclv – 2014-11-01T05:47:45.140

@LưuVĩnhPhúc I agree. Yet I get the impression that the multithreaded benchmark is more like SPEC Int Rate (each thread runs the same program) which scales relatively well, though the Physics and String Sorting tests (working set 30 MiB and 25 MiB, respectively) might place a higher burden on the memory system. No sub-test values are given, so reasoning about results is difficult. (BTW, in this case 2x cores/threads → 2.07x performance [without turboboost increasing frequency by 40%!}, so obviously there are other factors [cf. 17% faster in single thread].) – Paul A. Clayton – 2014-11-01T11:42:07.837

@PaulA.Clayton but in real life pure mathematical calculations are rare, and in benchmarks like antutu... the multithreaded tests don't scale like that – phuclv – 2014-11-01T11:57:06.943

@LưuVĩnhPhúc Again, I agree (I can't say I really like the chosen benchmark, especially with the lack of sub-test results). Single figure of merit is not especially useful (unless the benchmark happens to match the user's targeted workload or at least allow easy fudge factoring based on known traits like mem. bw, LLC size, etc.). However, the OP was asking about results from the specific benchmark. If I wasn't lazy, I would check the SPEC CPU database for results from similar processors. (Sadly, SPEC allows autopar results in base, which makes single-thread results less available.) – Paul A. Clayton – 2014-11-01T14:29:00.850

Answers

5

You are comparing a Xeon 5698 processor (4.4 GHz) with an E3-1265 (2.5-3.5 GHz). Here are some reasons why they measured the single thread E3 performance as faster:

  • L1 caches are twice as big on the E3.
  • L2 cache is twice as big on the E3.
  • More register read ports on the E3.
  • Better branch predictor on the E3.
  • Instructions are cached after decoding on the E3.

The Xeon processor you are comparing is two families older than than the E3-1265. See this for details on how the Ivy Bridge family is faster than the Nehalem family.

(Note that it is hard to tell what exactly is causing the difference without knowing exactly what the benchmark code does.)

Craig S. Anderson

Posted 2014-10-31T16:47:56.800

Reputation: 174