Why do higer end server cpu typically have slower single thread performance?

1

Something I saw many times and which is confirmed by multiple benchmarks: Xeon cpu and more generally Intel cpu targeting the server market have slower per thread performance than a CoreX cpu.
Even a $117 22nm Core i3 Ivy Bridge cpu will typically run python workloads faster than a $2000 10nm Xeon Cannon Lake cpu. And it’s not even with Turbo Boost mode enabled!

Except in the case of python (where the language doesn’t have proper multithreading support) server workloads are more multithreaded and more multiprocess than the games and workloads run by an individual which explains why they favour sacrificing single thread performance in order to have more cores.

While it’s already know that Intel and other hardware maufacturers can no longer increase performance using single core designs, what (in details) does decreasing per thread peformance for the same microarchitecture brings? Why not continue to just add less but faster core per chip for the same price?

user2284570

Posted 2019-10-27T04:23:43.797

Reputation: 1 160

Or rephrased why you have better to run a huge python workload on a $117 22nm Core i3 Ivy Bridge cpu than a $2000 10nm Xeon Cannon Lake cpu? (since cpu bounds python programs can run only one thread at time). – user2284570 – 2019-10-27T04:27:04.743

3The answer is simple: Because server CPUs are optimized for multi-core processes. That is the typical scenario of a server (web- /database server, computing cluster, ...). – Robert – 2019-10-27T10:05:52.917

@Robert that’s I said in my question. But why putting less but faster cores isn’t equivalent to setting more but slower cores? – user2284570 – 2019-10-27T14:33:52.093

This is a question of the software you run. Typical server software can make use of multiple cores. A Python program not. Some typical desktop programs can make use of some cores. – Robert – 2019-10-27T17:11:15.917

@Robert do you know that python is Powerring the backend of several Google websites (though not the most used ones)? – user2284570 – 2019-10-27T17:36:37.330

Python web systems use multiple processes instead of multi-threading. Therefore Python can be used without problems on a server in this case. – Robert – 2019-10-28T09:05:04.553

@Robert except if you have a 2000 billions (like in my case) possibility graph where walking on it takes 5ms per path so that creating a process each time or relying on ɪᴘᴄ is too much overhead. – user2284570 – 2019-10-28T12:43:10.183

Answers

0

(This post is asking for speculation and I'm happy to oblige.)

Why not continue to just add less but faster core per chip for the same price?

The problem is that the current technology had hit its limits, so only minor performance improvements are now possible. Improvements of 10-20% just don't sound very convincing.

On the other hand, manufacturers do not wish to fall behind Moore's law, stating that computer chip performance would roughly double every 18 months (with no increase in power consumption). This needs an improvement factor of 100%, and such single-core technology just does not exist.

Solution : Double the number of cores and sum up their total capacity, as proof that performance is evolving fast enough by 100%.

In real life this theoretical increase of the number of cores is not guaranteed to increase the total performance, since some computer resources are shared and may become bottlenecks, for example the RAM, bus and disk.

What does decreasing per thread performance for the same micro-architecture brings?

Increasing the number of cores cannot be done indefinitely, especially in view of electrical consumption. For a core to work faster, it needs more electricity. This means that the more cores you have, each will have a smaller part of the total available electricity and so must work slower.

The solution here is turbo mode, whereby one core gets most of the available electrical supply. So you have one fast core and the others either turned off or slowed down. But as one core cannot support that mode indefinitely, the solution is to switch turbo mode on for multiple cores in rotation.

In general, for comparable technology, a CPU with fewer cores may prove faster than a multi-core CPU, for a core-to-core comparison. Other factors may influence the speed, but choosing between the number of cores and single-core performance is often the question. The applicability of turbo mode to the work-load is another question.

harrymc

Posted 2019-10-27T04:23:43.797

Reputation: 306 093

Consumer cpu have a smaller die size than Xeon processors. For a larger die size, why not just put more fast per thread consumer cores (albeit with things like ecc memory support)? – user2284570 – 2019-10-27T14:36:18.983

Also, please see the last edit to my question. – user2284570 – 2019-10-27T14:38:31.560

There is a practical limit to how much electricity a CPU can draw without burning up. Faster needs more power. And more cores does not mean that your program can work faster. – harrymc – 2019-10-27T17:05:36.390

What I mean is for the same amount of electricity (can go up to 900W for a Xeon die), why not add less but faster core? (so this isn’t electricity related). – user2284570 – 2019-10-27T17:38:34.773

Some CPUs are like that. The fastest CPUs on per-core basis usually have less cores. The ones with most cores are only faster on total aggregated throughput. The difference is there but is not dramatic (if the technology is comparable). – harrymc – 2019-10-27T17:45:30.400

down to 40% slower per thread anyway. The largest CoreX consume about 120W of electricity and the largest Xeon cpu consumes 900W. Why don’t we have $2000 900W xeon cores consisting of 30 fast core instead of 50 slower ones? – user2284570 – 2019-10-27T17:50:17.133

Some computer resources are shared and will cause waiting. Most supercomputers are more of networks of independent micro-computers than cores. There are practical limits for everything, and I have tried to list some of them. – harrymc – 2019-10-27T17:56:28.727