Why have CPU manufacturers stopped increasing the clock speeds of their processors?

I have read that manufacturers stopped concentrating on higher clock speeds and are now working on other things to improve performance.

With

an old Desktop machine with Intel® Xeon® Processor E3110 with clock speed of 3.0GHz
and a new server with AMD Opteron(TM) Processor 6272 with clock speed of 2.1GHz

when performed a simple encryption comparison using (single threaded)

 openssl aes256c

the desktop performed far better than the server.

So even with latest optimization, why does the processor with the better clock speed perform better?

learner

Posted 2013-07-08T08:00:31.780

Reputation: 521

57The desktop chip is a dual-core; the server is a 16-core CPU. Using a single-threaded benchmark is NOT appropriate at all. – MSalters – 2013-07-08T09:58:01.277

@learner - Because the speed of the processors cannot increase with increasing the amount of voltage they required which reduces heat. By concentrating on power consumption they will be able to in theory ncreased the processor speeds in the future. – Ramhound – 2013-07-08T11:04:31.147

1Please cite actual cases of 'manufacturers' 'even reducing' clock speeds (without comparing apples and oranges) or limit your question title to 'not increasing'. – Jan Doggen – 2013-07-08T15:06:00.480

8AMD vs Intel clock speeds haven't been a fair comparison since the K6/Pentium days. AMD marketed Athlon processors as 2500+ or 3000+ when their core clocks might be 1.8 or 2.1 respectively, but they typically benchmarked quite respectively with Intel chips that did clock a true 2.5 or 3GHz. There are simply way too many differences between architectures now to make a comparison simply on clock rates. – KeithS – 2013-07-08T15:35:57.557

– Rich Homolka – 2013-07-08T17:40:12.333

Execute 16 of them at the same time and you should start to see a difference... ;-) – Marcello Romani – 2013-07-08T22:59:01.487

Answers

The reason manufacturers have stopped concentrating on increasing clock speed is because we can no longer cool the processors fast enough for this to be viable. The higher the clock speed, the more heat is generated, and we've now hit a stage where it is no longer efficient to increase processor speed due to the amount of energy that goes into cooling it.

Other answer goes into detail on how higher clock speed doesn't mean better performance in all areas.

Paul Hay

Posted 2013-07-08T08:00:31.780

Reputation: 551

In the past, process shrinks have allowed for faster clocks with less heat generated. Try over clocking a 386 to a mere 200MHz and it will melt but now we can go to 2GHz+ and emit less heat. So isn't the limit actually something else? Explain why process shrinks haven't been able to sustain this pattern recently. – thomasrutter – 2016-01-21T08:58:26.220

1+1 I seem to have purchased my machine right when this wall was hit, my 8 year old P4 3.4 ghz is probably the fastest in terms of clock speed, when looking at a vast majority of the market (non OC). – Karthik T – 2013-07-08T10:12:33.193

2note that the power consumption would be a problem too. If you had a 16 core 3.0GHz, that would probably consume 200+ watts, which the most common power supplies can barely support in combination with the rest of the system. – Mixxiphoid – 2013-07-08T10:15:15.513

9@Mixxiphoid You would also need to get those 200+ W into the CPU somehow, at a voltage the circuitry can handle. That is a non-trivial task in itself. – a CVn – 2013-07-08T11:11:37.570

@KarthikT - Intel has made significant performance improvements since their Pentium 4 product line was released. You cannot compare the peformance of a Pentium 4 to even a Dual Core Duo for these reasons. With the previous two generations (SandyBridge and IvyBridge) they concentrated increasing the performance of an indivual core. With Haswell they concentrated on power consumption. The clock speed of the Pentium 4 is significantly slower compared to the newer geneations because of those changes even if the clock speed is "slower". – Ramhound – 2013-07-08T13:29:33.043

1Not just that, to increase the CPU Clock they need to increase the pipeline, but every time you need to fork the code, change the context, jump or clear the memory you clear the entire pipeline, and you need to fill it again to give a result for that instruction, so its better to reduce the pipeline and the CPU frequency, so every time you need to fork, change context, jump you dont need to wait a long time to fill the pipeline again. – Lefsler – 2013-07-08T14:15:15.323

@demonofnight - In other words even though the clock frequency has not increased processors can still become faster. They just basically increase the number of instructions pure clock cycle. – Ramhound – 2013-07-08T15:25:51.900

1@demonofnight: It would be too much to say "every time you need to (...) jump". Unconditional direct branches pose no control hazard so they don't count out of conditional branches ~95%-99% are predicted by various techniques (data based on some paper basing on SPEC). Direct branches do pose the problem if the target is not yet in register and there is misprediction of it. I am not sure what you mean by 'fork the code' or 'clear the memory' but contexts switches should not be executed so often (interrupts and possibly I/O). – Maciej Piechotka – 2013-07-08T16:26:28.370

Ohh, that is true, i completely forgot about that. Thanks – Lefsler – 2013-07-08T17:22:29.280

@Ramhound no doubt,hence my disclaimers. moores law still alive no doubt, especially since my new laptops i7 is 20 times as good as my old P4 according to passmark's benchmarks. btw something wrong with the last sentence in your comment? – Karthik T – 2013-07-09T02:23:26.307

There is a lot more to processing speed than the clock rate.

Different CPUs can do different amounts in the same number of clock cycles, due to different variants on pipeline arrangement and having multiple component units (adders and so forth) in each core. While in your test it is not the case, you often find a "slower" chip can do more than a fast ones (measured by clock rate only) due to being able to do more per tick.
The test you performed may be very sensitive to differences in CPU architecture: it could be optimised for a specific architecture, you might find it performs differently not just between Intel chips and AMD ones but between Intel (or AMD) chips of different families. It is likely using a single thread too so is not taking advantage of the CPUs' multiple cores.
There is a move to lower clock rates for power and heat management reasons: ramping up the clock rate does not have a linear effect on power use and heat output.
Because of the above non-linear relationship it is far more efficient for today's requirements to have multiple processing units than it is to push the speed of one unit ever higher. This also allows for clever tricks to conserve power like turning off individual cores when not in use and revving them back up as demand increases again. Of course multiple cores doesn't help a single-threaded algorithm of course, though it would if you ran two or more instances of it at the same time.

David Spillett

Posted 2013-07-08T08:00:31.780

Reputation: 22 424

So what is the relationship between clock rate and power use? – user84207 – 2014-08-09T06:09:23.787

$P = CV^2f$. You may also want to read this.

– zakkak – 2015-01-14T07:17:24.213

Why do you think the manufactures are actually lowering the clock speed by only comparing two processors?

The 6272 has a Turbo Speed of 3Ghz. The lower base speed is just for lowering average wattage and keeping a acceptable TDP for a workloard when all cores are stressed.
AMD's next high performance chip for desktop the FX-9590 will hit 5 Ghz.

Also clock-speed isn't the same as performance per clock-cycle. You can have a 3.8 Ghz P4 vs. one 3.2 Ghz core from a i7-3930K, but that doesn't mean the P4 core is faster.

Everything said here about power consumption is also perfectly valid and true for a 16 core design, where you naturally got to be more concerned about TDP issues.

Also your benchmark method just testing openssl is a bit to simple to give real world numbers. Maybe you should try any crypto benchmark suite.

s1lv3r

Posted 2013-07-08T08:00:31.780

Reputation: 293

x86 CPUs used to require multiple clock cycles per instruction, but these days they can run multiple instructions per clock cycle. – Oskar Skog – 2017-11-04T12:49:29.703

3Just to add an analogy to clock-speed = performance fallacy. Imagine one person taking very small steps but very quickly (high clock speed), vs another person taking very big steps at a slightly slower speed (lower clock frequency). The person taking big steps can move much quicker. – Martin Konecny – 2013-07-08T15:25:12.267

@MartinKonecny: Great visualization! – Zach Latta – 2013-07-08T17:24:44.613

2@MartinKonecny My understanding is that most assembler instructions (ADD, MOV, IMUL, etc) are performed in a single cycle. So with these new processors, are multiple instructions being performed in a single cycle? – nialsh – 2013-07-08T21:09:40.727

@nialsh That is not true at all for CISC computers (in fact one of the defining things of CISC is instructions take multiple cycles), if all instructions took one cycle then the slowest instruction will take the same amount of time to execute as the fastest instruction.

– Scott Chamberlain – 2013-07-08T21:47:15.193

Your test case (aes-256 encryption) is very sensitive to processor-specific optimizations.

There are various CPUs that have special instructions intended to speed up encryption/decryption operations. Not only may these special instructions be only present on your desktop - it might be that the AMD CPU has different special instructions. Also, openssl might support these special instructions only for the Intel CPU. Did you check whether that was the case?

To find out which system is faster, try using a "proper" benchmark suite - or better, just use your typical workload.

jakob

Posted 2013-07-08T08:00:31.780

Reputation: 241

Where does the translation to that special instruction happen? I am not sure if there are different compilers for different instruction sets. – Shubham – 2013-07-08T16:38:39.280

Compilers do have options to target different instruction sets, and/or special "intrinsic functions" that map closely to CPU-specific instructions. It's possible for a single executable to check what family of CPU it's running on, and select a different code path based on that. – Russell Borogove – 2013-07-08T17:20:06.720

Simple: The AMD chip is far, far faster because it is a 16 core chip. At 115 Watt, it means each core produces ~7 Watt. This would not be achievable if each core ran at 3 Ghz. To achieve that 7 Watt figure, AMD lowered the clock frequency. Lowering the clock frequency by 10% reduces power consumption by 20%, which in turn allows you to put 25% extra cores on a chip.

MSalters

Posted 2013-07-08T08:00:31.780

Reputation: 7 587

As others have said, we can no longer effectively cool CPUs if we were to push the voltage required for the same relative clock rate increases in the past. There was a time (P4 era and prior) when you could purchase a new CPU and see an "immediate" gain is speed because the clock rate was significantly increased compared to the previous generation. Now we have hit a thermal wall, of sorts.

Each new modern generation of processors is very slightly increasing in clock rate, but this is also relative to the ability to cool them appropriately. Chip makers, such as Intel, are continually focusing on shrinking the die size of the CPU to both make them more power efficient and produce less heat at the same clocks. As a side note, this shrinking die size makes these makes modern processors more prone to die from over-volting rather than overheating. This means that it is also limiting the ceiling clock rate of any current generation CPU without other optimizations made by the chip maker.

Another area that is being heavily focused on by chip makers is increasing the number of cores on chip. This does factor into significant increases in computational power, but only when using software that takes advantage of multiple cores. Note the difference between computational power and speed here. Simply put, speed refers to how quickly a computer can execute a single instruction, whereas computational power refers to how many computations a computer can make in a given amount of time. Modern day operation systems, and much modern software does take advantage of multiple cores. The problem is that concurrent/parallel programming is more difficult than the standard, linear programming paradigm. This increased the time it took for many programs on the market to take full advantage of these newer processors power because many developers were not used to writing programs this way. There are still some programs on the market today (either modern or legacy) that do not take advantage of multiple cores or multi-threading. The encryption program that you cited is one such example.

These two areas of focus by chip makers are intrinsically connected. By reducing both the die size and power consumption of a chip, they are then able to increase the number of cores on said chip. Eventually though, this too will hit a wall, causing another, more drastic, paradigm shift.

The reason for this paradigm shift is due to us coming close to the limits of silicon as a base material for chip production. This is something that Intel and others have been working on solving for some time. Intel has stated that it has an alternative to silicon in the works, and we will likely start seeing it sometime after 2017. In addition to this new material, Intel is also looking into 3D transistors that could "effectively triple the processing power". Here is an article mentioning both of these ideas: http://apcmag.com/intel-looks-beyond-silicon-for-processors-past-2017.htm

PseudoPsyche

Posted 2013-07-08T08:00:31.780

Reputation: 402

The heat losses H equal 4th degree of frequency f.

H ~ f^4

So, the minor increasing of frequency leads to high heat losses.
Farther miniaturization

Higher frequency leads to farther crystal minimization. At this moment we have no technologies to work effectively with nano-meter scale materials and nano-meters are the limit.

Warlock

Posted 2013-07-08T08:00:31.780

Reputation: 129

-1 The fourth power part is not right. Power (heat generated per second) in CPUs is (roughly) linearly proportional to clock frequency like P ~ f C V^2 + P0 (http://en.wikipedia.org/wiki/CPU_power_dissipation). Granted voltage depends on clock speed (though not necessarily linear). See: http://physics.stackexchange.com/questions/34766 Bottomline, power generated by CPU is roughly linear to quadratic on clockspeed in the range of 1.6 GHz - 5 GHz. (Not prop to f^4 ).

– dr jimbob – 2013-07-08T20:22:49.473

As stated in a few other answers, CPU manufacturers want to keep clock speeds down to control power consumption and heat dissipation. In order to do more work at the same clock speed, several strategies are used.

Large on-chip memory caches can keep more data "close to" the CPU, available to be processed with minimal delay, as opposed to main memory, which is much slower to deliver data to the CPU.

Different CPU instructions take differing numbers of clock cycles to complete. In many cases, you can use a simple circuit to implement an operation over several clock cycles, or a more complex circuit to do so in fewer.

The most dramatic example of this in the Intel evolution is in the Pentium 4, which was a big outlier in clock speed, but didn't perform proportionally well. The bit-shifting instructions, which in previous chips could shift 32 bits in a single cycle, used a much simpler circuit in the Pentium 4, which required a single cycle for each bit shift. The expectation was that the Pentium 4 architecture would be scalable to much higher clock speeds because of its simplicity, but that didn't work out, and the fast, complex shift circuit returned in the Core and later architectures.

Russell Borogove

Posted 2013-07-08T08:00:31.780

Reputation: 502

From IEEE:

So why not push the clock faster? Because it's no longer worth the cost in terms of power consumed and heat dissipated. Intel calls the speed/power tradeoff a ”fundamental theorem of multicore processors”—and that's the reason it makes sense to use two or more processing areas, or cores, on a single chip.

http://spectrum.ieee.org/computing/hardware/why-cpu-frequency-stalled

Azevedo

Posted 2013-07-08T08:00:31.780

Reputation: 511