How does fabrication process and compiler quality affect CPU performance


I was reading up on my lectures and one of the slides listed down the factors affecting CPU performance , I cant understand how does fabrication process and compiler quality affect CPU performance


Compiler quality is the easier one...

Good compilers know how to transalte code to CPU instructions efficiently.

Imagine you have a piece of software which does a simple math equation - say 1+1. A smartly compiled application will tell the CPU to add the numbers, store the answer and job is done. This canbe represented as:

  • Set memory 0 as 1
  • set memory 1 as 1
  • add memory 0 to 1
  • store in memory bank 0 ..simple!

Bad compilers (and I've seen a few!) will do the same thing, but will issue loads of additional instructuins to do the same thing, which reduce performance and slow the application down. The same example:

  • Set memory 0
  • Set Memory 1
  • Set Memory 0 to 0
  • Set memory 1 to 0
  • Set memory 0 to 1
  • set memory 1 to 1
  • recall values from meory 0 and 1
  • add them together
  • store result in memory 0

Now bear in mind that a complex application like a video editor, graphics application, game, even a word processor may need to do hundreds of thousands (if not tens of millions) of operatiosn just to launch! Thats the impace of a good compiler!

Fabrication process is an extension of this in that fabrication is the "gluing" or multiple applications together through shared functions. If these are done well, less computing power is needed to accomplish the same end result.


The quality (optimization ability) of the compiler determines how well the machine code maps to the hardware resources. Compiler optimizations can reduce the amount of work performed (e.g., unrolling a loop can reduce the number of branches, register allocation can reduce the number of memory accesses, inlining can remove call overhead and remove code that is unused by the specific caller), schedule the work to avoid waiting (e.g., scheduling loads earlier so that dependent instructions do not have to wait), exploit specialized instructions that do the work more efficiently (e.g., vectorization can use SIMD instructions), organize memory accesses to exploit cache behavior (e.g., transforming an array of structures into a structure of arrays when inner loops only touch a few members of the structure).

(Some compiler optimizations apply to all or most hardware; others are more specific to particular hardware implementations. Also, even though hardware support for out-of-order execution improves the execution of less well scheduled code, good instruction scheduling can still provide a measurable, if small, benefit.)

Fabrication process determines the energy use, switching speed, and area used by transistors (and similar characteristics of other components). Obviously transistors that switch faster allow for higher performance. Reducing the area per transistor allows more transistors to be used in an economically manufacturable chip (which can be translated into more performance) and can reduce communication time between components (e.g., latency of cache access is constrained by distance and not just transistor switching speed). Energy use constrains performance (to some degree the more power that must be delivered, the more "pins" [solder balls] must be used to deliver that power, reducing the number potentially available for communication off chip to memory, I/O, or other processors; extracting the waste heat also presents an economic limit). Lower switching energy means that more work can be done within a given power budget; lower idle ("leakage") power means that more transistors can be kept powered and ready to do work (this is perhaps particularly important for SRAM which must be always powered to retain state).

Paul A. Clayton

