Running programs in cache and registers

In my operating systems class we were shown a picture depicting a hierarchy of memory starting from most expensive and fastest at the top and least expensive and slowest at the bottom. At the very top was registers and underneath it was cache. The professor said that the best place to run programs is in cache. I was wondering why programs can't be run in registers? Also, how can a program load itself into cache? Isn't the cache something that's controlled by the CPU and works automatically without software control?

tony_sid

Posted 2010-09-03T18:34:28.877

Reputation: 11 651

Answers

This is a very complicated question, expect a few answers as people better the responses of others :)

The professor said that the best place to run programs is in cache.

Remember that cache is MANY times more expensive than normal RAM. Back when a 'big' computer was 8MB (not gigs, megabytes), you could find machines were all 'cache' (it's technically a special type of RAM called SRAM) but they were more expensive. Now, you have home machines with 4GB of memory, 4GB of SRAM wired to the chip would be VERY expensive. Besides, you have many smart folks playing with programs and compilers to make the best use of cache. With the right caching algorithm, You get 95% of the benefit of cache, with a small percentage of the cost. Of course, the guesses aren't always right. Google 'branch prediction' for more info.

I was wondering why programs can't be run in registers?

Registers are what's actually to load and store data and addresses. Think of them as taxis. They can deliver things back and forth, what they deliver is your program data and addresses. Every part of your program that's 'run' goes through a register.

I'm assuming you're asking why you can't just run completely from registers. One reason - there's so few of them. Classic intel x86 registers are counted in bytes, but the programs are in Megabytes, Gigabytes. You'd be quite a rich person to have a chip that could run MS-Word out of registers.

Also, how can a program load itself into cache?

The program doesn't. The OS runs the program, and uses the Memory Management Unit chip to load the areas of program from normal RAM. While it does that, the MMU is smart and puts some of the memory also in cache, with the idea that I just used it, I may need to use it again soon.

Isn't the cache something that's controlled by the CPU and works automatically without software control?

Yes, technically the memory management chip not the CPU. This used to be a separate chip, but now is part of the CPU block, to make communication faster.

Rich Homolka

Posted 2010-09-03T18:34:28.877

Reputation: 27 121

Your programs right now are taking turns using the registers and the caches, probably under the direction of your OS kernel.

If all you want your program to do is take a number and add one to it over and over again, you could probably do all that in the registers. The registers are very small, storing one number each, and the common x86 processor has 16 of them (8 integers and 8 floating points).

Similarly, if you have a small program that will fit in the cache (and the OS doesn't need to intermittently swap it out to do other things), it will be run from the cache.

Most software programs these days are much bigger than the cache. And you are asking your computer to be working on many things at once, like updating the clock and keeping your drive indexed, or drawing this webpage. That means that many times a second it needs to swap the next thing to work on into the cache so it can work on it a bit (known as a switching contexts).

You can read some more about caches and registers.

yhw42

Posted 2010-09-03T18:34:28.877

Reputation: 2 161

Your programs are run from registers! They are also running from cache. All of these things help to make your computer run faster. The biggest limiting factor is size. There are very limited CPU registers. The typical x86 machine has only 8 32-bit registers which the CPU uses to store the data it is working on. As you know access to register is very fast, however because of the limited size, little data can be stored in the registers.

Cache is similar, in that it is limited in size. Smaller caches (L1 for example) are first checked by the CPU for data, if the data is not found in that cache, it then checks in subsequent caches (L2, L3, etc). Each level of cache gets progressively bigger and slower to access. If by the end of checking all of the caches the data is still not found the CPU must pull the data from RAM.

Software applications typically don't have explicit control over what gets put in the registers or in cache unless the application is a low-level driver or similar application.

heavyd

Posted 2010-09-03T18:34:28.877

Reputation: 54 755

I was wondering why programs can't be run in registers?

Most instruction set architectures (ISAs) do not support indirection in registers. I.e., the register addresses are encoded as constants in the instruction. (This limitation significantly simplifies pipelining.)

Caches also have the advantage of being microarchitectural features--i.e., the size and other characteristics are only visible to software in terms of performance. This allows different implementations of an ISA to use different sizes et al. without loss of binary compatibility, e.g., for different performance or application targets or to adjust for changes in the balance of tradeoffs from changes in manufacturing technology.

In addition, as the number of software visible registers grows, the benefit of compiler management tends to decrease relative to the cost in compiler complexity and compilation time, especially for programs with complex control flow. If multiple levels of registers are used (as in the Cray-1) to allow a fast small group of registers, compiler register allocation complexity is increased.

Furthermore, the size of a general purpose register is typically set by the size of the address space, whereas the size of a cache line (the comparable item for caches) involves tradeoffs of tag overhead, expected spatial locality of reference (larger cache lines can effectively prefetch nearby data but the bandwidth and storage is wasted if the data is not used while it is in the cache), bandwidth considerations (longer memory bursts are more bandwidth efficient--making larger cache lines more attractive--and cache coherence traffic is influenced by line size), false sharing (where one processor writes to a location near a location another processor will read; large cache lines introduce false communication dependencies), etc.

Also, how can a program load itself into cache? Isn't the cache something that's controlled by the CPU and works automatically without software control?

The cache is controlled by the CPU, but software can explicitly prefetch items into the cache (and the Itanium ISA supports non-temporal-at-cache-level_N hints to help hardware better manage cache allocation and replacement). Ordinary memory accesses will load into the L1 cache the requested item and items within the naturally aligned cache line and that cache line will usually--excluding coherence invalidations, e.g.--remain present until the CPU chooses to replace it with another another cache line of data requested by another--possibly speculative--memory access or hardware prefetch; so software does have some ability to manage cache contents.

Software can also exploit knowledge of specific characteristics of a cache system (cache-conscious algorithms and data structures) or the general characteristics shared by most cache systems (cache-oblivious algorithms) to reduce the frequency of cache misses (where items need to be retrieved from memory or a level of cache nearer memory).

Paul A. Clayton

Posted 2010-09-03T18:34:28.877

Reputation: 1 153