The Incredible
Ambassador of Buzz
Do anyone know what's L1, L2 & L3 cache?
Wikipedia said:The second issue is the fundamental tradeoff between cache latency and hit rate. Larger caches are both slower and have better hit rates. To ameliorate this tradeoff, many computers use multiple levels of cache, with small fast caches backed up by larger slower caches. As the latency difference between main memory and the fastest cache has become larger, some processors have begun to utilize as many as three levels of on-chip cache. For example, in 2003, Itanium II began shipping with a 6MB unified level 3 cache on-chip. The IBM Power 4 series has a 256MB level 3 cache off chip, shared among several processors.
Multi-level caches generally operate by checking the smallest Level 1 cache first; if it hits, the processor proceeds at high speed. If the smaller cache misses, the next larger cache is checked, and so on, before main memory is checked.
Multi-level caches introduce new design decisions. For instance, in some processors (like the Intel Pentium 2, 3, and 4, as well as most RISCs), the data in the L1 cache may also be in the L2 cache. These caches are called inclusive. Other processors (like the AMD Athlon) have exclusive caches — data is guaranteed to be in at most one of the L1 and L2 caches.
The advantage of exclusive caches is that they store more data. This advantage is larger with larger caches. When the L1 misses and the L2 hits on an access, the hitting cache line in the L2 is exchanged with a line in the L1. This exchange is quite a bit more work than just copying a line from L2 to L1, which is what an inclusive cache does.
Some implementations of inclusive caches guarantee that all data in the L1 cache is also in the L2 cache (Intel x86 implementations do not). One advantage of strictly inclusive caches is that when external devices or other processors in a multiprocessor system wish to remove a cache line from the processor, they need only have the processor check the L2 cache. In cache hierarchies which do not enforce inclusion, the L1 cache must be checked as well. As a drawback, there is a correlation between the associativities of L1 and L2 caches: if the L2 cache does not have at least as much ways as all L1 caches together, the effective associativity of the L1 caches is restricted.
Another advantage of inclusive caches is that the larger cache can use larger cache lines, which reduces the size of the secondary cache tags. If the secondary cache is an order of magnitude larger than the primary, and the cache data is an order of magnitude larger than the cache tags, this tag area saved can be comparable to the incremental area needed to store the L1 cache data in the L2.
As mentioned above, larger computers sometimes have another cache between the L2 cache and main memory called an L3 cache. This cache is generally implemented on a separate chip from the CPU, and, as of 2004, may range in size from 2 to 256 megabytes. This cache will generally cost well in excess of $1000 to implement, and its benefits are seen mostly on large data sets not typically found on PCs. The cost is generally why processors for PCs do not have this cache.
Finally, at the other end of the memory hiearchy, the CPU register file itself can be considered the smallest, fastest cache in the system, with the special characteristic that it is scheduled in software -- typically by a compiler, as it allocates registers to hold values retrieved from main memory.