|
|
Typically the fastest piece of memory on a CPU is internal flip-flops and buffers. These are not generally accessible to the programmer. Next fastest is the architecture registers. These are the registers that are accessible to the programmer (if he is programming in assembler). Next there is usually at least one cache between those registers and main memory (RAM); often there are two caches, one on-chip (on the same piece of silicon as the rest of the CPU) and one off-chip. When there is more than one cache they are usually called level-1 (for the one nearest the CPU), level-2, level-3 and so on.
The level-1 cache might be between 32 kibibits and 512 kibibits, say. In a 32 bit architecture that would be between 1024 and 16384 words (each word being 32 bits).
For example, an Intel 80486 has an 8-KB on-chip cache, and most Pentiums have a 16-KB on-chip level one cache that consists of an 8-KB instruction cache and an 8-KB data cache.
In general, a given instruction will require the CPU to fetch between one and two words of information, including the instruction itself and any data it may read from or write to main memory. If the required word is not in L1 cache, it is a "cache miss", and the word must be fetched from lower memory before the instruction can continue. Because of the difference in speed between the CPU and main memory, a cache miss can result in a relatively long delay. Much of the effort in cache design, then, is to first reduce the chance of a cache miss, and, second, to reduce the penalty incurred by a miss.
A cache will usually arrange data in "lines" (sometimes also called blocks), a line is a contiguous sequence of words starting at an address that is a multiple of the line size. A line might be 8 words say. So we might have 1024 lines in our cache. A "line" in main memory when accessed will end up as a line in the cache. Usually a cache will transfer data to and from memory in chunks of a whole line, this is more efficient than transferring each word individually. This is both a source of efficiency and inefficiency: if we end up accessing all of the data in a line then bring the line into cache all at once will have been more efficient than bring each word into the cache; on the other hand, if we only access one word from the line then we paid the cost of having to bring the whole line in which will be slower. This inefficiency can be reduced if the requested or "critical" word is the first to be retrieved, though this approach requires more difficult design and hardware.