正在加载图片...
ADDREsS SYSTEM CONTROL FIGURE 88.1 The memory interface. From Eq (88.1)we see that a decrease in the latency will result in an increase in bandwidth, and vice versa, if R is unchanged. We can also see that the bandwidth can be increased by increasing R, if L does not proportionately. For example, we can build a memory system that takes 20 ns to service the access of a single 32-bit word. Its latency is 20 ns per 32-bit word, and its bandwidth is 2 bit 20×10-9sec or 200 Mbytes/s If the memory system is modified to accept a new(still 20 ns)request for a 32-bit word every 5 ns by overlapping results, then its bandwidth 5×10 bits or 800 Mbytes/s. This memory system must be able to handle four requests at a given time uilding an ideal memory system(infinite capacity, zero latency and infinite bandwidth, with affordable cost)is not feasible. The challenge is, given a set of cost and technology constraints, to engineer a memor system whose abilities match the abilities that the processor demands of it. That is, engineering a memory system that performs as close to an ideal memory system( for the given processing unit)as is possible. For a processor that stalls when it makes a memory request(some current microprocessors are in this category),it is important to engineer a memory system with the lowest possible latency. For those processors that can handle nultiple outstanding memory requests(vector processors and high-end CPUs), it is important not only to reduce latency, but also to increase bandwidth(over what is possible by latency reduction alone) by designing a memory system that is capable of servicing multiple requests simultaneously Memory hierarchies provide decreased average latency and reduced bandwidth requirements, whereas parallel or interleaved memories provide higher bandwidth. 88.2 Memory hierarchies Technology does not permit memories that are cheap, large, and fast. By recognizing the nonrandom nature of memory requests, and emphasizing the average rather than worst case latency, it is possible to implement a hierarchical memory system that performs well. A small amount of very fast memory, placed in front of a large, slow memory, can be designed to satisfy most requests at the speed of the small memory. This, in fact, is the rimary motivation for the use of registers in the CPU: in this case, the programmer or compiler makes sure that the most commonly accessed variables are allocated to registers. A variety of techniques, employing either hardware, software, or a combination of the two, can be employed to assure that most memory references are satisfied by the faster memory. The foremost of these techniques is the exploitation of the locality of reference principle. This principle captures the fact that some memory locations are referenced much more frequently than others. Spatial locality is the property that an access to a given memory location greatly increases the probability that neighboring locations will soon be accessed. This is largely, but not exclusively, a result of the tendency to access memory locations sequentially. Temporal locality is the property that an access to a given memory location greatly increases the probability that the same location e 2000 by CRC Press LLC© 2000 by CRC Press LLC From Eq. (88.1) we see that a decrease in the latency will result in an increase in bandwidth, and vice versa, if R is unchanged. We can also see that the bandwidth can be increased by increasing R, if L does not increase proportionately. For example, we can build a memory system that takes 20 ns to service the access of a single 32-bit word. Its latency is 20 ns per 32-bit word, and its bandwidth is or 200 Mbytes/s. If the memory system is modified to accept a new (still 20 ns) request for a 32-bit word every 5 ns by overlapping results, then its bandwidth is or 800 Mbytes/s. This memory system must be able to handle four requests at a given time. Building an ideal memory system (infinite capacity, zero latency and infinite bandwidth, with affordable cost) is not feasible. The challenge is, given a set of cost and technology constraints, to engineer a memory system whose abilities match the abilities that the processor demands of it. That is, engineering a memory system that performs as close to an ideal memory system (for the given processing unit) as is possible. For a processor that stalls when it makes a memory request (some current microprocessors are in this category), it is important to engineer a memory system with the lowest possible latency. For those processors that can handle multiple outstanding memory requests (vector processors and high-end CPUs), it is important not only to reduce latency, but also to increase bandwidth (over what is possible by latency reduction alone) by designing a memory system that is capable of servicing multiple requests simultaneously. Memory hierarchies provide decreased average latency and reduced bandwidth requirements, whereas parallel or interleaved memories provide higher bandwidth. 88.2 Memory Hierarchies Technology does not permit memories that are cheap, large, and fast. By recognizing the nonrandom nature of memory requests, and emphasizing the average rather than worst case latency, it is possible to implement a hierarchical memory system that performs well. A small amount of very fast memory, placed in front of a large, slow memory, can be designed to satisfy most requests at the speed of the small memory. This, in fact, is the primary motivation for the use of registers in the CPU: in this case, the programmer or compiler makes sure that the most commonly accessed variables are allocated to registers. A variety of techniques, employing either hardware, software, or a combination of the two, can be employed to assure that most memory references are satisfied by the faster memory. The foremost of these techniques is the exploitation of the locality of reference principle. This principle captures the fact that some memory locations are referenced much more frequently than others. Spatial locality is the property that an access to a given memory location greatly increases the probability that neighboring locations will soon be accessed. This is largely, but not exclusively, a result of the tendency to access memory locations sequentially. Temporal locality is the property that an access to a given memory location greatly increases the probability that the same location FIGURE 88.1 The memory interface. 32 20 10 9 ¥ – sec bits 32 5 10 9 ¥ – sec bits
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有