Cache Memory
1 Cache Memory
Outline General concepts 3 ways to organize cache memory Issues with writes Write cache friendly codes Cache mountain Suggested Reading: 6.4, 6.5, 6.6
2 Outline • General concepts • 3 ways to organize cache memory • Issues with writes • Write cache friendly codes • Cache mountain • Suggested Reading: 6.4, 6.5, 6.6
6.4 Cache Memories
3 6.4 Cache Memories
Cache Memory History At very beginning 3 levels Registers, main memory, disk storage 10 years later, 4 levels Register, SRAM cache, main DRAM memory, disk storage Modern processor, 4-5 levels Registers, SRAM L1, L2(, L3)cache, main DRAM memory, disk storage Cache memories are small fast SRAM-based memories are managed by hardware automatically can be on-chip, on-die, off-chi
4 Cache Memory • History – At very beginning, 3 levels • Registers, main memory, disk storage – 10 years later, 4 levels • Register, SRAM cache, main DRAM memory, disk storage – Modern processor, 4~5 levels • Registers, SRAM L1, L2(,L3) cache, main DRAM memory, disk storage – Cache memories • are small, fast SRAM-based memories • are managed by hardware automatically • can be on-chip, on-die, off-chip
Cache Memory Figure 6.24 P488 CPU chip register Tile L1 ALU cache 三 cache bus system bus memory bus main L2 cache bus interface bridge memory
5 Cache Memory Figure 6.24 P488 main memory I/O bridge L2 cache bus interface ALU register file CPU chip cache bus system bus memory bus L1 cache
Cache Memory L1 cache is on-chip L2 cache is off-chip several years ago L3 cache can be off-chip or on-chi CPU looks first for data in L1, then in l2 then In main memory Hold frequently accessed blocks of main memory are in caches
6 Cache Memory • L1 cache is on-chip • L2 cache is off-chip several years ago • L3 cache can be off-chip or on-chip • CPU looks first for data in L1, then in L2, then in main memory – Hold frequently accessed blocks of main memory are in caches
Inserting an L1 cache between the CPU and main memory The tiny, very fast CPU register file has room for four 4-byte words The transfer unit between the CPU register file and the cache is a 4-byte block line o The small fast L1 cache has room line 1 for two 4-word blocks The transfer unit between the cache and main memory is a 4-word block (16 bytes) block 10 a b cd The big slow main memory block 21pqrs has room for many 4-word blocks block 30wxyz
7 Inserting an L1 cache between the CPU and main memory block 10 a b c d block 21 p q r s ... ... block 30 w x y z ... The big slow main memory has room for many 4-word blocks. The small fast L1 cache has room for two 4-word blocks. The tiny, very fast CPU register file has room for four 4-byte words. The transfer unit between the cache and main memory is a 4-word block (16 bytes). The transfer unit between the CPU register file and the cache is a 4-byte block. line 0 line 1
6.4.1 Generic Cache Memory Organization Figure 6.25 P488 1 valid bit t tag bits B=2b bytes Cache is an array per line per line per cache block of sets va tag 0 ···B Each set contains E lines set 0: one or more lines valida[01· per set ···B Each line holds a block of data valid[tago1|…|B B-1 set 1: S〓2 s sets valid[tag[o1|· ···B valid tad 01…B-1 set S-1 valid tag 0 B pp488
8 6.4.1 Generic Cache Memory Organization Figure 6.25 P488 0 1 • • • B–1 0 1 • • • B–1 valid valid tag tag set 0: B = 2b bytes per cache block E lines per set S = 2s sets t tag bits per line 1 valid bit per line • • • 0 1 • • • B–1 0 1 • • • B–1 valid valid tag tag set 1: • • • 0 1 • • • B–1 0 1 • • • B–1 valid valid tag tag set S-1: • • • • • • Cache is an array of sets. Each set contains one or more lines. Each line holds a block of data. pp.488
Addressing caches Figure 6.25 P488 Address a t bits s bits b bits ][tag[011…B-1 set 0: 01 √][tag[o1…l-1 set 1: √囫g[01…-1 The word at address a is in the cache if the tag bits in one of the lines in [tag[01…1 set <set indexs match <tag. set s-1 vtag 01.B-The word contents begin at offset <block offset bytes from the beginning of the block
9 Addressing caches Figure 6.25 P488 t bits s bits b bits m-1 0 Address A: 0 1 • • • B–1 0 1 • • • B–1 v v tag tag set 0: • • • 0 1 • • • B–1 0 1 • • • B–1 v v tag tag set 1: • • • 0 1 • • • B–1 0 1 • • • B–1 v v tag tag set S-1: • • • • • • The word at address A is in the cache if the tag bits in one of the lines in set match . The word contents begin at offset bytes from the beginning of the block
Cache Memory Fundamental parameters Parameters Descriptions s=25 Number of sets E Number of lines per set B=2b Block size(bytes) m=log2(M)Number of physical(main memory) address bits
10 Cache Memory Fundamental parameters Parameters Descriptions S = 2s E B=2b m=log2 (M) Number of sets Number of lines per set Block size(bytes) Number of physical(main memory) address bits