REVISED PRINTING COMPUTER ORGANIZATION AND DESIGN THE HARDWARE / SOFTWARE INTERFACE OUT REVISED 4 PRINTING DAVID A. PATTERSON JOHN L. HENNESSY MK OKAUTHAN
ARITHMETIC CORE INSTRUCTION SET ① OPCODE oya8o1 MIPS /FMT/FT Reference Data FOR- FUNGT NAME,MNEMONIC MAT OPERATION (Hex) CORE INSTRUCTION SET OPCODE Branch On FP True belt FI ifFPcondPC-PC+4+BranchAddr (4) 118/1/- FOR /FUNCT Branch On FP False beit if!FPeond)PC-PC+4+BranchAddr(4) 11/80/- NAME,MNEMONIC MAT (Hex) Divide div Lo-R[rsVR[rt]:Hi-R[rs]%R[rt] 0-1a pue g OPERATION (in Verilog) Divide Unsigned Add add R RIrd]■Rs+Rtl (1)0/20 divu R Lo-R[rsVR[rt]:Hi-R[rs]%R(rt] 60M1b FP Add Single add.s FR 11/10-0 Add Immediate add! R[rt]-R[rs]+SignExtlmm (1,2) F[fd ]F[fs]+F[ft) FPAdd add.d FR {F,Fl叶1》-F国.FE+1}+ Add Imm.Unsigned addiu Rfrt]-R[rs]+SignExtlmm (2 Double 11/11-0 F[,F+1} Add Unsigned addu R Rfrd]-Rfrs]+Rfrt] 0/21be FP Compare Single e.x.s FR FPcond -(F[fs]op F[ft])?1:0 11/10-y suunjoo)apis wonoq plod' And and R Rfrd]-Rfrs]Rfrt) 0 /24pe FP Compare Double FR eoad-+31:0 11/1-y And Immediate Rfrt]=Rfrs]ZeroExtlmm Chex is ag.it,or 1o)(op is -,or >shamt Rfrd]-R[s]|R[rt] 0--3 or R Store FP Single M[R[rs]+SignExtlmm]-FIrt] (2)39-/- Or Immediate R(rt]-R[rs]ZeroExtlmm (3) Store FP sdel MR[rs+SignExtlmm]-FIrt]: 2)3dw-/-- Set Less Than slt R Rd-(Rs>shamt 0/02ea PSEUDOINSTRUCTION SET Store Byte M[R(rs}+SignExtlmm](7:0)- NAME MNEMONIC OPERATION sb R470 (2) Branch Less Than bIt ifRfrs]R[rt])PC-Label R可-(atomic)?i:0 2,7) 38bex Branch Less Than or Equal ble iRs]小-R(r])PC=Label Branch Greater Than or Equal bge if(R[rs]-R(rt])PC-Label Store Halfword M[R(rs]+SignExtlmml(15:0)= sh R15:0) (2) 29bex Load Immediate 11 RIrd-irmmedsate Move Rfrd]-R[rs] uaa5) Store Word SW I MIR(rs]+SignExtlmm]-Rfrt] (2) REGISTER NAME,NUMBER,USE.CALL CONVENTION Subtract sub RRd-R[s图-R[ (1)0/22e 0/23be NAME NUMBER USE PRESERVEDACROSS Subtract Unsigned subu R Rfrd]-R[rs]-R(rt] A CALL? pe2 (1)May cause overflow exception 5他t0 The Constant value 0 N.A. (2)SignExtlmm -{16(immediate[15]),immediate Sat Assembler Temporary No (3)ZeroExtlmm16{1b'0),immediate (4)BranchAddr(14(immediate[15]),immediate,2'b0 Svo-SvI 2-3 Values for Function Results Ne (5)JumpAddr=fP℃+431:28l.ad山es线.2"b01 and Expression Evaluation (6)Operands considered unsigned numbers (vs.2's comp.) 47 Arguments No (7)Atomic test&set pair;Rfrt]=I if pair atomic.0 if not atomic STFS17 813 Temporaries No BASIC INSTRUCTION FORMATS 5S057 16-23 Saved Temporaries 24-25 Temporaries No EL shamt Sk0-Sk1 26-27 Reserved for OS Kemel No 2120 6 5 65 28 Global Pointer Yes opcode immediate Ssp 29 Stack Pointer Yes 2625 212 615 30 上tame pointer Yes d Retumn Adress Yes Copyright 2009 by Elseviet,Inc..All rights reserved.From Patterson and Hennessy,Computer Organization and Design,4th ed
M I P S Reference Data BASIC INSTRUCTION FORMATS REGISTER NAME, NUMBER, USE, CALL CONVENTION CORE INSTRUCTION SET OPCODE NAME, MNEMONIC FORMAT OPERATION (in Verilog) / FUNCT (Hex) Add add R R[rd] = R[rs] + R[rt] (1) 0 / 20hex Add Immediate addi I R[rt] = R[rs] + SignExtImm (1,2) 8hex Add Imm. Unsigned addiu I R[rt] = R[rs] + SignExtImm (2) 9hex Add Unsigned addu R R[rd] = R[rs] + R[rt] 0 / 21hex And and R R[rd] = R[rs] & R[rt] 0 / 24hex And Immediate andi I R[rt] = R[rs] & ZeroExtImm (3) chex Branch On Equal beq I if(R[rs]==R[rt]) PC=PC+4+BranchAddr (4) 4hex Branch On Not Equal bne I if(R[rs]!=R[rt]) PC=PC+4+BranchAddr (4) 5hex Jump j J PC=JumpAddr (5) 2hex Jump And Link jal J R[31]=PC+8;PC=JumpAddr (5) 3hex Jump Register jr R PC=R[rs] 0 / 08hex Load Byte Unsigned lbu I R[rt]={24’b0,M[R[rs] +SignExtImm](7:0)} (2) 24hex Load Halfword Unsigned lhu I R[rt]={16’b0,M[R[rs] +SignExtImm](15:0)} (2) 25hex Load Linked ll I R[rt] = M[R[rs]+SignExtImm] (2,7) 30hex Load Upper Imm. lui I R[rt] = {imm, 16’b0} fhex Load Word lw I R[rt] = M[R[rs]+SignExtImm] (2) 23hex Nor nor R R[rd] = ~ (R[rs] | R[rt]) 0 / 27hex Or or R R[rd] = R[rs] | R[rt] 0 / 25hex Or Immediate ori I R[rt] = R[rs] | ZeroExtImm (3) dhex Set Less Than slt R R[rd] = (R[rs] > shamt 0 / 02hex Store Byte sb I M[R[rs]+SignExtImm](7:0) = R[rt](7:0) (2) 28hex Store Conditional sc I M[R[rs]+SignExtImm] = R[rt]; R[rt] = (atomic) ? 1 : 0 (2,7) 38hex Store Halfword sh I M[R[rs]+SignExtImm](15:0) = R[rt](15:0) (2) 29hex Store Word sw I M[R[rs]+SignExtImm] = R[rt] (2) 2bhex Subtract sub R R[rd] = R[rs] - R[rt] (1) 0 / 22hex Subtract Unsigned subu R R[rd] = R[rs] - R[rt] 0 / 23hex (1) May cause overflow exception (2) SignExtImm = { 16{immediate[15]}, immediate } (3) ZeroExtImm = { 16{1b’0}, immediate } (5) JumpAddr = { PC+4[31:28], address, 2’b0 } (7) Atomic test&set pair; R[rt] = 1 if pair atomic, 0 if not atomic R opcode rs rt rd shamt funct 31 26 25 21 20 16 15 11 10 6 5 0 I opcode rs rt immediate 31 26 25 21 20 16 15 0 J opcode address 31 26 25 0 ARITHMETIC CORE INSTRUCTION SET OPCODE NAME, MNEMONIC FORMAT OPERATION / FMT /FT / FUNCT (Hex) Branch On FP True bc1t FI if(FPcond)PC=PC+4+BranchAddr (4) 11/8/1/-- Branch On FP False bc1f FI if(!FPcond)PC=PC+4+BranchAddr(4) 11/8/0/-- Divide div R Lo=R[rs]/R[rt]; Hi=R[rs]%R[rt] 0/--/--/1a Divide Unsigned divu R Lo=R[rs]/R[rt]; Hi=R[rs]%R[rt] (6) 0/--/--/1b FP Add Single add.s FR F[fd ]= F[fs] + F[ft] 11/10/--/0 FP Add Double add.d FR {F[fd],F[fd+1]} = {F[fs],F[fs+1]} + {F[ft],F[ft+1]} 11/11/--/0 FP Compare Single c.x.s* FR FPcond = (F[fs] op F[ft]) ? 1 : 0 11/10/--/y FP Compare Double c.x.d* FR FPcond = ({F[fs],F[fs+1]} op {F[ft],F[ft+1]}) ? 1 : 0 11/11/--/y * (x is eq, lt, or le) (op is ==, > shamt 0/--/--/3 Store FP Single swc1 I M[R[rs]+SignExtImm] = F[rt] (2) 39/--/--/-- Store FP Double sdc1 I M[R[rs]+SignExtImm] = F[rt]; (2) M[R[rs]+SignExtImm+4] = F[rt+1] 3d/--/--/-- FR opcode fmt ft fs fd funct 31 26 25 21 20 16 15 11 10 6 5 0 FI opcode fmt ft immediate 31 26 25 21 20 16 15 0 NAME MNEMONIC OPERATION Branch Less Than blt if(R[rs]R[rt]) PC = Label Branch Less Than or Equal ble if(R[rs]=R[rt]) PC = Label Load Immediate li R[rd] = immediate Move move R[rd] = R[rs] NAME NUMBER USE PRESERVED ACROSS A CALL? $zero 0 The Constant Value 0 N.A. $at 1 Assembler Temporary No $v0-$v1 2-3 Values for Function Results and Expression Evaluation No $a0-$a3 4-7 Arguments No $t0-$t7 8-15 Temporaries No $s0-$s7 16-23 Saved Temporaries Yes $t8-$t9 24-25 Temporaries No $k0-$k1 26-27 Reserved for OS Kernel No $gp 28 Global Pointer Yes $sp 29 Stack Pointer Yes $fp 30 Frame Pointer Yes $ra 31 Return Address Yes 1 2 MIPS Reference Data Card (“Green Card”) 1. Pull along perforation to separate card 2. Fold bottom side (columns 3 and 4) together FLOATING-POINT INSTRUCTION FORMATS PSEUDOINSTRUCTION SET Copyright 2009 by Elsevier, Inc., All rights reserved. From Patterson and Hennessy, Computer Organization and Design, 4th ed. (4) BranchAddr = { 14{immediate[15]}, immediate, 2’b0 } (6) Operands considered unsigned numbers (vs. 2 s comp.) ’ >
① IEEE 754 FLOATING-POINT ▣ OPCODES.BASE CONVERSION.ASCII SYMBOLS STANDARD IEEE 754 Symbols MIPS (1)MIPS (2)MIPS opeode Deci.Hexa.ASCII Deei.Hexa.ASCIT funct futct Binary dec.Char. dech Char 0 生0 (3126 (5:0) (50 mal mal (-1)x(1+Fraction)x 2(Exponent-Bias) 0 acter 士Denorm a11 a8d. 00000D0 0 O NUL 64 40 where Single Precision Bias-127. 0 00000D1 SOH 65 41 Double Precision Bias-1023. T to MAX-I anything F.Pt.Num. arl 000010 66 MAX 0 生o 1a1 ata 000011 3 3 上 A3 IEEE Single Precision and MAX 主0 NaN pue sllv Ud】[U 上 08 5 9 Double Precision Formats: S.P.MAX -255.D.P.MAX -2047 abs./ 000I01 ENO 000110 ACK 7 S Exponent Fraction bgt需 srav 000111 67 BEL 71 47 3130 232 0o00U B HT 7 s Exponent Fraction 74 632 a1七1 5251 OVZ a1r九rn 0010I1 b VT 75 4b MEMORY ALLOCATION STACK FRAME andi Found.w./)1100 Stack Higher 0¥1 00110 CR 77 d Argument 6 Memory 见后三1 c811.f 001110 14 78 Argument S Addresses 1u1 r1oor.w/001111 15 79 r DLE (2) 勇h 010001 I 81 aved Registers Stack novz. Dynamic Data 0I0o10 18 12 DC2 82 Grows movn./ 010011 9 Sgp+10008s000ea 1 DC3 Statie Data 10000000h ocal Variables 010101 21 15 NAK 010110 22 SYN Text 010111 pe00400000hex Lower u1七 U00 2 18 CAN 38 Memory Addresses 周B1七u 011001 )气 19 EM 59 Reserved div 011010 26 SUB 90 divu 011011 91 DATA ALIGNMENT O1100 28 FS 92 93 Double Word I 3 e 94 Word Word 011111 31 US 95 1 Halfword Halfword Halfword Halfword 1b cvt.a了 ④ addu cvt.df 10000 33 21 ByteByte Byte Byte Byte Byte Byte Byte 1w1 线o 1000I0 10011 35 9 63 Value of three least significant bits of byte address(Big Endian) cvt.w./ 00I00 0而 EXCEPTION CONTROL REGISTERS:CAUSE AND STATUS ox 100I01 101 B Interrupt Exception 10D110 诏 & 102 6 100111 103 6 D Mask Code 5101e b 31 sh 101001 41 29 105 69 Pending U EI sIt 101010 106 6 sItu 101011 Interrupt 3 M LE I= 108 BD-Branch Delay,UM-User Mode.EL-Exception Level,IE -Interrupt Enable 101101 45 109 6d EXCEPTION CODES WE 101110 46 2e 110 111 f Number Name Number Name 101111 Cause of Exception Cause of Exception 11 ctr 1000 48 30 0 112 70 0 Int Bp Breakpoint Exception c.un/ 9 31 113 Aderess frror exception AdEL 10 RI Reserved Instruction c.g./ 110010 14 (load or instruction fetch) Exception pror tItu c.ueg 110011 51 115 73 5 AdEs Address上rot上Excep0n 116 11 CpU Coprocessor c.olt. (store) mplemented 41 c.ult. 110101 53 5 17 15 IBE Bus Error on 12 Ov 1g2 tne 0.010 110110 6 118 6 6 Instruction Fetch Exception c.ulaf 110111 车写 119 7 c.st 36 DBE Bus Error on 13 Tr Load or Store Trap swcl 111001 9 121 8 58 Sys Syscall Exception 15 FPE Floating Point Exception 111010 3a 122 7a c.ngi/ 111011 123 SIZE PREFIXES (10%for Disk,Communication:2*for Memory) 125 7d PRE. PRE. PRE. PRE. 111101 61 SIE 7 c.1/ 111110 62 3 126 FIX FIX SIZE FIX FIX c.ng 111111 6 27 DEL 103,20Klo 105,20Peta- 10 milli- femto- (1)opcode(31F26)=0 10,220Mega 101,2Exa 106 micro- 10客 (2)opcode(31:26)--17n (1lhex):if fmt(25:21) 16in (10pex)s (single): 10.20Gga 1021,270 Zetta. 109a0- 1027 if fmnt(25:21)-17n()f-d (double) 102 240 Tera- 10,20Yota 0 yocto- The symbol for each prefix is just its first letter,except u is used for micro. Copyright 2009 by Elsevier,Ine.,All rights reserved.From Patterson and Hennessy.Computer Organization and Design,4th ed. SaIW
... Argument 6 Argument 5 Saved Registers Local Variables OPCODES, BASE CONVERSION, ASCII SYMBOLS (1) opcode(31:26) == 0 (2) opcode(31:26) == 17ten (11hex); if fmt(25:21)==16ten (10hex) f = s (single); if fmt(25:21)==17ten (11hex) f = d (double) STANDARD (-1)S × (1 + Fraction) × 2(Exponent - Bias) where Single Precision Bias = 127, Double Precision Bias = 1023. IEEE Single Precision and Double Precision Formats: MEMORY ALLOCATION $sp 7fff fffchex $gp 1000 8000hex 1000 0000hex pc 0040 0000hex 0hex DATA ALIGNMENT EXCEPTION CONTROL REGISTERS: CAUSE AND STATUS EXCEPTION CODES SIZE PREFIXES (10x for Disk, Communication; 2x for Memory) The symbol for each prefix is just its first letter, except μ is used for micro. MIPS opcode (31:26) (1) MIPS funct (5:0) (2) MIPS funct (5:0) Binary Decimal Hexadecimal ASCII Character Decimal Hexadecimal ASCII Character (1) sll add.f 00 0000 0 0 NUL 64 40 @ sub.f 00 0001 1 1 SOH 65 41 A j srl mul.f 00 0010 2 2 STX 66 42 B jal sra div.f 00 0011 3 3 ETX 67 43 C beq sllv sqrt.f 00 0100 4 4 EOT 68 44 D bne abs.f 00 0101 5 5 ENQ 69 45 E blez srlv mov.f 00 0110 6 6 ACK 70 46 F bgtz srav neg.f 00 0111 7 7 BEL 71 47 G addi jr 00 1000 8 8 BS 72 48 H addiu jalr 00 1001 9 9 HT 73 49 I slti movz 00 1010 10 a LF 74 4a J sltiu movn 00 1011 11 b VT 75 4b K andi syscall round.w.f 00 1100 12 c FF 76 4c L ori break trunc.w.f 00 1101 13 d CR 77 4d M xori ceil.w.f 00 1110 14 e SO 78 4e N lui sync floor.w.f 00 1111 15 f SI 79 4f O mfhi 01 0000 16 10 DLE 80 50 P (2) mthi 01 0001 17 11 DC1 81 51 Q mflo movz.f 01 0010 18 12 DC2 82 52 R mtlo movn.f 01 0011 19 13 DC3 83 53 S 01 0100 20 14 DC4 84 54 T 01 0101 21 15 NAK 85 55 U 01 0110 22 16 SYN 86 56 V 01 0111 23 17 ETB 87 57 W mult 01 1000 24 18 CAN 88 58 X multu 01 1001 25 19 EM 89 59 Y div 01 1010 26 1a SUB 90 5a Z divu 01 1011 27 1b ESC 91 5b [ 01 1100 28 1c FS 92 5c \ 01 1101 29 1d GS 93 5d ] 01 1110 30 1e RS 94 5e ^ 01 1111 31 1f US 95 5f _ lb add cvt.s.f 10 0000 32 20 Space 96 60 ‘ lh addu cvt.d.f 10 0001 33 21 ! 97 61 a lwl sub 10 0010 34 22 " 98 62 b lw subu 10 0011 35 23 # 99 63 c lbu and cvt.w.f 10 0100 36 24 $ 100 64 d lhu or 10 0101 37 25 % 101 65 e lwr xor 10 0110 38 26 & 102 66 f nor 10 0111 39 27 ’ 103 67 g sb 10 1000 40 28 ( 104 68 h sh 10 1001 41 29 ) 105 69 i swl slt 10 1010 42 2a * 106 6a j sw sltu 10 1011 43 2b + 107 6b k 10 1100 44 2c , 108 6c l 10 1101 45 2d - 109 6d m swr 10 1110 46 2e . 110 6e n cache 10 1111 47 2f / 111 6f o ll tge c.f.f 11 0000 48 30 0 112 70 p lwc1 tgeu c.un.f 11 0001 49 31 1 113 71 q lwc2 tlt c.eq.f 11 0010 50 32 2 114 72 r pref tltu c.ueq.f 11 0011 51 33 3 115 73 s teq c.olt.f 11 0100 52 34 4 116 74 t ldc1 c.ult.f 11 0101 53 35 5 117 75 u ldc2 tne c.ole.f 11 0110 54 36 6 118 76 v c.ule.f 11 0111 55 37 7 119 77 w sc c.sf.f 11 1000 56 38 8 120 78 x swc1 c.ngle.f 11 1001 57 39 9 121 79 y swc2 c.seq.f 11 1010 58 3a : 122 7a z c.ngl.f 11 1011 59 3b ; 123 7b { c.lt.f 11 1100 60 3c 126 7e ~ c.ngt.f 11 1111 63 3f ? 127 7f DEL S Exponent Fraction 31 30 23 22 0 S Exponent Fraction 63 62 52 51 0 Double Word Word Word Byte Byte Byte Byte Byte Byte Byte Byte 0 1 2 3 4 5 6 7 Value of three least significant bits of byte address (Big Endian) B D Interrupt Mask Exception Code 31 15 8 6 2 Pending Interrupt U M E L I E 15 8 4 1 0 Number Name Cause of Exception Number Name Cause of Exception 0 Int Interrupt (hardware) 9 Bp Breakpoint Exception 4 AdEL Address Error Exception (load or instruction fetch) 10 RI Reserved Instruction Exception 5 AdES Address Error Exception (store) 11 CpU Coprocessor Unimplemented 6 IBE Bus Error on Instruction Fetch 12 Ov Arithmetic Overflow Exception 7 DBE Bus Error on Load or Store 13 Tr Trap 8 Sys Syscall Exception 15 FPE Floating Point Exception SIZE PREFIX SIZE PREFIX SIZE PREFIX SIZE PREFIX 103, 210 Kilo- 1015, 250 Peta- 10-3 milli- 10-15 femto- 106, 220 Mega- 1018, 260 Exa- 10-6 micro- 10-18 atto- 109, 230 Giga- 1021, 270 Zetta- 10-9 nano- 10-21 zepto- 1012, 240 Tera- 1024, 280 Yotta- 10-12 pico- 10-24 yocto- 3 Stack Dynamic Data Static Data Text Reserved IEEE 754 Symbols S.P. MAX = 255, D.P. MAX = 2047 Exponent Fraction Object 0 0 ± 0 0 ≠0 ± Denorm 1 to MAX - 1 anything ± Fl. Pt. Num. MAX 0 ±∞ MAX ≠0 NaN STACK FRAME Higher Memory Addresses Lower Memory Addresses Stack Grows $sp $fp 4 MIPS Reference Data Card (“Green Card”) 1. Pull along perforation to separate card 2. Fold bottom side (columns 3 and 4) together IEEE 754 FLOATING-POINT Halfword Halfword Halfword Halfword BD = Branch Delay, UM = User Mode, EL = Exception Level, IE =Interrupt Enable Copyright 2009 by Elsevier, Inc., All rights reserved. From Patterson and Hennessy, Computer Organization and Design, 4th ed
In Praise of Computer Organization and Design:The Hardware/ Software Interface,Revised Fourth Edition "Patterson and Hennessy not only improve the pedagogy of the traditional mate- rial on pipelined processors and memory hierarchies,but also greatly expand the multiprocessor coverage to include emerging multicore processors and GPUs.The fourth edition of Computer Organization and Design sets a new benchmark against which all other architecture books must be compared." -David A.Wood,University of Wisconsin-Madison "Patterson and Hennessy have greatly improved what was already the gold stan- dard of textbooks.In the rapidly evolving field of computer architecture,they have woven an impressive number of recent case studies and contemporary issues into a framework of time-tested fundamentals." -Fred Chong,University of California at Santa Barbara "Since the publication of the first edition in 1994,Computer Organization and Design has introduced a generation of computer science and engineering students to computer architecture.Now,many of those students have become leaders in the field.In academia,the tradition continues as faculty use the latest edition of the book that inspired them to engage the next generation.With the fourth edition, readers are prepared for the next era of computing." -David I.August,Princeton University "The new coverage of multiprocessors and parallelism lives up to the standards of this well-written classic.It provides well-motivated,gentle introductions to the new topics,as well as many details and examples drawn from current hardware." -John Greiner,Rice University "As computer hardware architecture moves from uniprocessor to multicores,the parallel programming environments used to take advantage of these cores will be a defining challenge to the success of these new systems.In the multicore systems, the interface between the hardware and software is of particular importance.This new edition of Computer Organization and Design is mandatory for any student who wishes to understand multicore architecture including the interface between programming it and its architecture." -Jesse Fang,Director of Programming System Lab at Intel "The fourth edition of Computer Organization and Design continues to improve the high standards set by the previous editions.The new content,on trends that are reshaping computer systems including multicores,Flash memory,GPUs,etc., makes this edition a must read-even for all of those who grew up on previous editions of the book” -Parthasarathy Ranganathan,Principal Research Scientist,HP Labs
In Praise of Computer Organization and Design: The Hardware/ Software Interface, Revised Fourth Edition “Patterson and Hennessy not only improve the pedagogy of the traditional material on pipelined processors and memory hierarchies, but also greatly expand the multiprocessor coverage to include emerging multicore processors and GPUs. The fourth edition of Computer Organization and Design sets a new benchmark against which all other architecture books must be compared.” —David A. Wood, University of Wisconsin-Madison “Patterson and Hennessy have greatly improved what was already the gold standard of textbooks. In the rapidly evolving field of computer architecture, they have woven an impressive number of recent case studies and contemporary issues into a framework of time-tested fundamentals.” —Fred Chong, University of California at Santa Barbara “Since the publication of the first edition in 1994, Computer Organization and Design has introduced a generation of computer science and engineering students to computer architecture. Now, many of those students have become leaders in the field. In academia, the tradition continues as faculty use the latest edition of the book that inspired them to engage the next generation. With the fourth edition, readers are prepared for the next era of computing.” —David I. August, Princeton University “The new coverage of multiprocessors and parallelism lives up to the standards of this well-written classic. It provides well-motivated, gentle introductions to the new topics, as well as many details and examples drawn from current hardware.” —John Greiner, Rice University “As computer hardware architecture moves from uniprocessor to multicores, the parallel programming environments used to take advantage of these cores will be a defining challenge to the success of these new systems. In the multicore systems, the interface between the hardware and software is of particular importance. This new edition of Computer Organization and Design is mandatory for any student who wishes to understand multicore architecture including the interface between programming it and its architecture.” —Jesse Fang, Director of Programming System Lab at Intel “The fourth edition of Computer Organization and Design continues to improve the high standards set by the previous editions. The new content, on trends that are reshaping computer systems including multicores, Flash memory, GPUs, etc., makes this edition a must read—even for all of those who grew up on previous editions of the book.” —Parthasarathy Ranganathan, Principal Research Scientist, HP Labs
ACKNOWLEDG MENTS Figures 1.7,1.8 Courtesy of Other World Computing (www.macsales.com).Figure 1.10.6 Courtesy of the Computer History Museum. Figures 1.9,1.19,5.37 Courtesy of AMD. Figures 5.12.1,5.12.2 Courtesy of Museum of Science,Boston. Figure 1.10 Courtesy of Storage Technology Corp. Figure 5.12.4 Courtesy of MIPS Technologies,Inc. Figures 1.10.1,1.10.2,4.15.2 Courtesy of the Charles Babbage Figures 6.15,6.16,6.17 Courtesy of Sun Microsystems,Inc. Institute,University of Minnesota Libraries,Minneapolis. Figure 6.4 Peg Skorpinski. Figures1.10.3,4.15.l,4.15.3,5.12.3,6.14.2 Courtesy of IBM. Figure 6.14.1 Courtesy of the Computer Museum of America. Figure 1.10.4 Courtesy of Cray Inc. Figure 6.14.3 Courtesy of the Commercial Computing Museum. Figure 1.10.5 Courtesy of Apple Computer,Inc. Figures 7.13.1 Courtesy of NASA Ames Research Center
A C K N O W L E D G M E N T S Figures 1.7, 1.8 Courtesy of Other World Computing (www.macsales.com). Figures 1.9, 1.19, 5.37 Courtesy of AMD. Figure 1.10 Courtesy of Storage Technology Corp. Figures 1.10.1, 1.10.2, 4.15.2 Courtesy of the Charles Babbage Institute, University of Minnesota Libraries, Minneapolis. Figures 1.10.3, 4.15.1, 4.15.3, 5.12.3, 6.14.2 Courtesy of IBM. Figure 1.10.4 Courtesy of Cray Inc. Figure 1.10.5 Courtesy of Apple Computer, Inc. Figure 1.10.6 Courtesy of the Computer History Museum. Figures 5.12.1, 5.12.2 Courtesy of Museum of Science, Boston. Figure 5.12.4 Courtesy of MIPS Technologies, Inc. Figures 6.15, 6.16, 6.17 Courtesy of Sun Microsystems, Inc. Figure 6.4 © Peg Skorpinski. Figure 6.14.1 Courtesy of the Computer Museum of America. Figure 6.14.3 Courtesy of the Commercial Computing Museum. Figures 7.13.1 Courtesy of NASA Ames Research Center
R E V I S E D F O U R T H E D I T I O N Computer Organization and Design THE HARDWARE/SOFT WARE INTERFACE David A.Patterson University of California,Berkeley John L.Hennessy Stanford University With contributions by Perry Alexander David Kaeli Kevin Lim The University of Kansas Northeastern University Hewlett-Packard Peter J.Ashenden Nicole Kaiyan John Nickolls Ashenden Designs Pty Ltd University of Adelaide NVIDIA Javier Bruguera David Kirk John Oliver Universidade de Santiago de Compostela NVIDIA Cal Poly,San Luis Obispo Jichuan Chang James R.Larus Milos Prvulovic Hewlett-Packard Microsoft Research Georgia Tech Matthew Farrens Jacob Leverich Partha Ranganathan University of California,Davis Hewlett-Packard Hewlett-Packard AMSTERDAM·BOSTON·HEIDELBERG·LONDON NEW YORK·OXFORD·PARIS·SAN DIEGO M< SAN FRANCISCO·SINGAPORE·SYDNEY·TOKYO HORGAN KAUFHANN ELSEVIER Morgan Kaufmann is an imprint of Elsevier
AMSTERDAM • BOSTON • HEIDELBERG • LONDON NEW YORK • OXFORD • PARIS • SAN DIEGO SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO Morgan Kaufmann is an imprint of Elsevier R E V I S E D F O U R T H E D I T I O N Computer Organization and Design T H E H A R D W A R E / S O F T W A R E I N T E R F A C E David A. Patterson University of California, Berkeley John L. Hennessy Stanford University With contributions by Perry Alexander David Kaeli Kevin Lim The University of Kansas Northeastern University Hewlett-Packard Peter J. Ashenden Nicole Kaiyan John Nickolls Ashenden Designs Pty Ltd University of Adelaide NVIDIA Javier Bruguera David Kirk John Oliver Universidade de Santiago de Compostela NVIDIA Cal Poly, San Luis Obispo Jichuan Chang James R. Larus Milos Prvulovic Hewlett-Packard Microsoft Research Georgia Tech Matthew Farrens Jacob Leverich Partha Ranganathan University of California, Davis Hewlett-Packard Hewlett-Packard
Acquiring Editor:Todd Green Development Editor:Nate McFadden Project Manager:Jessica Vaughan Designer:Eric DeCicco Morgan Kaufmnann is an imprint of Elsevier 225 Wyman Street,Waltham,MA 02451,USA 2012 Elsevier,Inc.All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means,electronic or mechanical,including photocopying,recording,or any information storage and retrieval system,without permission in writing from the publisher.Details on how to seek permission,further information about the Publisher's permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency,can be found at our website:www.elsevier.com/permissions. This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein). Notices Knowledge and best practice in this field are constantly changing.As new research and experience broaden our understanding,changes in research methods or professional practices,may become necessary.Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information or methods described herein.In using such information or methods they should be mindful of their own safety and the safety of others,including parties for whom they have a professional responsibility. To the fullest extent of the law,neither the Publisher nor the authors,contributors,or editors,assume any liability for any injury and/or damage to persons or property as a matter of products liability,negligence or otherwise,or from any use or operation of any methods,products,instructions,or ideas contained in the material herein. Library of Congress Cataloging-in-Publication Data Patterson,David A. Computer organization and design:the hardware/software interface David A.Patterson,John L.Hennessy.-4th ed. p.cm.-(The Morgan Kaufmann series in computer architecture and design) Rev.ed.of:Computer organization and design/John L.Hennessy,David A.Patterson.1998. Summary:"Presents the fundamentals of hardware technologies,assembly language,computer arithmetic,pipelining, memory hierarchies and I/O"-Provided by publisher. ISBN978-0-12-374750-1(pbk) 1.Computer organization.2.Computer engineering.3.Computer interfaces.I.Hennessy,John L.II.Hennessy,John L. Computer organization and design.III.Title. QA76.9.C643H462011 004.22dc23 2011029199 British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library. ISBN:978-0-12-374750-1 For information on all MK publications visit our website at www.mkp.com Printed in the United States of America 1213141516 1098765432 Working together to grow libraries in developing countries www.elsevier.com www.bookaid.org www.sabre.org ELSEVIER BOOK AID International Sabre Foundation
Acquiring Editor: Todd Green Development Editor: Nate McFadden Project Manager: Jessica Vaughan Designer: Eric DeCicco Morgan Kaufmann is an imprint of Elsevier 225 Wyman Street, Waltham, MA 02451, USA © 2012 Elsevier, Inc. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions. This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein). Notices Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods or professional practices, may become necessary. Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information or methods described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility. To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein. Library of Congress Cataloging-in-Publication Data Patterson, David A. Computer organization and design: the hardware/software interface / David A. Patterson, John L. Hennessy. — 4th ed. p. cm. — (The Morgan Kaufmann series in computer architecture and design) Rev. ed. of: Computer organization and design / John L. Hennessy, David A. Patterson. 1998. Summary: “Presents the fundamentals of hardware technologies, assembly language, computer arithmetic, pipelining, memory hierarchies and I/O”— Provided by publisher. ISBN 978-0-12-374750-1 (pbk.) 1. Computer organization. 2. Computer engineering. 3. Computer interfaces. I. Hennessy, John L. II. Hennessy, John L. Computer organization and design. III. Title. QA76.9.C643H46 2011 004.2´2—dc23 2011029199 British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library. ISBN: 978-0-12-374750-1 Printed in the United States of America 12 13 14 15 16 10 9 8 7 6 5 4 3 2 For information on all MK publications visit our website at www.mkp.com
Contents Preface xv CHAPTERS 1 Computer Abstractions and Technology 2 1.1 Introduction 3 1.2 Below Your Program 10 1.3 Under the Covers 13 1.4 Performance 26 1.5 The Power Wall 39 1.6 The Sea Change:The Switch from Uniprocessors to Multiprocessors 41 1.7 Real Stuff:Manufacturing and Benchmarking the AMD Opteron X4 44 1.8 Fallacies and Pitfalls 51 1.9 Concluding Remarks 54 1.10 Historical Perspective and Further Reading 55 1.11 Exercises 56 2 Instructions:Language of the Computer 74 2.1 Introduction 76 2.2 Operations of the Computer Hardware 77 2.3 Operands of the Computer Hardware 80 2.4 Signed and Unsigned Numbers 87 2.5 Representing Instructions in the Computer 94 2.6 Logical Operations 102 2.7 Instructions for Making Decisions 105 2.8 Supporting Procedures in Computer Hardware 112 2.9 Communicating with People 122 2.10 MIPS Addressing for 32-Bit Immediates and Addresses 128 2.11 Parallelism and Instructions:Synchronization 137 2.12 Translating and Starting a Program 139 2.13 A C Sort Example to Put It All Together 149
Contents Preface xv C H A P T E R S 1 Computer Abstractions and Technology 2 1.1 Introduction 3 1.2 Below Your Program 10 1.3 Under the Covers 13 1.4 Performance 26 1.5 The Power Wall 39 1.6 The Sea Change: The Switch from Uniprocessors to Multiprocessors 41 1.7 Real Stuff: Manufacturing and Benchmarking the AMD Opteron X4 44 1.8 Fallacies and Pitfalls 51 1.9 Concluding Remarks 54 1.10 Historical Perspective and Further Reading 55 1.11 Exercises 56 2 Instructions: Language of the Computer 74 2.1 Introduction 76 2.2 Operations of the Computer Hardware 77 2.3 Operands of the Computer Hardware 80 2.4 Signed and Unsigned Numbers 87 2.5 Representing Instructions in the Computer 94 2.6 Logical Operations 102 2.7 Instructions for Making Decisions 105 2.8 Supporting Procedures in Computer Hardware 112 2.9 Communicating with People 122 2.10 MIPS Addressing for 32-Bit Immediates and Addresses 128 2.11 Parallelism and Instructions: Synchronization 137 2.12 Translating and Starting a Program 139 2.13 A C Sort Example to Put It All Together 149
Contents 2.14 Arrays versus Pointers 157 E 2.15 Advanced Material:Compiling C and Interpreting Java 161 2.16 Real Stuff:ARM Instructions 161 2.17 Real Stuff:x86 Instructions 165 2.18 Fallacies and Pitfalls 174 2.19 Concluding Remarks 176 2.20 Historical Perspective and Further Reading 179 2.21 Exercises 179 3 Arithmetic for Computers 222 3.1 Introduction 224 3.2 Addition and Subtraction 224 3.3 Multiplication 230 3.4 Division 236 3.5 Floating Point 242 3.6 Parallelism and Computer Arithmetic:Associativity 270 3.7 Real Stuff:Floating Point in the x86 272 3.8 Fallacies and Pitfalls 275 3.9 Concluding Remarks 280 3.10 Historical Perspective and Further Reading 283 3.11 Exercises 283 The Processor 298 4.1 Introduction 300 4.2 Logic Design Conventions 303 4.3 Building a Datapath 307 4.4 A Simple Implementation Scheme 316 4.5 An Overview of Pipelining 330 4.6 Pipelined Datapath and Control 344 4.7 Data Hazards:Forwarding versus Stalling 363 4.8 Control Hazards 375 4.9 Exceptions 384 4.10 Parallelism and Advanced Instruction-Level Parallelism 391 4.11 Real Stuff:the AMD Opteron X4 (Barcelona)Pipeline 404 4.12 Advanced Topic:an Introduction to Digital Design Using a Hardware Design Language to Describe and Model a Pipeline and More Pipelining Illustrations 406 4.13 Fallacies and Pitfalls 407 4.14 Concluding Remarks 408 4.15 Historical Perspective and Further Reading 409 4.16 Exercises 409
2.14 Arrays versus Pointers 157 2.15 Advanced Material: Compiling C and Interpreting Java 161 2.16 Real Stuff: ARM Instructions 161 2.17 Real Stuff: x86 Instructions 165 2.18 Fallacies and Pitfalls 174 2.19 Concluding Remarks 176 2.20 Historical Perspective and Further Reading 179 2.21 Exercises 179 3 Arithmetic for Computers 222 3.1 Introduction 224 3.2 Addition and Subtraction 224 3.3 Multiplication 230 3.4 Division 236 3.5 Floating Point 242 3.6 Parallelism and Computer Arithmetic: Associativity 270 3.7 Real Stuff: Floating Point in the x86 272 3.8 Fallacies and Pitfalls 275 3.9 Concluding Remarks 280 3.10 Historical Perspective and Further Reading 283 3.11 Exercises 283 4 The Processor 298 4.1 Introduction 300 4.2 Logic Design Conventions 303 4.3 Building a Datapath 307 4.4 A Simple Implementation Scheme 316 4.5 An Overview of Pipelining 330 4.6 Pipelined Datapath and Control 344 4.7 Data Hazards: Forwarding versus Stalling 363 4.8 Control Hazards 375 4.9 Exceptions 384 4.10 Parallelism and Advanced Instruction-Level Parallelism 391 4.11 Real Stuff: the AMD Opteron X4 (Barcelona) Pipeline 404 4.12 Advanced Topic: an Introduction to Digital Design Using a Hardware Design Language to Describe and Model a Pipeline and More Pipelining Illustrations 406 4.13 Fallacies and Pitfalls 407 4.14 Concluding Remarks 408 4.15 Historical Perspective and Further Reading 409 4.16 Exercises 409 x Contents
Contents xi 5 Large and Fast:Exploiting Memory Hierarchy 450 5.1 Introduction 452 5.2 The Basics of Caches 457 5.3 Measuring and Improving Cache Performance 475 5.4 Virtual Memory 492 5.5 A Common Framework for Memory Hierarchies 518 5.6 Virtual Machines 525 5.7 Using a Finite-State Machine to Control a Simple Cache 529 5.8 Parallelism and Memory Hierarchies:Cache Coherence 534 5.9 Advanced Material:Implementing Cache Controllers 538 5.10 Real Stuff:the AMD Opteron X4(Barcelona)and Intel Nehalem Memory Hierarchies 539 5.11 Fallacies and Pitfalls 543 5.12 Concluding Remarks 547 5.13 Historical Perspective and Further Reading 548 5.14 Exercises 548 6 Storage and Other I/O Topics 568 6.1 Introduction 570 6.2 Dependability,Reliability,and Availability 573 6.3 Disk Storage 575 6.4 Flash Storage 580 6.5 Connecting Processors,Memory,and I/O Devices 582 6.6 Interfacing I/O Devices to the Processor,Memory,and Operating System 586 6.7 I/O Performance Measures:Examples from Disk and File Systems 596 6.8 Designing an I/O System 598 6.9 Parallelism and I/O:Redundant Arrays of Inexpensive Disks 599 6.10 Real Stuff:Sun Fire x4150 Server 606 6.11 Advanced Topics:Networks 612 6.12 Fallacies and Pitfalls 613 6.13 Concluding Remarks 617 6.14 Historical Perspective and Further Reading 618 6.15 Exercises 619 7 Multicores,Multiprocessors,and Clusters 630 7.1 Introduction 632 7.2 The Difficulty of Creating Parallel Processing Programs 634 7.3 Shared Memory Multiprocessors 638
5 Large and Fast: Exploiting Memory Hierarchy 450 5.1 Introduction 452 5.2 The Basics of Caches 457 5.3 Measuring and Improving Cache Performance 475 5.4 Virtual Memory 492 5.5 A Common Framework for Memory Hierarchies 518 5.6 Virtual Machines 525 5.7 Using a Finite-State Machine to Control a Simple Cache 529 5.8 Parallelism and Memory Hierarchies: Cache Coherence 534 5.9 Advanced Material: Implementing Cache Controllers 538 5.10 Real Stuff: the AMD Opteron X4 (Barcelona) and Intel Nehalem Memory Hierarchies 539 5.11 Fallacies and Pitfalls 543 5.12 Concluding Remarks 547 5.13 Historical Perspective and Further Reading 548 5.14 Exercises 548 6 Storage and Other I/O Topics 568 6.1 Introduction 570 6.2 Dependability, Reliability, and Availability 573 6.3 Disk Storage 575 6.4 Flash Storage 580 6.5 Connecting Processors, Memory, and I/O Devices 582 6.6 Interfacing I/O Devices to the Processor, Memory, and Operating System 586 6.7 I/O Performance Measures: Examples from Disk and File Systems 596 6.8 Designing an I/O System 598 6.9 Parallelism and I/O: Redundant Arrays of Inexpensive Disks 599 6.10 Real Stuff: Sun Fire x4150 Server 606 6.11 Advanced Topics: Networks 612 6.12 Fallacies and Pitfalls 613 6.13 Concluding Remarks 617 6.14 Historical Perspective and Further Reading 618 6.15 Exercises 619 7 Multicores, Multiprocessors, and Clusters 630 7.1 Introduction 632 7.2 The Difficulty of Creating Parallel Processing Programs 634 7.3 Shared Memory Multiprocessors 638 Contents xi