内容提要 。流水线技术原理:4.5 。RV的五级流水线实现:4.6 。Hazard问题:4.5.2 一结构冲突:哈佛结构 -数据依赖:4.7 ·编译技术:插入nop,指令重排,寄存器重命名 ·forwarding技术:RAW 。Interlock技术:Stall -控制相关:4.8 ·编译技术:延迟分支 ·硬件优化:提前完成,投机,预测 。多发射技术:4.10 ·硬件多线程:6.4
内容提要 • 流水线技术原理:4.5 • RV的五级流水线实现:4.6 • Hazard问题:4.5.2 – 结构冲突:哈佛结构 – 数据依赖:4.7 • 编译技术:插入nop,指令重排,寄存器重命名 • forwarding技术:RAW • Interlock技术:Stall – 控制相关:4.8 • 编译技术:延迟分支 • 硬件优化:提前完成,投机,预测 • 多发射技术:4.10 • 硬件多线程:6.4
Scep R-Type Bw/ow 成,Memg PC=PC+4 D A=ReI25-21 B=e2H2015 beq指令完成时间? ALUOut .PC+(SEDR[15-0]1 <<2) EX ALUCut=Aop B ALLOu If (A==B)PC=PC[31-28] A+5夏15-0D PCAL0t时25-02 MEM Reg IR|15-111= MDR=Mem[ALUOut] ALUO九u MemALUOut■B wB RegfiR[20-16]]=MDR 以哪一条指令为基址? ·能否在EXE完成? IF/D ID/EX EX/MEM MEM/WB ALU ALU memory d u memory 6 图4-39
beq指令完成时间? • 以哪一条指令为基址? • 能否在EXE完成? 图4-39
beg branch hazard:stall? Program Time (in clock cycles) execution CC 1 CC2 CC3 CC 5 CC 6 CC 7 CC 8 CC9 order (in instructions) taken/not taken 40beq$1,$3,72 Reg 延迟槽(delay slot)=3 排空(flush) 44and$12,$2,$5 Reg M Reg 480r$13,s6,$2 M Reg 52add$14,s2,$2 IM Reg 72Iw$4,50(s7) 图4-59 R 1 “相对于数据相关,分支频率低,没有好方案”! llxx@ustc.edu.cn 4
llxx@ustc.edu.cn 4 beq branch hazard:stall? R e g R e g C C 1 T im e (in c lo c k c y c le s ) 4 0 b e q $ 1 , $ 3 , 72 P ro g ra m e x e c u tio n o rd e r (in in s tru c tio n s ) IM R e g IM D M IM D M IM D M D M D M R e g R e g R e g R e g R e g IM R e g 4 4 a n d $ 1 2 , $ 2 , $ 5 4 8 o r $ 1 3 , $ 6 , $ 2 5 2 a d d $ 1 4 , $ 2 , $ 2 7 2 lw $ 4 , 5 0 ($ 7 ) C C 2 C C 3 C C 4 C C 5 C C 6 C C 7 C C 8 C C 9 R e g taken/not taken 延迟槽(delay slot)= 3 排空(flush) “相对于数据相关,分支频率低,没有好方案”! 图4-59
beq:not taken与taken taken:flush Clear UP bubble down PCSre EX/MEM Control MEMMB EX IFAD Shin Branch left 1 ALUSre 图4-49 Address Instruction ALU ALU Read memory Reaister Address Write result data u Data memory Write data Instruction 31-0 32 Flushed Continued 64 MemRead ALU add $r1.$r2,$r3 sw Sr1,0($r2) sub $r1,$r2.Sr3 Beg Sr2,Sr5,0x08 sit Sr1,$r2,Sr3 D141 Instruction ALUOp IF ID EX MEM WB 111-7刀 Instruction flow
5 beq:not taken与taken 图4-49 taken:flush = Clear UP bubble down!
减小beq损失:单周期分支 Flushed Continued add Sr1.5r2,5r3 sw1,0r2) sUb51,5r2,33 hw Sr1,0(Sr2) slt Sr1,5r2.$r3 Flushed Continued IF ID EX MEM WB add Sr1.$r2.Sr3 Beq Sr2.Sr5.0x08 sw Sr1,0(Sr2) sub Sr1,Sr2.Sr3 sit Sr1.Sr2,Sr3 ←---Flushed---- Add Sr1,$r2,Sr4 Add Sr2 Sr2,Sr3 Beq Sr2.Sr5 Lw Sr2.0x04(Sr0)Lw Sr4.Ox0d(Sro) IF EX MEM WB IF ID EX MEM WB Instruction flow Instruction flow Program execution 200 400 600 800 1000 1200 1400 Time order (in instructions) 图4-29 add x4,x5,x6 Instruction Reg ALU Data fetch access Reg 只需Stall-一个cc! beq x1,x0,40 Instruction Data 200ps fetch Reg ALU access Reg bubble bubble bubble bubble bubble 0 or x7,x8,x9 Instruction 400ps Reg ALU Data fetch access Reg
减小beq损失:单周期分支 图4-29 只需Stall一个cc!
单周期分支实现 ·将beq完成时间从MEM提前到ID -将加法器挪到ID段,计算nPC 一在D段增加一个比较器,判断分支条件 ·针对“简单条件判断”:相等(按位xor,再or) ·不适于“复杂条件判断”:需要ALU计算 ·假设分支不发生!!! ·分支发生:只需更新PC和flush IF段 -clear up:增加IF.Flush,清空IFID(图4-62) ·“由controler挖制IF.Fush”:应该与比较器联合控制吧? -bubble down:向lD/EX送“0” Flushed Continued 4的132粉a0阳32.55,008 sw Sr1.0(5r2) sub Sr1,Sr2.Sr3 t51.23 ·损失一个周期 IF 0 EX MEM WB Instruction flow
单周期分支实现 • 将beq完成时间从MEM提前到ID – 将加法器挪到ID段,计算nPC – 在ID段增加一个比较器,判断分支条件 • 针对“简单条件判断”:相等(按位xor,再or) • 不适于“复杂条件判断”:需要ALU计算 • 假设分支不发生!!! • 分支发生:只需更新PC和flush IF段 – clear up:增加IF.Flush,清空IF/ID(图4-62) • “由controler控制 IF.Flush” :应该与比较器联合控制吧? – bubble down:向ID/EX送“0” • 损失一个周期
单周期beq实现 IF.Fhsh Hazard detoction unit ID/EX 图4-62 MEMNB M Contro WE EXMEM M Dsts memory memory imm Gen Fowarding unit Flushed Continued ad时5r12n 2.5.000 m$105的 eub Sr1.5r2.5r3 1多2的 F ID EX MEM WB Delay slot:one cycle! 8 n灯 ction fiow
8 单周期beq实现 图4-62 Delay slot:one cycle!
and x12,x2,x5 beq x1,x3,16 sub x10,x4,x8 before before F月ush Hazard detection unit Flush Taken EX/MEM MEMNWB 36 sub x10,X4,×8 40 beq ×1, ×3,16 44 and ×12,x2,x5 M memory Data 48 or x13,x2,x6 memory 52 add x14,×4,x2 56 sub x15,x6,x7 721d ×4, 50(×7) Clock 3 Clear“and”,取“ld” 1dx4,50(x7) Bubble(nop) beq x1,x3.16 5ubx10..,+; before IF.Flus Hazard C3 C4 C8 EX/MEM REG sub MEMWB MEM REC beq M REG and 7 memory Data R正0 EM Iw ·图4-60 Clock 4
Flush & Taken • 图4-60 Clear “and”,取“ld
单周期beq的RAW,$4.8.2 ·例一: IF Pah add$4,$5,$6 beq$4,$2,40 0 需要Stall? -需要FW? D 例二: w$4,20($1) beq$4,$2,40 0 需要新的interlock规则和forwarding规则? ·时空图?
单周期beq的RAW,$4.8.2 • 例一: add $4,$5,$6 beq $4,$2,40 。。。 – 需要Stall ? – 需要FW? • 例二: lw $4,20($1) beq $4,$2,40 。。。 – 需要新的interlock规则和forwarding规则? • 时空图?
Digltal Design and Computer Architecture with full hazard handling 图7.58 CLK CLK CLK RegWriteD RegWriteE Control MemtoRegD MemtoRegE MemtoRegM MemtoRegW unit MemWriteD MemWriteE MemWriteM ALUControlDo ALUControlE2o 3120 Op ALUSrcD ALUSrcE 5:0 Funct RegDstD RegDstE BranchD CLK CLK EqualD CLK CLK 2521 WE3 WE RD Instro A1 RD1 1 ALUOutM ReadDataW Instruction A RD 20:16 memory A2 RD2 SrcBE Data A3 1 WD3 Register WriteDataE WriteDataM memory file WD 2521 RsD RsE ALUOutW 20:16 RtD RdE RdE WriteRegE WriteRegM. 1511 Sign SignlmmD 150 extend <<2 PCPlus4F PCPlus4D PCBranchD BranchD ForwardAD Forward BO ForwardAE ForwardBE MemtoRegE RegWriteE emto RegM Hazard unit
with full hazard handling 图7.58