P relined Implementation Part ll
Pipelined Implementation Part II
Overview Make the pipelined processor work! Data Hazards a Instruction having register R as source follows shortly after instruction having register R as destination a Common condition, don't want to slow down pipeline Control Hazards a Mispredict conditional branch o Our design predicts all branches as being taken o Naive pipeline executes two extra instructions a Getting return address for ret instruction O PIPE-executes three extra instructions Making Sure It Really Works a What if multiple special cases happen simultaneously? Processor
– 2 – Processor Overview Make the pipelined processor work! Data Hazards ◼ Instruction having register R as source follows shortly after instruction having register R as destination ◼ Common condition, don’t want to slow down pipeline Control Hazards ◼ Mispredict conditional branch ⚫ Our design predicts all branches as being taken ⚫ Naïve pipeline executes two extra instructions ◼ Getting return address for ret instruction ⚫ PIPE- executes three extra instructions Making Sure It Really Works ◼ What if multiple special cases happen simultaneously?
Suggested Reading Chap 4.5 Processor
– 3 – Processor Suggested Reading - Chap 4.5
Branch Misprediction Example demo-3y 0x000: xox1号eax,8eax 0x002: 刀】 ne七 i Not taken 0x007: irm。v1$1,8eax Fall through 0x00d: nop 0x00e: nop 0x00f: nop 0x010: ha七 0x011: t: irmovI $3, edx Target (Should not execute 0x017: 工mov14,岩ecx i Should not execute 0x01a: irmovl $5, edx i Should not execute a Should only execute first 7 instructions Processor
– 4 – Processor Branch Misprediction Example ◼ Should only execute first 7 instructions 0x000: xorl %eax,%eax 0x002: jne t # Not taken 0x007: irmovl $1, %eax # Fall through 0x00d: nop 0x00e: nop 0x00f: nop 0x010: halt 0x011: t: irmovl $3, %edx # Target (Should not execute) 0x017: irmovl $4, %ecx # Should not execute 0x01d: irmovl $5, %edx # Should not execute demo-j.ys
Branch Misprediction Trace #f demo-1 2 34567 0300039268,9 eax FDEMw 0x002: Gne t Not taken FDEM W 0x01l: t: irmovl $3, edx Target FDEM 0x017: irmovl $4, %ecx Target+1 FDEMIW 0x007 irmovl $l, eax Fall Through FDEMWI Cycle 5 M a Incorrectly execute two M Bch=0 instructions at branch target M valA= 0x007 vaE←3 dstE=号edx D vaIC 4 dstE= ecx 5 rB←各eax Processor
– 5 – Processor Branch Misprediction Trace 0x000: xorl %eax,%eax 1 2 3 4 5 6 7 8 9 F D E M 0x002: jne t # Not taken F D E M W W 0x011: t: irmovl $3, %edx # Target F D E M W 0x017: irmovl $4, %ecx # Target+1 F D E M W 0x007: irmovl $1, %eax # Fall Through F D E M W # demo-j F D E M W Cycle 5 E valE 3 dstE = %edx E valE 3 dstE = %edx M M_Bch = 0 M_valA = 0x007 D valC = 4 dstE = %ecx D valC = 4 dstE = %ecx F valC 1 rB %eax F valC 1 rB %eax ◼ Incorrectly execute two instructions at branch target
Return Example demo -retys 0x000: irmovl Stack, esp Intialize stack pointer 0x006 nop Avoid hazard on esp 0x007: nop 0x008: nop 0x009 callp Procedure call 0x00e irmovl $5, esi Return point 0x014: ha1七 0x020:.pos0x20 Ox020: p:nop #pr。 cedure 0x021: nop 0x022: nop 0x023: re七 0x024: irmoⅴ1$1,号eax i Should not be executed 0x02a: irmovl $2, ecx Should not be executed 0x030: irmovl $3, edx Should not be executed 0x036: 1 Mov1$4,号ebx i Should not be executed 0x100:.pos0x100 0x100: Stack Stack: Stack pointer a Require lots of nops to avoid data hazards Processor
– 6 – Processor 0x000: irmovl Stack,%esp # Intialize stack pointer 0x006: nop # Avoid hazard on %esp 0x007: nop 0x008: nop 0x009: call p # Procedure call 0x00e: irmovl $5,%esi # Return point 0x014: halt 0x020: .pos 0x20 0x020: p: nop # procedure 0x021: nop 0x022: nop 0x023: ret 0x024: irmovl $1,%eax # Should not be executed 0x02a: irmovl $2,%ecx # Should not be executed 0x030: irmovl $3,%edx # Should not be executed 0x036: irmovl $4,%ebx # Should not be executed 0x100: .pos 0x100 0x100: Stack: # Stack: Stack pointer Return Example ◼ Require lots of nops to avoid data hazards demo-ret.ys
Incorrect Return Example ret 0x023 ret FDEMW 0x024 irmovl $1,,, eax Oops! FDEM V 0x02 FDEMV 0x030: i工mov1$3,号edx#Oops FDEM 0x00e irmovl $5, esi Return FDEM W a Incorrectly execute 3 instructions following ret W vaIM= 0x0e valE =1 dstE=号 valE dstE=告e D dstE= edx va|C←5 rB←各 Processe
– 7 – Processor Incorrect Return Example 0x023: ret F D E M 0x024: irmovl $1,%eax # Oops! F D E M W W 0x02a: irmovl $2,%ecx # Oops! F D E M W 0x030: irmovl $3,%edx # Oops! F D E M W 0x00e: irmovl $5,%esi # Return F D E M W # demo-ret F D E M W E valE 2 dstE = %ecx M valE = 1 dstE = %eax D valC = 3 dstE = %edx F valC 5 rB %esi W valM = 0x0e 0x023: ret F D E M 0x024: irmovl $1,%eax # Oops! F D E M W W 0x02a: irmovl $2,%ecx # Oops! F D E M W 0x030: irmovl $3,%edx # Oops! F D E M W 0x00e: irmovl $5,%esi # Return F D E M W # demo-ret F D E M W E valE 2 dstE = %ecx E valE 2 dstE = %ecx M valE = 1 dstE = %eax M valE = 1 dstE = %eax D valC = 3 dstE = %edx D valC = 3 dstE = %edx F valC 5 rB %esi F valC 5 rB %esi W valM = 0x0e ◼ Incorrectly execute 3 instructions following ret
Handling Misprediction 123456 10 s 0x000: xor1 geax %eax FDEM 0x002 jne target Not taken FDEMW 0x01l: t: irmovl $2, edx Target FD bubble EM W 0x017: irmovl $3, %ebx Target+1 F bubble DEMW 0x007: irmovl $l, eax Fall through FDEMW 0x00d: nop TFDTEMWI Predict branch as taken Figure 4.63 P346 a Fetch 2 instructions at target Cancel when mispredicted a Detect branch not-taken in execute stage a On following cycle, replace instructions in execute and decode by bubbles a No side effects have occurred yet 8 Processor
– 8 – Processor Handling Misprediction Predict branch as taken ◼ Fetch 2 instructions at target Cancel when mispredicted ◼ Detect branch not-taken in execute stage ◼ On following cycle, replace instructions in execute and decode by bubbles ◼ No side effects have occurred yet 0x000: xorl %eax,%eax 1 2 3 4 5 6 7 8 9 F D E M W 0x002: jne target # Not taken F D E M W E M W 10 # demo-j.ys 0x011: t: irmovl $2,%edx # Target bubble 0x017: irmovl $3,%ebx # Target+1 F D E M W D F bubble 0x007: irmovl $1,%eax # Fall through 0x00d: nop F D E M W F D E M W Figure 4.63 P346
Detecting Mispredicted Branch valA dstM ALU Figure 4.64 P347 Condition Trigger Mispredicted Branch E icode =lJXX& le Bch -9 Processor
– 9 – Processor Detecting Mispredicted Branch Condition Trigger Mispredicted Branch E_icode = IJXX & !e_Bch M F D Instruction memory PC increment Register file CC ALU Data memory Select PC rB dstE dstM ALU A ALU B Mem. control Addr srcA srcB read write ALU fun. Fetch Decode Execute Memory Write back data out data in A B M E M_valA W _valE W _valM W _valE M_valA W _valM f_PC Predict PC icode Bch valE valA dstE dstM E icode ifun valC valA valB dstE dstM srcA srcB icode ifun rA valC valP predPC d_srcA d_srcB e_Bch M_Bch Sel+Fwd A Fwd B W icode valE valM dstE dstM m_valM W _valM M_valE e_valE M F D Instruction memory PC increment Register file CC ALU Data memory Select PC rB dstE dstM ALU A ALU B Mem. control Addr srcA srcB read write ALU fun. Fetch Decode Execute Memory Write back data out data in A B M E M_valA W _valE W _valM W _valE M_valA W _valM f_PC Predict PC icode Bch valE valA dstE dstM E icode ifun valC valA valB dstE dstM srcA srcB icode ifun rA valC valP predPC d_srcA d_srcB e_Bch M_Bch Sel+Fwd A Fwd B W icode valE valM dstE dstM m_valM W _valM M_valE e_valE m_valM W _valM M_valE e_valE M F D Instruction memory PC increment Register file CC ALU Data memory Select PC rB dstE dstM ALU A ALU B Mem. control Addr srcA srcB read write ALU fun. Fetch Decode Execute Memory Write back data out data in A B M E M_valA W _valE W _valM W _valE M_valA W _valM f_PC Predict PC icode Bch valE valA dstE dstM E icode ifun valC valA valB dstE dstM srcA srcB icode ifun rA valC valP predPC d_srcA d_srcB e_Bch M_Bch Sel+Fwd A Fwd B W icode valE valM dstE dstM m_valM W _valM M_valE e_valE m_valM W _valM M_valE e_valE M F D Instruction memory PC increment Register file CC ALU Data memory Select PC rB dstE dstM ALU A ALU B Mem. control Addr srcA srcB read write ALU fun. Fetch Decode Execute Memory Write back data out data in A B M E M_valA W _valE W _valM W _valE M_valA W _valM f_PC Predict PC icode Bch valE valA dstE dstM E icode ifun valC valA valB dstE dstM srcA srcB icode ifun rA valC valP predPC d_srcA d_srcB e_Bch M_Bch Sel+Fwd A Fwd B W icode valE valM dstE dstM m_valM W _valM M_valE e_valE m_valM W _valM M_valE e_valE m_valM W _valM M_valE e_valE m_valM W _valM M_valE e_valE Figure 4.64 P347
Control for Misprediction demo -i y 2 3456 8910 0x000 xor Seax, Seax FDE MW 0x002: jne t# Not taken FDE MW 0x0l1: t: irmovl $2,% edx Target F D bubble E M W 0x017 irmovl $3, %ebx Target+1 F bubble D E M W 0x007: irmovl $l, geax Fall through FD E M W 0x00d nop I EMW Figure 4.63 P346 Condition F D E M W Mispredicted Branch normal bubblebubble normal normal Figure 4.66 P348 Processor
– 10 – Processor Control for Misprediction 0x000: xorl %eax,%eax 1 2 3 4 5 6 7 8 9 F D E M W 0x002: jne t # Not taken F D E M W E M W 10 # demo-j.ys 0x011: t: irmovl $2,%edx # Target bubble 0x017: irmovl $3,%ebx # Target+1 F D E M W D F bubble 0x007: irmovl $1,%eax # Fall through 0x00d: nop F D E M W F D E M W Condition F D E M W Mispredicted Branch normal bubble bubble normal normal Figure 4.63 P346 Figure 4.66 P348