
高级计算机体系结构设计及其在数据中心和云计算的应用Lecture 8Instruction Fetch and Branch Prediction
高级计算机体系结构设计及其在数据中心和云计算的应用 Lecture 8 Instruction Fetch and Branch Prediction

高级计算机体系结构设计及其在数据中心和云计算的应用Fetch Rate is an ILP Upper BoundInstruction fetch limits performance- To sustain IPC of N, must sustain a fetch rate of N per cycle·If you consume 1500 calories per day,but burn2000 calories per day,then you will eventually starve.- Need to fetch N on average, not on every cycleN-wide superscalar ideally fetches N insns.per cycle.This doesn't happen in practice due to:-Instructioncacheorganization-Branches-...andinteractionbetweenthetwo
高级计算机体系结构设计及其在数据中心和云计算的应用 Fetch Rate is an ILP Upper Bound • Instruction fetch limits performance – To sustain IPC of N, must sustain a fetch rate of N per cycle • If you consume 1500 calories per day, but burn 2000 calories per day, then you will eventually starve. – Need to fetch N on average, not on every cycle • N-wide superscalar ideally fetches N insns. per cycle • This doesn’t happen in practice due to: – Instruction cache organization – Branches – . and interaction between the two

高级计算机体系结构设计及其在数据中心和云计算的应用Instruction Cache Organization.To fetch N instructions per cycle...- L1-l line must be wide enough for N instructionsPCregisterselectsL1-l line. A fetch group is the set of insns. starting at PC- For N-wide machine, [PC,PC+N-1]PCInstInstInstCacheLineTagInstTagInstInstInstInstTagInstInstInstInstDecoder.TagInstInstInstInstTagInstInstInstInst
高级计算机体系结构设计及其在数据中心和云计算的应用 Instruction Cache Organization • To fetch N instructions per cycle. – L1-I line must be wide enough for N instructions • PC register selects L1-I line • A fetch group is the set of insns. starting at PC – For N-wide machine, [PC,PC+N wide machine, [PC,PC+N-1] Decoder Tag Inst Inst Inst Inst Tag Inst Inst Inst Inst Tag Inst Inst Inst Inst Tag Inst Inst Inst Inst Tag Inst Inst Inst Inst Cache Line PC

高级计算机体系结构设计及其在数据中心和云计算的应用Fetch Misalignment (1/2)IfPC=xxx01001,N=4- ldeal fetch group is xxx01001 through xxx01100 (inclusive)o1PC:xxx01007100011000TagInstInstInstInst001TagInstInstInstInst010TagInstInstInstInstDecoderol1InstTagInstInstInst...?TagInstInstInstInstLinewidthFetchgroupMisalignment reducesfetchwidth
高级计算机体系结构设计及其在数据中心和云计算的应用 Fetch Misalignment (1/2) • If PC = xxx01001, N=4: – Ideal fetch group is xxx01001 through xxx01100 (inclusive) Tag Inst Inst Inst Inst 000 001 PC: xxx01001 00 01 10 11 Decoder Tag Inst Inst Inst Inst Tag Inst Inst Inst Inst Tag Inst Inst Inst Inst Tag Inst Inst Inst Inst Tag Inst Inst Inst Inst 001 010 011 111 Line width Fetch group Misalignment reduces fetch width

高级计算机体系结构设计及其在数据中心和云计算的应用Fetch Misalignment (2/2)NowtakestwocyclestofetchNinstructions一Yfetchbandwidth!PC:xXx01001000TaginstInstInstInst001Tagnstnstnstnst010Tar福instInstInstInst011ecoTagnstInstInstnsi...JerTagInstInstInstInstPC:xxx01100000TarnstInstinstInst001InstInstInstTagInstInstInstInstCycleI010TagInstnstinstInst0TagInstnstinstinstTagInstInstInstInstInstInstInstInstMightnotbeYbycombiningwiththenextfetch
高级计算机体系结构设计及其在数据中心和云计算的应用 Fetch Misalignment (2/2) • Now takes two cycles to fetch N instructions – ½ fetch bandwidth! Decoder Tag Inst Inst Inst Inst Tag Inst Inst Inst Inst Tag Inst Inst Inst Inst 000 001 010 011 PC: xxx01001 00 01 10 11 Decoder Tag Inst Inst Inst Inst Tag Inst Inst Inst Inst Tag Inst Inst Inst Inst 010 011 111 Decoder Tag Inst Inst Inst Inst Tag Inst Inst Inst Inst Tag Inst Inst Inst Inst Tag Inst Inst Inst Inst Tag Inst Inst Inst Inst 000 001 010 011 111 PC: xxx01100 00 01 10 11 Inst Inst Inst Inst Cycle 1 Cycle 2 Inst Inst Inst Might not be ½ by combining with the next fetch

高级计算机体系结构设计及其在数据中心和云计算的应用Reducing Fetch Fragmentation (1/2)· Make |Fetch Group| N fromend of line
高级计算机体系结构设计及其在数据中心和云计算的应用 Reducing Fetch Fragmentation (1/2) • Make |Fetch Group| N from end of line

高级计算机体系结构设计及其在数据中心和云计算的应用Reducing Fetch Fragmentation (2/2) Needs a “rotator"to decode insns.in correct orderPCInstInstInstInstTagInstInstInstInstInstInstInstInstDecoderTagInstInstInistInstInstInstInstInstTagInistInistInstInistRoxaterInstInstInstInstAligned fetch group
高级计算机体系结构设计及其在数据中心和云计算的应用 Reducing Fetch Fragmentation (2/2) • Needs a “rotator” to decode insns. in correct order Decoder Tag Inst Inst Inst Inst Inst Inst Inst Inst Tag Inst Inst Inst Inst Inst Inst Inst Inst PC Decoder Tag Inst Inst Inst Inst Inst Inst Inst Inst Rotator Inst Inst Inst Inst Aligned fetch group

高级计算机体系结构设计及其在数据中心和云计算的应用Fragmentation due to Branches Fetch group is aligned, cache line size > fetch group- Taken branches still limit fetch widthTagInstInstInstInstTagInstBranchInstInstTagInstInstInistInstDecoderTagInistInstInstInstTagInlstInstIn'stInstxx
高级计算机体系结构设计及其在数据中心和云计算的应用 Fragmentation due to Branches • Fetch group is aligned, cache line size > fetch group – Taken branches still limit fetch width Decoder Tag Inst Inst Inst Inst Tag Inst Branch Inst Tag Inst Inst Inst Inst Inst Decoder Tag Inst Inst Inst Inst Tag Inst Inst Inst Inst X X

高级计算机体系结构设计及其在数据中心和云计算的应用Types of BranchesDirection:-Conditionalvs.UnconditionalTarget:-PC-encoded·PC-relative.Absolute offset-Computed(targetderivedfromregister)Need direction and target to find next fetch group
高级计算机体系结构设计及其在数据中心和云计算的应用 Types of Branches • Direction: – Conditional vs. Unconditional • Target: – PC-encoded • PC-relative • Absolute offset – Computed (target derived from register) Need direction and target to find next fetch group

高级计算机体系结构设计及其在数据中心和云计算的应用Branch Prediction OverviewUsetwohardwarepredictors-Directionpredictorguesses if branchistaken or not-taken Target predictor guesses the destination PCPredictions are based on history-Usepreviousbehaviorasindicationoffuturebehavior-Use historical context to disambiguate predictions
高级计算机体系结构设计及其在数据中心和云计算的应用 Branch Prediction Overview • Use two hardware predictors – Direction predictor guesses if branch is taken or not-taken – Target predictor guesses the destination PC • Predictions are based on history – Use previous behavior as indication of future behavior – Use historical context to disambiguate predictions