《现代计算机体系结构》课程教学课件（留学生版）Lecture 3 Pipelining

团购合买资源类别：文库，文档格式：PDF，文档页数：116，文件大小：4.32MB

Computer ArchitecturePipeliningComputerArchitecture

Computer Architecture Computer Architecture Pipelining

Can We Do Better?: What limitations do you see with the multi-cycledesign?. Limited concurrency- Some hardware resources are idle during differentphases of instruction processing cycle_"Fetch" logic is idle when an instruction is being"decoded"or"executed"- Most of the datapath is idle when a memory access ishappeningComputerArchitecture

Computer Architecture Can We Do Be*er? • What limitations do you see with the multi-cycle design? • Limited concurrency – Some hardware resources are idle during different phases of instruction processing cycle – “Fetch” logic is idle when an instruction is being “decoded” or “executed” – Most of the datapath is idle when a memory access is happening 2

CanWeUsetheldleHardwareto Improve Concurrency?· Goal: Concurrency →> throughput (more "work"completed in one cycle) Idea: When an instruction is using some resources inits processing phase, process other instructions on idleresources not needed by that instruction- E.g., when an instruction is being decoded, fetch the nextinstruction- E.g., when an instruction is being executed, decode anotherinstruction- E.g., when an instruction is accessing data memory (ld/st)executethenextinstruction- E.g., when an instruction is writing its result into the registerfile, access data memory for the next instructionComputerArchitecture

Computer Architecture Can We Use the Idle Hardware to Improve Concurrency? • Goal: Concurrency à throughput (more “work” completed in one cycle) • Idea: When an instruction is using some resources in its processing phase, process other instructions on idle resources not needed by that instruction – E.g., when an instruction is being decoded, fetch the next instruction – E.g., when an instruction is being executed, decode another instruction – E.g., when an instruction is accessing data memory (ld/st), execute the next instruction – E.g., when an instruction is writing its result into the register file, access data memory for the next instruction 3

Pipelining:Basicldea: More systematically:- Pipeline the execution of multiple instructions-Analogy:"Assembly line processing"of instructions: Idea:- Divide the instruction processing cycle into distinct "stages" ofprocessingEnsurethereareenoughhardware resourcestoprocess oneinstruction in each stage- Process a different instruction in each stageInstructions consecutive inprogramorderareprocessed in consecutivestagesBenefit: Increases instruction processing throughput (1)CPI)Downside: Start thinking about this..ComputerArchitecture

Computer Architecture Pipelining: Basic Idea • More systematically: – Pipeline the execution of multiple instructions – Analogy: “Assembly line processing” of instructions • Idea: – Divide the instruction processing cycle into distinct “stages” of processing – Ensure there are enough hardware resources to process one instruction in each stage – Process a different instruction in each stage • Instructions consecutive in program order are processed in consecutive stages • Benefit: Increases instruction processing throughput (1/ CPI) • Downside: Start thinking about this. 4

Example:ExecutionofFourIndependentADDs: Multi-cycle: 4 cycles per instructionW1Time: Pipelined: 4 cycles per 4 instructions (steady state)DEWDEWDW-TimeComputerArchitecture

Computer Architecture Example: ExecuBon of Four Independent ADDs • Multi-cycle: 4 cycles per instruction • Pipelined: 4 cycles per 4 instructions (steady state) 5 Time F D E W F D E W F D E W F D E W F D E W F D E W F D E W F D E W Time

TheLaundryAnalogy1011126PM18912 AMTimeTaskorder可GB一DC"place one dirty load of clothes in the washer""when the washer is finished, place the wet load in the dryer""when the dryer is finished, take out the dry load and fold'"when folding is finished, ask your roommate (??) to put the clothesaway"-stepstodoaloadaresequentiallydependent-nodependencebetweendifferentloads-differentstepsdonotshareresourcesComputerArchitecture

Computer Architecture The Laundry Analogy • “place one dirty load of clothes in the washer” • “when the washer is finished, place the wet load in the dryer” • “when the dryer is finished, take out the dry load and fold” • “when folding is finished, ask your roommate (??) to put the clothes away” 6 - steps to do a load are sequenBally dependent - no dependence between different loads - different steps do not share resources Time 6 PM 7 8 9 10 11 12 1 2 AM A B C D Time 6 PM 7 8 9 10 11 12 1 2 AM A B C D Task order Task order

Pipelining Multiple Loads of Laundry1891011126 PM12 AMTimeTaskorderC同兰丽ro1011126 PM7812 AMTimeTaskorder-4loadsof laundryinparallel同丽B-noadditionalresourcesC-throughputincreasedby4D-latency per loadis the sameComputerArchitecture

Computer Architecture Pipelining MulBple Loads of Laundry 7 Time 6 PM 7 8 9 10 11 12 1 2 AM A B C D Time 6 PM 7 8 9 10 11 12 1 2 AM A B C D Task order Task order Time 6 PM 7 8 9 10 11 12 1 2 AM A B C D Time 6 PM 7 8 9 10 11 12 1 2 AM A B C D Task order Task order - latency per load is the same - throughput increased by 4 - 4 loads of laundry in parallel - no addiBonal resources

Pipelining MultipleLoads of Laundry:In Practice1011126 PM7892 AM1TimeTaskorderBC6 PM7812910112 AMTimeTaskorder兰丽A口奇丽BC向m国同田thesloweststep decidesthroughputComputerArchitecture

Computer Architecture Pipelining MulBple Loads of Laundry: In PracBce 8 Time 6 PM 7 8 9 10 11 12 1 2 AM A B C D Time 6 PM 7 8 9 10 11 12 1 2 AM A B C D Task order Task order Time 6 PM 7 8 9 10 11 12 1 2 AM A B C D Time 6 PM 7 8 9 10 11 12 1 2 AM A B C D Task order Task order the slowest step decides throughput

Pipelining Multiple Loads ofLaundry:In Practice111289106 PM72 AMTimeTaskorderB6 PM891011122 AM1Time"TaskorderBCLThroughputrestored (2 loads perhour)using2dryersComputerArchitecture

Computer Architecture Pipelining MulBple Loads of Laundry: In PracBce 9 Time 6 PM 7 8 9 10 11 12 1 2 AM A B C D Time 6 PM 7 8 9 10 11 12 1 2 AM A B C D Task order Task order A B A B Throughput restored (2 loads per hour) using 2 dryers Time 6 PM 7 8 9 10 11 12 1 2 AM A B C D Time 6 PM 7 8 9 10 11 12 1 2 AM A B C D Task order Task order

AnldealPipeline Goal: Increase throughput with little increase in cost(hardware cost, in case of instruction processing):Repetition of identical operations. The same operation is repeated on a large number ofdifferent inputsRepetition of independentoperations-Nodependenciesbetweenrepeatedoperations.Uniformly partitionable suboperations- Processing can be evenly divided into uniform-latencysuboperations (that do not share resources) Fitting examples: automobile assembly line, doinglaundry- What about the instruction processing "cycle"?ComputerArchitecture10

Computer Architecture An Ideal Pipeline • Goal: Increase throughput with little increase in cost (hardware cost, in case of instruction processing) • Repetition of identical operations – The same operation is repeated on a large number of different inputs • Repetition of independent operations – No dependencies between repeated operations • Uniformly partitionable suboperations – Processing can be evenly divided into uniform-latency suboperations (that do not share resources) • Fitting examples: automobile assembly line, doing laundry – What about the instruction processing “cycle”? 10

点击下载完整版文档（PDF格式）

共116页，可试读30页，点击继续阅读 ↓↓

点击下载（PDF格式）

浏览记录