
ComputerArchitectureOut of Order ExecutionComputerArchitecture
Computer Architecture Computer Architecture Out of Order Execu3on

Tomasulo AlgorithmHistory:In today's perspective:1966: scoreboarding inthe implementationCDC6600,implementingbelongsto reservation-limited dynamicbased dynamic schedulingschedulingThree years later:The merit is the use of tagTomasuloinIBM360/91to identify instructions inintroducing registerthe fly: register renamingrenaming and reservationstationreservation station, tagbroadcastNowappearingintodaysDec Alpha, SGI MIPS, SUNUltraSparc,Intel Pentium,It is an“algorithm"firstIBM PowerPC, and othersComputerArchitecture
Computer Architecture Tomasulo Algorithm History: • 1966: scoreboarding in CDC6600, implemen?ng limited dynamic scheduling • Three years later: Tomasulo in IBM 360/91, introducing register renaming and reserva?on sta?on • Now appearing in todays Dec Alpha, SGI MIPS, SUN UltraSparc, Intel Pen?um, IBM PowerPC, and others In today’s perspec?ve: • the implementa?on belongs to reserva?onbased dynamic scheduling • The merit is the use of tag to iden?fy instruc?ons in the fly: register renaming, reserva?on sta?on, tag broadcast It is an “algorithm” first

TomasuloOrganizationFrom instruction unitInstructionFPregistersqueueLoad-storeoperationsOperandFloating-pointAddressunitbusesoperationsStorebuffersLoadbuffersVOperation bus32121ReservationstationsAddressData?FPmultipliersFP addersMemoryunltCommondata bus (CDB)ComputerArchitecture
Computer Architecture Tomasulo Organiza?on

ReservationStationComponentsOp-Operation to perform in the unit (e.g., + or -)Vi,Vk-ValueofSourceoperands-StorebuffershasVfield,resulttobestoredQi,Qk-Reservationstationsproducingsourceregisters(value to be written)- Note: No ready flags as in Scoreboard; Qj,Qk=0 => ready- Store buffers onlyhave Qifor RS producingresultBusy-lndicates reservation station or FU is busyComputerArchitecture
Computer Architecture Reserva?on Sta?on Components Op—Opera?on to perform in the unit (e.g., + or –) Vj, Vk—Value of Source operands – Store buffers has V field, result to be stored Qj, Qk—Reserva?on sta?ons producing source registers (value to be wri_en) – Note: No ready flags as in Scoreboard; Qj,Qk=0 => ready – Store buffers only have Qi for RS producing result Busy—Indicates reserva?on sta?on or FU is busy

RenamingTableIndicateswhich RSwill write each register,if oneexists. Blank when no pending instructions that willwrite that register.-Called“Registerresult status"in thetext book.ComputerArchitecture
Computer Architecture Renaming Table • Indicates which RS will write each register, if one exists. Blank when no pending instruc?ons that will write that register. – Called “Register result status” in the text book

ThreeStagesof Tomasulo Algorithm1. Issue-get instruction from FP Op QueueIf reservationstationfree (no structural hazard)issues instr & sends operands (renames registers)2. Execution-operate on operands (Ex)Whenbothoperandsreadythenexecute;if not ready,watch Common Data Bus for result3. Write result-finish execution (WB)Write on Common Data Bus to all awaiting units;markreservationstationavailableIssue:builddependencefornewinstWriteback:Wakeupdependent instructionsComputerArchitecture
Computer Architecture Three Stages of Tomasulo Algorithm 1. Issue—get instruc?on from FP Op Queue If reserva?on sta?on free (no structural hazard), issues instr & sends operands (renames registers). 2. Execution—operate on operands (EX) When both operands ready then execute; if not ready, watch Common Data Bus for result 3. Write result—finish execu?on (WB) Write on Common Data Bus to all awai?ng units; mark reserva?on sta?on available Issue: build dependence for new inst Writeback: Wakeup dependent instruc?ons

ILP: Tomasulo Algorithm - An ExampleExplanationsofeachstages,AbigexampleComputerArchitecture
Computer Architecture ILP: Tomasulo Algorithm – An Example Explana?ons of each stages, A big example

ThreeStagesofTomasuloAlgorithm1.Issue-get instructionfromFPOpQueueIf reservation stationfree (no structural hazard)issues instr & sends operands (renames registers).2. Execution-operate on operands (Ex)Whenbothoperands readythenexecute;if not ready, watch Common Data Bus for result3. Write result-finish execution (WB)Write on Common Data Bus to all awaiting units;markreservationstationavailableIssue:builddependencefornewinstWriteback:WakeupdependentinstructionsComputerArchitecture
Computer Architecture Three Stages of Tomasulo Algorithm 1. Issue—get instruc?on from FP Op Queue If reserva?on sta?on free (no structural hazard), issues instr & sends operands (renames registers). 2. Execution—operate on operands (EX) When both operands ready then execute; if not ready, watch Common Data Bus for result 3. Write result—finish execu?on (WB) Write on Common Data Bus to all awai?ng units; mark reserva?on sta?on available Issue: build dependence for new inst Writeback: Wakeup dependent instruc?ons

IssueStageandRenamingTableRenames its two sourceregisters (source renaming)AssignsittoafreeRsUpdates Renaming table (dest renaming). Also decodes the inst and read register values in parallelComputerArchitecture
Computer Architecture Issue Stage and Renaming Table • Renames its two source registers (source renaming) • Assigns it to a free RS • Updates Renaming table (dest renaming) • Also decodes the inst and read register values in parallel

ExecuteStage· Only"ready"instructions can join the competition: There is a select logic to select instructions for FUexecution- Some policy may be used, e.g. age basedNon-ready instructions can be“waken up" duringwriteback of its parent instComputerArchitecture
Computer Architecture Execute Stage • Only “ready” instruc?ons can join the compe??on • There is a select logic to select instruc?ons for FU execu?on – Some policy may be used, e.g. age based • Non-ready instruc?ons can be “waken up” during writeback of its parent inst