out-of-order execution: tomasulo's algorithm - cs5513 computer …€¦ ·...
Post on 18-Aug-2020
4 Views
Preview:
TRANSCRIPT
Out-of-Order Execution: Tomasulo’sAlgorithm
CS5513 Computer Architecture
Wei Wang
Computer ScienceUniversity of Texas at San Antonio
Text Book ChaptersI “Computer Architecture” Hennessy and Patterson,
Chapter 3.4 and 3.5 about “Dynamic Scheduling”.
Fall 2019 CS5513 Computer Architecture 2
Road Map
Overview of Tomasulo’s Algorithm
An Example of Tomasulo’s Algorithm
Another Example of Tomasulo’s Algorithm
Tomasulo Wrapup
Acknowledgment
Fall 2019 CS5513 Computer Architecture 3
Scoreboarding is Limited ByI Number and type of functional units.I Number of instruction buffer entries (scoreboard size)I Amount of application ILP (RAW hazards)I Presence of anti-dependencies (WAR) and output
dependencies (WAW)– In-order issue for WAW/Structural Hazards limits
scheduler.– WAR stalls are critical for loops (WAR prevents hardware
unrolling in Scoreboarding).
Fall 2019 CS5513 Computer Architecture 4
The WAR Limitation of ScoreboardingI WAR stalls are critical for loops (WAR prevents
hardware unrolling in Scoreboarding).– For example, for the following two iterations of a loop,
instruction 5 cannot be finished due to WAR hazardbetween instruction 3 and instruction 5, preventing theinstructions following it from execution,
1 mov R3, [R12+R14]2 mul R3, R3, 113 add R5, R3, 104 add R14, R14, 45 mov R3, [R12+R14]6 mul R3, R3, 117 add R5, R3, 108 add R14, R14, 4
I One way to address this WAR limitation is to renameregisters.
Fall 2019 CS5513 Computer Architecture 5
Register RenamingI Dynamically change register names to eliminate
“false dependencies” (WAR/WAW hazards)I Architectural registers are Names not Locations
– Many more locations (e.g., “reservation stations” or“physical registers”) than names (“logical or architecturalregisters”)
– Dynamically map names to locations (e.g., mapping EAXto a inter-stage buffer).
Fall 2019 CS5513 Computer Architecture 6
Register Renaming ExampleI For example, the previous code can be changed by
renaming R3 to R6 to allow two iterations of the loopto executed in parallel (assuming there are enoughfunctional units)
1 mov R3, [ R12+R14 ]2 mul R3, R3, 113 add R5, R3, 104 add R14 , R14 , 45 mov R3, [ R12+R14 ]6 mul R3, R3, 117 add R5, R3, 108 add R14 , R14 , 4
1 mov R3, [ R12+R14 ]2 mul R3, R3, 113 add R5, R3, 104 add R14 , R14 , 45 mov R6, [ R12+R14 ]6 mul R6, R6, 117 add R6, R6, 108 add R14 , R14 , 4
Fall 2019 CS5513 Computer Architecture 7
Tomasulo’s ApproachI Used in IBM 360/91 Machines (Late 60s)I Similar to scoreboarding, but added register renamingI Key concept: Reservation StationsI Very important in Architecture, since most processors
use this algorithm.
Fall 2019 CS5513 Computer Architecture 8
Reservation Stations (RS)I Distributed (rather than centralized) control scheme.
– Bypassing is allowed via Common Data Bus (CDB) to RS– Register Renaming eliminates WAR/WAW hazards
I Scoreboard/Instruction Buffer => ReservationStations
– Fetch and buffer operands as soon as availableI Eliminates the need to always get values from
registers– Pending instructions designate reservation stations that
will provide their inputs– Successive writes to a register cause only the last one to
update the register
Fall 2019 CS5513 Computer Architecture 9
Register Renaming with TomasuloI At instruction issue:
– Registers for source operands are renamed to the namesof the registers in reservation stations
– Values can exist in the registers of the reservation stationsor the register file
I To eliminate WARs, register values are copied toreservation stations at issue
I Other methods example use pointer-based renaming(map-table)
Fall 2019 CS5513 Computer Architecture 10
Reservation Station ComponentsI Op: Operation to perform in the unitI Qj, Qk: Ids of the reservation station entries that
produce source values– No ready flags needed as in Scoreboard;– If index is 0, then the values (Vj or Vk) are ready– Store buffers only have Qi for RS producing result, since
stores only need one register.I Vj, Vk: Values of Source operandsI Busy: Indicates reservation station or FU are
occupiedI RegisterResultStatus: Indicates which functional unit
will write each register, if one exists. Blank when nopending instructions that will write that register.
Fall 2019 CS5513 Computer Architecture 11
Three Stages of Tomasulo Algorithm1. Issue (IS): get instruction from instruction queue
– Issue if there are slots in the reservation station (nostructural hazard)
– Control Unit issues instruction and send operands (renameregisters) to reservation stations.
2. Execution (EX): operate on operands3. Write back (WB): finish execution and write back to
registers– Send result value to Common Data Bus (CDB)– Registers in register file and reservation stations pick up
values from CDB based on needs.
Fall 2019 CS5513 Computer Architecture 12
Common Data BusI Normal data bus: data + destination (“go to” bus)I Common data bus: data + source (“come from” bus)
– 64 bits of data + 4 bits id of the source reservation stationentry that generates the result/data
– Registers in RS and register file monitors the bus and pickup the value based on the source id.
– A broadcasting bus
Fall 2019 CS5513 Computer Architecture 13
Tomasulo Organization
Load Buffers act as a RS forload instructions.
Fall 2019 CS5513 Computer Architecture 14
Tomasulo Organization
RS for the adder.
Fall 2019 CS5513 Computer Architecture 14
Tomasulo Organization
RS for the multiplier.
Fall 2019 CS5513 Computer Architecture 14
Tomasulo Organization
Store Buffers act as a RS forstore instructions.
Fall 2019 CS5513 Computer Architecture 14
Tomasulo Organization
Register file.
Fall 2019 CS5513 Computer Architecture 14
Tomasulo Organization
Load buffers, adder andmultiplier send valuesand sources to CDB. CDBbroadcasts the values andsources to RS, register fileand store buffers.
Fall 2019 CS5513 Computer Architecture 14
Road Map
Overview of Tomasulo’s Algorithm
An Example of Tomasulo’s Algorithm
Another Example of Tomasulo’s Algorithm
Tomasulo Wrapup
Acknowledgment
Fall 2019 CS5513 Computer Architecture 15
Tomasulo Example
Clock: cycle 0
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 NoAdd2 NoAdd3 NoMul1 NoMul2 No
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11LD R2 45+ R13MUL R0 R2 R4SUB R8 R6 R2DIV R10 R0 R6ADD R6 R8 R2
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
Fall 2019 CS5513 Computer Architecture 16
Tomasulo Example: Cycle 1
Clock: cycle 1
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 NoAdd2 NoAdd3 NoMul1 NoMul2 No
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1LD R2 45+ R13MUL R0 R2 R4SUB R8 R6 R2DIV R10 R0 R6ADD R6 R8 R2
Load Buffers:
Busy AddressLoad1 Yes 34+R11Load2 NoLoad3 No
Load1
Fall 2019 CS5513 Computer Architecture 17
Tomasulo Example: Cycle 1
Clock: cycle 1
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 NoAdd2 NoAdd3 NoMul1 NoMul2 No
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1LD R2 45+ R13MUL R0 R2 R4SUB R8 R6 R2DIV R10 R0 R6ADD R6 R8 R2
Load Buffers:
Busy AddressLoad1 Yes 34+R11Load2 NoLoad3 No
Issue the firstinstruction atcycle 1.
Occupiedthe first loadbuffer entrywith instruc-tion 1.
Register R6 will be filled with theresult from the Load1.
Load1
Fall 2019 CS5513 Computer Architecture 17
Tomasulo Example: Cycle 2
Clock: cycle 2
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 NoAdd2 NoAdd3 NoMul1 NoMul2 No
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2LD R2 45+ R13 2MUL R0 R2 R4SUB R8 R6 R2DIV R10 R0 R6ADD R6 R8 R2
Load Buffers:
Busy AddressLoad1 Yes 34+R11Load2 Yes 45+R13Load3 No
Load1Load2
Fall 2019 CS5513 Computer Architecture 18
Tomasulo Example: Cycle 2
Clock: cycle 2
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 NoAdd2 NoAdd3 NoMul1 NoMul2 No
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2LD R2 45+ R13 2MUL R0 R2 R4SUB R8 R6 R2DIV R10 R0 R6ADD R6 R8 R2
Load Buffers:
Busy AddressLoad1 Yes 34+R11Load2 Yes 45+R13Load3 No
Issue the 2nd instruction at cy-cle 2, since load buffers haveempty entries. Unlike Score-boarding, as long as there areentries in the load buffers, loadinstructions can be issued.
Occupiedthe 2nd loadbuffer entrywith instruc-tion 2.
Register R2 will be filled with theresult from Load2.
Load1Load2
Fall 2019 CS5513 Computer Architecture 18
Tomasulo Example: Cycle 2
Clock: cycle 2
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 NoAdd2 NoAdd3 NoMul1 NoMul2 No
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2LD R2 45+ R13 2MUL R0 R2 R4SUB R8 R6 R2DIV R10 R0 R6ADD R6 R8 R2
Load Buffers:
Busy AddressLoad1 Yes 34+R11Load2 Yes 45+R13Load3 No
Instruction 1starts execu-tion. Takes2 cycles tocomplete.
Load1Load2
Fall 2019 CS5513 Computer Architecture 18
Tomasulo Example: Cycle 3
Clock: cycle 3
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 NoAdd2 NoAdd3 NoMul1 Yes MUL Value(R4) Load2Mul2 No
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3LD R2 45+ R13 2 3MUL R0 R2 R4 3SUB R8 R6 R2DIV R10 R0 R6ADD R6 R8 R2
Load Buffers:
Busy AddressLoad1 Yes 34+R11Load2 Yes 45+R13Load3 No
Load1Load2Mul1
Fall 2019 CS5513 Computer Architecture 19
Tomasulo Example: Cycle 3
Clock: cycle 3
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 NoAdd2 NoAdd3 NoMul1 Yes MUL Value(R4) Load2Mul2 No
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3LD R2 45+ R13 2 3MUL R0 R2 R4 3SUB R8 R6 R2DIV R10 R0 R6ADD R6 R8 R2
Load Buffers:
Busy AddressLoad1 Yes 34+R11Load2 Yes 45+R13Load3 No
Issue the 3rdinstruction atcycle 3.
Register R0 will be filled with theresult from Mul1.
Occupy Mul1 with the 3rd instruc-tion. Copy the value of register R4in to Vk (src2 value). Mark Qj to“Load2”, indicating src1 comes from“Load2”.Load1Load2Mul1
Fall 2019 CS5513 Computer Architecture 19
Tomasulo Example: Cycle 3
Clock: cycle 3
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 NoAdd2 NoAdd3 NoMul1 Yes MUL Value(R4) Load2Mul2 No
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3LD R2 45+ R13 2 3MUL R0 R2 R4 3SUB R8 R6 R2DIV R10 R0 R6ADD R6 R8 R2
Load Buffers:
Busy AddressLoad1 Yes 34+R11Load2 Yes 45+R13Load3 No
Instruction1 finishesexecution atthe end ofcycle 3.
Load1Load2Mul1
Fall 2019 CS5513 Computer Architecture 19
Tomasulo Example: Cycle 3
Clock: cycle 3
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 NoAdd2 NoAdd3 NoMul1 Yes MUL Value(R4) Load2Mul2 No
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3LD R2 45+ R13 2 3MUL R0 R2 R4 3SUB R8 R6 R2DIV R10 R0 R6ADD R6 R8 R2
Load Buffers:
Busy AddressLoad1 Yes 34+R11Load2 Yes 45+R13Load3 No
Instruction 2 startsat cycle 3, assumingmultiple memory loadscan happened at thesame time. Takes 2cycles to execute.
Load1Load2Mul1
Fall 2019 CS5513 Computer Architecture 19
Tomasulo Example: Cycle 4
Clock: cycle 4
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 Yes SUB M1 Load2Add2 NoAdd3 NoMul1 Yes MUL Value(R4) Load2Mul2 No
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4MUL R0 R2 R4 3SUB R8 R6 R2 4DIV R10 R0 R6ADD R6 R8 R2
Load Buffers:
Busy AddressLoad1 NoLoad2 Yes 45+R13Load3 No
M1Load2Mul1 Add1
Fall 2019 CS5513 Computer Architecture 20
Tomasulo Example: Cycle 4
Clock: cycle 4
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 Yes SUB M1 Load2Add2 NoAdd3 NoMul1 Yes MUL Value(R4) Load2Mul2 No
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4MUL R0 R2 R4 3SUB R8 R6 R2 4DIV R10 R0 R6ADD R6 R8 R2
Load Buffers:
Busy AddressLoad1 NoLoad2 Yes 45+R13Load3 No
Issue the 4thinstruction atcycle 4.
Register R8 will be filled with theresult from Add1.
Occupy Add1 with the 4th instruc-tion. Copy the value (M1) of from“Load1” from CDB to Vj (src1value). Mark Qk to “Load2”, indi-cating src1 comes from “Load2”.
M1Load2Mul1 Add1
Fall 2019 CS5513 Computer Architecture 20
Tomasulo Example: Cycle 4
Clock: cycle 4
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 Yes SUB M1 Load2Add2 NoAdd3 NoMul1 Yes MUL Value(R4) Load2Mul2 No
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4MUL R0 R2 R4 3SUB R8 R6 R2 4DIV R10 R0 R6ADD R6 R8 R2
Load Buffers:
Busy AddressLoad1 NoLoad2 Yes 45+R13Load3 No
Instruction 1writes back.Load bufferssends resultvalue M1 andid “Load1” onto CDB.
Load1 is re-leased.
Register R6 sees id “Load1” onCDB matches its source, picks upthe value M1.
M1Load2Mul1 Add1
Fall 2019 CS5513 Computer Architecture 20
Tomasulo Example: Cycle 4
Clock: cycle 4
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 Yes SUB M1 Load2Add2 NoAdd3 NoMul1 Yes MUL Value(R4) Load2Mul2 No
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4MUL R0 R2 R4 3SUB R8 R6 R2 4DIV R10 R0 R6ADD R6 R8 R2
Load Buffers:
Busy AddressLoad1 NoLoad2 Yes 45+R13Load3 No
Instruction2 finishes atcycle 4.
M1Load2Mul1 Add1
Fall 2019 CS5513 Computer Architecture 20
Tomasulo Example: Cycle 4
Clock: cycle 4
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 Yes SUB M1 Load2Add2 NoAdd3 NoMul1 Yes MUL Value(R4) Load2Mul2 No
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4MUL R0 R2 R4 3SUB R8 R6 R2 4DIV R10 R0 R6ADD R6 R8 R2
Load Buffers:
Busy AddressLoad1 NoLoad2 Yes 45+R13Load3 No
Instruction 3stalled due tomissing src1.
M1Load2Mul1 Add1
Fall 2019 CS5513 Computer Architecture 20
Tomasulo Example: Cycle 5
Clock: cycle 5
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 Yes SUB M1 M2
Add2 NoAdd3 NoMul1 Yes MUL M2 Value(R4)Mul2 Yes DIV M1 Mul1
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3SUB R8 R6 R2 4DIV R10 R0 R6 5ADD R6 R8 R2
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
M1M2Mul1 Add1 Mul2
Fall 2019 CS5513 Computer Architecture 21
Tomasulo Example: Cycle 5
Clock: cycle 5
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 Yes SUB M1 M2
Add2 NoAdd3 NoMul1 Yes MUL M2 Value(R4)Mul2 Yes DIV M1 Mul1
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3SUB R8 R6 R2 4DIV R10 R0 R6 5ADD R6 R8 R2
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
M1M2Mul1 Add1 Mul2
Instruction 2writes back.Load bufferssends resultvalue M2 andid “Load2” onto CDB.
Load2 is re-leased.
Register R2 sees id “Load2” onCDB matches its source, picks upthe value M2.
Fall 2019 CS5513 Computer Architecture 21
Tomasulo Example: Cycle 5
Clock: cycle 5
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 Yes SUB M1 M2
Add2 NoAdd3 NoMul1 Yes MUL M2 Value(R4)Mul2 Yes DIV M1 Mul1
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3SUB R8 R6 R2 4DIV R10 R0 R6 5ADD R6 R8 R2
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
M1M2Mul1 Add1 Mul2
Issue the 5thinstruction atcycle 5.
Register R10 will be filled with theresult from Mul2.
Occupy Mul2 with the 5th instruc-tion. Copy the value (M1) of fromR0 to Vk (src2 value). Mark Qj to“Mul1”, indicating src1 comes from“Mul1”.
Fall 2019 CS5513 Computer Architecture 21
Tomasulo Example: Cycle 5
Clock: cycle 5
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 Yes SUB M1 M2
Add2 NoAdd3 NoMul1 Yes MUL M2 Value(R4)Mul2 Yes DIV M1 Mul1
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3SUB R8 R6 R2 4DIV R10 R0 R6 5ADD R6 R8 R2
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
M1M2Mul1 Add1 Mul2
Mul1 sees id “Load2” on CDBmatches its source, picks up thevalue M2 and set it to Vj .
Fall 2019 CS5513 Computer Architecture 21
Tomasulo Example: Cycle 5
Clock: cycle 5
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 Yes SUB M1 M2
Add2 NoAdd3 NoMul1 Yes MUL M2 Value(R4)Mul2 Yes DIV M1 Mul1
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3SUB R8 R6 R2 4DIV R10 R0 R6 5ADD R6 R8 R2
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
M1M2Mul1 Add1 Mul2
Add1 sees id “Load2” on CDBmatches its source, picks up thevalue M2 and set it to Vk .
Fall 2019 CS5513 Computer Architecture 21
Tomasulo Example: Cycle 6
Clock: cycle 6
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
1 Add1 Yes SUB M1 M2
Add2 Yes ADD M2 Add1Add3 No
9 Mul1 Yes MUL M2 Value(R4)Mul2 Yes DIV M1 Mul1
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6SUB R8 R6 R2 4 6DIV R10 R0 R6 5ADD R6 R8 R2 6
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
M2Mul1 Add1 Mul2Add2
Fall 2019 CS5513 Computer Architecture 22
Tomasulo Example: Cycle 6
Clock: cycle 6
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
1 Add1 Yes SUB M1 M2
Add2 Yes ADD M2 Add1Add3 No
9 Mul1 Yes MUL M2 Value(R4)Mul2 Yes DIV M1 Mul1
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6SUB R8 R6 R2 4 6DIV R10 R0 R6 5ADD R6 R8 R2 6
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
M2Mul1 Add1 Mul2Add2
Issue the 6th instruction at cy-cle 6, since there are emptyslots in the RS for adders.
Register R6 will be filled with theresult from Add2.
Occupy Add2 with the 6th instruc-tion. Copy the value (M2) of fromR2 to Vk (src2 value). Mark Qj to“Add1”, indicating src1 comes from“Add1”.
Fall 2019 CS5513 Computer Architecture 22
Tomasulo Example: Cycle 6
Clock: cycle 6
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
1 Add1 Yes SUB M1 M2
Add2 Yes ADD M2 Add1Add3 No
9 Mul1 Yes MUL M2 Value(R4)Mul2 Yes DIV M1 Mul1
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6SUB R8 R6 R2 4 6DIV R10 R0 R6 5ADD R6 R8 R2 6
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
M2Mul1 Add1 Mul2Add2
All sourceoperandsare ready,instruction 3starts execu-tion.
Fall 2019 CS5513 Computer Architecture 22
Tomasulo Example: Cycle 6
Clock: cycle 6
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
1 Add1 Yes SUB M1 M2
Add2 Yes ADD M2 Add1Add3 No
9 Mul1 Yes MUL M2 Value(R4)Mul2 Yes DIV M1 Mul1
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6SUB R8 R6 R2 4 6DIV R10 R0 R6 5ADD R6 R8 R2 6
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
M2Mul1 Add1 Mul2Add2
All sourceoperandsare ready,instruction 4starts execu-tion.
Instruction 3 and 4 still needs 1 and9 cycles to finish.
Fall 2019 CS5513 Computer Architecture 22
Tomasulo Example: Cycle 7
Clock: cycle 7
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
0 Add1 Yes SUB M1 M2
Add2 Yes ADD M2 Add1Add3 No
8 Mul1 Yes MUL M2 Value(R4)Mul2 Yes DIV M1 Mul1
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6SUB R8 R6 R2 4 6-7DIV R10 R0 R6 5ADD R6 R8 R2 6
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
M2Mul1 Add1 Mul2Add2
Fall 2019 CS5513 Computer Architecture 23
Tomasulo Example: Cycle 7
Clock: cycle 7
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
0 Add1 Yes SUB M1 M2
Add2 Yes ADD M2 Add1Add3 No
8 Mul1 Yes MUL M2 Value(R4)Mul2 Yes DIV M1 Mul1
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6SUB R8 R6 R2 4 6-7DIV R10 R0 R6 5ADD R6 R8 R2 6
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
M2Mul1 Add1 Mul2Add2
Instruction 4finishes.
Fall 2019 CS5513 Computer Architecture 23
Tomasulo Example: Cycle 8
Clock: cycle 8
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 NoAdd2 Yes ADD Val1 M2
Add3 No7 Mul1 Yes MUL M2 Value(R4)
Mul2 Yes DIV M1 Mul1
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6SUB R8 R6 R2 4 6-7 8DIV R10 R0 R6 5ADD R6 R8 R2 6
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
M2Mul1 Val1 Mul2Add2
Fall 2019 CS5513 Computer Architecture 24
Tomasulo Example: Cycle 8
Clock: cycle 8
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 NoAdd2 Yes ADD Val1 M2
Add3 No7 Mul1 Yes MUL M2 Value(R4)
Mul2 Yes DIV M1 Mul1
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6SUB R8 R6 R2 4 6-7 8DIV R10 R0 R6 5ADD R6 R8 R2 6
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
M2Mul1 Val1 Mul2Add2
Instruction 4 writesback.
Add1 sends result value Val1 andid “Add1” on to CDB. Instruction 4releases Add1.
Fall 2019 CS5513 Computer Architecture 24
Tomasulo Example: Cycle 8
Clock: cycle 8
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 NoAdd2 Yes ADD Val1 M2
Add3 No7 Mul1 Yes MUL M2 Value(R4)
Mul2 Yes DIV M1 Mul1
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6SUB R8 R6 R2 4 6-7 8DIV R10 R0 R6 5ADD R6 R8 R2 6
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
M2Mul1 Val1 Mul2Add2
Add2 reads id “Add1” from CDB,which matches its Vj . THerefore,Add2 reads value Val1 from CDB toValj
Register R8 sees id “Add1” onCDB, which matches its source,so it picks up the value Val1.
Fall 2019 CS5513 Computer Architecture 24
Tomasulo Example: Cycle 9
Clock: cycle 9
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 No1 Add2 Yes ADD Val1 M2
Add3 No6 Mul1 Yes MUL M2 Value(R4)
Mul2 Yes DIV M1 Mul1
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6SUB R8 R6 R2 4 6-7 8DIV R10 R0 R6 5ADD R6 R8 R2 6 9
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
M2Mul1 Val1 Mul2Add2
Fall 2019 CS5513 Computer Architecture 25
Tomasulo Example: Cycle 9
Clock: cycle 9
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 No1 Add2 Yes ADD Val1 M2
Add3 No6 Mul1 Yes MUL M2 Value(R4)
Mul2 Yes DIV M1 Mul1
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6SUB R8 R6 R2 4 6-7 8DIV R10 R0 R6 5ADD R6 R8 R2 6 9
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
M2Mul1 Val1 Mul2Add2
Instruction 6starts execu-tion.
Add takes 2 cycles. 1 cycle re-mains.
Fall 2019 CS5513 Computer Architecture 25
Tomasulo Example: Cycle 10
Clock: cycle 10
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 No0 Add2 Yes ADD Val1 M2
Add3 No5 Mul1 Yes MUL M2 Value(R4)
Mul2 Yes DIV M1 Mul1
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6SUB R8 R6 R2 4 6-7 8DIV R10 R0 R6 5ADD R6 R8 R2 6 9-10
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
M2Mul1 Val1 Mul2Add2
Fall 2019 CS5513 Computer Architecture 26
Tomasulo Example: Cycle 10
Clock: cycle 10
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 No0 Add2 Yes ADD Val1 M2
Add3 No5 Mul1 Yes MUL M2 Value(R4)
Mul2 Yes DIV M1 Mul1
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6SUB R8 R6 R2 4 6-7 8DIV R10 R0 R6 5ADD R6 R8 R2 6 9-10
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
M2Mul1 Val1 Mul2Add2
Instruction6 finishesexecution.
Fall 2019 CS5513 Computer Architecture 26
Tomasulo Example: Cycle 11
Clock: cycle 11
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 NoAdd2 NoAdd3 No
4 Mul1 Yes MUL M2 Value(R4)Mul2 Yes DIV M1 Mul1
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6SUB R8 R6 R2 4 6-7 8DIV R10 R0 R6 5ADD R6 R8 R2 6 9-10 11
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
M2Mul1 Val1 Mul2Val2
Fall 2019 CS5513 Computer Architecture 27
Tomasulo Example: Cycle 11
Clock: cycle 11
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 NoAdd2 NoAdd3 No
4 Mul1 Yes MUL M2 Value(R4)Mul2 Yes DIV M1 Mul1
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6SUB R8 R6 R2 4 6-7 8DIV R10 R0 R6 5ADD R6 R8 R2 6 9-10 11
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
M2Mul1 Val1 Mul2Val2
Instruction 6writes back.
Add2 sends result value Val2 andid “Add2” on to CDB. Instruction 6releases Add2.
Fall 2019 CS5513 Computer Architecture 27
Tomasulo Example: Cycle 11
Clock: cycle 11
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 NoAdd2 NoAdd3 No
4 Mul1 Yes MUL M2 Value(R4)Mul2 Yes DIV M1 Mul1
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6SUB R8 R6 R2 4 6-7 8DIV R10 R0 R6 5ADD R6 R8 R2 6 9-10 11
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
M2Mul1 Val1 Mul2Val2
Register R6 sees id “Add2” on CDB, which matches its source, so it picks up thevalue Val2. Unlike Scoreboarding, WAW does not prevent write back as registersare already renamed: instruction 5 (DIV) waits on “Mul1” instead of R6 in the RS.
Fall 2019 CS5513 Computer Architecture 27
Tomasulo Example: Cycle 15
Clock: cycle 15
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 NoAdd2 NoAdd3 No
0 Mul1 Yes MUL M2 Value(R4)Mul2 Yes DIV M1 Mul1
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6-15SUB R8 R6 R2 4 6-7 8DIV R10 R0 R6 5ADD R6 R8 R2 6 9-10 11
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
M2Mul1 Val1 Mul2Val2
Fall 2019 CS5513 Computer Architecture 28
Tomasulo Example: Cycle 15
Clock: cycle 15
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 NoAdd2 NoAdd3 No
0 Mul1 Yes MUL M2 Value(R4)Mul2 Yes DIV M1 Mul1
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6-15SUB R8 R6 R2 4 6-7 8DIV R10 R0 R6 5ADD R6 R8 R2 6 9-10 11
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
M2Mul1 Val1 Mul2Val2
Instruction 3finishes.
Fall 2019 CS5513 Computer Architecture 28
Tomasulo Example: Cycle 16
Clock: cycle 16
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 NoAdd2 NoAdd3 NoMul1 NoMul2 Yes DIV Val3 M1
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2/3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6-15 16SUB R8 R6 R2 4 6-7 8DIV R10 R0 R6 5ADD R6 R8 R2 6 9-10 11
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
M2Val3 Val1 Mul2Val2
Fall 2019 CS5513 Computer Architecture 29
Tomasulo Example: Cycle 16
Clock: cycle 16
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 NoAdd2 NoAdd3 NoMul1 NoMul2 Yes DIV Val3 M1
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2/3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6-15 16SUB R8 R6 R2 4 6-7 8DIV R10 R0 R6 5ADD R6 R8 R2 6 9-10 11
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
M2Val3 Val1 Mul2Val2
Instruction 3writes back.
Mul1 sends result value Val3 andid “Mul1” on to CDB. Instruction 3releases Mul1.
Fall 2019 CS5513 Computer Architecture 29
Tomasulo Example: Cycle 16
Clock: cycle 16
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 NoAdd2 NoAdd3 NoMul1 NoMul2 Yes DIV Val3 M1
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2/3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6-15 16SUB R8 R6 R2 4 6-7 8DIV R10 R0 R6 5ADD R6 R8 R2 6 9-10 11
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
M2Val3 Val1 Mul2Val2
Mul2 reads id “Mul1” from CDB,which matches its Vj . THerefore,Mul2 reads value Val2 from CDB toValj
Register R0 sees id “Mul1” on CDB,which matches its source, so itpicks up the value Val3.
Fall 2019 CS5513 Computer Architecture 29
Tomasulo Example: Cycle 17
Clock: cycle 17
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 NoAdd2 NoAdd3 NoMul1 No
39 Mul2 Yes DIV Val3 M1
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6-15 16SUB R8 R6 R2 4 6-7 8DIV R10 R0 R6 5 17ADD R6 R8 R2 6 9-10 11
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
M2Val3 Val1 Mul2Val2
Fall 2019 CS5513 Computer Architecture 30
Tomasulo Example: Cycle 17
Clock: cycle 17
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 NoAdd2 NoAdd3 NoMul1 No
39 Mul2 Yes DIV Val3 M1
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6-15 16SUB R8 R6 R2 4 6-7 8DIV R10 R0 R6 5 17ADD R6 R8 R2 6 9-10 11
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
M2Val3 Val1 Mul2Val2
Instruction 5starts execu-tion.
Fall 2019 CS5513 Computer Architecture 30
Tomasulo Example: Cycle 56
Clock: cycle 56
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 NoAdd2 NoAdd3 NoMul1 No
0 Mul2 Yes DIV Val3 M1
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6-15 16SUB R8 R6 R2 4 6-7 8DIV R10 R0 R6 5 17-56ADD R6 R8 R2 6 9-10 11
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
M2Val3 Val1 Mul2Val2
Fall 2019 CS5513 Computer Architecture 31
Tomasulo Example: Cycle 56
Clock: cycle 56
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 NoAdd2 NoAdd3 NoMul1 No
0 Mul2 Yes DIV Val3 M1
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6-15 16SUB R8 R6 R2 4 6-7 8DIV R10 R0 R6 5 17-56ADD R6 R8 R2 6 9-10 11
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
M2Val3 Val1 Mul2Val2
Instruction5 finishesexecution.
Fall 2019 CS5513 Computer Architecture 31
Tomasulo Example: Cycle 57
Clock: cycle 57
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 NoAdd2 NoAdd3 NoMul1 No
0 Mul2 No
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6-15 16SUB R8 R6 R2 4 6-7 8DIV R10 R0 R6 5 17-56 57ADD R6 R8 R2 6 9-10 11
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
M2Val3 Val1 Val4Val2
Fall 2019 CS5513 Computer Architecture 32
Tomasulo Example: Cycle 57
Clock: cycle 57
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 NoAdd2 NoAdd3 NoMul1 No
0 Mul2 No
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6-15 16SUB R8 R6 R2 4 6-7 8DIV R10 R0 R6 5 17-56 57ADD R6 R8 R2 6 9-10 11
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
M2Val3 Val1 Val4Val2
Instruction 5writes back.
Fall 2019 CS5513 Computer Architecture 32
Tomasulo Example: Cycle 57
Clock: cycle 57
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 NoAdd2 NoAdd3 NoMul1 No
0 Mul2 No
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R6 34+ R11 1 2-3 4LD R2 45+ R13 2 3-4 5MUL R0 R2 R4 3 6-15 16SUB R8 R6 R2 4 6-7 8DIV R10 R0 R6 5 17-56 57ADD R6 R8 R2 6 9-10 11
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
M2Val3 Val1 Val4Val2
Out-of-order com-pletion.
Out-of-order exe-cution.
In-order issue.
Fall 2019 CS5513 Computer Architecture 32
Comparing to ScoreboardI Comparing to Scoreboard’s 62 cycles, Tomasulo’s
Algorithm takes only 57 cycles to execute the samecode.
I Why Tomasulo’s Algorithm is faster?– Fewer structure hazards (Scoreboarding cannot issue if
functional units are busy).– CDB acts as a data forwarding mechanism: write back
also forwards the value to RS, eliminating a separateregister read.
Fall 2019 CS5513 Computer Architecture 33
Road Map
Overview of Tomasulo’s Algorithm
An Example of Tomasulo’s Algorithm
Another Example of Tomasulo’s Algorithm
Tomasulo Wrapup
Acknowledgment
Fall 2019 CS5513 Computer Architecture 34
Tomasulo Example 2
Clock: cycle 0
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 NoAdd2 NoAdd3 NoMul1 NoMul2 No
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11MUL R1 16 R1ADD R5 R5 R1LD R1 4 R1MUL R1 16 R1ADD R5 R5 R1
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
Fall 2019 CS5513 Computer Architecture 35
Tomasulo Example 2: Cycle 1
Clock: cycle 1
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 NoAdd2 NoAdd3 NoMul1 NoMul2 No
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1MUL R1 16 R1ADD R5 R5 R1LD R1 4 R11MUL R1 16 R1ADD R5 R5 R1
Load Buffers:
Busy AddressLoad1 Yes 0 + R11Load2 NoLoad3 No
Load1
Fall 2019 CS5513 Computer Architecture 36
Tomasulo Example 2: Cycle 1
Clock: cycle 1
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 NoAdd2 NoAdd3 NoMul1 NoMul2 No
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1MUL R1 16 R1ADD R5 R5 R1LD R1 4 R11MUL R1 16 R1ADD R5 R5 R1
Load Buffers:
Busy AddressLoad1 Yes 0 + R11Load2 NoLoad3 No
Load1
Issue the firstinstruction atcycle 1.
Occupiedthe first loadbuffer entrywith instruc-tion 1.
Register R1 will be filled with theresult from the Load1.
Fall 2019 CS5513 Computer Architecture 36
Tomasulo Example 2: Cycle 2
Clock: cycle 2
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 NoAdd2 NoAdd3 NoMul1 Yes MUL 16 Load1Mul2 No
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2MUL R1 16 R1 2ADD R5 R5 R1LD R1 4 R11MUL R1 16 R1ADD R5 R5 R1
Load Buffers:
Busy AddressLoad1 Yes 0 + R11Load2 NoLoad3 No
Mul1
Fall 2019 CS5513 Computer Architecture 37
Tomasulo Example 2: Cycle 2
Clock: cycle 2
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 NoAdd2 NoAdd3 NoMul1 Yes MUL 16 Load1Mul2 No
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2MUL R1 16 R1 2ADD R5 R5 R1LD R1 4 R11MUL R1 16 R1ADD R5 R5 R1
Load Buffers:
Busy AddressLoad1 Yes 0 + R11Load2 NoLoad3 No
Mul1
Issue the2nd instruc-tion at cycle2
Occupy Mul1 with the 3rd instruc-tion. Copy the immd value 16 into Vk (src1 value). Mark Qk to“Load1”, indicating src2 comes from“Load1”.
Register R1 will be filled with the result from Mul1. Note that,value from Load1 will not be copied into R1 anymore. Instead,Load1 will be forwarded to Mul1 directly.
Fall 2019 CS5513 Computer Architecture 37
Tomasulo Example 2: Cycle 2
Clock: cycle 2
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 NoAdd2 NoAdd3 NoMul1 Yes MUL 16 Load1Mul2 No
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2MUL R1 16 R1 2ADD R5 R5 R1LD R1 4 R11MUL R1 16 R1ADD R5 R5 R1
Load Buffers:
Busy AddressLoad1 Yes 0 + R11Load2 NoLoad3 No
Mul1
Instruction 1starts execut-ing. Takes2 cycles tocomplete.
Fall 2019 CS5513 Computer Architecture 37
Tomasulo Example 2: Cycle 3
Clock: cycle 3
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 Yes ADD Val(R5) Mul1Add2 NoAdd3 NoMul1 Yes MUL 16 Load1Mul2 No
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3MUL R1 16 R1 2ADD R5 R5 R1 3LD R1 4 R11MUL R1 16 R1ADD R5 R5 R1
Load Buffers:
Busy AddressLoad1 Yes 0 + R11Load2 NoLoad3 No
Mul1 Add1
Fall 2019 CS5513 Computer Architecture 38
Tomasulo Example 2: Cycle 3
Clock: cycle 3
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 Yes ADD Val(R5) Mul1Add2 NoAdd3 NoMul1 Yes MUL 16 Load1Mul2 No
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3MUL R1 16 R1 2ADD R5 R5 R1 3LD R1 4 R11MUL R1 16 R1ADD R5 R5 R1
Load Buffers:
Busy AddressLoad1 Yes 0 + R11Load2 NoLoad3 No
Mul1 Add1
Issue the 3rdinstruction atcycle 3.
Register R5 will be filled with theresult from Add1.
Occupy Add1 with the 3rd instruc-tion. Copy the value of register R5in to Vj (src1 value). Mark Qk to“Mul1”, indicating src2 comes from“Mul1”.
Fall 2019 CS5513 Computer Architecture 38
Tomasulo Example 2: Cycle 3
Clock: cycle 3
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 Yes ADD Val(R5) Mul1Add2 NoAdd3 NoMul1 Yes MUL 16 Load1Mul2 No
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3MUL R1 16 R1 2ADD R5 R5 R1 3LD R1 4 R11MUL R1 16 R1ADD R5 R5 R1
Load Buffers:
Busy AddressLoad1 Yes 0 + R11Load2 NoLoad3 No
Mul1 Add1
Instruction1 finishesexecution atthe end ofcycle 3.
Fall 2019 CS5513 Computer Architecture 38
Tomasulo Example 2: Cycle 4
Clock: cycle 4
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 Yes ADD Val(R5) Mul1Add2 NoAdd3 NoMul1 Yes MUL 16 M1
Mul2 No
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2ADD R5 R5 R1 3LD R1 4 R11 4MUL R1 16 R1ADD R5 R5 R1
Load Buffers:
Busy AddressLoad1 NoLoad2 Yes 4 + R11Load3 No
Load2 Add1
Fall 2019 CS5513 Computer Architecture 39
Tomasulo Example 2: Cycle 4
Clock: cycle 4
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 Yes ADD Val(R5) Mul1Add2 NoAdd3 NoMul1 Yes MUL 16 M1
Mul2 No
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2ADD R5 R5 R1 3LD R1 4 R11 4MUL R1 16 R1ADD R5 R5 R1
Load Buffers:
Busy AddressLoad1 NoLoad2 Yes 4 + R11Load3 No
Load2 Add1
Issue the 4thinstruction atcycle 4.
Occupiedthe 2nd loadbuffer entrywith instruc-tion 2.
Register R1 will be filled with the result from Load2. Note that,value from Mul1 will not be copied into R1 anymore. Instead,Mul1 will be forwarded to Add1 directly.
Fall 2019 CS5513 Computer Architecture 39
Tomasulo Example 2: Cycle 4
Clock: cycle 4
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 Yes ADD Val(R5) Mul1Add2 NoAdd3 NoMul1 Yes MUL 16 M1
Mul2 No
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2ADD R5 R5 R1 3LD R1 4 R11 4MUL R1 16 R1ADD R5 R5 R1
Load Buffers:
Busy AddressLoad1 NoLoad2 Yes 4 + R11Load3 No
Load2 Add1
Instruction 1writes back.Load bufferssends resultvalue M1 andid “Load1” onto CDB.
Load1 is re-leased.
Mul1 sees id “Load1” on CDBmatches its source, picks up thevalue M1 and set it to Vk .
Fall 2019 CS5513 Computer Architecture 39
Tomasulo Example 2: Cycle 5
Clock: cycle 5
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 Yes ADD Val(R5) Mul1Add2 NoAdd3 No
9 Mul1 Yes MUL 16 M1
Mul2 Yes MUL 16 Load2
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5ADD R5 R5 R1 3LD R1 4 R11 4 5MUL R1 16 R1 5ADD R5 R5 R1
Load Buffers:
Busy AddressLoad1 NoLoad2 Yes 4 + R11Load3 No
Mul2 Add1
Fall 2019 CS5513 Computer Architecture 40
Tomasulo Example 2: Cycle 5
Clock: cycle 5
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 Yes ADD Val(R5) Mul1Add2 NoAdd3 No
9 Mul1 Yes MUL 16 M1
Mul2 Yes MUL 16 Load2
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5ADD R5 R5 R1 3LD R1 4 R11 4 5MUL R1 16 R1 5ADD R5 R5 R1
Load Buffers:
Busy AddressLoad1 NoLoad2 Yes 4 + R11Load3 No
Mul2 Add1
Issue the 5thinstruction atcycle 5.
Register R1 will be filled with the result from Mul2. Note that,value from Load2 will not be copied into R1 anymore. Instead,Mul1 will be forwarded to Mul2 directly.
Occupy Mul2 with the 5th instruction. Copy the immdvalue 16 in to Vk (src1 value). Mark Qk to “Load2”,indicating src2 comes from “Load2”.
Fall 2019 CS5513 Computer Architecture 40
Tomasulo Example 2: Cycle 5
Clock: cycle 5
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 Yes ADD Val(R5) Mul1Add2 NoAdd3 No
9 Mul1 Yes MUL 16 M1
Mul2 Yes MUL 16 Load2
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5ADD R5 R5 R1 3LD R1 4 R11 4 5MUL R1 16 R1 5ADD R5 R5 R1
Load Buffers:
Busy AddressLoad1 NoLoad2 Yes 4 + R11Load3 No
Mul2 Add1
Instruction 2starts execut-ing at Mul1
Instruction 2 still needs 9 cycles tofinish.
Fall 2019 CS5513 Computer Architecture 40
Tomasulo Example 2: Cycle 5
Clock: cycle 5
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 Yes ADD Val(R5) Mul1Add2 NoAdd3 No
9 Mul1 Yes MUL 16 M1
Mul2 Yes MUL 16 Load2
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5ADD R5 R5 R1 3LD R1 4 R11 4 5MUL R1 16 R1 5ADD R5 R5 R1
Load Buffers:
Busy AddressLoad1 NoLoad2 Yes 4 + R11Load3 No
Mul2 Add1
Instruction 4starts execut-ing at Load2
Fall 2019 CS5513 Computer Architecture 40
Tomasulo Example 2: Cycle 6
Clock: cycle 6
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 Yes ADD Val(R5) Mul1Add2 Yes ADD Add1 Mul2Add3 No
8 Mul1 Yes MUL 16 M1
Mul2 Yes MUL 16 Load2
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5ADD R5 R5 R1 3LD R1 4 R11 4 5-6MUL R1 16 R1 5ADD R5 R5 R1 6
Load Buffers:
Busy AddressLoad1 NoLoad2 Yes 4 + R11Load3 No
Mul2 Add2
Fall 2019 CS5513 Computer Architecture 41
Tomasulo Example 2: Cycle 6
Clock: cycle 6
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 Yes ADD Val(R5) Mul1Add2 Yes ADD Add1 Mul2Add3 No
8 Mul1 Yes MUL 16 M1
Mul2 Yes MUL 16 Load2
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5ADD R5 R5 R1 3LD R1 4 R11 4 5-6MUL R1 16 R1 5ADD R5 R5 R1 6
Load Buffers:
Busy AddressLoad1 NoLoad2 Yes 4 + R11Load3 No
Mul2 Add2
Issue the 6thinstruction atcycle 6.
Register R5 will be filled with the result from Add2. Note that,value from Add1 will not be copied into R5 anymore. Instead,Add1 will be forwarded to Add2 directly.
Occupy Add2 with the 6th instruc-tion. Both operands are not ready,and are from “Add1” and “Mul2”each.
Fall 2019 CS5513 Computer Architecture 41
Tomasulo Example 2: Cycle 6
Clock: cycle 6
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 Yes ADD Val(R5) Mul1Add2 Yes ADD Add1 Mul2Add3 No
8 Mul1 Yes MUL 16 M1
Mul2 Yes MUL 16 Load2
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5ADD R5 R5 R1 3LD R1 4 R11 4 5-6MUL R1 16 R1 5ADD R5 R5 R1 6
Load Buffers:
Busy AddressLoad1 NoLoad2 Yes 4 + R11Load3 No
Mul2 Add2
Instruction4 finishesexecution atthe end ofcycle 6.
Fall 2019 CS5513 Computer Architecture 41
Tomasulo Example 2: Cycle 7
Clock: cycle 7
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 Yes ADD Val(R5) Mul1Add2 Yes ADD Add1 Mul2Add3 No
7 Mul1 Yes MUL 16 M1
Mul2 Yes MUL 16 M2
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5ADD R5 R5 R1 3LD R1 4 R11 4 5-6 7MUL R1 16 R1 5ADD R5 R5 R1 6
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
Mul2 Add2
Fall 2019 CS5513 Computer Architecture 42
Tomasulo Example 2: Cycle 7
Clock: cycle 7
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 Yes ADD Val(R5) Mul1Add2 Yes ADD Add1 Mul2Add3 No
7 Mul1 Yes MUL 16 M1
Mul2 Yes MUL 16 M2
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5ADD R5 R5 R1 3LD R1 4 R11 4 5-6 7MUL R1 16 R1 5ADD R5 R5 R1 6
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
Mul2 Add2
Instruction 4writes back.Load bufferssends resultvalue M2 andid “Load2” onto CDB.
Load2 is re-leased.
Mul2 sees id “Load2” on CDBmatches its source, picks up thevalue M2 and set it to Vk .
Fall 2019 CS5513 Computer Architecture 42
Tomasulo Example 2: Cycle 8
Clock: cycle 8
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 Yes ADD Val(R5) Mul1Add2 Yes ADD Add1 Mul2Add3 No
6 Mul1 Yes MUL 16 M1
9 Mul2 Yes MUL 16 M2
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5ADD R5 R5 R1 3LD R1 4 R11 4 5-6 7MUL R1 16 R1 5 8ADD R5 R5 R1 6
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
Mul2 Add2
Fall 2019 CS5513 Computer Architecture 43
Tomasulo Example 2: Cycle 8
Clock: cycle 8
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 Yes ADD Val(R5) Mul1Add2 Yes ADD Add1 Mul2Add3 No
6 Mul1 Yes MUL 16 M1
9 Mul2 Yes MUL 16 M2
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5ADD R5 R5 R1 3LD R1 4 R11 4 5-6 7MUL R1 16 R1 5 8ADD R5 R5 R1 6
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
Mul2 Add2
Instruction 5 starts executingat Mul2, assuming we havetwo adders. We now have twoiterations of the loop running atthe same time.
Instruction 2 and 5 still need 6 and9 cycles to finish.
Fall 2019 CS5513 Computer Architecture 43
Tomasulo Example 2: Cycle 14
Clock: cycle 14
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 Yes ADD Val(R5) Mul1Add2 Yes ADD Add1 Mul2Add3 No
0 Mul1 Yes MUL 16 M1
3 Mul2 Yes MUL 16 M2
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5-14ADD R5 R5 R1 3LD R1 4 R11 4 5-6 7MUL R1 16 R1 5 8ADD R5 R5 R1 6
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
Mul2 Add2
Fall 2019 CS5513 Computer Architecture 44
Tomasulo Example 2: Cycle 14
Clock: cycle 14
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 Yes ADD Val(R5) Mul1Add2 Yes ADD Add1 Mul2Add3 No
0 Mul1 Yes MUL 16 M1
3 Mul2 Yes MUL 16 M2
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5-14ADD R5 R5 R1 3LD R1 4 R11 4 5-6 7MUL R1 16 R1 5 8ADD R5 R5 R1 6
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
Mul2 Add2
Instruction2 finishesexecuting.
Fall 2019 CS5513 Computer Architecture 44
Tomasulo Example 2: Cycle 15
Clock: cycle 15
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 Yes ADD Val(R5) Val1Add2 Yes ADD Add1 Mul2Add3 NoMul1 No
2 Mul2 Yes MUL 16 M2
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5-14 15ADD R5 R5 R1 3LD R1 4 R11 4 5-6 7MUL R1 16 R1 5 8ADD R5 R5 R1 6
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
Mul2 Add2
Fall 2019 CS5513 Computer Architecture 45
Tomasulo Example 2: Cycle 15
Clock: cycle 15
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 Yes ADD Val(R5) Val1Add2 Yes ADD Add1 Mul2Add3 NoMul1 No
2 Mul2 Yes MUL 16 M2
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5-14 15ADD R5 R5 R1 3LD R1 4 R11 4 5-6 7MUL R1 16 R1 5 8ADD R5 R5 R1 6
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
Mul2 Add2
Instruction 2writes back.
Mul1 sends result value Val1 andid “Mul1” on to CDB. Instruction 2releases Mul1.
Fall 2019 CS5513 Computer Architecture 45
Tomasulo Example 2: Cycle 15
Clock: cycle 15
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 Yes ADD Val(R5) Val1Add2 Yes ADD Add1 Mul2Add3 NoMul1 No
2 Mul2 Yes MUL 16 M2
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5-14 15ADD R5 R5 R1 3LD R1 4 R11 4 5-6 7MUL R1 16 R1 5 8ADD R5 R5 R1 6
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
Mul2 Add2
Add1 sees id “Mul1” on CDBmatches its source, picks up thevalue Val1 and set it to Vk .
Fall 2019 CS5513 Computer Architecture 45
Tomasulo Example 2: Cycle 16
Clock: cycle 16
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
1 Add1 Yes ADD Val(R5) Val1Add2 Yes ADD Add1 Mul2Add3 NoMul1 No
1 Mul2 Yes MUL 16 M2
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5-14 15ADD R5 R5 R1 3 16LD R1 4 R11 4 5-6 7MUL R1 16 R1 5 8ADD R5 R5 R1 6
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
Mul2 Add2
Fall 2019 CS5513 Computer Architecture 46
Tomasulo Example 2: Cycle 16
Clock: cycle 16
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
1 Add1 Yes ADD Val(R5) Val1Add2 Yes ADD Add1 Mul2Add3 NoMul1 No
1 Mul2 Yes MUL 16 M2
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5-14 15ADD R5 R5 R1 3 16LD R1 4 R11 4 5-6 7MUL R1 16 R1 5 8ADD R5 R5 R1 6
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
Mul2 Add2
Instruction 3starts execut-ing at Add1.
Instruction 3 and 5 still each needs1 cycle to finish.
Fall 2019 CS5513 Computer Architecture 46
Tomasulo Example 2: Cycle 17
Clock: cycle 17
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
0 Add1 Yes ADD Val(R5) Val1Add2 Yes ADD Add1 Mul2Add3 NoMul1 No
0 Mul2 Yes MUL 16 M2
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5-14 15ADD R5 R5 R1 3 16-17LD R1 4 R11 4 5-6 7MUL R1 16 R1 5 8-17ADD R5 R5 R1 6
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
Mul2 Add2
Fall 2019 CS5513 Computer Architecture 47
Tomasulo Example 2: Cycle 17
Clock: cycle 17
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
0 Add1 Yes ADD Val(R5) Val1Add2 Yes ADD Add1 Mul2Add3 NoMul1 No
0 Mul2 Yes MUL 16 M2
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5-14 15ADD R5 R5 R1 3 16-17LD R1 4 R11 4 5-6 7MUL R1 16 R1 5 8-17ADD R5 R5 R1 6
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
Mul2 Add2
Instruction3 finishesexecuting.
Fall 2019 CS5513 Computer Architecture 47
Tomasulo Example 2: Cycle 17
Clock: cycle 17
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
0 Add1 Yes ADD Val(R5) Val1Add2 Yes ADD Add1 Mul2Add3 NoMul1 No
0 Mul2 Yes MUL 16 M2
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5-14 15ADD R5 R5 R1 3 16-17LD R1 4 R11 4 5-6 7MUL R1 16 R1 5 8-17ADD R5 R5 R1 6
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
Mul2 Add2
Instruction5 finishesexecuting.
Fall 2019 CS5513 Computer Architecture 47
Tomasulo Example 2: Cycle 18
Clock: cycle 18
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 NoAdd2 Yes ADD Val2 Mul2Add3 NoMul1 No
0 Mul2 Yes MUL 16 M2
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5-14 15ADD R5 R5 R1 3 16-17 18LD R1 4 R11 4 5-6 7MUL R1 16 R1 5 8-17ADD R5 R5 R1 6
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
Val3 Add2
Fall 2019 CS5513 Computer Architecture 48
Tomasulo Example 2: Cycle 18
Clock: cycle 18
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 NoAdd2 Yes ADD Val2 Mul2Add3 NoMul1 No
0 Mul2 Yes MUL 16 M2
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5-14 15ADD R5 R5 R1 3 16-17 18LD R1 4 R11 4 5-6 7MUL R1 16 R1 5 8-17ADD R5 R5 R1 6
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
Val3 Add2
Instruction 3writes back.
Add1 sends result value Val2 andid “Add1” on to CDB. Instruction 3releases Add1.
Fall 2019 CS5513 Computer Architecture 48
Tomasulo Example 2: Cycle 18
Clock: cycle 18
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 NoAdd2 Yes ADD Val2 Mul2Add3 NoMul1 No
0 Mul2 Yes MUL 16 M2
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5-14 15ADD R5 R5 R1 3 16-17 18LD R1 4 R11 4 5-6 7MUL R1 16 R1 5 8-17ADD R5 R5 R1 6
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
Val3 Add2
Add2 sees id “Add1” on CDBmatches its source, picks up thevalue Val2 and set it to Vj .
Fall 2019 CS5513 Computer Architecture 48
Tomasulo Example 2: Cycle 18
Clock: cycle 18
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 NoAdd2 Yes ADD Val2 Mul2Add3 NoMul1 No
0 Mul2 Yes MUL 16 M2
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5-14 15ADD R5 R5 R1 3 16-17 18LD R1 4 R11 4 5-6 7MUL R1 16 R1 5 8-17ADD R5 R5 R1 6
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
Val3 Add2
Instruction 5cannot writeback due toCDB is busy(structuralhazard).
Fall 2019 CS5513 Computer Architecture 48
Tomasulo Example 2: Cycle 19
Clock: cycle 19
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 NoAdd2 Yes ADD Val2 Val3Add3 NoMul1 NoMul2 No
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5-14 15ADD R5 R5 R1 3 16-17 18LD R1 4 R11 4 5-6 7MUL R1 16 R1 5 8-17 19ADD R5 R5 R1 6
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
Val3 Add2
Fall 2019 CS5513 Computer Architecture 49
Tomasulo Example 2: Cycle 19
Clock: cycle 19
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 NoAdd2 Yes ADD Val2 Val3Add3 NoMul1 NoMul2 No
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5-14 15ADD R5 R5 R1 3 16-17 18LD R1 4 R11 4 5-6 7MUL R1 16 R1 5 8-17 19ADD R5 R5 R1 6
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
Val3 Add2
Instruction 5writes back.
Mul2 sends result value Val3 andid “Mul2” on to CDB. Instruction 5releases Mul2.
Fall 2019 CS5513 Computer Architecture 49
Tomasulo Example 2: Cycle 19
Clock: cycle 19
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 NoAdd2 Yes ADD Val2 Val3Add3 NoMul1 NoMul2 No
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5-14 15ADD R5 R5 R1 3 16-17 18LD R1 4 R11 4 5-6 7MUL R1 16 R1 5 8-17 19ADD R5 R5 R1 6
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
Val3 Add2
Add2 sees id “Mul2” on CDBmatches its source, picks up thevalue Val3 and set it to Vk .
Register R1 sees id “Mul2” on CDB,which matches its source, so itpicks up the value Val3.
Fall 2019 CS5513 Computer Architecture 49
Tomasulo Example 2: Cycle 20
Clock: cycle 20
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 No1 Add2 Yes ADD Val2 Val3
Add3 NoMul1 NoMul2 No
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5-14 15ADD R5 R5 R1 3 16-17 18LD R1 4 R11 4 5-6 7MUL R1 16 R1 5 8-17 19ADD R5 R5 R1 6 20
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
Val3 Add2
Fall 2019 CS5513 Computer Architecture 50
Tomasulo Example 2: Cycle 20
Clock: cycle 20
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 No1 Add2 Yes ADD Val2 Val3
Add3 NoMul1 NoMul2 No
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5-14 15ADD R5 R5 R1 3 16-17 18LD R1 4 R11 4 5-6 7MUL R1 16 R1 5 8-17 19ADD R5 R5 R1 6 20
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
Val3 Add2
Instruction 6starts execut-ing.
Fall 2019 CS5513 Computer Architecture 50
Tomasulo Example 2: Cycle 21
Clock: cycle 21
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 No0 Add2 Yes ADD Val2 Val3
Add3 NoMul1 NoMul2 No
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5-14 15ADD R5 R5 R1 3 16-17 18LD R1 4 R11 4 5-6 7MUL R1 16 R1 5 8-17 19ADD R5 R5 R1 6 20-21
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
Val3 Add2
Fall 2019 CS5513 Computer Architecture 51
Tomasulo Example 2: Cycle 21
Clock: cycle 21
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 No0 Add2 Yes ADD Val2 Val3
Add3 NoMul1 NoMul2 No
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5-14 15ADD R5 R5 R1 3 16-17 18LD R1 4 R11 4 5-6 7MUL R1 16 R1 5 8-17 19ADD R5 R5 R1 6 20-21
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
Val3 Add2
Instruction6 finishesexecuting.
Fall 2019 CS5513 Computer Architecture 51
Tomasulo Example 2: Cycle 22
Clock: cycle 22
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 NoAdd2 NOAdd3 NoMul1 NoMul2 No
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5-14 15ADD R5 R5 R1 3 16-17 18LD R1 4 R11 4 5-6 7MUL R1 16 R1 5 8-17 19ADD R5 R5 R1 6 20-21 22
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
Val3 Val4
Fall 2019 CS5513 Computer Architecture 52
Tomasulo Example 2: Cycle 22
Clock: cycle 22
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 NoAdd2 NOAdd3 NoMul1 NoMul2 No
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5-14 15ADD R5 R5 R1 3 16-17 18LD R1 4 R11 4 5-6 7MUL R1 16 R1 5 8-17 19ADD R5 R5 R1 6 20-21 22
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
Val3 Val4
Instruction 6writes back.
Add2 sends result value Val4 andid “Add2” on to CDB. Instruction 6releases Add2.
Fall 2019 CS5513 Computer Architecture 52
Tomasulo Example 2: Cycle 22
Clock: cycle 22
Register Result Status:
FUR0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 ... ... ...
Reservation Stations for the Adder and Multiplier
Time Name Busy Op Src1 ValueVj
Src2 ValueVk
Src1 RS IdQj
Src2 RS IdQk
Add1 NoAdd2 NOAdd3 NoMul1 NoMul2 No
Instruction Status:
Instructions StagesOp Dest i Src1 j Src2 k IS EX WBLD R1 0+ R11 1 2-3 4MUL R1 16 R1 2 5-14 15ADD R5 R5 R1 3 16-17 18LD R1 4 R11 4 5-6 7MUL R1 16 R1 5 8-17 19ADD R5 R5 R1 6 20-21 22
Load Buffers:
Busy AddressLoad1 NoLoad2 NoLoad3 No
Val3 Val4
Register R5 sees id “Add2” onCDB, which matches its source,so it picks up the value Val4.
Fall 2019 CS5513 Computer Architecture 52
Road Map
Overview of Tomasulo’s Algorithm
An Example of Tomasulo’s Algorithm
Another Example of Tomasulo’s Algorithm
Tomasulo Wrapup
Acknowledgment
Fall 2019 CS5513 Computer Architecture 53
Tomasulo ReviewI Reservation Stations
– RS provides many more registers than those name-able inthe assembly programming.
– Distribute RAW hazard detection– Renaming eliminates WAW hazards– Buffering values in Reservation Stations removes WARs– Tag match in CDB requires many associative compares
I Common Data Bus– Broadcast results to RS and register file.– Can be a bottle neck: Multiple writebacks (multiple CDBs)
expensive
Fall 2019 CS5513 Computer Architecture 54
Load/Store reorderingI In case a store and a load to the same memory
address are reordered, store buffer has to forward thevalue to the load buffer.
– That is, if store R1 and load R1 are reordered to load R1and store R1, load R1 cannot be issued but wait for storebuffer to forward the value.
I To support this forwarding, every load has to checkthe memory addresses in the store buffers beforeexecuting.
Fall 2019 CS5513 Computer Architecture 55
Tomasulo vs. Scoreboarding1. No explicit checking for WAW or WAR hazards2. CDB broadcasts results rather than waiting on
registers3. Loads/Store are treated like basic FUs4. Distributed vs. Centralized control
Fall 2019 CS5513 Computer Architecture 56
Register Renaming: Pointer-BasedI Registers are renamed (mapped) to an internal
register.I A map table records this renaming/mapping.I Mapper/Map Table: Hardware to hold these
mappings– Register Writes: Allocate new location, note mapping in
table– Register Reads: Look in map table, find location of most
recent writeI Deallocate mappings when doneI Used in MIPS R10K, Alpha 21264, Pentium 4 and
POWER4
Fall 2019 CS5513 Computer Architecture 57
Road Map
Overview of Tomasulo’s Algorithm
An Example of Tomasulo’s Algorithm
Another Example of Tomasulo’s Algorithm
Tomasulo Wrapup
Acknowledgment
Fall 2019 CS5513 Computer Architecture 58
AcknowledgmentThis lecture is partially based on the slides from Dr. DavidBrooks. The first Tomasulo’s example was based ib Dr.Broderson’s example, for CS152 at UCB (Copyright (C)2001 UCB).
Fall 2019 CS5513 Computer Architecture 59
top related