comp25212 advanced pipelining out of order processors
TRANSCRIPT
From Monday…
Out-of-Order Execution with Scoreboard
• Centralized data structure which tracks the status of registers, FUs and instructions and creates, dynamically in hardware, the dependency graph
– The centralized nature limits scalability:
– Small number of FUs and small window of instructions
• Dependencies– RAW – stall conflicted instruction
– WAW – stall the pipeline
– WAR – stall WB
Tomasulo’s Algorithm
• Control logic for out-of-order execution is decentralized
– Reservation Stations (RS) in the functional units keep instruction information
– In addition RS seamlessly rename registers
• A Common Data Bus (CDB) broadcasts data and results to the different devices
– A single instruction can finish each cycle
• Distributed control allows for a larger window of instructions – Dynamic scheduling
Tomasulo’s Algorithm
• Structural hazards stall the pipeline
• RS tracks when operands are available and buffers them as soon as they are
– No need for accessing register bank (store values or sources)
• Impact of RAW dependencies are limited
– Execute an instruction when its operands are available
• WAW and WAR dependencies are avoided
– Register renaming
DIV.D F0, F2, F4ADD.D F6, F0, F8ST.D F6, 0(R1)SUB.D F8, F10, F14MUL.D F6, F10, F8
Antidependence
Output dependence
Register Renaming (Example)
• Eliminates WAR and WAW hazards by renaming all destination registers.
• Can be done by compiler
True dependences
TT
SS
Tomasulo Organization
FP addersFP adders
Add1Add2Add3
FP multipliersFP multipliers
Mult1Mult2
From Mem FP Registers
Reservation Stations
Common Data Bus (CDB)
To Mem
FP OpQueue
Load Buffers
Store Buffers
Load1Load2Load3Load4Load5Load6
Normal data bus: data + destinationCommon data bus: data + source
Issue
WriteBack
ExecuteInteger
WriteBack
ExecuteFP Multiplication
WriteBack
ExecuteFP Add
WriteBack
ExecuteFP Division
WriteBack
ExecuteFP Multiplication
Stages of a Tomasulo Pipeline
Three Stages of Tomasulo Algorithm
1. Issue—get instruction from FP Op Queue If reservation station free (no structural hazard),
control issues instr & sends operands (renames registers).
2.Execute—operate on operands (EX) When both source operands are ready then execute;
if not ready, watch Common Data Bus for result
3.Write result—finish execution (WB) Write on Common Data Bus to all awaiting units;
mark reservation station available
• Normal data bus: data + destination (“go to” bus)
• Common data bus: data + source (“come from” bus)– 64 bits of data + 4 bits of Functional Unit source address
– Write if matches expected Functional Unit (produces result)
– Does the broadcast
Tomasulo ExampleInstruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 Load1 NoLD F2 45+ R3 Load2 NoMULTD F0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 NoMult1 NoMult2 No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
0 FU
Instruction stream
Instruction status:Tomasulo does not
need this infoWe will show the
times for each stage, for convenience
Reservation Station Components
No information about instructions needed
Op: Operation to perform in the unit (e.g., + or –)
Vj, Vk: Value of Source operands– Store buffers has V field, result to be stored
Qj, Qk: Reservation stations producing source registers (value to be written)
– Note: Qj,Qk=0 => ready– Store buffers only have Qi for RS producing result
Busy: Indicates reservation station or FU is busy
Tomasulo ExampleInstruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 Load1 NoLD F2 45+ R3 Load2 NoMULTD F0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 NoMult1 NoMult2 No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
0 FU
FU countdown
Sourceregisters
Which FU will produce
operands
Reservation Stations:3 Adder
2 Multiplication
Reservation Stations:3 Load Buffers
Sourceregisters
Reservation Station Components
No information about instructions needed
Op: Operation to perform in the unit (e.g., + or –)
Vj, Vk: Value of Source operands– Store buffers has V field, result to be stored
Qj, Qk: Reservation stations producing source registers (value to be written)
– Note: Qj,Qk=0 => ready– Store buffers only have Qi for RS producing result
Busy: Indicates reservation station or FU is busy
Register result status—Indicates which functional unit will write each register, if one exists. Blank when no pending instructions that will write that register.
Tomasulo ExampleInstruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 Load1 NoLD F2 45+ R3 Load2 NoMULTD F0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 NoMult1 NoMult2 No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
0 FU
Clock cycle counter
Which RS will write in each register?
A Tomasulo Example
The following code is run on a Tomasulo pipeline with:
L.D F6, 34(R2)
L.D F2, 45(R3)
MUL.D F0, F2, F4
SUB.D F8, F6, F2
DIV.D F10, F0, F6
ADD.D F6, F8, F2
Functional Unit (FU) # of FUs EX cycles
FP Multiply/Division 2 10/40 FP Addition/Substraction 3 2Mem Load 3 2
Functional units not pipelined
Dependency Graph For Example Code
L.D F6, 34(R2)L.D F2, 45(R3)MUL.D F0, F2, F4SUB.D F8, F6, F2DIV.D F10, F0, F6ADD.D F6, F8, F2
123456
L.D F6, 34 (R2)
1
L.D F2, 45 (R3)
2
MUL.D F0, F2, F4
3
DIV.D F10, F0, F6
5
SUB.D F8, F6, F2
4
ADD.D F6, F8, F2
6
Date Dependence:(1, 4) (1, 5) (2, 3) (2, 4) (2, 6) (3, 5) (4, 6)
Output Dependence:(1, 6)
Anti-dependence: (5, 6)
Example Code
Real Data Dependence (RAW)
Anti-dependence (WAR)
Output Dependence (WAW)
Tomasulo ExampleInstruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 Load1 NoLD F2 45+ R3 Load2 NoMULTD F0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 NoMult1 NoMult2 No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
0 FU
Tomasulo Example Cycle 1Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 Load1 Yes 34+R2LD F2 45+ R3 Load2 NoMULTD F0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 NoMult1 NoMult2 No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
1 FU Load1
LD#1 issued
Tomasulo Example Cycle 2Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 Load1 Yes 34+R2LD F2 45+ R3 2 Load2 Yes 45+R3MULTD F0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 NoMult1 NoMult2 No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
2 FU Load2 Load1
LD#2 issued
Tomasulo Example Cycle 3Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 Load1 Yes 34+R2LD F2 45+ R3 2 Load2 Yes 45+R3MULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 NoMult1 Yes MULTD R(F4) Load2Mult2 No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
3 FU Mult1 Load2 Load1
• MULTD is issued• LD#1 completes and broadcasts its result
Tomasulo Example Cycle 4Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 Load2 Yes 45+R3MULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6ADDD F6 F8 F2
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 Yes SUBD M(A1) Load2Add2 NoAdd3 NoMult1 Yes MULTD R(F4) Load2Mult2 No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
4 FU Mult1 Load2 M(A1) Add1
• SUBD is issued• LD#1 result updates the register bank• LD#2 completes, broadcasting its result
Tomasulo Example Cycle 5Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6 5ADDD F6 F8 F2
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
2 Add1 Yes SUBD M(A1) M(A2)Add2 NoAdd3 No
10 Mult1 Yes MULTD M(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
5 FU Mult1 M(A2) M(A1) Add1 Mult2
• DIVD is issued• LD#2 result updates the register bank
• Add1, Mult1 start execution
Tomasulo Example Cycle 6Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6 5ADDD F6 F8 F2 6
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
1 Add1 Yes SUBD M(A1) M(A2)Add2 Yes ADDD M(A2) Add1Add3 No
9 Mult1 Yes MULTD M(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
6 FU Mult1 M(A2) Add2 Add1 Mult2
• ADDD issued
Tomasulo Example Cycle 7Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7DIVD F10 F0 F6 5ADDD F6 F8 F2 6
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
0 Add1 Yes SUBD M(A1) M(A2)Add2 Yes ADDD M(A2) Add1Add3 No
8 Mult1 Yes MULTD M(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
7 FU Mult1 M(A2) Add2 Add1 Mult2
• Add1 (SUBD) completes and broadcasts result
Tomasulo Example Cycle 8Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 No2 Add2 Yes ADDD (M-M) M(A2)
Add3 No7 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
8 FU Mult1 M(A2) Add2 (M-M) Mult2
• Add1 (SUBD) result updates the register bank• Add2 (ADDD) start execution
Tomasulo Example Cycle 9Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 No1 Add2 Yes ADDD (M-M) M(A2)
Add3 No6 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
9 FU Mult1 M(A2) Add2 (M-M) Mult2
• ADDD and MULTD continue execution
Tomasulo Example Cycle 10Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 No0 Add2 Yes ADDD (M-M) M(A2)
Add3 No5 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
10 FU Mult1 M(A2) Add2 (M-M) Mult2
• Add2 (ADDD) completes
Tomasulo Example Cycle 11Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 No
4 Mult1 Yes MULTD M(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
11 FU Mult1 M(A2) (M-M+M)(M-M) Mult2
• ADDD result updates the register bank
Tomasulo Example Cycle 12Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 No
3 Mult1 Yes MULTD M(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
12 FU Mult1 M(A2) (M-M+M)(M-M) Mult2
• MULTD continues execution
Tomasulo Example Cycle 13Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 No
2 Mult1 Yes MULTD M(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
13 FU Mult1 M(A2) (M-M+M)(M-M) Mult2
• MULTD continues execution
Tomasulo Example Cycle 14Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 No
1 Mult1 Yes MULTD M(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
14 FU Mult1 M(A2) (M-M+M)(M-M) Mult2
• MULTD continues execution
Tomasulo Example Cycle 15Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 15 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 No
0 Mult1 Yes MULTD M(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
15 FU Mult1 M(A2) (M-M+M)(M-M) Mult2
• MULTD completes and broadcasts result
Tomasulo Example Cycle 16Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 NoMult1 No
40 Mult2 Yes DIVD M*F4 M(A1)
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
16 FU M*F4 M(A2) (M-M+M)(M-M) Mult2
• MULTD result updates the register bank• DIVD starts execution
Tomasulo Example Cycle 55Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 NoMult1 No
1 Mult2 Yes DIVD M*F4 M(A1)
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
55 FU M*F4 M(A2) (M-M+M)(M-M) Mult2
• DIVD is about to complete
Tomasulo Example Cycle 56Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5 56ADDD F6 F8 F2 6 10 11
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 NoMult1 No
0 Mult2 Yes DIVD M*F4 M(A1)
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
56 FU M*F4 M(A2) (M-M+M)(M-M) Mult2
• DIVD completes
Tomasulo Example Cycle 57Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5 56 57ADDD F6 F8 F2 6 10 11
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 NoMult1 NoMult2 No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ...
56 FU M*F4 M(A2) (M-M+M)(M-M) Result
• DIVD result updates the register bank
Tomasulo Example Cycle 57Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5 56 57ADDD F6 F8 F2 6 10 11
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 NoMult1 NoMult2 No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ...
56 FU M*F4 M(A2) (M-M+M)(M-M) Result
In-order issueOut-of-order executionOut-of-order completion
Tomasulo’s advantages
(1) Distributed hazard detection logic
– distributed reservation stations and the CDB
– If multiple instructions waiting on a single result, & each instruction has other operand, then instructions can be dispatched simultaneously by broadcasting on CDB
– If a centralized register file were used, the units would have to read their results from the registers when register buses are available.
(2) Avoids stalling due to WAW or WAR hazards
Tomasulo Drawbacks
• Complexity of hardware
• Performance limited by Common Data Bus
– Each CDB must go to all functional units high capacitance, high wiring density
– Number of functional units that can complete per cycle limited to one!
» Multiple CDBs more FU logic for parallel stores
Summary
• Reservations stations: implicit register renaming to larger set of registers + buffering source operands
– Prevents registers from being bottleneck
– Avoids the WAR and WAW hazards of Scoreboard
• Lasting Contributions
– Dynamic scheduling
– Register renaming
– Load/store disambiguation
Out of Order Processors
BENEFITS:
• Accelerates the execution of programs
• More efficient design– Increases the utilisation of
processor resources
LIMITATIONS:
• More complex design
• Very expensive in terms of area and power
• Non-precise interrupts– Interrupting exactly after
an instruction might not be possible
Scoreboard vs Tomasulo
Scoreboard Tomasulo
Window size: ≤ 5 instructions ≤ 14 instructions
Structural hazard: No issue No issue
WAR dependency stall completion renaming avoids
WAW dependency: stall completion renaming avoids
Results forwarding: Write/read registers Broadcast from FU
Control structure: central scoreboard distributed reservation stations
Example
LD – 4 cyclesAdd/Sub – 2 cyclesMul/Div – 2 cycles
Assuming no structural Hazards
In-order 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 LD R1 X IF ID LD1 LD2 LD3 LD4 WB LD R2 Y IF ID LD1 LD2 LD3 LD4 WB ADD R3 R1 R2 IF ID Stall Stall Stall Stall Add1 Add2 WB SUB R3 R5 R6 IF Stall Stall Stall Stall ID Sub1 Sub2 WB MUL R4 R1 R1 IF ID Mul1 Mul2 Mul3 Mul4 WB DIV R7 R5 R6 IF ID Div1 Div2 Div3 Div4 WB
Out-of-order with Scoreboard 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 LD R1 X I RO LD1 LD2 LD3 LD4 WB LD R2 Y I RO LD1 LD2 LD3 LD4 WB ADD R3 R1 R2 I RO RO RO RO RO Add1 Add2 WB SUB R3 R5 R6 I I I I RO Sub1 Sub2 WB MUL R4 R1 R1 I RO Mul1 Mul2 Mul3 Mul4 WB DIV R7 R5 R6 I RO Div1 Div2 Div3 Div4 WB
Out-of-order with Tomasulo 1 2 3 4 5 6 7 8 9 10 11 12 LD R1 X I LD1 LD2 LD3 LD4 CDB LD R2 Y I LD1 LD2 LD3 LD4 CDB ADD R3 R1 R2 I RS RS RS RS Add1 Add2 CDB SUB R3 R5 R6 I Sub1 Sub2 CDB CDB MUL R4 R1 R1 I RS Mul1 Mul2 Mul3 Mul4 CDB DIV R7 R5 R6 I Div1 Div2 Div3 Div4 CDB CDB
RAW – Stall the pipeline
RAW – ADD stalled, SUB could be issued
RAW – ADD stalled, SUB can be issued
RAW
WAW
Example
LD – 4 cyclesAdd/Sub – 2 cyclesMul/Div – 2 cycles
Assuming no structural Hazards
In-order 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 LD R1 X IF ID LD1 LD2 LD3 LD4 WB LD R2 Y IF ID LD1 LD2 LD3 LD4 WB ADD R3 R1 R2 IF ID Stall Stall Stall Stall Add1 Add2 WB SUB R3 R5 R6 IF Stall Stall Stall Stall ID Sub1 Sub2 WB MUL R4 R1 R1 IF ID Mul1 Mul2 Mul3 Mul4 WB DIV R7 R5 R6 IF ID Div1 Div2 Div3 Div4 WB
Out-of-order with Scoreboard 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 LD R1 X I RO LD1 LD2 LD3 LD4 WB LD R2 Y I RO LD1 LD2 LD3 LD4 WB ADD R3 R1 R2 I RO RO RO RO RO Add1 Add2 WB SUB R3 R5 R6 I I I I RO Sub1 Sub2 WB MUL R4 R1 R1 I RO Mul1 Mul2 Mul3 Mul4 WB DIV R7 R5 R6 I RO Div1 Div2 Div3 Div4 WB
Out-of-order with Tomasulo 1 2 3 4 5 6 7 8 9 10 11 12 LD R1 X I LD1 LD2 LD3 LD4 CDB LD R2 Y I LD1 LD2 LD3 LD4 CDB ADD R3 R1 R2 I RS RS RS RS Add1 Add2 CDB SUB R3 R5 R6 I Sub1 Sub2 CDB CDB MUL R4 R1 R1 I RS Mul1 Mul2 Mul3 Mul4 CDB DIV R7 R5 R6 I Div1 Div2 Div3 Div4 CDB CDB
WAW – Allowed by register renaming in RS
WAWWAW – SUB cannot be issued Stall the pipeline
Example
LD – 4 cyclesAdd/Sub – 2 cyclesMul/Div – 2 cycles
Assuming no structural Hazards
In-order 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 LD R1 X IF ID LD1 LD2 LD3 LD4 WB LD R2 Y IF ID LD1 LD2 LD3 LD4 WB ADD R3 R1 R2 IF ID Stall Stall Stall Stall Add1 Add2 WB SUB R3 R5 R6 IF Stall Stall Stall Stall ID Sub1 Sub2 WB MUL R4 R1 R1 IF ID Mul1 Mul2 Mul3 Mul4 WB DIV R7 R5 R6 IF ID Div1 Div2 Div3 Div4 WB
Out-of-order with Scoreboard 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 LD R1 X I RO LD1 LD2 LD3 LD4 WB LD R2 Y I RO LD1 LD2 LD3 LD4 WB ADD R3 R1 R2 I RO RO RO RO RO Add1 Add2 WB SUB R3 R5 R6 I I I I RO Sub1 Sub2 WB MUL R4 R1 R1 I RO Mul1 Mul2 Mul3 Mul4 WB DIV R7 R5 R6 I RO Div1 Div2 Div3 Div4 WB
Out-of-order with Tomasulo 1 2 3 4 5 6 7 8 9 10 11 12 LD R1 X I LD1 LD2 LD3 LD4 CDB LD R2 Y I LD1 LD2 LD3 LD4 CDB ADD R3 R1 R2 I RS RS RS RS Add1 Add2 CDB SUB R3 R5 R6 I Sub1 Sub2 CDB CDB MUL R4 R1 R1 I RS Mul1 Mul2 Mul3 Mul4 CDB DIV R7 R5 R6 I Div1 Div2 Div3 Div4 CDB CDB
2 instrs. can finish atthe same time
CDB limits finishinginstrs. to one/cycle