comp25212 advanced pipelining out of order processors

48
COMP25212 Advanced Pipelining Out of Order Processors

Upload: clinton-solis

Post on 14-Dec-2015

222 views

Category:

Documents


1 download

TRANSCRIPT

COMP25212

Advanced Pipelining

Out of Order Processors

From Monday…

Out-of-Order Execution with Scoreboard

• Centralized data structure which tracks the status of registers, FUs and instructions and creates, dynamically in hardware, the dependency graph

– The centralized nature limits scalability:

– Small number of FUs and small window of instructions

• Dependencies– RAW – stall conflicted instruction

– WAW – stall the pipeline

– WAR – stall WB

Out of Order Execution with Tomasulo

Tomasulo’s Algorithm

• Control logic for out-of-order execution is decentralized

– Reservation Stations (RS) in the functional units keep instruction information

– In addition RS seamlessly rename registers

• A Common Data Bus (CDB) broadcasts data and results to the different devices

– A single instruction can finish each cycle

• Distributed control allows for a larger window of instructions – Dynamic scheduling

Tomasulo’s Algorithm

• Structural hazards stall the pipeline

• RS tracks when operands are available and buffers them as soon as they are

– No need for accessing register bank (store values or sources)

• Impact of RAW dependencies are limited

– Execute an instruction when its operands are available

• WAW and WAR dependencies are avoided

– Register renaming

DIV.D F0, F2, F4ADD.D F6, F0, F8ST.D F6, 0(R1)SUB.D F8, F10, F14MUL.D F6, F10, F8

Antidependence

Output dependence

Register Renaming (Example)

• Eliminates WAR and WAW hazards by renaming all destination registers.

• Can be done by compiler

True dependences

TT

SS

Tomasulo Organization

FP addersFP adders

Add1Add2Add3

FP multipliersFP multipliers

Mult1Mult2

From Mem FP Registers

Reservation Stations

Common Data Bus (CDB)

To Mem

FP OpQueue

Load Buffers

Store Buffers

Load1Load2Load3Load4Load5Load6

Normal data bus: data + destinationCommon data bus: data + source

Issue

WriteBack

ExecuteInteger

WriteBack

ExecuteFP Multiplication

WriteBack

ExecuteFP Add

WriteBack

ExecuteFP Division

WriteBack

ExecuteFP Multiplication

Stages of a Tomasulo Pipeline

Three Stages of Tomasulo Algorithm

1. Issue—get instruction from FP Op Queue If reservation station free (no structural hazard),

control issues instr & sends operands (renames registers).

2.Execute—operate on operands (EX) When both source operands are ready then execute;

if not ready, watch Common Data Bus for result

3.Write result—finish execution (WB) Write on Common Data Bus to all awaiting units;

mark reservation station available

• Normal data bus: data + destination (“go to” bus)

• Common data bus: data + source (“come from” bus)– 64 bits of data + 4 bits of Functional Unit source address

– Write if matches expected Functional Unit (produces result)

– Does the broadcast

Reservation Station Components

No information about instructions needed

Tomasulo ExampleInstruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 Load1 NoLD F2 45+ R3 Load2 NoMULTD F0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 NoMult1 NoMult2 No

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

0 FU

Instruction stream

Instruction status:Tomasulo does not

need this infoWe will show the

times for each stage, for convenience

Reservation Station Components

No information about instructions needed

Op: Operation to perform in the unit (e.g., + or –)

Vj, Vk: Value of Source operands– Store buffers has V field, result to be stored

Qj, Qk: Reservation stations producing source registers (value to be written)

– Note: Qj,Qk=0 => ready– Store buffers only have Qi for RS producing result

Busy: Indicates reservation station or FU is busy

Tomasulo ExampleInstruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 Load1 NoLD F2 45+ R3 Load2 NoMULTD F0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 NoMult1 NoMult2 No

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

0 FU

FU countdown

Sourceregisters

Which FU will produce

operands

Reservation Stations:3 Adder

2 Multiplication

Reservation Stations:3 Load Buffers

Sourceregisters

Reservation Station Components

No information about instructions needed

Op: Operation to perform in the unit (e.g., + or –)

Vj, Vk: Value of Source operands– Store buffers has V field, result to be stored

Qj, Qk: Reservation stations producing source registers (value to be written)

– Note: Qj,Qk=0 => ready– Store buffers only have Qi for RS producing result

Busy: Indicates reservation station or FU is busy

Register result status—Indicates which functional unit will write each register, if one exists. Blank when no pending instructions that will write that register.

Tomasulo ExampleInstruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 Load1 NoLD F2 45+ R3 Load2 NoMULTD F0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 NoMult1 NoMult2 No

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

0 FU

Clock cycle counter

Which RS will write in each register?

A Tomasulo Example

The following code is run on a Tomasulo pipeline with:

L.D F6, 34(R2)

L.D F2, 45(R3)

MUL.D F0, F2, F4

SUB.D F8, F6, F2

DIV.D F10, F0, F6

ADD.D F6, F8, F2

Functional Unit (FU) # of FUs EX cycles

FP Multiply/Division 2 10/40 FP Addition/Substraction 3 2Mem Load 3 2

Functional units not pipelined

Dependency Graph For Example Code

L.D F6, 34(R2)L.D F2, 45(R3)MUL.D F0, F2, F4SUB.D F8, F6, F2DIV.D F10, F0, F6ADD.D F6, F8, F2

123456

L.D F6, 34 (R2)

1

L.D F2, 45 (R3)

2

MUL.D F0, F2, F4

3

DIV.D F10, F0, F6

5

SUB.D F8, F6, F2

4

ADD.D F6, F8, F2

6

Date Dependence:(1, 4) (1, 5) (2, 3) (2, 4) (2, 6) (3, 5) (4, 6)

Output Dependence:(1, 6)

Anti-dependence: (5, 6)

Example Code

Real Data Dependence (RAW)

Anti-dependence (WAR)

Output Dependence (WAW)

Tomasulo ExampleInstruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 Load1 NoLD F2 45+ R3 Load2 NoMULTD F0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 NoMult1 NoMult2 No

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

0 FU

Tomasulo Example Cycle 1Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 Load1 Yes 34+R2LD F2 45+ R3 Load2 NoMULTD F0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 NoMult1 NoMult2 No

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

1 FU Load1

LD#1 issued

Tomasulo Example Cycle 2Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 Load1 Yes 34+R2LD F2 45+ R3 2 Load2 Yes 45+R3MULTD F0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 NoMult1 NoMult2 No

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

2 FU Load2 Load1

LD#2 issued

Tomasulo Example Cycle 3Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 Load1 Yes 34+R2LD F2 45+ R3 2 Load2 Yes 45+R3MULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 NoMult1 Yes MULTD R(F4) Load2Mult2 No

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

3 FU Mult1 Load2 Load1

• MULTD is issued• LD#1 completes and broadcasts its result

Tomasulo Example Cycle 4Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 Load2 Yes 45+R3MULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6ADDD F6 F8 F2

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 Yes SUBD M(A1) Load2Add2 NoAdd3 NoMult1 Yes MULTD R(F4) Load2Mult2 No

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

4 FU Mult1 Load2 M(A1) Add1

• SUBD is issued• LD#1 result updates the register bank• LD#2 completes, broadcasting its result

Tomasulo Example Cycle 5Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6 5ADDD F6 F8 F2

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

2 Add1 Yes SUBD M(A1) M(A2)Add2 NoAdd3 No

10 Mult1 Yes MULTD M(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

5 FU Mult1 M(A2) M(A1) Add1 Mult2

• DIVD is issued• LD#2 result updates the register bank

• Add1, Mult1 start execution

Tomasulo Example Cycle 6Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6 5ADDD F6 F8 F2 6

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

1 Add1 Yes SUBD M(A1) M(A2)Add2 Yes ADDD M(A2) Add1Add3 No

9 Mult1 Yes MULTD M(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

6 FU Mult1 M(A2) Add2 Add1 Mult2

• ADDD issued

Tomasulo Example Cycle 7Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7DIVD F10 F0 F6 5ADDD F6 F8 F2 6

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

0 Add1 Yes SUBD M(A1) M(A2)Add2 Yes ADDD M(A2) Add1Add3 No

8 Mult1 Yes MULTD M(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

7 FU Mult1 M(A2) Add2 Add1 Mult2

• Add1 (SUBD) completes and broadcasts result

Tomasulo Example Cycle 8Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 No2 Add2 Yes ADDD (M-M) M(A2)

Add3 No7 Mult1 Yes MULTD M(A2) R(F4)

Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

8 FU Mult1 M(A2) Add2 (M-M) Mult2

• Add1 (SUBD) result updates the register bank• Add2 (ADDD) start execution

Tomasulo Example Cycle 9Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 No1 Add2 Yes ADDD (M-M) M(A2)

Add3 No6 Mult1 Yes MULTD M(A2) R(F4)

Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

9 FU Mult1 M(A2) Add2 (M-M) Mult2

• ADDD and MULTD continue execution

Tomasulo Example Cycle 10Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 No0 Add2 Yes ADDD (M-M) M(A2)

Add3 No5 Mult1 Yes MULTD M(A2) R(F4)

Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

10 FU Mult1 M(A2) Add2 (M-M) Mult2

• Add2 (ADDD) completes

Tomasulo Example Cycle 11Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 No

4 Mult1 Yes MULTD M(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

11 FU Mult1 M(A2) (M-M+M)(M-M) Mult2

• ADDD result updates the register bank

Tomasulo Example Cycle 12Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 No

3 Mult1 Yes MULTD M(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

12 FU Mult1 M(A2) (M-M+M)(M-M) Mult2

• MULTD continues execution

Tomasulo Example Cycle 13Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 No

2 Mult1 Yes MULTD M(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

13 FU Mult1 M(A2) (M-M+M)(M-M) Mult2

• MULTD continues execution

Tomasulo Example Cycle 14Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 No

1 Mult1 Yes MULTD M(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

14 FU Mult1 M(A2) (M-M+M)(M-M) Mult2

• MULTD continues execution

Tomasulo Example Cycle 15Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 15 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 No

0 Mult1 Yes MULTD M(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

15 FU Mult1 M(A2) (M-M+M)(M-M) Mult2

• MULTD completes and broadcasts result

Tomasulo Example Cycle 16Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 NoMult1 No

40 Mult2 Yes DIVD M*F4 M(A1)

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

16 FU M*F4 M(A2) (M-M+M)(M-M) Mult2

• MULTD result updates the register bank• DIVD starts execution

39 cycles later…

Tomasulo Example Cycle 55Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 NoMult1 No

1 Mult2 Yes DIVD M*F4 M(A1)

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

55 FU M*F4 M(A2) (M-M+M)(M-M) Mult2

• DIVD is about to complete

Tomasulo Example Cycle 56Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5 56ADDD F6 F8 F2 6 10 11

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 NoMult1 No

0 Mult2 Yes DIVD M*F4 M(A1)

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

56 FU M*F4 M(A2) (M-M+M)(M-M) Mult2

• DIVD completes

Tomasulo Example Cycle 57Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5 56 57ADDD F6 F8 F2 6 10 11

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 NoMult1 NoMult2 No

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ...

56 FU M*F4 M(A2) (M-M+M)(M-M) Result

• DIVD result updates the register bank

Tomasulo Example Cycle 57Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5 56 57ADDD F6 F8 F2 6 10 11

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 NoMult1 NoMult2 No

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ...

56 FU M*F4 M(A2) (M-M+M)(M-M) Result

In-order issueOut-of-order executionOut-of-order completion

Tomasulo’s advantages

(1) Distributed hazard detection logic

– distributed reservation stations and the CDB

– If multiple instructions waiting on a single result, & each instruction has other operand, then instructions can be dispatched simultaneously by broadcasting on CDB

– If a centralized register file were used, the units would have to read their results from the registers when register buses are available.

(2) Avoids stalling due to WAW or WAR hazards

Tomasulo Drawbacks

• Complexity of hardware

• Performance limited by Common Data Bus

– Each CDB must go to all functional units high capacitance, high wiring density

– Number of functional units that can complete per cycle limited to one!

» Multiple CDBs more FU logic for parallel stores

Summary

• Reservations stations: implicit register renaming to larger set of registers + buffering source operands

– Prevents registers from being bottleneck

– Avoids the WAR and WAW hazards of Scoreboard

• Lasting Contributions

– Dynamic scheduling

– Register renaming

– Load/store disambiguation

Summary of Out-of-Order Processors

Out of Order Processors

BENEFITS:

• Accelerates the execution of programs

• More efficient design– Increases the utilisation of

processor resources

LIMITATIONS:

• More complex design

• Very expensive in terms of area and power

• Non-precise interrupts– Interrupting exactly after

an instruction might not be possible

Scoreboard vs Tomasulo

Scoreboard Tomasulo

Window size: ≤ 5 instructions ≤ 14 instructions

Structural hazard: No issue No issue

WAR dependency stall completion renaming avoids

WAW dependency: stall completion renaming avoids

Results forwarding: Write/read registers Broadcast from FU

Control structure: central scoreboard distributed reservation stations

Example

LD – 4 cyclesAdd/Sub – 2 cyclesMul/Div – 2 cycles

Assuming no structural Hazards

In-order 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 LD R1 X IF ID LD1 LD2 LD3 LD4 WB LD R2 Y IF ID LD1 LD2 LD3 LD4 WB ADD R3 R1 R2 IF ID Stall Stall Stall Stall Add1 Add2 WB SUB R3 R5 R6 IF Stall Stall Stall Stall ID Sub1 Sub2 WB MUL R4 R1 R1 IF ID Mul1 Mul2 Mul3 Mul4 WB DIV R7 R5 R6 IF ID Div1 Div2 Div3 Div4 WB

Out-of-order with Scoreboard 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 LD R1 X I RO LD1 LD2 LD3 LD4 WB LD R2 Y I RO LD1 LD2 LD3 LD4 WB ADD R3 R1 R2 I RO RO RO RO RO Add1 Add2 WB SUB R3 R5 R6 I I I I RO Sub1 Sub2 WB MUL R4 R1 R1 I RO Mul1 Mul2 Mul3 Mul4 WB DIV R7 R5 R6 I RO Div1 Div2 Div3 Div4 WB

Out-of-order with Tomasulo 1 2 3 4 5 6 7 8 9 10 11 12 LD R1 X I LD1 LD2 LD3 LD4 CDB LD R2 Y I LD1 LD2 LD3 LD4 CDB ADD R3 R1 R2 I RS RS RS RS Add1 Add2 CDB SUB R3 R5 R6 I Sub1 Sub2 CDB CDB MUL R4 R1 R1 I RS Mul1 Mul2 Mul3 Mul4 CDB DIV R7 R5 R6 I Div1 Div2 Div3 Div4 CDB CDB

RAW – Stall the pipeline

RAW – ADD stalled, SUB could be issued

RAW – ADD stalled, SUB can be issued

RAW

WAW

Example

LD – 4 cyclesAdd/Sub – 2 cyclesMul/Div – 2 cycles

Assuming no structural Hazards

In-order 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 LD R1 X IF ID LD1 LD2 LD3 LD4 WB LD R2 Y IF ID LD1 LD2 LD3 LD4 WB ADD R3 R1 R2 IF ID Stall Stall Stall Stall Add1 Add2 WB SUB R3 R5 R6 IF Stall Stall Stall Stall ID Sub1 Sub2 WB MUL R4 R1 R1 IF ID Mul1 Mul2 Mul3 Mul4 WB DIV R7 R5 R6 IF ID Div1 Div2 Div3 Div4 WB

Out-of-order with Scoreboard 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 LD R1 X I RO LD1 LD2 LD3 LD4 WB LD R2 Y I RO LD1 LD2 LD3 LD4 WB ADD R3 R1 R2 I RO RO RO RO RO Add1 Add2 WB SUB R3 R5 R6 I I I I RO Sub1 Sub2 WB MUL R4 R1 R1 I RO Mul1 Mul2 Mul3 Mul4 WB DIV R7 R5 R6 I RO Div1 Div2 Div3 Div4 WB

Out-of-order with Tomasulo 1 2 3 4 5 6 7 8 9 10 11 12 LD R1 X I LD1 LD2 LD3 LD4 CDB LD R2 Y I LD1 LD2 LD3 LD4 CDB ADD R3 R1 R2 I RS RS RS RS Add1 Add2 CDB SUB R3 R5 R6 I Sub1 Sub2 CDB CDB MUL R4 R1 R1 I RS Mul1 Mul2 Mul3 Mul4 CDB DIV R7 R5 R6 I Div1 Div2 Div3 Div4 CDB CDB

WAW – Allowed by register renaming in RS

WAWWAW – SUB cannot be issued Stall the pipeline

Example

LD – 4 cyclesAdd/Sub – 2 cyclesMul/Div – 2 cycles

Assuming no structural Hazards

In-order 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 LD R1 X IF ID LD1 LD2 LD3 LD4 WB LD R2 Y IF ID LD1 LD2 LD3 LD4 WB ADD R3 R1 R2 IF ID Stall Stall Stall Stall Add1 Add2 WB SUB R3 R5 R6 IF Stall Stall Stall Stall ID Sub1 Sub2 WB MUL R4 R1 R1 IF ID Mul1 Mul2 Mul3 Mul4 WB DIV R7 R5 R6 IF ID Div1 Div2 Div3 Div4 WB

Out-of-order with Scoreboard 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 LD R1 X I RO LD1 LD2 LD3 LD4 WB LD R2 Y I RO LD1 LD2 LD3 LD4 WB ADD R3 R1 R2 I RO RO RO RO RO Add1 Add2 WB SUB R3 R5 R6 I I I I RO Sub1 Sub2 WB MUL R4 R1 R1 I RO Mul1 Mul2 Mul3 Mul4 WB DIV R7 R5 R6 I RO Div1 Div2 Div3 Div4 WB

Out-of-order with Tomasulo 1 2 3 4 5 6 7 8 9 10 11 12 LD R1 X I LD1 LD2 LD3 LD4 CDB LD R2 Y I LD1 LD2 LD3 LD4 CDB ADD R3 R1 R2 I RS RS RS RS Add1 Add2 CDB SUB R3 R5 R6 I Sub1 Sub2 CDB CDB MUL R4 R1 R1 I RS Mul1 Mul2 Mul3 Mul4 CDB DIV R7 R5 R6 I Div1 Div2 Div3 Div4 CDB CDB

2 instrs. can finish atthe same time

CDB limits finishinginstrs. to one/cycle