instruction/decode buffer lecture 8 o in dispatch speculation &...

37
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar EECS 470 Instruction/Decode Buffer Fetch Dispatch Buffer Decode Order Lecture 8 Speculation & Dispatch Buffer Reservation Dispatch Stations Issue In O Speculation & Precise Interrupts II Fall 2007 Reorder/ Complete Execute Finish Out of Order r Completion Buffer Fall 2007 Prof. Thomas Wenisch http://www eecs umich edu/courses/eecs470 Store Buffer Complete Retire In Order http://www.eecs.umich.edu/courses/eecs470 Many thanks to Prof. Martin and Roth of University of Pennsylvania for most of Many thanks to Prof. Martin and Roth of University of Pennsylvania for most of these slides. these slides. Portions developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Portions developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Lecture 8 Slide 1 EECS 470 Portions developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Portions developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar, and Wenisch of Carnegie Mellon University, Purdue Smith, Sohi, Tyson, Vijaykumar, and Wenisch of Carnegie Mellon University, Purdue University, University of Michigan, and University of Wisconsin. University, University of Michigan, and University of Wisconsin.

Upload: others

Post on 23-Nov-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Instruction/Decode Buffer Lecture 8 O In Dispatch Speculation & …web.eecs.umich.edu/~twenisch/470_F07/lectures/8.pdf · 2007. 9. 30. · P6 Dispatch (D): Part II Map Table T+ value

© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

EECS 470 Instruction/Decode Buffer

Fetch

Dispatch Buffer

Decode

Ord

er

Lecture 8Speculation &

Dispatch Buffer

Reservation

Dispatch

StationsIssue

In O

Speculation &Precise Interrupts IIFall 2007

Reorder/

Complete

Execute

Finish

Out

of

Ord

err

Completion BufferFall 2007

Prof. Thomas Wenisch

http://www eecs umich edu/courses/eecs470

Store Buffer

Complete

RetireIn O

rder

http://www.eecs.umich.edu/courses/eecs470

Many thanks to Prof. Martin and Roth of University of Pennsylvania for most of Many thanks to Prof. Martin and Roth of University of Pennsylvania for most of these slides. these slides. Portions developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Portions developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen,

Lecture 8 Slide 1EECS 470

Portions developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Portions developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar, and Wenisch of Carnegie Mellon University, Purdue Smith, Sohi, Tyson, Vijaykumar, and Wenisch of Carnegie Mellon University, Purdue University, University of Michigan, and University of Wisconsin. University, University of Michigan, and University of Wisconsin.

Page 2: Instruction/Decode Buffer Lecture 8 O In Dispatch Speculation & …web.eecs.umich.edu/~twenisch/470_F07/lectures/8.pdf · 2007. 9. 30. · P6 Dispatch (D): Part II Map Table T+ value

© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Announcements

HW # 3 is posted, due 10/10HW # 3 is posted, due 10/10  

Programming assignment #3 (due 10/8)Programming assignment #3 (due 10/8)

Project handout is posted• Form groups of 3‐5 ASAP• Bigger group == higher expectations for gradinggg g p g p g g

Lecture 8 Slide 2EECS 470

Page 3: Instruction/Decode Buffer Lecture 8 O In Dispatch Speculation & …web.eecs.umich.edu/~twenisch/470_F07/lectures/8.pdf · 2007. 9. 30. · P6 Dispatch (D): Part II Map Table T+ value

© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Readings

For Today:For Today:Smith & Pleszkun “Implementing Precise Interrupts”H & P Chapter 2.4‐2.6, 2.8

Have you read yet?D Si “D i S f R i t R i T h i ”D. Sima “Design Space of Register Renaming Techniques”

Some of the homework questions cover the papers!Some of the homework questions cover the papers!

Lecture 8 Slide 3EECS 470

Page 4: Instruction/Decode Buffer Lecture 8 O In Dispatch Speculation & …web.eecs.umich.edu/~twenisch/470_F07/lectures/8.pdf · 2007. 9. 30. · P6 Dispatch (D): Part II Map Table T+ value

© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

The Problem with Precise Stateinsn buffer

regfile

D$I$B

P bl it b k bi t t f ti

P SD

Problem: writeback combines two separate functions• Forwards values to younger insns: OK for this to be out‐of‐order

• Write values to registers: would like this to be in‐order

Similar problem (decode) for OoO execution: solution?• Split decode (D) → in‐order dispatch (D) + out‐of‐order issue (S)

EECS 470Lecture 8 Slide 4EECS 470 EECS 470

• Separate using insn buffer: scoreboard or reservation station

Page 5: Instruction/Decode Buffer Lecture 8 O In Dispatch Speculation & …web.eecs.umich.edu/~twenisch/470_F07/lectures/8.pdf · 2007. 9. 30. · P6 Dispatch (D): Part II Map Table T+ value

© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Re-Order Buffer (ROB)Reorder buffer (ROB)

regfile

D$I$B

Insn buffer→ re‐order buffer (ROB)

P W1 W2

Insn buffer → re‐order buffer (ROB)• Buffers completed results en route to register file

• May be combined with RS or separate

• Combined in picture: register‐update unit RUU (Sohi’s method)

• Separate (more common today): P6‐style

Split writeback (W) into two stages

EECS 470Lecture 8 Slide 5EECS 470 EECS 470

Split writeback (W) into two stages• Why is there no latch between W1 and W2?

Page 6: Instruction/Decode Buffer Lecture 8 O In Dispatch Speculation & …web.eecs.umich.edu/~twenisch/470_F07/lectures/8.pdf · 2007. 9. 30. · P6 Dispatch (D): Part II Map Table T+ value

© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Complete and RetireReorder buffer (ROB)

regfile

D$I$B

Complete (C): second part of decode

P C R

Complete (C): second part of decode• Completed insns write results into ROB

+ Out‐of‐order: wait doesn’t back‐propagate to younger insns

Retire (R): aka commit, graduate• ROB writes results to register file

• In order: stall back‐propagates to younger insns

EECS 470Lecture 8 Slide 6EECS 470 EECS 470

In order: stall back propagates to younger insns

Page 7: Instruction/Decode Buffer Lecture 8 O In Dispatch Speculation & …web.eecs.umich.edu/~twenisch/470_F07/lectures/8.pdf · 2007. 9. 30. · P6 Dispatch (D): Part II Map Table T+ value

© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Load/Store Queue (LSQ)ROB makes register writes in‐order, but what about stores?

As usual, i.e., to D$ in X stage?• Not even close, imprecise memory worse than imprecise registers

Load/store queue (LSQ)• Completed stores write to LSQp Q

• When store retires, head of LSQ written to D$

• When loads execute, access LSQ and D$ in parallel

• Forward from LSQ if older store with matching address• Forward from LSQ if older store with matching address

• More modern design: loads and stores in separate queues

• More on this later

EECS 470Lecture 8 Slide 7EECS 470 EECS 470

Page 8: Instruction/Decode Buffer Lecture 8 O In Dispatch Speculation & …web.eecs.umich.edu/~twenisch/470_F07/lectures/8.pdf · 2007. 9. 30. · P6 Dispatch (D): Part II Map Table T+ value

© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

ROB + LSQROB

regfile

I$BP C R

store data load data

D$

LSQload/store

store dataaddr

EECS 470Lecture 8 Slide 8EECS 470 EECS 470

Modulo gross simplifications, this picture is almost realistic!

Page 9: Instruction/Decode Buffer Lecture 8 O In Dispatch Speculation & …web.eecs.umich.edu/~twenisch/470_F07/lectures/8.pdf · 2007. 9. 30. · P6 Dispatch (D): Part II Map Table T+ value

© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6P6: Start with Tomasulo’s algorithm… add ROB

• Separate ROB and RS

Simple‐P6• Our old RS organization: 1 ALU 1 load 1 store 2 3 cycle FP• Our old RS organization: 1 ALU, 1 load, 1 store, 2 3‐cycle FP

EECS 470Lecture 8 Slide 9EECS 470 EECS 470

Page 10: Instruction/Decode Buffer Lecture 8 O In Dispatch Speculation & …web.eecs.umich.edu/~twenisch/470_F07/lectures/8.pdf · 2007. 9. 30. · P6 Dispatch (D): Part II Map Table T+ value

© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6 Data StructuresReservation Stations are same as before  

ROB• head, tail: pointers maintain sequential order

• R: insn output register, V: insn output value

T diff tTags are different• Tomasulo: RS# → P6: ROB#

Map Table is differentp• T+: tag + “ready‐in‐ROB” bit

• T==0 → Value is ready in regfile

• T! 0→ Value is not ready• T!=0 → Value is not ready

• T!=0+ → Value is ready in the ROB

EECS 470Lecture 8 Slide 10EECS 470 EECS 470

Page 11: Instruction/Decode Buffer Lecture 8 O In Dispatch Speculation & …web.eecs.umich.edu/~twenisch/470_F07/lectures/8.pdf · 2007. 9. 30. · P6 Dispatch (D): Part II Map Table T+ value

© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6 Data Structures

valueT+Map TableRegfile

R valueHead

DB

.V

DB

.T

HeadRetire

TailDispatch

V1 V2T2T1Top========

CD

CD

Dispatch========

ROBDispatch

FU

==RS

T

==

• Insn fields and status bits

• Tags

• Values

EECS 470Lecture 8 Slide 11EECS 470 EECS 470

Values

Page 12: Instruction/Decode Buffer Lecture 8 O In Dispatch Speculation & …web.eecs.umich.edu/~twenisch/470_F07/lectures/8.pdf · 2007. 9. 30. · P6 Dispatch (D): Part II Map Table T+ value

© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6 Data StructuresROBht # Insn R V S X C

1 ldf X( 1) f1

Map TableReg T+f0

CDBT V

1 ldf X(r1),f12 mulf f0,f1,f23 stf f2,Z(r1) 4 addi r1,4,r1

f0f1f2r1

5 ldf X(r1),f16 mulf f0,f1,f27 stf f2,Z(r1)

Reservation Stations# FU busy op T T1 T2 V1 V21 ALU no2 LD no3 ST no4 FP1 no

EECS 470Lecture 8 Slide 12EECS 470 EECS 470

5 FP2 no

Page 13: Instruction/Decode Buffer Lecture 8 O In Dispatch Speculation & …web.eecs.umich.edu/~twenisch/470_F07/lectures/8.pdf · 2007. 9. 30. · P6 Dispatch (D): Part II Map Table T+ value

© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6 PipelineNew pipeline structure: F, D, S, X, C, R

• D (dispatch)

• Structural hazard (ROB/LSQ/RS) ? Stall

• Allocate ROB/LSQ/RS

• Set RS tag to ROB#Set RS tag to ROB#

• Set Map Table entry to ROB# and clear “ready‐in‐ROB” bit

• Read ready registers into RS (from either ROB or Regfile)

X ( t )• X (execute)

• Free RS entry

• Use to be at W, can be earlier because RS# are not tags

EECS 470Lecture 8 Slide 13EECS 470 EECS 470

Page 14: Instruction/Decode Buffer Lecture 8 O In Dispatch Speculation & …web.eecs.umich.edu/~twenisch/470_F07/lectures/8.pdf · 2007. 9. 30. · P6 Dispatch (D): Part II Map Table T+ value

© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6 Pipeline• C (complete)

• Structural hazard (CDB)? wait

• Write value into ROB entry indicated by RS tag

• Mark ROB entry as complete

• If not overwritten, mark Map Table entry “ready‐in‐ROB” bit (+)

• R (retire)

• Insn at ROB head not complete ? stall

• Handle any exceptions• Handle any exceptions

• Write ROB head value to register file

• If store, write LSQ head to D$

• Free ROB/LSQ entries

EECS 470Lecture 8 Slide 14EECS 470 EECS 470

Page 15: Instruction/Decode Buffer Lecture 8 O In Dispatch Speculation & …web.eecs.umich.edu/~twenisch/470_F07/lectures/8.pdf · 2007. 9. 30. · P6 Dispatch (D): Part II Map Table T+ value

© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6 Dispatch (D): Part I

valueT+Map TableRegfile

R valueHead

DB

.V

DB

.T

HeadRetire

TailDispatch

V1 V2T2T1Top========

CD

CD

Dispatch========

ROBDispatch

FU

==RS

T

==

• RS/ROB full ? stall  

• Allocate RS/ROB entries, assign ROB# to RS output tag

• Set output register Map Table entry to ROB#, clear “ready‐in‐ROB”

EECS 470Lecture 8 Slide 15EECS 470 EECS 470

Set output register Map Table entry to ROB#, clear  ready in ROB

Page 16: Instruction/Decode Buffer Lecture 8 O In Dispatch Speculation & …web.eecs.umich.edu/~twenisch/470_F07/lectures/8.pdf · 2007. 9. 30. · P6 Dispatch (D): Part II Map Table T+ value

© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6 Dispatch (D): Part II

valueT+Map TableRegfile

R valueHead

DB

.V

DB

.T

HeadRetire

TailDispatch

V1 V2T2T1Top========

CD

CD

Dispatch========

ROBDispatch

FU

==RS

T

==

• Read tags for register inputs from Map Table

• Tag==0 → copy value from Regfile (not shown)

• Tag!=0 → copy Map Table tag to RS

EECS 470Lecture 8 Slide 16EECS 470 EECS 470

• Tag!=0+ → copy value from ROB

Page 17: Instruction/Decode Buffer Lecture 8 O In Dispatch Speculation & …web.eecs.umich.edu/~twenisch/470_F07/lectures/8.pdf · 2007. 9. 30. · P6 Dispatch (D): Part II Map Table T+ value

© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6 Complete (C)

valueT+Map TableRegfile

R valueHead

DB

.V

DB

.T

HeadRetire

TailDispatch

V1 V2T2T1Top========

CD

CD

Dispatch========

ROBDispatch

FU

==RS

T

==

• Structural hazard (CDB) ? Stall : broadcast <value,tag> on CDB

• Write result into ROB, if still valid set MapTable “ready‐in‐ROB” bit

• Match tags, write CDB.V into RS slots of dependent insns

EECS 470Lecture 8 Slide 17EECS 470 EECS 470

Match tags, write CDB.V into RS slots of dependent insns

Page 18: Instruction/Decode Buffer Lecture 8 O In Dispatch Speculation & …web.eecs.umich.edu/~twenisch/470_F07/lectures/8.pdf · 2007. 9. 30. · P6 Dispatch (D): Part II Map Table T+ value

© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6 Retire (R)

valueTMap TableRegfile

R valueHead

DB

.V

DB

.T

HeadRetire

TailDispatch

V1 V2T2T1Top========

CD

CD

Dispatch========

ROBDispatch

FU

==RS

T

==

• ROB head not complete ? stall : free ROB entry

• Write ROB head result to Regfile

• If still valid, clear Map Table entry

EECS 470Lecture 8 Slide 18EECS 470 EECS 470

If still valid, clear Map Table entry

Page 19: Instruction/Decode Buffer Lecture 8 O In Dispatch Speculation & …web.eecs.umich.edu/~twenisch/470_F07/lectures/8.pdf · 2007. 9. 30. · P6 Dispatch (D): Part II Map Table T+ value

© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6: Cycle 1ROBht # Insn R V S X Cht 1 ldf X( 1) f1 f1

Map TableReg T+f0

CDBT V

ht 1 ldf X(r1),f1 f12 mulf f0,f1,f23 stf f2,Z(r1) 4 addi r1,4,r1

f0f1 ROB#1f2r1

5 ldf X(r1),f16 mulf f0,f1,f27 stf f2,Z(r1)

Reservation Stations# FU busy op T T1 T2 V1 V2 set ROB# tag1 ALU no2 LD yes ldf ROB#1 [r1]3 ST no4 FP1 no

allocate

EECS 470Lecture 8 Slide 19EECS 470 EECS 470

5 FP2 no

Page 20: Instruction/Decode Buffer Lecture 8 O In Dispatch Speculation & …web.eecs.umich.edu/~twenisch/470_F07/lectures/8.pdf · 2007. 9. 30. · P6 Dispatch (D): Part II Map Table T+ value

© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6: Cycle 2ROBht # Insn R V S X Ch 1 ldf X( 1) f1 f1 2

Map TableReg T+f0

CDBT V

h 1 ldf X(r1),f1 f1 c2t 2 mulf f0,f1,f2 f2

3 stf f2,Z(r1) 4 addi r1,4,r1

f0f1 ROB#1f2 ROB#2r1

5 ldf X(r1),f16 mulf f0,f1,f27 stf f2,Z(r1)

Reservation Stations# FU busy op T T1 T2 V1 V2 set ROB# tag1 ALU no2 LD yes ldf ROB#1 [r1]3 ST no4 FP1 yes mulf ROB#2 ROB#1 [f0] allocate

EECS 470Lecture 8 Slide 20EECS 470 EECS 470

y # # [ ]5 FP2 no

allocate

Page 21: Instruction/Decode Buffer Lecture 8 O In Dispatch Speculation & …web.eecs.umich.edu/~twenisch/470_F07/lectures/8.pdf · 2007. 9. 30. · P6 Dispatch (D): Part II Map Table T+ value

© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6: Cycle 3ROBht # Insn R V S X Ch 1 ldf X( 1) f1 f1 2 3

Map TableReg T+f0

CDBT V

h 1 ldf X(r1),f1 f1 c2 c32 mulf f0,f1,f2 f2

t 3 stf f2,Z(r1) 4 addi r1,4,r1

f0f1 ROB#1f2 ROB#2r1

5 ldf X(r1),f16 mulf f0,f1,f27 stf f2,Z(r1)

Reservation Stations# FU busy op T T1 T2 V1 V21 ALU no2 LD no3 ST yes stf ROB#3 ROB#2 [r1]4 FP1 yes mulf ROB#2 ROB#1 [f0]

allocatefree

EECS 470Lecture 8 Slide 21EECS 470 EECS 470

y # # [ ]5 FP2 no

Page 22: Instruction/Decode Buffer Lecture 8 O In Dispatch Speculation & …web.eecs.umich.edu/~twenisch/470_F07/lectures/8.pdf · 2007. 9. 30. · P6 Dispatch (D): Part II Map Table T+ value

© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6: Cycle 4ROBht # Insn R V S X Ch 1 ldf X( 1) f1 f1 [f1] 2 3 4

Map TableReg T+f0

CDBT VROB#1 [f1]h 1 ldf X(r1),f1 f1 [f1] c2 c3 c4

2 mulf f0,f1,f2 f2 c43 stf f2,Z(r1)

t 4 addi r1,4,r1 r1

f0f1 ROB#1+f2 ROB#2r1 ROB#4

ROB#1 [f1]

5 ldf X(r1),f16 mulf f0,f1,f27 stf f2,Z(r1)

ldf finished1. set “ready-in-ROB” bit2. write result to ROB3 CDB broadcast

Reservation Stations# FU busy op T T1 T2 V1 V2

# ll t

3. CDB broadcast

1 ALU yes add ROB#4 [r1]2 LD no3 ST yes stf ROB#3 ROB#2 [r1]4 FP1 yes mulf ROB#2 ROB#1 [f0] CDB.V

allocate

ROB#1 ready

EECS 470Lecture 8 Slide 22EECS 470 EECS 470

y # # [ ]5 FP2 no grab CDB.V

Page 23: Instruction/Decode Buffer Lecture 8 O In Dispatch Speculation & …web.eecs.umich.edu/~twenisch/470_F07/lectures/8.pdf · 2007. 9. 30. · P6 Dispatch (D): Part II Map Table T+ value

© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6: Cycle 5ROBht # Insn R V S X C

1 ldf X( 1) f1 f1 [f1] 2 3 4

Map TableReg T+f0

CDBT V

1 ldf X(r1),f1 f1 [f1] c2 c3 c4h 2 mulf f0,f1,f2 f2 c4 c5

3 stf f2,Z(r1) 4 addi r1,4,r1 r1 c5

f0f1 ROB#5f2 ROB#2r1 ROB#4

t 5 ldf X(r1),f1 f16 mulf f0,f1,f27 stf f2,Z(r1)

ldf retires1. write ROB result to regfile

Reservation Stations# FU busy op T T1 T2 V1 V2

#1 ALU yes add ROB#4 [r1]2 LD yes ldf ROB#5 ROB#43 ST yes stf ROB#3 ROB#2 [r1]4 FP1 no

allocate

free

EECS 470Lecture 8 Slide 23EECS 470 EECS 470

5 FP2 nofree

Page 24: Instruction/Decode Buffer Lecture 8 O In Dispatch Speculation & …web.eecs.umich.edu/~twenisch/470_F07/lectures/8.pdf · 2007. 9. 30. · P6 Dispatch (D): Part II Map Table T+ value

© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6: Cycle 6ROBht # Insn R V S X C

1 ldf X( 1) f1 f1 [f1] 2 3 4

Map TableReg T+f0

CDBT V

1 ldf X(r1),f1 f1 [f1] c2 c3 c4h 2 mulf f0,f1,f2 f2 c4 c5+

3 stf f2,Z(r1) 4 addi r1,4,r1 r1 c5 c6

f0f1 ROB#5f2 ROB#6r1 ROB#4

5 ldf X(r1),f1 f1t 6 mulf f0,f1,f2 f2

7 stf f2,Z(r1)

Reservation Stations# FU busy op T T1 T2 V1 V2

f1 ALU no2 LD yes ldf ROB#5 ROB#43 ST yes stf ROB#3 ROB#2 [r1]4 FP1 yes mulf ROB#6 ROB#5 [f0] allocate

free

EECS 470Lecture 8 Slide 24EECS 470 EECS 470

y # # [ ]5 FP2 no

allocate

Page 25: Instruction/Decode Buffer Lecture 8 O In Dispatch Speculation & …web.eecs.umich.edu/~twenisch/470_F07/lectures/8.pdf · 2007. 9. 30. · P6 Dispatch (D): Part II Map Table T+ value

© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6: Cycle 7ROBht # Insn R V S X C

1 ldf X( 1) f1 f1 [f1] 2 3 4

Map TableReg T+f0

CDBT VROB#4 [ 1]1 ldf X(r1),f1 f1 [f1] c2 c3 c4

h 2 mulf f0,f1,f2 f2 c4 c5+3 stf f2,Z(r1) 4 addi r1,4,r1 r1 [r1] c5 c6 c7

f0f1 ROB#5f2 ROB#6r1 ROB#4+

ROB#4 [r1]

5 ldf X(r1),f1 f1 c7t 6 mulf f0,f1,f2 f2

7 stf f2,Z(r1) stall D (no free ST RS)

Reservation Stations# FU busy op T T1 T2 V1 V21 ALU no2 LD yes ldf ROB#5 ROB#4 CDB.V3 ST yes stf ROB#3 ROB#2 [r1]4 FP1 yes mulf ROB#6 ROB#5 [f0]

ROB#4 readygrab CDB.V

EECS 470Lecture 8 Slide 25EECS 470 EECS 470

y # # [ ]5 FP2 no

Page 26: Instruction/Decode Buffer Lecture 8 O In Dispatch Speculation & …web.eecs.umich.edu/~twenisch/470_F07/lectures/8.pdf · 2007. 9. 30. · P6 Dispatch (D): Part II Map Table T+ value

© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6: Cycle 8ROBht # Insn R V S X C

1 ldf X( 1) f1 f1 [f1] 2 3 4

Map TableReg T+f0

CDBT VROB#2 [f2]1 ldf X(r1),f1 f1 [f1] c2 c3 c4

h 2 mulf f0,f1,f2 f2 [f2] c4 c5+ c83 stf f2,Z(r1) c84 addi r1,4,r1 r1 [r1] c5 c6 c7

f0f1 ROB#5f2 ROB#6r1 ROB#4+

ROB#2 [f2]

5 ldf X(r1),f1 f1 c7 c8t 6 mulf f0,f1,f2 f2

7 stf f2,Z(r1)

stall R for addi (in-order)

ROB#2 invalid in MapTabledon’t set “ready-in-ROB”

Reservation Stations# FU busy op T T1 T2 V1 V2

don t set ready in ROB

1 ALU no2 LD no3 ST yes stf ROB#3 ROB#2 [f2] [r1]4 FP1 yes mulf ROB#6 ROB#5 [f0]

ROB#2 readygrab CDB.V

EECS 470Lecture 8 Slide 26EECS 470 EECS 470

y # # [ ]5 FP2 no

Page 27: Instruction/Decode Buffer Lecture 8 O In Dispatch Speculation & …web.eecs.umich.edu/~twenisch/470_F07/lectures/8.pdf · 2007. 9. 30. · P6 Dispatch (D): Part II Map Table T+ value

© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6: Cycle 9ROBht # Insn R V S X C

1 ldf X( 1) f1 f1 [f1] 2 3 4

Map TableReg T+f0

CDBT VROB#5 [f1]1 ldf X(r1),f1 f1 [f1] c2 c3 c4

2 mulf f0,f1,f2 f2 [f2] c4 c5+ c8h 3 stf f2,Z(r1) c8 c9

4 addi r1,4,r1 r1 [r1] c5 c6 c7

f0f1 ROB#5+f2 ROB#6r1 ROB#4+

ROB#5 [f1]

5 ldf X(r1),f1 f1 [f1] c7 c8 c96 mulf f0,f1,f2 f2 c9

t 7 stf f2,Z(r1)

retire mulf

all pipe stages active at once!

Reservation Stations# FU busy op T T1 T2 V1 V21 ALU no2 LD no3 ST yes stf ROB#7 ROB#6 ROB#4.V4 FP1 yes mulf ROB#6 ROB#5 [f0] CDB.V ROB#5 ready

free, re-allocate

EECS 470Lecture 8 Slide 27EECS 470 EECS 470

y # # [ ]5 FP2 no

ROB#5 readygrab CDB.V

Page 28: Instruction/Decode Buffer Lecture 8 O In Dispatch Speculation & …web.eecs.umich.edu/~twenisch/470_F07/lectures/8.pdf · 2007. 9. 30. · P6 Dispatch (D): Part II Map Table T+ value

© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6: Cycle 10ROBht # Insn R V S X C

1 ldf X( 1) f1 f1 [f1] 2 3 4

Map TableReg T+f0

CDBT V

1 ldf X(r1),f1 f1 [f1] c2 c3 c42 mulf f0,f1,f2 f2 [f2] c4 c5+ c8

h 3 stf f2,Z(r1) c8 c9 c104 addi r1,4,r1 r1 [r1] c5 c6 c7

f0f1 ROB#5+f2 ROB#6r1 ROB#4+

5 ldf X(r1),f1 f1 [f1] c7 c8 c96 mulf f0,f1,f2 f2 c9 c10

t 7 stf f2,Z(r1)

Reservation Stations# FU busy op T T1 T2 V1 V21 ALU no2 LD no3 ST yes stf ROB#7 ROB#6 ROB#4.V4 FP1 no free

EECS 470Lecture 8 Slide 28EECS 470 EECS 470

5 FP2 nofree

Page 29: Instruction/Decode Buffer Lecture 8 O In Dispatch Speculation & …web.eecs.umich.edu/~twenisch/470_F07/lectures/8.pdf · 2007. 9. 30. · P6 Dispatch (D): Part II Map Table T+ value

© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6: Cycle 11ROBht # Insn R V S X C

1 ldf X( 1) f1 f1 [f1] 2 3 4

Map TableReg T+f0

CDBT V

1 ldf X(r1),f1 f1 [f1] c2 c3 c42 mulf f0,f1,f2 f2 [f2] c4 c5 c83 stf f2,Z(r1) c8 c9 c10

h 4 addi r1,4,r1 r1 [r1] c5 c6 c7

f0f1 ROB#5+f2 ROB#6r1 ROB#4+

5 ldf X(r1),f1 f1 [f1] c7 c8 c96 mulf f0,f1,f2 f2 c9 c10

t 7 stf f2,Z(r1)

retire stf

Reservation Stations# FU busy op T T1 T2 V1 V21 ALU no2 LD no3 ST yes stf ROB#7 ROB#6 ROB#4.V4 FP1 no

EECS 470Lecture 8 Slide 29EECS 470 EECS 470

5 FP2 no

Page 30: Instruction/Decode Buffer Lecture 8 O In Dispatch Speculation & …web.eecs.umich.edu/~twenisch/470_F07/lectures/8.pdf · 2007. 9. 30. · P6 Dispatch (D): Part II Map Table T+ value

© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Precise State in P6Point of ROB is maintaining precise state

• How does that work?

• Easy as 1,2,3

1. Wait until last good insn retires, first bad insn at ROB head

2. Clear contents of ROB, RS, and Map Table2. Clear contents of ROB, RS, and Map Table

3. Start over

• Works because zero (0) means the right thing…

0 i ROB/RS t i t• 0 in ROB/RS → entry is empty

• Tag == 0 in Map Table → register is in regfile• …and because regfile and D$ writes take place at R

• Example: page fault in first stf

EECS 470Lecture 8 Slide 30EECS 470 EECS 470

Page 31: Instruction/Decode Buffer Lecture 8 O In Dispatch Speculation & …web.eecs.umich.edu/~twenisch/470_F07/lectures/8.pdf · 2007. 9. 30. · P6 Dispatch (D): Part II Map Table T+ value

© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6: Cycle 9 (with precise state)ROBht # Insn R V S X C

1 ldf X( 1) f1 f1 [f1] 2 3 4

Map TableReg T+f0

CDBT VROB#5 [f1]1 ldf X(r1),f1 f1 [f1] c2 c3 c4

2 mulf f0,f1,f2 f2 [f2] c4 c5+ c8h 3 stf f2,Z(r1) c8 c9

4 addi r1,4,r1 r1 [r1] c5 c6 c7

f0f1 ROB#5+f2 ROB#6r1 ROB#4+

ROB#5 [f1]

5 ldf X(r1),f1 f1 [f1] c7 c8 c96 mulf f0,f1,f2 f2 c9

t 7 stf f2,Z(r1) PAGE FAULT

Reservation Stations# FU busy op T T1 T2 V1 V21 ALU no2 LD no3 ST yes stf ROB#7 ROB#6 ROB#4.V4 FP1 yes mulf ROB#6 ROB#5 [f0] CDB.V

EECS 470Lecture 8 Slide 31EECS 470 EECS 470

y # # [ ]5 FP2 no

Page 32: Instruction/Decode Buffer Lecture 8 O In Dispatch Speculation & …web.eecs.umich.edu/~twenisch/470_F07/lectures/8.pdf · 2007. 9. 30. · P6 Dispatch (D): Part II Map Table T+ value

© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6: Cycle 10 (with precise state)ROBht # Insn R V S X C

1 ldf X( 1) f1 f1 [f1] 2 3 4

Map TableReg T+f0

CDBT V

1 ldf X(r1),f1 f1 [f1] c2 c3 c42 mulf f0,f1,f2 f2 [f2] c4 c5+ c83 stf f2,Z(r1) 4 addi r1,4,r1

f0f1f2r1

5 ldf X(r1),f16 mulf f0,f1,f27 stf f2,Z(r1)

faulting insn at ROB head?CLEAR EVERYTHING

Reservation Stations# FU busy op T T1 T2 V1 V21 ALU no2 LD no3 ST no4 FP1 no

EECS 470Lecture 8 Slide 32EECS 470 EECS 470

5 FP2 no

Page 33: Instruction/Decode Buffer Lecture 8 O In Dispatch Speculation & …web.eecs.umich.edu/~twenisch/470_F07/lectures/8.pdf · 2007. 9. 30. · P6 Dispatch (D): Part II Map Table T+ value

© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6: Cycle 11 (with precise state)ROBht # Insn R V S X C

1 ldf X( 1) f1 f1 [f1] 2 3 4

Map TableReg T+f0

CDBT V

1 ldf X(r1),f1 f1 [f1] c2 c3 c42 mulf f0,f1,f2 f2 [f2] c4 c5+ c8

ht 3 stf f2,Z(r1) 4 addi r1,4,r1

f0f1f2r1

5 ldf X(r1),f16 mulf f0,f1,f27 stf f2,Z(r1)

START OVER(after OS fixes page fault)

Reservation Stations# FU busy op T T1 T2 V1 V21 ALU no2 LD no3 ST yes stf ROB#3 [f4] [r1]4 FP1 no

EECS 470Lecture 8 Slide 33EECS 470 EECS 470

5 FP2 no

Page 34: Instruction/Decode Buffer Lecture 8 O In Dispatch Speculation & …web.eecs.umich.edu/~twenisch/470_F07/lectures/8.pdf · 2007. 9. 30. · P6 Dispatch (D): Part II Map Table T+ value

© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6: Cycle 12 (with precise state)ROBht # Insn R V S X C

1 ldf X( 1) f1 f1 [f1] 2 3 4

Map TableReg T+f0

CDBT V

1 ldf X(r1),f1 f1 [f1] c2 c3 c42 mulf f0,f1,f2 f2 [f2] c4 c5+ c8

h 3 stf f2,Z(r1) c12t 4 addi r1,4,r1 r1

f0f1f2r1 ROB#4

5 ldf X(r1),f16 mulf f0,f1,f27 stf f2,Z(r1)

Reservation Stations# FU busy op T T1 T2 V1 V2

#1 ALU yes addi ROB#4 [r1]2 LD no3 ST yes stf ROB#3 [f4] [r1]4 FP1 no

EECS 470Lecture 8 Slide 34EECS 470 EECS 470

5 FP2 no

Page 35: Instruction/Decode Buffer Lecture 8 O In Dispatch Speculation & …web.eecs.umich.edu/~twenisch/470_F07/lectures/8.pdf · 2007. 9. 30. · P6 Dispatch (D): Part II Map Table T+ value

© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6 PerformanceIn other words: what is the cost of precise state?

+ In general: same performance as “plain” Tomasulo

• ROB is not a performance device

• Maybe a little better (RS freed earlier → fewer struct hazards)– Unless ROB is too smallUnless ROB is too small

• In which case ROB struct hazards become a problem

• Rules of thumb for ROB size

At l t N ( idth) * b f i t b t D d R• At least N (width) * number of pipe stages between D and R

• At least N * thit‐L2• Can add a factor of 2 to both if you want

• What is the rationale behind these?

EECS 470Lecture 8 Slide 35EECS 470 EECS 470

Page 36: Instruction/Decode Buffer Lecture 8 O In Dispatch Speculation & …web.eecs.umich.edu/~twenisch/470_F07/lectures/8.pdf · 2007. 9. 30. · P6 Dispatch (D): Part II Map Table T+ value

© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

P6 (Tomasulo+ROB) ReduxPopular design for a while

• (Relatively) easy to implement correctly

• Anything goes wrong (mispredicted branch, fault, interrupt)?

• Just clear everything and start again

• Examples: Intel PentiumPro, IBM/Motorola PowerPC, AMD K6Examples: Intel PentiumPro, IBM/Motorola PowerPC, AMD K6

Actually making a comeback…E l I t l P ti M• Examples: Intel PentiumM

But went away for a while, why?y y

EECS 470Lecture 8 Slide 36EECS 470 EECS 470

Page 37: Instruction/Decode Buffer Lecture 8 O In Dispatch Speculation & …web.eecs.umich.edu/~twenisch/470_F07/lectures/8.pdf · 2007. 9. 30. · P6 Dispatch (D): Part II Map Table T+ value

© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

The Problem with P6

valueT+Map TableRegfile

R valueHead

DB

.V

DB

.T

HeadRetire

TailDispatch

V1 V2T2T1Top========

CD

CD

Dispatch========

ROBDispatch

FU

==RS

T

==

Problem for high performance implementations– Too much value movement (regfile/ROB→RS→ROB→regfile)– Multi input muxes long buses complicate routing and slow clock

EECS 470Lecture 8 Slide 37EECS 470 EECS 470

– Multi‐input muxes, long buses complicate routing and slow clock