recap: summary of pipelining basics

52
CS 152 Lec 13.1 CS 152: Computer Architecture and Engineering Lecture 13 Advanced Pipelining Randy H. Katz, Instructor Satrajit Chatterjee, Teaching Assistant George Porter, Teaching Assistant

Upload: galya

Post on 11-Jan-2016

32 views

Category:

Documents


2 download

DESCRIPTION

CS 152: Computer Architecture and Engineering Lecture 13 Advanced Pipelining Randy H. Katz, Instructor Satrajit Chatterjee, Teaching Assistant George Porter, Teaching Assistant. Recap: Summary of Pipelining Basics. Five Stages: Fetch: Fetch instruction from memory - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Recap: Summary of Pipelining Basics

CS 152Lec 13.1

CS 152: Computer Architecture

and Engineering

Lecture 13

Advanced Pipelining

Randy H. Katz, InstructorSatrajit Chatterjee, Teaching Assistant

George Porter, Teaching Assistant

Page 2: Recap: Summary of Pipelining Basics

CS 152Lec 13.2

Recap: Summary of Pipelining Basics Five Stages:

• Fetch: Fetch instruction from memory

• Decode: get register values and decode control information

• Execute: Execute arithmetic operations/calculate addresses

• Memory: Do memory ops (load or store)

• Writeback: Write results back to registers (I.e. COMMIT)

Pipelines pass control information down the pipe just as data moves down pipe

Forwarding/Stalls handled by local control

Balancing length of instructions makes pipelining much smoother

Increasing length of pipe increases impact of hazards; pipelining helps instruction bandwidth, not latency

Page 3: Recap: Summary of Pipelining Basics

CS 152Lec 13.3

Recap: Can Pipelining Get Us into Trouble? Yes: Pipeline Hazards

• Structural hazards: attempt to use the same resource two different ways at the same time

- E.g., combined washer/dryer would be a structural hazard or folder busy doing something else (watching TV)

• Data hazards: attempt to use item before it is ready

- E.g., one sock of pair in dryer and one in washer; can’t fold until get sock from washer through dryer

- Instruction depends on result of prior instruction still in the pipeline

• Control hazards: attempt to make a decision before condition is evaulated

- E.g., washing football uniforms and need to get proper detergent level; need to see after dryer before next load in

- Branch instructions

Can always resolve hazards by waiting• Pipeline control must detect the hazard

• Take action (or delay action) to resolve hazards

Page 4: Recap: Summary of Pipelining Basics

CS 152Lec 13.4

Pipelining the Load Instruction

The five independent functional units in the pipeline datapath are:

• Instruction Memory for the Ifetch stage

• Register File’s Read ports (bus A and busB) for the Reg/Dec stage

• ALU for the Exec stage

• Data Memory for the Mem stage

• Register File’s Write port (bus W) for the Wr stage

Clock

Cycle 1Cycle 2 Cycle 3Cycle 4 Cycle 5 Cycle 6 Cycle 7

Ifetch Reg/Dec Exec Mem Wr1st lw

Ifetch Reg/Dec Exec Mem Wr2nd lw

Ifetch Reg/Dec Exec Mem Wr3rd lw

Page 5: Recap: Summary of Pipelining Basics

CS 152Lec 13.5

The Four Stages of R-type

Ifetch: Instruction Fetch• Fetch the instruction from the Instruction Memory

Reg/Dec: Registers Fetch and Instruction Decode

Exec: • ALU operates on the two register operands

• Update PC

Wr: Write the ALU output back to the register file

Cycle 1Cycle 2 Cycle 3Cycle 4

Ifetch Reg/Dec Exec WrR-type

Page 6: Recap: Summary of Pipelining Basics

CS 152Lec 13.6

Pipelining the R-type and Load Instruction

We have pipeline conflict or structural hazard:• Two instructions try to write to the register file at the same

time!

• Only one write port

Clock

Cycle 1Cycle 2 Cycle 3Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9

Ifetch Reg/Dec Exec WrR-type

Ifetch Reg/Dec Exec WrR-type

Ifetch Reg/Dec Exec Mem WrLoad

Ifetch Reg/Dec Exec WrR-type

Ifetch Reg/Dec Exec WrR-type

Ops! We have a problem!

Page 7: Recap: Summary of Pipelining Basics

CS 152Lec 13.7

Important Observation Each functional unit can only be used once per

instruction

Each functional unit must be used at the same stage for all instructions:

• Load uses Register File’s Write Port during its 5th stage

• R-type uses Register File’s Write Port during its 4th stage

Ifetch Reg/Dec Exec Mem WrLoad

1 2 3 4 5

Ifetch Reg/Dec Exec WrR-type

1 2 3 4

2 ways to solve this pipeline hazard.

Page 8: Recap: Summary of Pipelining Basics

CS 152Lec 13.8

Solution 1: Insert “Bubble” into the Pipeline

Insert a “bubble” into the pipeline to prevent 2 writes at the same cycle

• The control logic can be complex.

• Lose instruction fetch and issue opportunity.

No instruction is started in Cycle 6!

Clock

Cycle 1Cycle 2 Cycle 3Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9

Ifetch Reg/Dec Exec WrR-type

Ifetch Reg/Dec Exec

Ifetch Reg/Dec Exec Mem WrLoad

Ifetch Reg/Dec Exec WrR-type

Ifetch Reg/Dec Exec WrR-type Pipeline

Bubble

Ifetch Reg/Dec Exec Wr

Page 9: Recap: Summary of Pipelining Basics

CS 152Lec 13.9

Solution 2: Delay R-type’s Write by One Cycle Delay R-type’s register write by one cycle:

• Now R-type instructions also use Reg File’s write port at Stage 5

• Mem stage is a NOOP stage: nothing is being done.

Clock

Cycle 1Cycle 2 Cycle 3Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9

Ifetch Reg/Dec Mem WrR-type

Ifetch Reg/Dec Mem WrR-type

Ifetch Reg/Dec Exec Mem WrLoad

Ifetch Reg/Dec Mem WrR-type

Ifetch Reg/Dec Mem WrR-type

Ifetch Reg/Dec Exec WrR-type Mem

Exec

Exec

Exec

Exec

1 2 3 4 5

Page 10: Recap: Summary of Pipelining Basics

CS 152Lec 13.10

Modified Control & Datapath

IR <- Mem[PC]; PC <– PC+4;

A <- R[rs]; B<– R[rt]

S <– A + B;

R[rd] <– M;

S <– A + SX;

M <– Mem[S]

R[rd] <– M;

S <– A or ZX;

R[rt] <– M;

S <– A + SX;

Mem[S] <- B

if Cond PC < PC+SX;

M <– S

Exec

Reg.

File

Mem

Acc

es

sD

ata

Mem

A

B

S

Reg

File

Equ

al

PC

Next

PC

IR

Inst

. M

em

D

M

M <– S

Page 11: Recap: Summary of Pipelining Basics

CS 152Lec 13.11

The Four Stages of Store

Ifetch: Instruction Fetch• Fetch the instruction from the Instruction Memory

Reg/Dec: Registers Fetch and Instruction Decode

Exec: Calculate the memory address

Mem: Write the data into the Data Memory

Cycle 1Cycle 2 Cycle 3Cycle 4

Ifetch Reg/Dec Exec MemStore Wr

Page 12: Recap: Summary of Pipelining Basics

CS 152Lec 13.12

The Three Stages of Beq

Ifetch: Instruction Fetch• Fetch the instruction from the Instruction Memory

Reg/Dec: • Registers Fetch and Instruction Decode

Exec: • compares the two register operand,

• select correct branch target address

• latch into PC

Cycle 1Cycle 2 Cycle 3Cycle 4

Ifetch Reg/Dec Exec MemBeq Wr

Page 13: Recap: Summary of Pipelining Basics

CS 152Lec 13.13

Control Diagram

IR <- Mem[PC]; PC < PC+4;

A <- R[rs]; B<– R[rt]

S <– A + B;

R[rd] <– S;

S <– A + SX;

M <– Mem[S]

R[rd] <– M;

S <– A or ZX;

R[rt] <– S;

S <– A + SX;

Mem[S] <- B

If Cond PC < PC+SX;

Exec

Reg.

File

Mem

Acc

es

sD

ata

Mem

A

B

S

Reg

File

Equ

al

PC

Next

PC

IR

Inst

. M

em

D

M <– S M <– S

M

Page 14: Recap: Summary of Pipelining Basics

CS 152Lec 13.14

Administrivia Get started on Lab 5: Pipelining is difficult to get

right! Be sure that we will test “gotcha” cases in our mystery programs…

Next Lecture: “state-of-the-art” pipelining• Out-of-order execution/register renaming

• Reorder buffers

Page 15: Recap: Summary of Pipelining Basics

CS 152Lec 13.15

The Five Classic Components of a Computer

Today’s Topics: • Recap last lecture

• Review MIPS R3000 pipeline

• Administrivia

• Advanced Pipelining

• SuperScalar, VLIW/EPIC

The Big Picture: Where are We Now?

Control

Datapath

Memory

Processor

Input

Output

Page 16: Recap: Summary of Pipelining Basics

CS 152Lec 13.16

Recall: Single Cycle Control!

DataOut

Clk

5

Rw Ra Rb

32 32-bitRegisters

Rd

ALU

Clk

Data In

DataAddress

IdealData

Memory

Instruction

InstructionAddress

IdealInstructionMemory

Clk

PC

5Rs

5Rt

32

323232

A

B

Next

Addre

ss

Control

Datapath

Control Signals Conditions

Page 17: Recap: Summary of Pipelining Basics

CS 152Lec 13.17

Data Stationary Control The Main Control generates the control signals

during Reg/Dec• Control signals for Exec (ExtOp, ALUSrc, ...) are used 1 cycle

later

• Control signals for Mem (MemWr Branch) are used 2 cycles later

• Control signals for Wr (MemtoReg MemWr) are used 3 cycles later

IF/ID

Reg

iste

r

ID/E

x R

eg

iste

r

Ex/M

em

Reg

iste

r

Mem

/Wr R

eg

iste

r

Reg/Dec Exec Mem

ExtOp

ALUOp

RegDst

ALUSrc

BranchMemWr

MemtoReg

RegWr

MainControl

ExtOp

ALUOp

RegDst

ALUSrc

MemtoReg

RegWr

MemtoReg

RegWr

MemtoReg

RegWr

BranchMemWr Branch

MemWr

Wr

Page 18: Recap: Summary of Pipelining Basics

CS 152Lec 13.18

Datapath + Data Stationary Control

Exec

Reg.

File

Mem

Acc

es

sD

ata

Mem

A

B

SR

eg

File

PC

Next

PC

IR

Inst

. M

em

D

Deco

de

MemCtrl

WB Ctrl

M

rs rt

oprsrt

fun

im

exmewbrwv

mewbrwv

wbrwv

Page 19: Recap: Summary of Pipelining Basics

CS 152Lec 13.19

Let’s Try it Out

10 lw r1, r2(35)

14 addI r2, r2, 3

20 sub r3, r4, r5

24 beq r6, r7, 100

30 ori r8, r9, 17

34 add r10, r11, r12

100 and r13, r14, 15

these addresses are octal

Page 20: Recap: Summary of Pipelining Basics

CS 152Lec 13.20

Start: Fetch 10

Exe

c

Reg

. F

ile

Mem

Acc

ess

Dat

aM

em

A

B

SReg

File

IR

Inst

. M

em

D

Dec

ode

MemCtrl

WB Ctrl

M

rs rt im

10 lw r1, r2(35)

14 addI r2, r2, 3

20 sub r3, r4, r5

24 beq r6, r7, 100

30 ori r8, r9, 17

34 add r10, r11, r12

100 and r13, r14, 15

IF

PC

Nex

t P

C

10

=

n n n n

Page 21: Recap: Summary of Pipelining Basics

CS 152Lec 13.21

Fetch 14, Decode 10

Exec

Reg.

File

Mem

Acc

es

sD

ata

Mem

A

B

S

Reg

File

IR

Inst

. M

em

D

Deco

de

MemCtrl

WB Ctrl

M

2 rt im

10 lw r1, r2(35)

14 addI r2, r2, 3

20 sub r3, r4, r5

24 beq r6, r7, 100

30 ori r8, r9, 17

34 add r10, r11, r12

100 and r13, r14, 15

lw r

1, r2

(35

)

ID

IF

PC

Next

PC

14

=

n n n

Page 22: Recap: Summary of Pipelining Basics

CS 152Lec 13.22

Fetch 20, Decode 14, Exec 10

Exec

Reg.

File

Mem

Acc

es

sD

ata

Mem

r2

B

S

Reg

File

IR

Inst

. M

em

D

Deco

de

MemCtrl

WB Ctrl

M

2 rt 35

10 lw r1, r2(35)

14 addI r2, r2, 3

20 sub r3, r4, r5

24 beq r6, r7, 100

30 ori r8, r9, 17

34 add r10, r11, r12

100 and r13, r14, 15

lw r

1

addI r2

, r2

, 3

ID

IF

EX

PC

Next

PC

20

=

n n

Page 23: Recap: Summary of Pipelining Basics

CS 152Lec 13.23

Fetch 24, Decode 20, Exec 14, Mem 10

Exec

Reg.

File

Mem

Acc

ess

Data

Mem

r2

B

r2+

35

Reg

File

IR

Inst

. M

em

D

Deco

de

MemCtrl

WB Ctrl

M

4 5 3

10 lw r1, r2(35)

14 addI r2, r2, 3

20 sub r3, r4, r5

24 beq r6, r7, 100

30 ori r8, r9, 17

34 add r10, r11, r12

100 and r13, r14, 15

lw r

1

sub r

3, r4

, r5

addI r2

, r2

, 3

ID

IF

EX

M

PC

Next

PC

24

=

n

Page 24: Recap: Summary of Pipelining Basics

CS 152Lec 13.24

Fetch 30, Dcd 24, Ex 20, Mem 14, WB 10

Exec

Reg.

File

Mem

Acc

es

sD

ata

Mem

r4

r5

r2+

3

Reg

File

IR

Inst

. M

em

D

Deco

de

MemCtrl

WB Ctrl

M[r

2+

35]6 7

10 lw r1, r2(35)

14 addI r2, r2, 3

20 sub r3, r4, r5

24 beq r6, r7, 100

30 ori r8, r9, 17

34 add r10, r11, r12

100 and r13, r14, 15

lw r

1

beq r

6, r7

10

0

addI r2

sub r

3

ID

IF

EX

M WB

PC

Next

PC

30

=

Note Delayed Branch: always execute ori after beq

Page 25: Recap: Summary of Pipelining Basics

CS 152Lec 13.25

Fetch 100, Dcd 30, Ex 24, Mem 20, WB 14

Exec

Reg.

File

Mem

Acc

es

sD

ata

Mem

r6

r7

r2+

3

Reg

File

IR

Inst

. M

em

D

Deco

de

MemCtrl

WB Ctrl

r1=

M[r

2+

35]

9 xx

10 lw r1, r2(35)

14 addI r2, r2, 3

20 sub r3, r4, r5

24 beq r6, r7, 100

30 ori r8, r9, 17

34 add r10, r11, r12

100 and r13, r14, 15

beq

addI r2

sub r

3r4

-r5

10

0

ori

r8

, r9

17

ID

IF

EX

M WB

PC

Next

PC

100

=

Page 26: Recap: Summary of Pipelining Basics

CS 152Lec 13.26

Fetch 104, Dcd 100, Ex 30, Mem 24, WB 20

Exec

Reg.

File

Mem

Acc

es

sD

ata

Mem

Reg

File

IR

Inst

. M

em

D

Deco

de

MemCtrl

WB Ctrl

10 lw r1, r2(35)

14 addI r2, r2, 3

20 sub r3, r4, r5

24 beq r6, r7, 100

30 ori r8, r9, 17

34 add r10, r11, r12

100 and r13, r14, 15

ID

EX

M WB

PC

Next

PC

___

=

Fill it in yourself!

?

Page 27: Recap: Summary of Pipelining Basics

CS 152Lec 13.27

Fetch 110, Dcd 104, Ex 100, Mem 30, WB 24

Exec

Reg.

File

Mem

Acc

es

sD

ata

Mem

Reg

File

IR

Inst

. M

em

D

Deco

de

MemCtrl

WB Ctrl

10 lw r1, r2(35)

14 addI r2, r2, 3

20 sub r3, r4, r5

24 beq r6, r7, 100

30 ori r8, r9, 17

34 add r10, r11, r12

100 and r13, r14, 15

EX

M WB

PC

Next

PC

___

=

Fill it in yourself!

? ?

?

?

?

Page 28: Recap: Summary of Pipelining Basics

CS 152Lec 13.28

Fetch 114, Dcd 110, Ex 104, Mem 100, WB 30

Exec

Reg.

File

Mem

Acc

ess

Data

Mem

Reg

File

IR

Inst

. M

em

D

Deco

de

MemCtrl

WB Ctrl

10 lw r1, r2(35)

14 addI r2, r2, 3

20 sub r3, r4, r5

24 beq r6, r7, 100

30 ori r8, r9, 17

34 add r10, r11, r12

100 and r13, r14, 15

M

WB

PC

Next

PC

___

=

Fill it in yourself!

? ?

?

?

?

?

Page 29: Recap: Summary of Pipelining Basics

CS 152Lec 13.29

Pipelined Processor

Separate control at each stage

Stalls propagate backwards to freeze previous stages

Bubbles in pipeline introduced by placing “Noops” into local stage, stall previous stages.

Exec

Reg.

File

Mem

Acc

ess

Data

Mem

A

B

S

M

Reg

File

Equ

al

PC

Next

PC

IR

Inst

. M

em

Valid

IRex

Dcd

Ctr

l

IRm

em

Ex C

trl

IRw

b

Mem

Ctr

l

WB

Ctr

l

D

Stalls

Bubbles

Page 30: Recap: Summary of Pipelining Basics

CS 152Lec 13.30

Recap: Data Hazards

Avoid some “by design”• Eliminate WAR by always fetching operands early (DCD) in

pipe

• Eliminate WAW by doing all WBs in order (last stage, static)

Detect and resolve remaining ones• Stall or forward (if possible)

IF DCD EX Mem WB

IF DCD OF Ex Mem

RAW Data Hazard

WAW Data Hazard

IF DCD OF Ex RS WAR Data Hazard

IF DCD EX Mem WB

IF DCD EX Mem WB

Page 31: Recap: Summary of Pipelining Basics

CS 152Lec 13.31

Hazard Detection

Suppose instruction i is about to be issued and a predecessor instruction j is in the instruction pipeline.

A RAW hazard exists on register if Rregs( i ) Wregs( j )• Keep a record of pending writes (for inst's in the pipe) and compare

with operand regs of current instruction.

• When instruction issues, reserve its result register.

• When on operation completes, remove its write reservation.

A WAW hazard exists on register if Wregs( i ) Wregs( j )

A WAR hazard exists on register if Wregs( i ) Rregs( j )

Window on execution:Only pending instructions cancause exceptionsInst J

Inst INew Inst

InstructionMovement:

Page 32: Recap: Summary of Pipelining Basics

CS 152Lec 13.32

Record of Pending Writes In Pipeline Registers

Current operand registers

Pending writes

hazard <=((rs == rwex) & regWex) OR

((rs == rwmem) & regWme) OR

((rs == rwwb) & regWwb) OR

((rt == rwex) & regWex) OR

((rt == rwmem) & regWme) OR

((rt == rwwb) & regWwb)

npc

I mem

Regs

B

alu

S

D mem

m

IAU

PC

Regs

A im op rwn

op rwn

op rwn

op rw rs rt

Page 33: Recap: Summary of Pipelining Basics

CS 152Lec 13.33

Resolve RAW by “forwarding” (or bypassing)

Detect nearest valid write op operand register and forward into op latches, bypassing remainder of the pipe

•Increase muxes to add paths from pipeline registers

•Data Forwarding = Data Bypassing

npc

I memRegs

B

alu

S

D mem

m

IAU

PC

Regs

A im op rwn

op rwn

op rwn

op rw rs rtForward

mux

Page 34: Recap: Summary of Pipelining Basics

CS 152Lec 13.34

Forwarding

PCInstruction

memory

Registers

Mux

Mux

Control

ALU

EX

M

WB

M

WB

WB

ID/EX

EX/MEM

MEM/WB

Datamemory

Mux

Forwardingunit

IF/ID

Inst

ruct

ion

Mux

RdEX/MEM.RegisterRd

MEM/WB.RegisterRd

Rt

Rt

Rs

IF/ID.RegisterRd

IF/ID.RegisterRt

IF/ID.RegisterRt

IF/ID.RegisterRs

Page 35: Recap: Summary of Pipelining Basics

CS 152Lec 13.35

What about Memory Operations?

A B

op Rd Ra Rb

op Rd Ra Rb

Rd

to regfile

R

Rd

If instructions are initiated in order and operations always occur in the same stage, there can be no hazards between memory operations!

What does delaying WB on arithmetic operations cost? – cycles ? – hardware ?

What about data dependence on loads? R1 <- R4 + R5 R2 <- Mem[ R2 + I ] R3 <- R2 + R1 “Delayed Loads”

Can recognize this in decode stage and introduce bubble while stalling fetch stage (hint for lab 5!)

Tricky situation: R1 <- Mem[ R2 + I ] Mem[R3+34] <- R1 Handle with bypass in memory stage!

D

Mem

T

Page 36: Recap: Summary of Pipelining Basics

CS 152Lec 13.36

Compiler Avoiding Load Stalls:

% loads stalling pipeline

0% 20% 40% 60% 80%

tex

spice

gcc

25%

14%

31%

65%

42%

54%

scheduled unscheduled

Page 37: Recap: Summary of Pipelining Basics

CS 152Lec 13.37

What about Interrupts, Traps, Faults? External Interrupts:

• Allow pipeline to drain, Fill with NOPs

• Load PC with interrupt address

Faults (within instruction, restartable)• Force trap instruction into IF

• disable writes till trap hits WB

• must save multiple PCs or PC + state

Recall: Precise Exceptions State of the machine is preserved as if program executed up to the offending instruction

• All previous instructions completed

• Offending instruction and all following instructions act as if they have not even started

• Same system code will work on different implementations

Page 38: Recap: Summary of Pipelining Basics

CS 152Lec 13.38

Exception/Interrupts: Implementation Questions5 instructions, executing in 5 different pipeline

stages! Who caused the interrupt?

Stage Problem interrupts occurring

IF Page fault on instruction fetch; misaligned memory access; memory-protection violation

ID Undefined or illegal opcode

EX Arithmetic exception

MEM Page fault on data fetch; misaligned memory access; memory-protection

violation; memory error

How do we stop the pipeline? How do we restart it? Do we interrupt immediately or wait? How do we sort all of this out to maintain

preciseness?

Page 39: Recap: Summary of Pipelining Basics

CS 152Lec 13.39

Exception Handling

npc

I mem

Regs

B

alu

S

D mem

m

IAU

PClw $2,20($5)

Regs

A im op rwn

detect bad instruction address

detect bad instruction

detect overflow

detect bad data address

Allow exception to take effect

Excp

Excp

Excp

Excp

Page 40: Recap: Summary of Pipelining Basics

CS 152Lec 13.40

Another Look at the Exception Problem

Use pipeline to sort this out!• Pass exception status along with instruction.

• Keep track of PCs for every instruction in pipeline.

• Don’t act on exception until it reache WB stage

Handle interrupts through “faulting noop” in IF stage

When instruction reaches end of MEM stage:• Save PC EPC, Interrupt vector addr PC

• Turn all instructions in earlier stages into noops!

Pro

gra

m F

low

Time

IFetchDcd Exec Mem WB

IFetchDcd Exec Mem WB

IFetchDcd Exec Mem WB

IFetchDcd Exec Mem WB

Data TLB

Bad Inst

Inst TLB fault

Overflow

Page 41: Recap: Summary of Pipelining Basics

CS 152Lec 13.41

Resolution: Freeze Above & Bubble Below

Flush accomplished by setting “invalid” bit in pipeline

npc

I mem

Regs

B

alu

S

D mem

m

IAU

PC

Regs

A im op rwn

op rwn

op rwn

op rw rs rt

bubble

freeze

Page 42: Recap: Summary of Pipelining Basics

CS 152Lec 13.42

FYI: MIPS R3000 clocking discipline

2-phase non-overlapping clocks

Pipeline stage is two (level sensitive) latches

phi1

phi2

phi1 phi1phi2Edge-triggered

Page 43: Recap: Summary of Pipelining Basics

CS 152Lec 13.43

MIPS R3000 Instruction Pipeline

Inst Fetch DecodeReg. Read

ALU / E.A Memory Write Reg

TLB I-Cache RF Operation WB

E.A. TLB D-Cache

TLB

I-cache

RF

ALUALU

TLB

D-Cache

WB

Resource Usage

Write in phase 1, read in phase 2 => eliminates bypass from WB

Page 44: Recap: Summary of Pipelining Basics

CS 152Lec 13.44

Recall: Data Hazard on r1

Instr.

Order

Time (clock cycles)

add r1,r2,r3

sub r4,r1,r3

and r6,r1,r7

or r8,r1,r9

xor r10,r1,r11

IF ID/RF EX MEM WBALUIm Reg Dm Reg

ALUIm Reg Dm Reg

ALUIm Reg Dm Reg

Im

ALUReg Dm Reg

ALUIm Reg Dm Reg

With MIPS R3000 pipeline, no need to forward from WB stage

Page 45: Recap: Summary of Pipelining Basics

CS 152Lec 13.45

MIPS R3000 Multicycle Operations

Ex: Multiply, Divide, Cache Miss

Use control word of local stage to step through multicycle operation

Stall all stages above multicycle operation in the pipeline

Drain (bubble) stages below it

Alternatively, launch multiply/divide to autonomous unit, only stall pipe if attempt to get result before ready - This means stall mflo/mfhi in

decode stage if multiply/divide still executing - Extra credit in Lab 5 does this

A B

op Rd Ra Rb

mul Rd Ra Rb

Rd

to regfile

R

T Rd

Page 46: Recap: Summary of Pipelining Basics

CS 152Lec 13.46

Is CPI = 1 for Our Pipeline?

Remember that CPI is an “Average # cycles/inst

CPI here is 1, since the average throughput is 1 instruction every cycle.

What if there are stalls or multi-cycle execution?

Usually CPI > 1. How close can we get to 1??

IFetchDcd Exec Mem WB

IFetchDcd Exec Mem WB

IFetchDcd Exec Mem WB

IFetchDcd Exec Mem WB

Page 47: Recap: Summary of Pipelining Basics

CS 152Lec 13.47

Case Study: MIPS R4000 (200 MHz)

8 Stage Pipeline:• IF–first half of fetching of instruction; PC selection happens here as

well as initiation of instruction cache access.

• IS–second half of access to instruction cache.

• RF–instruction decode and register fetch, hazard checking and also instruction cache hit detection.

• EX–execution, which includes effective address calculation, ALU operation, and branch target computation and condition evaluation.

• DF–data fetch, first half of access to data cache.

• DS–second half of access to data cache.

• TC–tag check, determine whether the data cache access hit.

• WB–write back for loads and register-register operations.

8 Stages: What is impact on Load delay? Branch delay? Why?

Page 48: Recap: Summary of Pipelining Basics

CS 152Lec 13.48

Case Study: MIPS R4000

IF ISIF

RFISIF

EXRFISIF

DFEXRFISIF

DSDFEXRFISIF

TCDSDFEXRFISIF

WBTCDSDFEXRFISIF

TWO CycleLoad Latency

IF ISIF

RFISIF

EXRFISIF

DFEXRFISIF

DSDFEXRFISIF

TCDSDFEXRFISIF

WBTCDSDFEXRFISIF

THREE CycleBranch Latency(conditions evaluated during EX phase)

Delay slot plus two stallsBranch likely cancels delay slot if not taken

Page 49: Recap: Summary of Pipelining Basics

CS 152Lec 13.49

MIPS R4000 Floating Point

FP Adder, FP Multiplier, FP Divider

Last step of FP Multiplier/Divider uses FP Adder HW

8 kinds of stages in FP units:Stage Functional unit Description

A FP adder Mantissa ADD stage

D FP divider Divide pipeline stage

E FP multiplier Exception test stage

M FP multiplier First stage of multiplier

N FP multiplier Second stage of multiplier

R FP adder Rounding stage

S FP adder Operand shift stage

U Unpack FP numbers

Page 50: Recap: Summary of Pipelining Basics

CS 152Lec 13.50

MIPS FP Pipe Stages

FP Instr 1 2 3 4 5 6 7 8 …

Add, Subtract U S+A A+R R+S

Multiply U E+M M M M N N+A R

Divide U A R D28 … D+A D+R, D+R, D+A, D+R, A, R

Square root U E (A+R)108 … A R

Negate U S

Absolute value U S

FP compare U A R

Stages:M First stage of multiplierN Second stage of multiplierR Rounding stageS Operand shift stageU Unpack FP numbers

A Mantissa ADD stage

D Divide pipeline stage

E Exception test stage

Page 51: Recap: Summary of Pipelining Basics

CS 152Lec 13.51

R4000 Performance

Not ideal CPI of 1:• Load stalls (1 or 2 clock cycles)

• Branch stalls (2 cycles + unfilled slots)

• FP result stalls: RAW data hazard (latency)

• FP structural stalls: Not enough FP hardware (parallelism)

00.5

11.5

22.5

33.5

44.5

eqnto

tt

esp

ress

o

gcc li

doduc

nasa

7

ora

spic

e2g6

su2co

r

tom

catv

Base Load stalls Branch stalls FP result stalls FP structural

stalls

Page 52: Recap: Summary of Pipelining Basics

CS 152Lec 13.52

Summary Hazards limit performance

• Structural: need more HW resources

• Data: need forwarding, compiler scheduling

• Control: early evaluation & PC, delayed branch, prediction

Data hazards must be handled carefully:• RAW data hazards handled by forwarding

• WAW and WAR hazards don’t exist in 5-stage pipeline

MIPS I instruction set architecture made pipeline visible (delayed branch, delayed load)

Exceptions in 5-stage pipeline recorded when theyoccur, but acted on only at WB (end of MEM) stage

• Must flush all previous instructions

More performance from deeper pipelines, parallelism