cmpe 421 parallel computer architecture

47
CMPE 421 Parallel Computer Architecture Part 2: Hardware Solution: Forwarding

Upload: vince

Post on 22-Feb-2016

29 views

Category:

Documents


2 download

DESCRIPTION

CMPE 421 Parallel Computer Architecture. Part 2: Hardware Solution: Forwarding. Hardware Solution: Forwarding. Idea: use intermediate data , do not wait for result to be finally written to the destination register. Two steps: Detect data hazard - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CMPE 421 Parallel  Computer Architecture

CMPE 421Parallel Computer Architecture

Part 2:Hardware Solution:

Forwarding

Page 2: CMPE 421 Parallel  Computer Architecture

Hardware Solution: Forwarding Idea: use intermediate data, do

not wait for result to be finally written to the destination register. Two steps:

1. Detect data hazard2. Forward intermediate data to

resolve hazard

Page 3: CMPE 421 Parallel  Computer Architecture

Review: MIPS Pipeline Data and Control Paths

ReadAddress

InstructionMemory

Add

PC

4

Write Data

Read Addr 1

Read Addr 2

Write Addr

Register

File

Read Data 1

Read Data 2

16 32

ALU

Shiftleft 2

Add

DataMemory

Address

Write Data

ReadData

IF/ID

SignExtend

ID/EXEX/MEM

MEM/WB

Control

ALUcntrl

RegWrite

MemWriteMemRead

MemtoReg

RegDst

ALUOp

ALUSrc

Branch

PCSrc

How many bits wide is each pipeline register?

PC – 32 bitsIF/ID – 64 bitsID/EX – 9 + 32x4 + 10

= 147EX/MEM – 5 + 1 + 32x3 +

5 = 107MEM/WB – 2 + 32x2 + 5

= 71

Page 4: CMPE 421 Parallel  Computer Architecture

Pipelined Datapath with Control II (as before)

PC

Instructionmemory

Inst

ruct

ion

Add

Instruction[20– 16]

Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4

16 32Instruction[15– 0]

0

0

Mux

0

1

Add Addresult

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux

1

ALUresult

Zero

Writedata

Readdata

Mux

1

ALUcontrol

Shiftleft 2

Reg

Writ

e

MemRead

Control

ALU

Instruction[15– 11]

6

EX

M

WB

M

WB

WBIF/ID

PCSrc

ID/EX

EX/MEM

MEM/WB

Mux

0

1

Mem

Writ

e

AddressData

memory

Address

Control signalsemanate from the controlportions of the pipeline registers

Page 5: CMPE 421 Parallel  Computer Architecture

Data Forwarding Plan:

allow inputs to the ALU not just from ID/EX, but also later pipeline registers, and

use multiplexors and control signals to choose appropriate inputs to ALU

IM Reg

IM Reg

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Time (in clock cycles)

sub $2, $1, $3

Programexecution order(in instructions)

and $12, $2, $5

IM Reg DM Reg

IM DM Reg

IM DM Reg

CC 7 CC 8 CC 9

10 10 10 10 10/– 20 – 20 – 20 – 20 – 20

or $13, $6, $2

add $14, $2, $2

sw $15, 100($2)

Value of register $2 :

DM Reg

Reg

Reg

Reg

X X X – 20 X X X X XValue of EX/MEM :X X X X – 20 X X X XValue of MEM/WB :

DM

sub $2, $1, $3and $12, $2, $5or $13, $6, $2add $14, $2, $2sw $15, 100($2)

Fig 6.29 Dependencies between pipelines move forward in time

Page 6: CMPE 421 Parallel  Computer Architecture

Possible Hazard Conditions can be detected by following notations during forwarding technique

Hazard conditions:1a. EX/MEM.RegisterRd = ID/EX.RegisterRs1b. EX/MEM.RegisterRd = ID/EX.RegisterRt2a. MEM/WB.RegisterRd = ID/EX.RegisterRs2b. MEM/WB.RegisterRd = ID/EX.RegisterRt

Eg., in the earlier example (Fig. 6-29), first hazard between sub $2, $1, $3 and

and $12, $2, $5 is detected when the and is in EX stage and the sub is in MEM stage because

EX/MEM.RegisterRd = ID/EX.RegisterRs = $2 (1a) Similar to above this time dependency between “sub” & “or”

can be detected as MEM/WB.RegisterRd = ID/EX.RegisterRt = $2 (2b)

The two dependencies on “sub”-”add” are not hazard Another form of forwarding but it occurs within the reg file

There is no hazard between sub-sw

Page 7: CMPE 421 Parallel  Computer Architecture

Remarks:-We don’t have any WB hazard

Why? Assume that REG. file supplies the correct result if the next instruction in the ID stage can read the register written by the current instruction in the WB stage

-Whether to forward also depends on:if the later instruction is going to write a register – if not, no need to forward, even if there is register number match as in conditions above

If the destination register of the later instruction is $0 – in which case there is no need to forward value ($0 is always 0 and never overwritten)

Page 8: CMPE 421 Parallel  Computer Architecture

Forwarding Hardware To Detect hazard forwarding unit should

be added by inserting Mux’es to the ALU inputs (see Fig 6.30)

For forwarding just the R-type instrucutions initially sub, add, and , ….

Forward A and Forward B control lines to select MUX inputs that will go into ALU

Forwarding unit will be in EX stage because the ALU forwarding MUX’es are found in that stage

Page 9: CMPE 421 Parallel  Computer Architecture

Forwarding Hardware

FIGURE 6.31 The control values for the forwarding multiplexors in Figure 6.30. The signed immediate that is another input to the ALU is described in the Elaboration at the end of this section.

Page 10: CMPE 421 Parallel  Computer Architecture

Data Forwarding (Bypassing) Take the result from the earliest point that it exists

in any of the pipeline state registers and forward it to the functional units (e.g., the ALU) that need it that cycle

For ALU functional unit: the inputs can come from any pipeline register rather than just from ID/EX by adding multiplexors to the inputs of the ALU connecting the Rd write data in EX/MEM or MEM/WB to

either (or both) of the EX’s stage Rs and Rt ALU mux inputs

adding the proper control hardware to control the new muxes

Other functional units may need similar forwarding logic (e.g., the DM)

With forwarding can achieve a CPI of 1 even in the presence of data dependencies

Page 11: CMPE 421 Parallel  Computer Architecture

ForwardingHardware

Registers

Mux M

ux

ALU

ID/EX MEM/WB

Datamemory

Mux

Forwardingunit

EX/MEM

b. With forwarding

ForwardB

Rd EX/MEM.RegisterRd

MEM/WB.RegisterRd

RtRtRs

ForwardA

Mux

ALU

ID/EX MEM/WB

Datamemory

EX/MEM

a. No forwarding

Registers

Mux

Datapath before adding forwarding hardware

Datapath after adding forwarding hardware

FIGURE 6.30 On the top are the ALU and pipeline registers before adding forwarding. On the bottom, the multiplexors have been expanded to add the forwarding paths, and we show the forwarding unit. The new hardware is shown in color. This figure is a stylized drawing, how ever, leaving out details from the full datapath such as the sign extension hardware. Note that the ID/EX. RegisterRt field is shown twice, once to connect to the mux and once to the forwarding unit, but it is a single signal. As in the earlier discussion, this ignores forwarding of a store value to a store instruction. Also note that this mechanism works for slt instructions as well.

Page 12: CMPE 421 Parallel  Computer Architecture

Elaboration on Forwarding Hardware

FIGURE 6.33 The datapath modified to resolve hazards via forwarding. Compared with the datapath in Figure 6.30, the additions are the multiplexors to the inputs to the ALU. This figure is a more stylized drawing, however, leaving out details from the full datapath, such as the branch hardware and the sign extension hardware.

Page 13: CMPE 421 Parallel  Computer Architecture

stall

stall

Review: One Way to “Fix” a Data Hazard

Instr.

Order

add $1,

ALUIM Reg DM Reg

sub $4,$1,$5

and $6,$7,$1

ALUIM Reg DM Reg

ALUIM Reg DM Reg

Fix data hazard by

waiting – stall – but impacts

CPI

Page 14: CMPE 421 Parallel  Computer Architecture

Review: Another Way to “Fix” a Data Hazard

Instr.

Order

add $1,

ALUIM Reg DM Reg

sub $4,$1,$5

and $6,$7,$1ALUIM Reg DM Reg

ALUIM Reg DM Reg

Fix data hazards by forwarding results as soon

as they are available to

where they are needed

sw $4,4($1)

or $8,$1,$1

ALUIM Reg DM Reg

ALUIM Reg DM Reg

Notice that for now we are showing the forwarded data coming out of the ALU. After looking at the problem more closely we will see that it really is supplied by the pipeline register EX/MEM and will depict it as such.

Page 15: CMPE 421 Parallel  Computer Architecture

Data Forwarding Control Conditions

1. EX/MEM hazard: if (EX/MEM.RegWriteand (EX/MEM.RegisterRd != 0)and (EX/MEM.RegisterRd = ID/EX.RegisterRs))

ForwardA = 10if (EX/MEM.RegWriteand (EX/MEM.RegisterRd != 0)and (EX/MEM.RegisterRd = ID/EX.RegisterRt))

ForwardB = 10

Forwards the result from the previous instr. to either input of the ALU

Forwards the result from the second previous instr. to either input of the ALU

2. MEM/WB hazard:if (MEM/WB.RegWriteand (MEM/WB.RegisterRd != 0)and (MEM/WB.RegisterRd = ID/EX.RegisterRs))

ForwardA = 01if (MEM/WB.RegWriteand (MEM/WB.RegisterRd != 0)and (MEM/WB.RegisterRd = ID/EX.RegisterRt))

ForwardB = 01

Page 16: CMPE 421 Parallel  Computer Architecture

Forwarding Illustration

Instr.

Order

add $1,

sub $4,$1,$5

and $6,$7,$1

ALUIM Reg DM Reg

ALUIM Reg DM Reg

ALUIM Reg DM Reg

EX/MEM hazard forwarding

MEM/WB hazard forwarding

Notice that for now we are showing the forwarded data coming out of the ALU. After looking at the problem more closely we will see that it really is supplied by the pipeline register EX/MEM and will depict it as such.

Page 17: CMPE 421 Parallel  Computer Architecture
Page 18: CMPE 421 Parallel  Computer Architecture

Yet Another Complication!

Instr.

Order

add $1,$1,$2

ALUIM Reg DM Reg

add $1,$1,$3

add $1,$1,$4

ALUIM Reg DM Reg

ALUIM Reg DM Reg

Another potential data hazard can occur when there is a conflict between the result of the WB stage instruction and the MEM stage instruction – which should be forwarded?

Page 19: CMPE 421 Parallel  Computer Architecture

Yet Another Complication!

Instr.

Order

add $1,$1,$2

ALUIM Reg DM Reg

add $1,$1,$3

add $1,$1,$4

ALUIM Reg DM Reg

ALUIM Reg DM Reg

Another potential data hazard can occur when there is a conflict between the result of the WB stage instruction and the MEM stage instruction – which should be forwarded?

Register $1 is written by both of the previous instructions, but only the most recent result (from the second ADD) should be forwarded.

Register $1 is written by both of the previous instructions, but only themost recent result (from the second ADD) should be forwarded.

Page 20: CMPE 421 Parallel  Computer Architecture

Corrected Data Forwarding Control Conditions2. MEM/WB hazard:if (MEM/WB.RegWriteand (MEM/WB.RegisterRd != 0)and (EX/MEM.RegisterRd != ID/EX.RegisterRs)and (MEM/WB.RegisterRd = ID/EX.RegisterRs))

ForwardA = 01

if (MEM/WB.RegWriteand (MEM/WB.RegisterRd != 0)and (EX/MEM.RegisterRd != ID/EX.RegisterRt)and (MEM/WB.RegisterRd = ID/EX.RegisterRt))

ForwardB = 01

Page 21: CMPE 421 Parallel  Computer Architecture

Datapath with Forwarding Hardware

PCSrc

ReadAddress

InstructionMemory

Add

PC

4

Write Data

Read Addr 1

Read Addr 2

Write Addr

Register

File

Read Data 1

Read Data 2

16 32

ALU

Shiftleft 2

Add

DataMemory

Address

Write Data

ReadData

IF/ID

SignExtend

ID/EXEX/MEM

MEM/WB

Control

ALUcntrl

Branch

ForwardUnit

ID/EX.RegisterRt

ID/EX.RegisterRs

EX/MEM.RegisterRd

MEM/WB.RegisterRd

MEM/WB.RegWrite

EX/MEM.RegWrite

Page 22: CMPE 421 Parallel  Computer Architecture

Forwarding Hardware with Control

PC Instructionmemory

Registers

Mux

Mux

Control

ALU

EX

M

WB

M

WB

WB

ID/EX

EX/MEM

MEM/WB

Datamemory

Mux

Forwardingunit

IF/ID

Inst

ruct

ion

Mux

RdEX/MEM.RegisterRd

MEM/WB.RegisterRd

Rt

Rt

Rs

IF/ID.RegisterRd

IF/ID.RegisterRt

IF/ID.RegisterRt

IF/ID.RegisterRs

Datapath with forwarding hardware and control wires – certain details,e.g., branching hardware, are omitted to simplify the drawingNote: so far we have only handled forwarding to R-type instructions…!

Called forwarding unit, not hazard detection unit, because once data is forwarded there is no hazard!

Page 23: CMPE 421 Parallel  Computer Architecture

Example Consider the following code

sequence in which the dependencies have been highlighted sub $2, $1, $3 and $4, $2, $5 or $4, $4, $2 add $9, $4, $2

We’ll try to keep the example short.— We’ll skip the first two cycles, since they’re the same as before.

Page 24: CMPE 421 Parallel  Computer Architecture

Example:Forwarding

PC Instructionmemory

Registers

Mux

Mux

Mux

EX

M

WB

WB

Datamemory

Mux

Forwardingunit

Inst

ruct

ion

IF/ID

and $4, $2, $5 sub $2, $1, $3

ID/EX

before<1>

EX/MEM

before<2>

MEM/WB

or $4, $4, $2

Clock 3

2

5

10 10

$2

$5

52

4

$1

$3

31

2

Control

ALU

PC Instructionmemory

Registers

Mux

Mux

Mux

EX

M

WB

M

WB

Datamemory

Mux

Forwardingunit

Inst

ruct

ion

IF/ID

or $4, $4, $2 and $4, $2, $5

ID/EX

sub $2, . . .

EX/MEM

before<1>

MEM/WB

add $9, $4, $2

Clock 4

4

6

10 10

$4

$2

62

4

$2

$5

52

4

Control

ALU

10

2

WB

M

WB

sub $2, $1, $3and $4, $2, $5or $4, $4, $2add $9, $4, $2

Execution

example:

Clock cycle 3

Clock cycle 4

Page 25: CMPE 421 Parallel  Computer Architecture

Execution

example (cont.):

PC Instructionmemory

Registers

Mux

Mux

Mux

EX

M

WB

M

WB

Datamemory

Mux

Forwardingunit

Inst

ruct

ion

IF/ID

add $9, $4, $2 or $4, $4, $2

ID/EX

and $4, . . .

EX/MEM

sub $2, . . .

MEM/WB

after<1>

Clock 5

4

2

10 10

$4

$2

24

9

$4

$2

4

2

24

Control

ALU

10

WB

2

1

PC Instructionmemory

Mux

Mux

Mux

EX

M

WB

M

WB

Datamemory

Mux

Forwardingunit

after<1>after<2> add $9, $4, $2 or $4, . . .

EX/MEM

and $4, . . .

MEM/WB

Clock 6

10

$4

$2

24

9

ALU

10

4

4

WB

4

1

Registers

Inst

ruct

ion

IF/ID

ID/EX

4

Controlsub $2, $1, $3and $4, $2, $5or $4, $4, $2add $9, $4, $2

Clock cycle 5

Clock cycle 6

Example:Forwarding

Page 26: CMPE 421 Parallel  Computer Architecture

Memory-to-Memory Copies

Instr.

Order

lw $1,4($2)ALUIM Reg DM Reg

sw $1,4($3)

ALUIM Reg DM Reg

For loads immediately followed by stores (memory-to-memory copies) can avoid a stall by adding forwarding hardware from the MEM/WB register to the data memory input.

Would need to add a Forward Unit and a mux to the memory access stage

What if lw was replaced with add $1, - is forwarding still needed? From where, to where?What if $1 was used to compute the effective address (it would be a load-use data hazard and would require a stall insertion between the lw and sw)

Page 27: CMPE 421 Parallel  Computer Architecture
Page 28: CMPE 421 Parallel  Computer Architecture
Page 29: CMPE 421 Parallel  Computer Architecture
Page 30: CMPE 421 Parallel  Computer Architecture
Page 31: CMPE 421 Parallel  Computer Architecture

Load-use Hazard Detection Unit Forwarding is not the solution for all data

hazard conditions, For ex. When an instruction tries to read a register following a “lw”, that writes the same register Problem will occur In clk 4 the correct value of reg 2 is not

available at the beginning Therefore, In addition to FU, we need

hazard detection unit (HDU) is to be operated in during ID stage

HDU will insert stall between the “load” and its use

Page 32: CMPE 421 Parallel  Computer Architecture

Load word can still cause a hazard: an instruction tries to read a register following a load

instruction that writes to the same register

therefore, we need a hazard detection unit to stall the pipeline after the load instruction

Data Hazards and Stalls

Reg

IM

Reg

Reg

IM

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Time (in clock cycles)

lw $2, 20($1)

Programexecutionorder(in instructions)

and $4, $2, $5

IM Reg DM Reg

IM DM Reg

IM DM Reg

CC 7 CC 8 CC 9

or $8, $2, $6

add $9, $4, $2

slt $1, $6, $7

DM Reg

Reg

Reg

DM

lw $2, 20($1)and $4, $2, $5or $8, $2, $6add $9, $4, $2Slt $1, $6, $7

As even a pipelinedependency goesbackward in timeforwarding will notsolve the hazard

Page 33: CMPE 421 Parallel  Computer Architecture

Stalling Resolves a Hazard Same instruction sequence as before for which

forwarding by itself could not resolve the hazard:

lw $2, 20($1)

Programexecutionorder(in instructions)

and $4, $2, $5

or $8, $2, $6

add $9, $4, $2

slt $1, $6, $7

Reg

IM

Reg

Reg

IM DM

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6Time (in clock cycles)

IM Reg DM RegIM

IM DM Reg

IM DM Reg

CC 7 CC 8 CC 9 CC 10

DM Reg

RegReg

Reg

bubble

lw $2, 20($1)and $4, $2, $5or $8, $2, $6add $9, $4, $2Slt $1, $6, $7

• Hazard detection unit inserts a 1-cycle bubble in the pipeline, after which all pipeline register dependencies go forward so then the forwarding unit can handle them and there are no more hazards

• AND instruction is turned into NOP all instructions beginning with AND instruction are delayed one cycle. Hazard forces the AND and OR instructions to repeat in clock cycle 4, what they did in clock cycle 3

Page 34: CMPE 421 Parallel  Computer Architecture

Example:Forwarding with Load-use Data Hazards

Instr.

Order

lw $1,4($2)

and $6,$1,$7

xor $4,$1,$5

or $8,$1,$9ALUIM Reg DM Reg

ALUIM Reg DM

ALUIM Reg DM Reg

ALUIM Reg DM Reg

ALUIM Reg DM Reg

ALUIM Reg DM Regsub $4,$1,$5

Page 35: CMPE 421 Parallel  Computer Architecture

stall

Forwarding with Load-use Data Hazards

Instr.

Order

lw $1,4($2)

sub $4,$1,$5

and $6,$1,$7

xor $4,$1,$5

or $8,$1,$9ALUIM Reg DM Reg

ALUIM Reg DM

ALUIM Reg DM RegALUIM Reg DM Reg

ALUIM Reg DM Reg

ALUIM Reg DM Regsub $4,$1,$5

and $6,$1,$7

xor $4,$1,$5

or $8,$1,$9

The one case where forwarding cannot save anything when an instruction tries to read a register following a load instruction that writes the same register.

Page 36: CMPE 421 Parallel  Computer Architecture

Load-use Hazard Detection Unit

Need a Hazard detection Unit in the ID stage that inserts a stall between the load and its use

The first line tests to see if the instruction now in the EX stage is a lw; the next two lines check to see if the destination register of the lw matches either source register of the instruction in the ID stage (the load-use instruction)

After this one cycle stall, the forwarding logic can handle the remaining data hazards

Hazard detection unit implements the following check if to stall

if ( ID/EX.MemRead // if the instruction in the EX stage is a load…

and ( ( ID/EX.RegisterRt = IF/ID.RegisterRs ) // and the destination register

or ( ID/EX.RegisterRt = IF/ID.RegisterRt ) ) ) // matches either source register

// of the instruction in the ID stage, then…

stall the pipeline

Page 37: CMPE 421 Parallel  Computer Architecture

Stall Hardware Along with the Hazard Unit, we have to implement

the stall Prevent the instructions in the IF and ID stages from

progressing down the pipeline – done by preventing the PC register and the IF/ID pipeline register from changing

Hazard detection Unit controls the writing of the PC (PC.write) and IF/ID (IF/ID.write) registers

Insert a “bubble” between the lw instruction (in the EX stage) and the load-use instruction (in the ID stage) (i.e., insert a noop in the execution stream)

Set the control bits in the EX, MEM, and WB control fields of the ID/EX pipeline register to 0 (noop). The Hazard Unit controls the mux that chooses between the real control values and the 0’s.

Let the lw instruction and the instructions after it in the pipeline (before it in the code) proceed normally down the pipeline

Page 38: CMPE 421 Parallel  Computer Architecture

Adding the Hazard Hardware

ReadAddress

InstructionMemory

Add

PC

4

Write Data

Read Addr 1

Read Addr 2

Write Addr

Register

File

Read Data 1

Read Data 2

16 32

ALU

Shiftleft 2

Add

DataMemory

Address

Write Data

ReadData

IF/ID

SignExtend

ID/EXEX/MEM

MEM/WBControl

ALUcntrl

Branch

PCSrc

ForwardUnit

HazardUnit 0

1

ID/EX.RegisterRt

0

ID/EX.MemRead

Page 39: CMPE 421 Parallel  Computer Architecture

Mechanics of Stalling If the check to stall verifies, then the pipeline needs

to stall only 1 clock cycle after the load as after that the forwarding unit can resolve the dependency

What the hardware does to stall the pipeline 1 cycle: does not let the IF/ID register change (disable write!) – this

will cause the instruction in the ID stage to repeat, i.e., stall therefore, the instruction, just behind, in the IF stage must

be stalled as well – so hardware does not let the PC change (disable write!) – this will cause the instruction in the IF stage to repeat, i.e., stall

changes all the EX, MEM and WB control fields in the ID/EX pipeline register to 0, so effectively the instruction just behind the load becomes a nop – a bubble is said to have been inserted into the pipeline

note that we cannot turn that instruction into an nop by 0ing all the bits in the instruction itself – recall nop = 00…0 (32 bits) – because it has already been decoded and control signals generated

Page 40: CMPE 421 Parallel  Computer Architecture

Pipelined Datapath with Control II (as before)

PC

Instructionmemory

Inst

ruct

ion

Add

Instruction[20– 16]

Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4

16 32Instruction[15– 0]

0

0

Mux

0

1

Add Addresult

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux

1

ALUresult

Zero

Writedata

Readdata

Mux

1

ALUcontrol

Shiftleft 2

Reg

Writ

e

MemRead

Control

ALU

Instruction[15– 11]

6

EX

M

WB

M

WB

WBIF/ID

PCSrc

ID/EX

EX/MEM

MEM/WB

Mux

0

1

Mem

Writ

e

AddressData

memory

Address

Control signalsemanate from the controlportions of the pipeline registers

Page 41: CMPE 421 Parallel  Computer Architecture

Hazard Detection Unit + Forwarding Unit

PC Instructionmemory

Registers

Mux

Mux

Mux

Control

ALU

EX

M

WB

M

WB

WB

ID/EX

EX/MEM

MEM/WB

Datamemory

Mux

Hazarddetection

unit

Forwardingunit

0

Mux

IF/ID

Inst

ruct

ion

ID/EX.MemRead

IF/I

DW

rite

PC

Writ

e

ID/EX.RegisterRt

IF/ID.RegisterRd

IF/ID.RegisterRtIF/ID.RegisterRtIF/ID.RegisterRs

RtRs

Rd

Rt EX/MEM.RegisterRd

MEM/WB.RegisterRd

Datapath with forwarding hardware, the hazard detection unit and controls wires – certain details, e.g., branching hardware are omitted to simplify the drawing

Page 42: CMPE 421 Parallel  Computer Architecture

Stalling Execution

example:lw $2, 20($1)and $4, $2, $5or $4, $4, $2add $9, $4, $2

Page 43: CMPE 421 Parallel  Computer Architecture

Stalling Execution

example:lw $2, 20($1)and $4, $2, $5or $4, $4, $2add $9, $4, $2

Page 44: CMPE 421 Parallel  Computer Architecture

Stalling Execution

example:lw $2, 20($1)and $4, $2, $5or $4, $4, $2add $9, $4, $2

Page 45: CMPE 421 Parallel  Computer Architecture

Stalling Execution

example:lw $2, 20($1)and $4, $2, $5or $4, $4, $2add $9, $4, $2

Page 46: CMPE 421 Parallel  Computer Architecture

Stalling Execution

example:lw $2, 20($1)and $4, $2, $5or $4, $4, $2add $9, $4, $2

Page 47: CMPE 421 Parallel  Computer Architecture

Stalling Execution

example:lw $2, 20($1)and $4, $2, $5or $4, $4, $2add $9, $4, $2