lec21-quite good but same thing

49
Er ik Jonsson School of En inee ri n and Computer Science e n vers y o exas a a as The Pipelined MIPS Processor The Pipelined MIPS Processor We complete our study of computer architecture by investigating an approach providing even higher performance for the MIPS CPU. We first saw how the MIPS CPU erformance could be im roved  by converting the so-called single-cycle CPU to a multi-cycle design. In the multi-cycle approach, instead of using a single clock cycle for the whole instruction, the clock is accelerated, and instructions execute in phases over several clock cycles. Each instruction phase takes one clock cycle. This means that as each instruction executes, o nly one section of the CPU wi ll be ac tive pe r clock cycle -- the one execut ing that phase of the instruction. This suggests that perhaps we might redesign the CPU slightly so © N. B. Dodge 09/09 1 Lecture #21 : The Pipeline MIPS Processor a every sec on can opera e n epen en y on an ns ruc on at the same time.

Upload: srinivas-somasundaram

Post on 08-Apr-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 1/49

Erik Jonsson School of En ineerin andComputer Science

e n vers y o exas a a as

The Pipelined MIPS Processor

The Pipelined MIPS Processor• We complete our study of computer architecture by investigating anapproach providing even higher performance for the MIPS CPU.

• We first saw how the MIPS CPU erformance could be im roved

by converting the so-called single-cycle CPU to a multi-cycle design.– In the multi-cycle approach, instead of using a single clock cycle for the

whole instruction, the clock is accelerated, and instructions execute inphases over several clock cycles.

– Each instruction phase takes one clock cycle.

– This means that as each instruction executes, only one section of the

CPU will be active per clock cycle -- the one executing that phase of theinstruction.

• This suggests that perhaps we might redesign the CPU slightly so

© N. B. Dodge 09/091 Lecture #21: The Pipeline MIPS Processor

a every sec on can opera e n epen en y on an ns ruc onat the same time.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 2/49

Erik Jonsson School of En ineerin andComputer Science

e n vers y o exas a a as

The “Laundry Example”The “Laundry Example”

• As an introduction to the concept of pipelining, Patterson andHennessy use the example of doing one’s laundry.

• Most eo le have – or have access to – a washer and dr er.

• Assume that you need to wash several washer loads of clothing.

• Would anyone divide the clothing into washer loads and then, ,

second?

• No, if you were washing clothes, you would finish washing the first

, , .• If there were more loads to wash, you would begin to fold and putaway finished clothing while the later loads were washing and

© N. B. Dodge 09/092 Lecture #21: The Pipeline MIPS Processor

.

• We can see this schematically on the next slide.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 3/49

Erik Jonsson School of En ineerin andComputer Science

e n vers y o exas a a as

Graphical Example of the Laundry CycleGraphical Example of the Laundry Cycle

© N. B. Dodge 09/093 Lecture #21: The Pipeline MIPS Processor

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 4/49

Erik Jonsson School of En ineerin andComputer Science

e n vers y o exas a a as

The “Pipeline” ProcessorThe “Pipeline” Processor

• Patterson and Hennessy applied this “simultaneous wash-dry-fold-put away concept” to the single-cycle computer model.

“ ”, , ,simultaneously so that the instruction throughput – the number of

clock cycles per instructions – could be dramatically decreased.,

cycle, but the clock must be as slow as the slowest instruction.

• In the multi-cycle implementation, the clock runs faster, instructions

- , .• What if, each time the clock ticked, we could process an instruction

in each section of the multicycle processor? Then we could process

© N. B. Dodge 09/094 Lecture #21: The Pipeline MIPS Processor

,completing an instruction every clock cycle.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 5/49

Erik Jonsson School of En ineerin andComputer Science

e n vers y o exas a a as

Pipeline ArchitecturePipeline Architecture

• A pipelined computer executes instructions concurrently.• Hardware units are organized into stages:

– Execution in each stage takes exactly 1 clock period.– partial results to the next stage.

• Unfortunately, as noted earlier, speed = complexity + cost.

e p pe ne approac r ngs a ona expense p us sown set of problems and complications, called hazards,which we will also study.

© N. B. Dodge 09/095 Lecture #21: The Pipeline MIPS Processor

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 6/49

Erik Jonsson School of En ineerin andComputer Science

e n vers y o exas a a as

Sequential Versus Pipelined ExecutionSequential Versus Pipelined Execution

(clock cycles) 0 1 2 3 4 5 6 7 8 9 10

lw $t0, 16($a3) Instruc.Fetch

Reg.Fetch

ALUProcess

Mem. R/Wor ALU Out

Reg.Write

lw $t1, 32($a3)

lw $t2, 48($a3)

Instruc.Fetch

Reg.Fetch

ALUProcess

Mem. R/Wor ALU Out

Reg.Write

Instruc.

Fetch

Reg.

Fetch

AL

Proc

4 clock cycles

4 clock cycles

etc.Timeline(clock cycles) 0 1 2 3 4 5 6 7 8 9 10

5 clock cycles

lw $t0, 16($a3)

lw $t1, 32($a3)

Instruc.Fetch

Reg.Fetch

ALUProcess

Mem. R/Wor ALU Out

Reg.Write

Instruc.Fetch

Reg.Fetch

ALUProcess

Mem. R/Wor ALU Out

Reg.Write

Instruc. Reg. ALU Mem. R/W Reg.

© N. B. Dodge 09/096 Lecture #21: The Pipeline MIPS Processor

lw $t2, 48($a3)etc.

e c e c rocess or u r e

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 7/49

Erik Jonsson School of En ineerin andComputer Science

e n vers y o exas a a as

Speed Advantage of the PipelineSpeed Advantage of the Pipeline• The multicycle, serial processor that we studied last lecture can

execute n instructions in ns clock periods, or ET S = ns , where

• A pipelined processor with s stages can execute n instructions in

ET = s + n 1 clock eriods.

• The ideal pipeline speedup depends on the number of stages, andcan be greater for more stages (hence Intel’s choice of a 20-stage

i eline for the current P-IV .

• Thus the speed advantage of pipeline over multicycle can bedefined as:

s ET ns S s

© N. B. Dodge 09/097 Lecture #21: The Pipeline MIPS Processor

P s n

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 8/49

Erik Jonsson School of En ineerin andComputer Science

e n vers y o exas a a as

Pipeline StagesPipeline Stages

0 1 2 3 4 5

ID/

oc cyc es

• The MIPS R2000 pipeline processor is divided into five processing

RF

stages:1. Instruction fetch (IF)2. Instruction decode (ID) and register fetch (RF)

3. ALU instruction execution (ALU) ALU processing, branchcondition evaluation, memory address computation, etc. This is alsoreferred to as execution (EX)

© N. B. Dodge 09/098 Lecture #21: The Pipeline MIPS Processor

. emory access5. Write back (WB) to register file

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 9/49

Erik Jonsson School of En ineerin andComputer Science

e n vers y o exas a a as

Overlapped Pipeline ExecutionOverlapped Pipeline Execution

0 1 2 3 4 5 6 7

Clock cycles

ALUIFID/RF

MEM WB Instruction 1

ALUIFID/RF

MEM WB

ID/

Instruction 2

RF

© N. B. Dodge 09/099 Lecture #21: The Pipeline MIPS Processor

Instruction execution order

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 10/49

Erik Jonsson School of En ineerin andComputer Science

e n vers y o exas a a as

Single-Cycle DatapathSingle-Cycle Datapath

ADDBranch

Mem. Read32

32

3232

Reg. Dest.

MUX

ADDMem. To Reg.

ALU Op.

Reg. Write

ALU Srce.

+

Leftshift

2 s 0 - 3 1

32

5

ControlMem. Write

6 (Bits 26-31)

ALUP InstructionAddress

MData

AddressInst.

M

s

Rt

Rd

ReadData 1

ReadData 2 Read

I n s t r u c t i o n b i t

32

32

3232

5

5

MUX

5

ReadWrite

Mem./Reg.Select

Lines indicate need for

UX

-UXReg. Block WriteData

SignExtend

32

WriteData

Data

16 (Bits 0-15)

32

32

ALU

InstructionMemory

DataMemory

© N. B. Dodge 09/0910 Lecture #21: The Pipeline MIPS Processor

storage between stages if processor is converted to

pipeline

6 (Bits 0-5)

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 11/49

Erik Jonsson School of En ineerin andComputer Science

e n vers y o exas a a as

Single-Cycle Datapath with Pipeline RegistersSingle-Cycle Datapath with Pipeline Registers

MUX

Inter-stage registers are master-slave D flip-flops; the master canbe receiving new data from the previous stage of the instructionwhile the slave flip-flop is providing data to the next stage

ADDADD

+4

Memory

Leftshift

2Compare

resultReg. Block

ALU

PC

ns ruc onAddress Memory

MU

DataAddress

Inst.0-31 Rt

Rd MU

ReadData 1

ReadData 2

ReadData

SignExtend

WriteData

16 32

WriteData

Master sideof register

Slave side

© N. B. Dodge 09/0911 Lecture #21: The Pipeline MIPS Processor

IF/ID ID/EX EX/MEM MEM/WBof register

Note: Control lines andlogic not shown for clarity After David A. Patterson and John L. Hennessy, Computer Organization and Design , 2nd Edition

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 12/49

Erik Jonsson School of En ineerin andComputer Science

e n vers y o exas a a as

Instruction Process Through Pipeline (1)Instruction Process Through Pipeline (1)

MUX

ADDADD

+4

MemoryLeftshift

2Compare

resultReg. Block

ALU

PC

ns ruc onAddress Memory

MU

DataAddress

Inst.0-31 Rt

Rd MU

ReadData 1

ReadData 2

ReadData

Stage 1: Instructionloaded into IF/ID

SignExtend

WriteData

16 32

WriteData

© N. B. Dodge 09/0912 Lecture #21: The Pipeline MIPS Processor

,

After David A. Patterson and John L. Hennessy, Computer Organization and Design , 2nd Edition

IF/ID ID/EX EX/MEM MEM/WB

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 13/49

Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as

Instruction Process Through Pipeline (2)Instruction Process Through Pipeline (2)age : ns ruc on

decoded, register dataaccessed, immediatessign-extended

MUX

ADDADD

+4

Memory

Leftshift

2Compare

resultReg. Block

ALU

PC

InstructionAddress Memory

MU

DataAddress

Inst.0-31

s

Rt

Rd MU

ReadData 1

ReadData 2

ReadData

SignExtend

XWriteData

16

X

32

WriteData

© N. B. Dodge 09/0913 Lecture #21: The Pipeline MIPS Processor

After David A. Patterson and John L. Hennessy, Computer Organization and Design , 2nd Edition

IF/ID ID/EX EX/MEM MEM/WB

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 14/49

Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as

Instruction Process Through Pipeline (3)Instruction Process Through Pipeline (3)Stage 3: Instructionexecuted / branchaddress computed

MUX

ADDADD

+4

MemoryLeftshift

2Compare

resultReg. Block

ALU

PC

ns ruc onAddress Memory

MU

DataAddress

Inst.0-31 Rt

Rd MU

ReadData 1

ReadData 2

ReadData

SignExtend

WriteData

16 32

WriteData

© N. B. Dodge 09/0914 Lecture #21: The Pipeline MIPS Processor

After David A. Patterson and John L. Hennessy, Computer Organization and Design , 2nd Edition

IF/ID ID/EX EX/MEM MEM/WB

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 15/49

Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as

Instruction Process Through Pipeline (4)Instruction Process Through Pipeline (4)

store, branch taken/not takenALU results bypass taken toMEM/WB register

MUX

ADDADD

+4

MemoryLeftshift

2Compare

resultReg. Block

ALU

PC

ns ruc onAddress Memory

MU

DataAddress

Inst.0-31 Rt

Rd MU

ReadData 1

ReadData 2

ReadData

SignExtend

WriteData

16 32

WriteData

© N. B. Dodge 09/0915 Lecture #21: The Pipeline MIPS Processor

After David A. Patterson and John L. Hennessy, Computer Organization and Design , 2nd Edition

IF/ID ID/EX EX/MEM MEM/WB

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 16/49

Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as

Instruction Process Through Pipeline (5)Instruction Process Through Pipeline (5)M

UX

ADD+4

Instruction

MemoryLeftshift

2Compare

resultRs

Reg. Block

ALUC

Address emory

MUX

DataAddress

Inst.0-31 Rt

Rd

Write

MUX

ReadData 1

ReadData 2

ReadData

Stage 5: Resultwrite-back to

SignExtend

Data

16 32

WriteData

© N. B. Dodge 09/0916 Lecture #21: The Pipeline MIPS Processor

dest. register

After David A. Patterson and John L. Hennessy, Computer Organization and Desi gn , 2nd Edition

IF/ID ID/EX EX/MEM MEM/WB

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 17/49

Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as

Adding ControlAdding Control

• Control information must be carried along as a part of the instruction, since this information is required at

erent stages o t e p pe ne .• This can be done by adding more inter-stage storage

• The result is very large inter-stage registers . Forexample, the storage capacity required between the

instruction decode and ALU execution stages (ID/EXregister) is more than 120 bits.

© N. B. Dodge 09/0917 Lecture #21: The Pipeline MIPS Processor

is shown on the next slide

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 18/49

Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as

ID/EX EX/MEM MEM/WB

i t e

ADD+4

UX

IF/ID

- 3 1

ControlDecode

t e d

R e g i s

t e r W r

P InstructionAddress

MemoryLeftshift

2

Inst.

Reg. Block

Rs

Rt ReadData 1

I n s t r u c t i o n b i t s

ALU Srce

Branch

M e m o r y

W r

M e m o r y

R e

e m o r y / A

L U R e s u l t

ALUMUX

DataAddress

-

MUX

Rd

WriteData

ReadData 2

Write

ReadData

M

Full PipelineDesign with

MemorySign

Extend

32DataBits 0-15

Bits 16-20

Bits 11-15

ALUCont.

MU ALU Op

© N. B. Dodge 09/0918 Lecture #21: The Pipeline MIPS Processor

After David A. Patterson and John L. Hennessy,Computer Organization and Design , 2nd Edition

Reg. Dst.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 19/49

Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as

The Pipeline in ActionThe Pipeline in Action

• The following instruction sequence from the P&H textillustrates the pipeline in action.w ,

sub $11, $2, $3, ,

or $13, $6, $7add $14 $8 $9

• Note that registers are identified by number ratherthan the letter id’s, since that is the way they appear in

© N. B. Dodge 09/0919 Lecture #21: The Pipeline MIPS Processor

e processor. s a rem n er, = a , - = -

t6, $2-3=$v0-v1, $4-7=$a0-a3, etc.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 20/49

IF: Idle ID/RF: Idle EX: Idle MEM: Idle WB: Idle

M

ID/EX EX/MEM MEM/WB

r i t e

ADDADD

+4

UX

IF/ID

0 - 3 1

Decode

r i t e a d

R e g i s t e r

PInstruction

Address

MemoryLeftshift

2

Inst.0-31

Reg. Block

Rs

Rt ReadData 1 I

n s t r u c t i o n b i t

ALU Srce

Branch

M e m o r y

W

M e m o r y

R

e m o r y / A

L U R e s u l

ALUMUX

DataAddress

MUX

Rd

WriteData

ReadData 2

Write

ReadData

MemorySign

Extenda as -

Bits 16-20

Bits 11-15

ALUCont.

MUX

ALU Op

After David A. Patterson and John L. Hennessy,Computer Organization and Design , 2nd Edition20

Reg. Dst.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 21/49

IF: lw $10 20 $1 ID/RF: Idle EX: Idle MEM: Idle WB: Idle

ID/EX EX/MEM MEM/WB

i t e

ADD+4

UX

IF/ID

- 3 1

ControlDecode

t e d

R e g i s t e r

W r

PInstruction

Address

MemoryLeftshift

2

Inst.

Reg. Block

Rs

Rt ReadData 1

I n s t r u c t i o n

b i t s

ALU Srce

Branch

M e m o r y

W r

M e m o r y

R e

e m o r y / A

L U R e s u l t

ALUMUX

DataAddress

-

MUX

Rd

WriteData

ReadData 2

Write

ReadData

M

MemorySignExtend

32DataBits 0-15

Bits 16-20

Bits 11-15

ALUCont.

MU ALU Op

After David A. Patterson and John L. Hennessy,Computer Organization and Design , 2nd Edition21

Reg. Dst.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 22/49

IF: sub $11 $2 $3 ID/RF: lw $10 20 $1 EX: Idle MEM: Idle WB: Idle

ID/EX EX/MEM MEM/WB

i t e

ADD+4

UX

IF/ID

- 3 1

ControlDecode

t e d

R e g i s t e r

W r

[ $1 ]

$ 1

$ 10

PInstruction

Address

MemoryLeftshift

2

Inst.

Reg. Block

Rs

Rt ReadData 1

I n s t r u c t i o n

b i t s

ALU Srce

Branch

M e m o r y

W r

M e m o r y

R e

e m o r y / A

L U R e s u l t

X ALUMUX

DataAddress

-

MUX

Rd

WriteData

ReadData 2

Write

ReadData

M

$ 10

X

20

MemorySignExtendDataBits 0-15

Bits 16-20

Bits 11-15

ALUCont.

MU ALU Op

After David A. Patterson and John L. Hennessy,Computer Organization and Design , 2nd Edition22

Reg. Dst.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 23/49

IF: and $12 $4 $5 MEM: Idle WB:ID/RF: sub $11 $2 $3 EX: lw $10 20 $1 Idle

ID/EX EX/MEM MEM/WB

i t e

ADD+4

UX

IF/ID

- 3 1

ControlDecode

t e d

R e g i s t e r

W r

[ $ 2 ] [ $1 ]$ 3

$ 2

PInstruction

Address

MemoryLeftshift

2

Inst.

Reg. Block

Rs

Rt ReadData 1

I n s t r u c t i o n

b i t s

ALU Srce

Branch

M e m o r y

W r

M e m o r y

R e

e m o r y / A

L U R e s u l t

[ $ 3 ]

add20

ALUMUX

DataAddress

-

MUX

Rd

WriteData

ReadData 2

Write

ReadData

M

$ 11

X

X

20

$ 10

$ 10

MemorySignExtendDataBits 0-15

Bits 16-20

Bits 11-15

ALUCont.

MU ALU Op

After David A. Patterson and John L. Hennessy,Computer Organization and Design , 2nd Edition23

Reg. Dst.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 24/49

IF: or $13 $6 $7 WB:ID/RF: and $12 $4 $5 EX: sub $11 $2 $3 MEM: lw $10 20 $1 Idle

ID/EX EX/MEM MEM/WB

i t e

ADD+4

UX

IF/ID

- 3 1

ControlDecode

t e d

R e g i s t e r

W r

[ $2 ]

$ 4

$ 5

PInstruction

Address

MemoryLeftshift

2

Inst.

Reg. Block

Rs

Rt ReadData 1

I n s t r u c t i o n

b i t s

ALU Srce

Branch

M e m o r y

W r

M e m o r y

R e

e m o r y / A

L U R e s u l t

[$3]

[ $3 ]sub

ALUMUX

DataAddress

-

MUX

Rd

WriteData

ReadData 2

Write

ReadData

M

X

X

$ 12 $ 11$ 11 $ 10

MemorySignExtendDataBits 0-15

Bits 16-20

Bits 11-15

ALUCont.

MU ALU Op

After David A. Patterson and John L. Hennessy,Computer Organization and Design , 2nd Edition24

Reg. Dst.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 25/49

IF: add $14 $8 $9 ID/RF: or $13 $6 $7 EX: and $12 $4 $5 MEM: sub $11 $2 $3 WB: lw $10 20($1)

ID/EX EX/MEM MEM/WB

i t e

ADD+4

UX

IF/ID

- 3 1

ControlDecode

t e d

R e g i s t e r

W r

[ $6 ]

$ 6

$ 7[ $4 ]P

InstructionAddress

MemoryLeftshift

2

Inst.

Reg. Block

Rs

Rt ReadData 1 I n

s t r u c t i o n b i t s

ALU Srce

Branch

M e m o r y

W r

M e m o r y

R e

e m o r y / A

L U R e s u l t

[ $7 ] [$5]

[ $5 ]and

$10 ALUMUX

DataAddress

-

MUX

Rd

WriteData

ReadData 2

Write

ReadData

M

X

X

$ 13 $ 12$ 12 $ 11 $ 10

MemorySignExtend

32 DataBits 0-15

Bits 16-20

Bits 11-15

ALUCont.

MU ALU Op

After David A. Patterson and John L. Hennessy,Computer Organization and Design , 2nd Edition25

Reg. Dst.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 26/49

IF: Idle ID/RF: add $14 $8 $9 EX: or $13 $6 $7 MEM: and $12 WB: sub$4, $5 $11, $2, $3

ID/EX EX/MEM MEM/WB

i t e

ADD+4

UX

IF/ID

- 3 1

ControlDecode

t e d

R e g i s t e r

W r

[ $8 ]

$ 8

$ 9[ $6 ]P

InstructionAddress

MemoryLeftshift

2

Inst.

Reg. Block

Rs

Rt ReadData 1 I n

s t r u c t i o n b i t s

ALU Srce

Branch

M e m o r y

W r

M e m o r y

R e

e m o r y / A

L U R e s u l t

[ $9 ]$11 [$7]

[ $7 ] or

ALUMUX

DataAddress

-

MUX

Rd

WriteData

ReadData 2

Write

ReadData

M

X

X

$ 14 $ 13$ 13 $ 12 $ 11

MemorySignExtendDataBits 0-15

Bits 16-20

Bits 11-15

ALUCont.

MU ALU Op

After David A. Patterson and John L. Hennessy,Computer Organization and Design , 2nd Edition26

Reg. Dst.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 27/49

IF: Idle ID/RF: Idle WB EX: add $14 $8 $9 $12, $4, $5

ID/EX EX/MEM MEM/WB

i t e

ADD+4

UX

IF/ID

- 3 1

ControlDecode

t e d

R e g i s t e r

W r

[ $8 ]PInstruction

Address

MemoryLeftshift

2

Inst.

Reg. Block

Rs

Rt ReadData 1 I n

s t r u c t i o n b i t s

ALU Srce

Branch

M e m o r y

W r

M e m o r y

R e

e m o r y / A

L U R e s u l t

[$9]

[ $9 ] add

$12 ALUMUX

DataAddress

-

MUX

Rd

WriteData

ReadData 2

Write

ReadData

M

$ 14$ 14 $ 13 $ 12

MemorySignExtend

32 DataBits 0-15

Bits 16-20

Bits 11-15

ALUCont.

MU ALU Op

After David A. Patterson and John L. Hennessy,Computer Organization and Design , 2nd Edition27

Reg. Dst.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 28/49

IF: Idle ID/RF: Idle WB EX: Idle $13, $6, $7

ID/EX EX/MEM MEM/WB

i t e

ADD+4

UX

IF/ID

- 3 1

ControlDecode

t e d

R e g i s t e r

W r

PInstruction

Address

MemoryLeftshift

2

Inst.

Reg. Block

Rs

Rt ReadData 1 I n s t r u c t i o n

b i t s

ALU Srce

Branch

M e m o r y

W r

M e m o r y

R e

e m o r y / A

L U R e s u l t

$13 ALUMUX

DataAddress

-

MUX

Rd

WriteData

ReadData 2

Write

ReadData

M

$ 14 $ 13

MemorySignExtend

32 DataBits 0-15

Bits 16-20

Bits 11-15

ALUCont.

MU ALU Op

After David A. Patterson and John L. Hennessy,Computer Organization and Design , 2nd Edition28

Reg. Dst.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 29/49

IF: Idle ID/RF: Idle WB EX: Idle MEM: Idle $14, $8, $9

ID/EX EX/MEM MEM/WB

i t e

ADD+4

UX

IF/ID

- 3 1

ControlDecode

t e d

R e g i s t e r

W r

PInstruction

Address

MemoryLeftshift

2

Inst.

Reg. Block

Rs

Rt ReadData 1 I n s t r u c t i o n

b i t s

ALU Srce

Branch

M e m o r y

W r

M e m o r y

R e

e m o r y / A

L U R e s u l t

$14 ALUMUX

DataAddress

-

MUX

Rd

WriteData

ReadData 2

Write

ReadData

M

$ 14

MemorySignExtend

32 DataBits 0-15

Bits 16-20

Bits 11-15

ALUCont.

MU ALU Op

After David A. Patterson and John L. Hennessy,Computer Organization and Design , 2nd Edition29

Reg. Dst.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 30/49

IF: Idle ID/RF: Idle EX: Idle MEM: Idle WB: Idle

ID/EX EX/MEM MEM/WB

i t e

ADD+4

UX

IF/ID

- 3 1

ControlDecode

t e d

R e g

i s t e r W r

PInstruction

Address

MemoryLeftshift

2

Inst.

Reg. Block

Rs

Rt ReadData 1 I n s t r u c t i o n

b i t s

ALU Srce

Branch

M e m o r y

W r

M e m o r y

R e

e m o r y / A

L U R e s u l t

ALUMUX

DataAddress

-

MUX

Rd

WriteData

ReadData 2

Write

ReadData

M

MemorySignExtend

32 DataBits 0-15

Bits 16-20

Bits 11-15

ALUCont.

MU ALU Op

30

Reg. Dst.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 31/49

Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as

Pipeline Processor Operation SummaryPipeline Processor Operation Summary

• Pipelining replaces the “single-cycle” processor with a“ - ” ,

completing one part of each instruction.

• A new instruction is started every clock cycle.• Inter-process registers store instruction information

(data, write register, branch conditions) between cycles“ ”

between the pipeline stages.• When the pipeline is filled with instructions, an

© N. B. Dodge 09/0931 Lecture #21: The Pipeline MIPS Processor

instruction completes every clock cycle.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 32/49

Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as

Exercise 1Exercise 1

• On the diagram on the next page, identify thefollowing:1. Highlight all the control lines that must be active during a load

word instruction.2. As in our exercise in Lecture 20, identify the decoder

locations.

3. The ID/EX Re ister interface stores the most bits of an of the

pipeline section interfaces. Approximately how many bits isthat, according to the diagram?

© N. B. Dodge 09/0932 Lecture #21: The Pipeline MIPS Processor

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 33/49

Print out a copy of this diagram and bring to class.

MUX

ID/EX EX/MEM MEM/WB

ControlDecode

g i s t e r W r i t e

ADDADD

+4

MemoryLeftshift

2

Reg. Block i o n b i t s 0 - 3 1

Branch o r y

W r i t e

m o r y

R e a d

R e

R e s u l t

ALU

PC

InstructionAddress

M Data

Inst.0-31

M

Rt

Rd

ReadData 1

ReadData 2 Read

I n s t r u cALU Srce M

e M e

M e m o r y / A

L

Memory

UX

UX

WriteData

SignExtend

32WriteData

a a

Bits 0-15

ALU

Bits 16-20

Bits 11-15

.

MUX

ALU Op

Reg. Dst.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 34/49

Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as

HazardsHazards

• Hazards occur because data required for executing the.

• An instruction in the “register fetch” cycle may need

data from a register whose value will be changed by aninstruction “downstream” but still in process in thepipeline (in the ALU, memory/memory bypass orwriteback c cle .

• Thus an “upstream” instruction could access a registerand get incorrect data because the register data has not

© N. B. Dodge 09/0934 Lecture #21: The Pipeline MIPS Processor

yet een up ate y a ownstream nstruct on.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 35/49

Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as

Hazards (2)Hazards (2)

• There are two types of hazards, data hazards , andcontrol hazards .• Both occur because an instruction in the ID/RF stage of

the MIPS pipeline needs register data that will be

MEM/Bypass, or WB stage.• Data hazards occur when an instruction needs register

contents for an arithmetic/ logical/memory instruction.• Control hazards occur when a branch instruction is

© N. B. Dodge 09/0935 Lecture #21: The Pipeline MIPS Processor

branch is not yet available in the same sort of scenario.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 36/49

Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as

Data Hazard in the PipelineData Hazard in the PipelineTimeline

(clock cycles) 0 1 2 3 4 5 6 7 8 9 10

5 clock cycles

sub $2, $1, $3

and $12, $2, $5

or $13 $6 $2 Instruc. Reg. ALU Mem. R/W Reg.

Instruc.Fetch

Reg.Fetch

ALUProcess

Mem. R/Wor ALU Out

Reg.Write

Instruc.Fetch

Reg.Fetch

ALUProcess

Mem. R/Wor ALU Out

Reg.Write

add $14, $2, $2

sw $15, 100($2) Instruc.Fetch

Reg.Fetch

ALUProcess

Mem. R/Wor ALU Out

Reg.Write

Instruc.Fetch

Reg.Fetch

ALUProcess

Mem. R/Wor ALU Out

Reg.Write

• In the instruction sequence above, the last four instructionsrequire data from $2, which is changed in the first instruction.• The $2 data will not be rewritten until cycle 4, so the AND and OR

© N. B. Dodge 09/0936 Lecture #21: The Pipeline MIPS Processor

n an r ns ruc ons w e c ncorrec a a rom .• Even the add may not get the correct information ( sw is okay ).

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 37/49

Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as

Control Hazards in the PipelineControl Hazards in the PipelineTimeline

(clock cycles) 0 1 2 3 4 5 6 7 8 9 10

5 clock cycles

sub $2, $1, $3

blt $2, $8, wait

b t $2 $7 o Instruc. Reg. ALU Mem. R/W Reg.

Instruc.Fetch

Reg.Fetch

ALUProcess

Mem. R/Wor ALU Out

Reg.Write

Instruc.Fetch

Reg.Fetch

ALUProcess

Mem. R/Wor ALU Out

Reg.Write

add $14, $2, $2

sw $15, 100($2) Instruc.Fetch

Reg.Fetch

ALUProcess

Mem. R/Wor ALU Out

Reg.Write

Instruc.Fetch

Reg.Fetch

ALUProcess

Mem. R/Wor ALU Out

Reg.Write

• Here the problem is changed, with two branch instructions added.• Neither branch instruction may be executed correctly, once again

© N. B. Dodge 09/0937 Lecture #21: The Pipeline MIPS Processor

.

• This wrong data could cause an incorrect branch.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 38/49

Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as

Forwarding as a Solution to Data HazardsForwarding as a Solution to Data Hazards

0 1 2 3 4 5

oc cyc es

ID/RF

ID/

• One solution to the problem of data hazards is forwarding .

RF

• Forwarding uses the fact that although instruction 2 needs registerdata two clock cycles before instruction 1 enters the WB stage, thatdata is already available as the output of the ALU .

© N. B. Dodge 09/0938 Lecture #21: The Pipeline MIPS Processor

• If a mechanism were available, instruction 1 could forward requiredregister data after its ALU cycle to the ID/RF cycle of instruction 2.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 39/49

Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as

Forwarding Unit in the PipelineForwarding Unit in the Pipeline

Rs

Rt

ReadData 1

ID/EX EX/MEM MEM/WB

MU

ALU MU

X

Rd

WriteData

ReadData 2

DataAddress

ReadData

MU

X

Forward A

Memory

r eData

M

X

Rs

RtEX/MEM Register Rd

Forward B

XForwarding

UnitMEM/WB Register Rd

© N. B. Dodge 09/0939 Lecture #21: The Pipeline MIPS Processor

After David A. Patterson and John L. Hennessy, Computer Organization and Design , 2nd Edition

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 40/49

Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as

Forwarding Unit OperationForwarding Unit Operation

ALU

Reg. Block

Memory

ForwardingUnit

• The forwarding unit samples register id’s in the EX/MEM andMEM/WB registers to determine if source registers in the ID/RFcyc e are e same.

• If so, source register data is replaced by pipeline (as yet unwritten)data by the forwarding unit.

© N. B. Dodge 09/0940 Lecture #21: The Pipeline MIPS Processor

• The correct information is thus processed and the instruction canproceed to correct execution.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 41/49

Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as

StallsStalls

• Forwarding will not always solve the problems of data hazards.• For exam le su ose an add instruction follows a load word lw

and the add involves the register that receives the memory data.

• In this case, forwarding will not work.

,will not be available until the end of the MEM cycle. Thus therequired data is not available for a forward, and the addinstruction. if it roceeds will rocess the wron data.

• A solution to this problem is the stall.• A stall halts the instruction awaiting data, while the key

© N. B. Dodge 09/0941 Lecture #21: The Pipeline MIPS Processor

cycle, after which the desired data is available to the add.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 42/49

Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as

Result of Stall ApproachResult of Stall ApproachTimeline

(clock cycles) 0 1 2 3 4 5 6 7 8 9 10

5 clock cycles

lw $2, 32($3)

add $14, $6, $2

sw $15 80 $2 Instruc. Reg. ALU Mem. R/W Reg.

Instruc.Fetch

Reg.Fetch

ALUProcess

Mem. R/Wor ALU Out

Reg.Write

Instruc.Fetch

Reg.Fetch

ALUProcess

Mem. R/Wor ALU Out

Reg.Write

• Consider the 3 instructions above, the last twodepending on the lw.

• $2 contents will be available at the beginning of the WBstage in the first instruction, but not before.

© N. B. Dodge 09/0942 Lecture #21: The Pipeline MIPS Processor

,the add and sw instructions hold place for one cycle.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 43/49

Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as

Result of Stall Approach (2)Result of Stall Approach (2)(clock

cycles) 0 1 2 3 4 5 6 7 8 9 105 clock cycles

lw $2, 32($3)

add $14, $6, $2 (delayed 1 count)

sw $15, 80($2) (delayed 1 count)Instruc.

FetchReg.Fetch

ALUProcess

Mem. R/Wor ALU Out

Reg.Write

Instruc.Fetch

Reg.Fetch

ALUProcess

Mem. R/Wor ALU Out

Reg.Write

.Fetch

.Fetch Process

.or ALU Out

.Write

• With the delay, the lw result feeds the ALU input stageof the add instruction, and the fetch stage of the sw.

• Note that forwarding in still required (this time fromthe MEM/WB interface, not the ALU output).

© N. B. Dodge 09/0943 Lecture #21: The Pipeline MIPS Processor

, ,following a lw must also be delayed for one clock cycle.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 44/49

Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as

Other Problems With BranchesOther Problems With Branches

• A remaining problem is what to do about instructions following abranch. Even assuming forwarding and stalls, the branch/nobranch decision is not made until the third stage. This means thatin the MIPS pipeline, two following instructions will enter the pipe

before the branch/no branch decision is made. What if:– The following instructions were for the case of “branch taken” and

the branch was not taken.

– The following instructions were for “branch not taken” and it wasa en.

• In either case, the wrong instructions are in the pipe and they mustbe eliminated (“flushed”). How can this problem be prevented?

© N. B. Dodge 09/0944 Lecture #21: The Pipeline MIPS Processor

• A few approaches to the problem are shown in the following slides.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 45/49

Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as

Control Hazard Approaches (1)Control Hazard Approaches (1)MIPS R-2000 Pipeline Processor

WBALU/EX

ID/RFIFMEM/

• One a roach is to alwa s assume the branch is or is not taken:

Direction of pipeline flow

– Say we assume the branch is never taken . Then if the instruction in ALU/EX

is a branch, the instructions in IF and ID/RF will be those in the “not taken”program line (branch determination is made in ALU/EX).

– s assump on s correc , e p pe ne w con nue o ow w ou e ay.– When the branch is taken, instructions in IF and ID/RF must be “flushed,”

usually by changing the “op” code of those instructions to a “nop” and lettingthem proceed to the end of the pipe.

© N. B. Dodge 09/0945 Lecture #21: The Pipeline MIPS Processor

– Clearly, a 2-clock time delay is involved here, and it would be worse for longerpipelines (P-IV pipeline ~ 20 stages).

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 46/49

Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as

Control Hazard Approaches (2)Control Hazard Approaches (2)MIPS R-2000 Pipeline Processor

WBALU/EXID/RF

IFMEM/

BranchComparator

• Reducing the cost of taking the branch:– In this case, a branch assumption is still made (taken or not taken).

– identified in the ID/RF stage, a comparator can be added there to do thebranch/no-branch determination.

– With the branch determination made in this earl sta e onl one

© N. B. Dodge 09/0946 Lecture #21: The Pipeline MIPS Processor

instruction must be flushed, in the IF stage (only a 1-instruction delay).

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 47/49

Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as

Control Hazard Approaches (3)Control Hazard Approaches (3)MIPS R-2000 Pipeline Processor

WBALU/EXID/RF

IFMEM/

Branch feedback based on History

BranchHistory

• ynam c ranc pre ct on ase on recent ranc story:– In this approach, an indicator bit (0/1) gives the last branch condition.– The next branch can be made according to the bit setting.– ,

time until a substantial number of calculations are complete.– Some schemes use 2 bits and do not change the prediction until the

predictor is wrong twice, after which the alternate behavior is chosen.

© N. B. Dodge 09/0947 Lecture #21: The Pipeline MIPS Processor

– In either case, incorrect predictions will still be made, but hopefully notas often.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 48/49

Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as

Exercise 2Exercise 2

1. Explain forwarding in your own words..

problem be solved?

3. Wh could 2-bit d namic branch rediction work toensure about a 1% error rate in branch prediction ina subroutine that loops about 100 times before

called frequently, and that it always executes 100 ormore loop traversals before returning to the calling

© N. B. Dodge 09/0948 Lecture #21: The Pipeline MIPS Processor

program.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 49/49

Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as

Homework Homework

• As usual, write down the two or three most importantthings you learned today and add to your list.

• Also, write down two or three things you did not clearly

you still have questions, see me during office hours.

• Readings, per syllabus. Note: Some of the PH material

is hard slogging, but for those of you interested incomputer engineering, it is well worth reading once.

© N. B. Dodge 09/0951 Lecture #21: The Pipeline MIPS Processor

.