lec21-quite good but same thing

Post on 08-Apr-2018

224 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 1/49

Erik Jonsson School of En ineerin andComputer Science

e n vers y o exas a a as

The Pipelined MIPS Processor

The Pipelined MIPS Processor• We complete our study of computer architecture by investigating anapproach providing even higher performance for the MIPS CPU.

• We first saw how the MIPS CPU erformance could be im roved

by converting the so-called single-cycle CPU to a multi-cycle design.– In the multi-cycle approach, instead of using a single clock cycle for the

whole instruction, the clock is accelerated, and instructions execute inphases over several clock cycles.

– Each instruction phase takes one clock cycle.

– This means that as each instruction executes, only one section of the

CPU will be active per clock cycle -- the one executing that phase of theinstruction.

• This suggests that perhaps we might redesign the CPU slightly so

© N. B. Dodge 09/091 Lecture #21: The Pipeline MIPS Processor

a every sec on can opera e n epen en y on an ns ruc onat the same time.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 2/49

Erik Jonsson School of En ineerin andComputer Science

e n vers y o exas a a as

The “Laundry Example”The “Laundry Example”

• As an introduction to the concept of pipelining, Patterson andHennessy use the example of doing one’s laundry.

• Most eo le have – or have access to – a washer and dr er.

• Assume that you need to wash several washer loads of clothing.

• Would anyone divide the clothing into washer loads and then, ,

second?

• No, if you were washing clothes, you would finish washing the first

, , .• If there were more loads to wash, you would begin to fold and putaway finished clothing while the later loads were washing and

© N. B. Dodge 09/092 Lecture #21: The Pipeline MIPS Processor

.

• We can see this schematically on the next slide.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 3/49

Erik Jonsson School of En ineerin andComputer Science

e n vers y o exas a a as

Graphical Example of the Laundry CycleGraphical Example of the Laundry Cycle

© N. B. Dodge 09/093 Lecture #21: The Pipeline MIPS Processor

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 4/49

Erik Jonsson School of En ineerin andComputer Science

e n vers y o exas a a as

The “Pipeline” ProcessorThe “Pipeline” Processor

• Patterson and Hennessy applied this “simultaneous wash-dry-fold-put away concept” to the single-cycle computer model.

“ ”, , ,simultaneously so that the instruction throughput – the number of

clock cycles per instructions – could be dramatically decreased.,

cycle, but the clock must be as slow as the slowest instruction.

• In the multi-cycle implementation, the clock runs faster, instructions

- , .• What if, each time the clock ticked, we could process an instruction

in each section of the multicycle processor? Then we could process

© N. B. Dodge 09/094 Lecture #21: The Pipeline MIPS Processor

,completing an instruction every clock cycle.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 5/49

Erik Jonsson School of En ineerin andComputer Science

e n vers y o exas a a as

Pipeline ArchitecturePipeline Architecture

• A pipelined computer executes instructions concurrently.• Hardware units are organized into stages:

– Execution in each stage takes exactly 1 clock period.– partial results to the next stage.

• Unfortunately, as noted earlier, speed = complexity + cost.

e p pe ne approac r ngs a ona expense p us sown set of problems and complications, called hazards,which we will also study.

© N. B. Dodge 09/095 Lecture #21: The Pipeline MIPS Processor

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 6/49

Erik Jonsson School of En ineerin andComputer Science

e n vers y o exas a a as

Sequential Versus Pipelined ExecutionSequential Versus Pipelined Execution

(clock cycles) 0 1 2 3 4 5 6 7 8 9 10

lw $t0, 16($a3) Instruc.Fetch

Reg.Fetch

ALUProcess

Mem. R/Wor ALU Out

Reg.Write

lw $t1, 32($a3)

lw $t2, 48($a3)

Instruc.Fetch

Reg.Fetch

ALUProcess

Mem. R/Wor ALU Out

Reg.Write

Instruc.

Fetch

Reg.

Fetch

AL

Proc

4 clock cycles

4 clock cycles

etc.Timeline(clock cycles) 0 1 2 3 4 5 6 7 8 9 10

5 clock cycles

lw $t0, 16($a3)

lw $t1, 32($a3)

Instruc.Fetch

Reg.Fetch

ALUProcess

Mem. R/Wor ALU Out

Reg.Write

Instruc.Fetch

Reg.Fetch

ALUProcess

Mem. R/Wor ALU Out

Reg.Write

Instruc. Reg. ALU Mem. R/W Reg.

© N. B. Dodge 09/096 Lecture #21: The Pipeline MIPS Processor

lw $t2, 48($a3)etc.

e c e c rocess or u r e

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 7/49

Erik Jonsson School of En ineerin andComputer Science

e n vers y o exas a a as

Speed Advantage of the PipelineSpeed Advantage of the Pipeline• The multicycle, serial processor that we studied last lecture can

execute n instructions in ns clock periods, or ET S = ns , where

• A pipelined processor with s stages can execute n instructions in

ET = s + n 1 clock eriods.

• The ideal pipeline speedup depends on the number of stages, andcan be greater for more stages (hence Intel’s choice of a 20-stage

i eline for the current P-IV .

• Thus the speed advantage of pipeline over multicycle can bedefined as:

s ET ns S s

© N. B. Dodge 09/097 Lecture #21: The Pipeline MIPS Processor

P s n

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 8/49

Erik Jonsson School of En ineerin andComputer Science

e n vers y o exas a a as

Pipeline StagesPipeline Stages

0 1 2 3 4 5

ID/

oc cyc es

• The MIPS R2000 pipeline processor is divided into five processing

RF

stages:1. Instruction fetch (IF)2. Instruction decode (ID) and register fetch (RF)

3. ALU instruction execution (ALU) ALU processing, branchcondition evaluation, memory address computation, etc. This is alsoreferred to as execution (EX)

© N. B. Dodge 09/098 Lecture #21: The Pipeline MIPS Processor

. emory access5. Write back (WB) to register file

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 9/49

Erik Jonsson School of En ineerin andComputer Science

e n vers y o exas a a as

Overlapped Pipeline ExecutionOverlapped Pipeline Execution

0 1 2 3 4 5 6 7

Clock cycles

ALUIFID/RF

MEM WB Instruction 1

ALUIFID/RF

MEM WB

ID/

Instruction 2

RF

© N. B. Dodge 09/099 Lecture #21: The Pipeline MIPS Processor

Instruction execution order

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 10/49

Erik Jonsson School of En ineerin andComputer Science

e n vers y o exas a a as

Single-Cycle DatapathSingle-Cycle Datapath

ADDBranch

Mem. Read32

32

3232

Reg. Dest.

MUX

ADDMem. To Reg.

ALU Op.

Reg. Write

ALU Srce.

+

Leftshift

2 s 0 - 3 1

32

5

ControlMem. Write

6 (Bits 26-31)

ALUP InstructionAddress

MData

AddressInst.

M

s

Rt

Rd

ReadData 1

ReadData 2 Read

I n s t r u c t i o n b i t

32

32

3232

5

5

MUX

5

ReadWrite

Mem./Reg.Select

Lines indicate need for

UX

-UXReg. Block WriteData

SignExtend

32

WriteData

Data

16 (Bits 0-15)

32

32

ALU

InstructionMemory

DataMemory

© N. B. Dodge 09/0910 Lecture #21: The Pipeline MIPS Processor

storage between stages if processor is converted to

pipeline

6 (Bits 0-5)

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 11/49

Erik Jonsson School of En ineerin andComputer Science

e n vers y o exas a a as

Single-Cycle Datapath with Pipeline RegistersSingle-Cycle Datapath with Pipeline Registers

MUX

Inter-stage registers are master-slave D flip-flops; the master canbe receiving new data from the previous stage of the instructionwhile the slave flip-flop is providing data to the next stage

ADDADD

+4

Memory

Leftshift

2Compare

resultReg. Block

ALU

PC

ns ruc onAddress Memory

MU

DataAddress

Inst.0-31 Rt

Rd MU

ReadData 1

ReadData 2

ReadData

SignExtend

WriteData

16 32

WriteData

Master sideof register

Slave side

© N. B. Dodge 09/0911 Lecture #21: The Pipeline MIPS Processor

IF/ID ID/EX EX/MEM MEM/WBof register

Note: Control lines andlogic not shown for clarity After David A. Patterson and John L. Hennessy, Computer Organization and Design , 2nd Edition

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 12/49

Erik Jonsson School of En ineerin andComputer Science

e n vers y o exas a a as

Instruction Process Through Pipeline (1)Instruction Process Through Pipeline (1)

MUX

ADDADD

+4

MemoryLeftshift

2Compare

resultReg. Block

ALU

PC

ns ruc onAddress Memory

MU

DataAddress

Inst.0-31 Rt

Rd MU

ReadData 1

ReadData 2

ReadData

Stage 1: Instructionloaded into IF/ID

SignExtend

WriteData

16 32

WriteData

© N. B. Dodge 09/0912 Lecture #21: The Pipeline MIPS Processor

,

After David A. Patterson and John L. Hennessy, Computer Organization and Design , 2nd Edition

IF/ID ID/EX EX/MEM MEM/WB

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 13/49

Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as

Instruction Process Through Pipeline (2)Instruction Process Through Pipeline (2)age : ns ruc on

decoded, register dataaccessed, immediatessign-extended

MUX

ADDADD

+4

Memory

Leftshift

2Compare

resultReg. Block

ALU

PC

InstructionAddress Memory

MU

DataAddress

Inst.0-31

s

Rt

Rd MU

ReadData 1

ReadData 2

ReadData

SignExtend

XWriteData

16

X

32

WriteData

© N. B. Dodge 09/0913 Lecture #21: The Pipeline MIPS Processor

After David A. Patterson and John L. Hennessy, Computer Organization and Design , 2nd Edition

IF/ID ID/EX EX/MEM MEM/WB

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 14/49

Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as

Instruction Process Through Pipeline (3)Instruction Process Through Pipeline (3)Stage 3: Instructionexecuted / branchaddress computed

MUX

ADDADD

+4

MemoryLeftshift

2Compare

resultReg. Block

ALU

PC

ns ruc onAddress Memory

MU

DataAddress

Inst.0-31 Rt

Rd MU

ReadData 1

ReadData 2

ReadData

SignExtend

WriteData

16 32

WriteData

© N. B. Dodge 09/0914 Lecture #21: The Pipeline MIPS Processor

After David A. Patterson and John L. Hennessy, Computer Organization and Design , 2nd Edition

IF/ID ID/EX EX/MEM MEM/WB

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 15/49

Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as

Instruction Process Through Pipeline (4)Instruction Process Through Pipeline (4)

store, branch taken/not takenALU results bypass taken toMEM/WB register

MUX

ADDADD

+4

MemoryLeftshift

2Compare

resultReg. Block

ALU

PC

ns ruc onAddress Memory

MU

DataAddress

Inst.0-31 Rt

Rd MU

ReadData 1

ReadData 2

ReadData

SignExtend

WriteData

16 32

WriteData

© N. B. Dodge 09/0915 Lecture #21: The Pipeline MIPS Processor

After David A. Patterson and John L. Hennessy, Computer Organization and Design , 2nd Edition

IF/ID ID/EX EX/MEM MEM/WB

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 16/49

Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as

Instruction Process Through Pipeline (5)Instruction Process Through Pipeline (5)M

UX

ADD+4

Instruction

MemoryLeftshift

2Compare

resultRs

Reg. Block

ALUC

Address emory

MUX

DataAddress

Inst.0-31 Rt

Rd

Write

MUX

ReadData 1

ReadData 2

ReadData

Stage 5: Resultwrite-back to

SignExtend

Data

16 32

WriteData

© N. B. Dodge 09/0916 Lecture #21: The Pipeline MIPS Processor

dest. register

After David A. Patterson and John L. Hennessy, Computer Organization and Desi gn , 2nd Edition

IF/ID ID/EX EX/MEM MEM/WB

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 17/49

Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as

Adding ControlAdding Control

• Control information must be carried along as a part of the instruction, since this information is required at

erent stages o t e p pe ne .• This can be done by adding more inter-stage storage

• The result is very large inter-stage registers . Forexample, the storage capacity required between the

instruction decode and ALU execution stages (ID/EXregister) is more than 120 bits.

© N. B. Dodge 09/0917 Lecture #21: The Pipeline MIPS Processor

is shown on the next slide

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 18/49

Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as

ID/EX EX/MEM MEM/WB

i t e

ADD+4

UX

IF/ID

- 3 1

ControlDecode

t e d

R e g i s

t e r W r

P InstructionAddress

MemoryLeftshift

2

Inst.

Reg. Block

Rs

Rt ReadData 1

I n s t r u c t i o n b i t s

ALU Srce

Branch

M e m o r y

W r

M e m o r y

R e

e m o r y / A

L U R e s u l t

ALUMUX

DataAddress

-

MUX

Rd

WriteData

ReadData 2

Write

ReadData

M

Full PipelineDesign with

MemorySign

Extend

32DataBits 0-15

Bits 16-20

Bits 11-15

ALUCont.

MU ALU Op

© N. B. Dodge 09/0918 Lecture #21: The Pipeline MIPS Processor

After David A. Patterson and John L. Hennessy,Computer Organization and Design , 2nd Edition

Reg. Dst.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 19/49

Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as

The Pipeline in ActionThe Pipeline in Action

• The following instruction sequence from the P&H textillustrates the pipeline in action.w ,

sub $11, $2, $3, ,

or $13, $6, $7add $14 $8 $9

• Note that registers are identified by number ratherthan the letter id’s, since that is the way they appear in

© N. B. Dodge 09/0919 Lecture #21: The Pipeline MIPS Processor

e processor. s a rem n er, = a , - = -

t6, $2-3=$v0-v1, $4-7=$a0-a3, etc.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 20/49

IF: Idle ID/RF: Idle EX: Idle MEM: Idle WB: Idle

M

ID/EX EX/MEM MEM/WB

r i t e

ADDADD

+4

UX

IF/ID

0 - 3 1

Decode

r i t e a d

R e g i s t e r

PInstruction

Address

MemoryLeftshift

2

Inst.0-31

Reg. Block

Rs

Rt ReadData 1 I

n s t r u c t i o n b i t

ALU Srce

Branch

M e m o r y

W

M e m o r y

R

e m o r y / A

L U R e s u l

ALUMUX

DataAddress

MUX

Rd

WriteData

ReadData 2

Write

ReadData

MemorySign

Extenda as -

Bits 16-20

Bits 11-15

ALUCont.

MUX

ALU Op

After David A. Patterson and John L. Hennessy,Computer Organization and Design , 2nd Edition20

Reg. Dst.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 21/49

IF: lw $10 20 $1 ID/RF: Idle EX: Idle MEM: Idle WB: Idle

ID/EX EX/MEM MEM/WB

i t e

ADD+4

UX

IF/ID

- 3 1

ControlDecode

t e d

R e g i s t e r

W r

PInstruction

Address

MemoryLeftshift

2

Inst.

Reg. Block

Rs

Rt ReadData 1

I n s t r u c t i o n

b i t s

ALU Srce

Branch

M e m o r y

W r

M e m o r y

R e

e m o r y / A

L U R e s u l t

ALUMUX

DataAddress

-

MUX

Rd

WriteData

ReadData 2

Write

ReadData

M

MemorySignExtend

32DataBits 0-15

Bits 16-20

Bits 11-15

ALUCont.

MU ALU Op

After David A. Patterson and John L. Hennessy,Computer Organization and Design , 2nd Edition21

Reg. Dst.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 22/49

IF: sub $11 $2 $3 ID/RF: lw $10 20 $1 EX: Idle MEM: Idle WB: Idle

ID/EX EX/MEM MEM/WB

i t e

ADD+4

UX

IF/ID

- 3 1

ControlDecode

t e d

R e g i s t e r

W r

[ $1 ]

$ 1

$ 10

PInstruction

Address

MemoryLeftshift

2

Inst.

Reg. Block

Rs

Rt ReadData 1

I n s t r u c t i o n

b i t s

ALU Srce

Branch

M e m o r y

W r

M e m o r y

R e

e m o r y / A

L U R e s u l t

X ALUMUX

DataAddress

-

MUX

Rd

WriteData

ReadData 2

Write

ReadData

M

$ 10

X

20

MemorySignExtendDataBits 0-15

Bits 16-20

Bits 11-15

ALUCont.

MU ALU Op

After David A. Patterson and John L. Hennessy,Computer Organization and Design , 2nd Edition22

Reg. Dst.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 23/49

IF: and $12 $4 $5 MEM: Idle WB:ID/RF: sub $11 $2 $3 EX: lw $10 20 $1 Idle

ID/EX EX/MEM MEM/WB

i t e

ADD+4

UX

IF/ID

- 3 1

ControlDecode

t e d

R e g i s t e r

W r

[ $ 2 ] [ $1 ]$ 3

$ 2

PInstruction

Address

MemoryLeftshift

2

Inst.

Reg. Block

Rs

Rt ReadData 1

I n s t r u c t i o n

b i t s

ALU Srce

Branch

M e m o r y

W r

M e m o r y

R e

e m o r y / A

L U R e s u l t

[ $ 3 ]

add20

ALUMUX

DataAddress

-

MUX

Rd

WriteData

ReadData 2

Write

ReadData

M

$ 11

X

X

20

$ 10

$ 10

MemorySignExtendDataBits 0-15

Bits 16-20

Bits 11-15

ALUCont.

MU ALU Op

After David A. Patterson and John L. Hennessy,Computer Organization and Design , 2nd Edition23

Reg. Dst.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 24/49

IF: or $13 $6 $7 WB:ID/RF: and $12 $4 $5 EX: sub $11 $2 $3 MEM: lw $10 20 $1 Idle

ID/EX EX/MEM MEM/WB

i t e

ADD+4

UX

IF/ID

- 3 1

ControlDecode

t e d

R e g i s t e r

W r

[ $2 ]

$ 4

$ 5

PInstruction

Address

MemoryLeftshift

2

Inst.

Reg. Block

Rs

Rt ReadData 1

I n s t r u c t i o n

b i t s

ALU Srce

Branch

M e m o r y

W r

M e m o r y

R e

e m o r y / A

L U R e s u l t

[$3]

[ $3 ]sub

ALUMUX

DataAddress

-

MUX

Rd

WriteData

ReadData 2

Write

ReadData

M

X

X

$ 12 $ 11$ 11 $ 10

MemorySignExtendDataBits 0-15

Bits 16-20

Bits 11-15

ALUCont.

MU ALU Op

After David A. Patterson and John L. Hennessy,Computer Organization and Design , 2nd Edition24

Reg. Dst.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 25/49

IF: add $14 $8 $9 ID/RF: or $13 $6 $7 EX: and $12 $4 $5 MEM: sub $11 $2 $3 WB: lw $10 20($1)

ID/EX EX/MEM MEM/WB

i t e

ADD+4

UX

IF/ID

- 3 1

ControlDecode

t e d

R e g i s t e r

W r

[ $6 ]

$ 6

$ 7[ $4 ]P

InstructionAddress

MemoryLeftshift

2

Inst.

Reg. Block

Rs

Rt ReadData 1 I n

s t r u c t i o n b i t s

ALU Srce

Branch

M e m o r y

W r

M e m o r y

R e

e m o r y / A

L U R e s u l t

[ $7 ] [$5]

[ $5 ]and

$10 ALUMUX

DataAddress

-

MUX

Rd

WriteData

ReadData 2

Write

ReadData

M

X

X

$ 13 $ 12$ 12 $ 11 $ 10

MemorySignExtend

32 DataBits 0-15

Bits 16-20

Bits 11-15

ALUCont.

MU ALU Op

After David A. Patterson and John L. Hennessy,Computer Organization and Design , 2nd Edition25

Reg. Dst.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 26/49

IF: Idle ID/RF: add $14 $8 $9 EX: or $13 $6 $7 MEM: and $12 WB: sub$4, $5 $11, $2, $3

ID/EX EX/MEM MEM/WB

i t e

ADD+4

UX

IF/ID

- 3 1

ControlDecode

t e d

R e g i s t e r

W r

[ $8 ]

$ 8

$ 9[ $6 ]P

InstructionAddress

MemoryLeftshift

2

Inst.

Reg. Block

Rs

Rt ReadData 1 I n

s t r u c t i o n b i t s

ALU Srce

Branch

M e m o r y

W r

M e m o r y

R e

e m o r y / A

L U R e s u l t

[ $9 ]$11 [$7]

[ $7 ] or

ALUMUX

DataAddress

-

MUX

Rd

WriteData

ReadData 2

Write

ReadData

M

X

X

$ 14 $ 13$ 13 $ 12 $ 11

MemorySignExtendDataBits 0-15

Bits 16-20

Bits 11-15

ALUCont.

MU ALU Op

After David A. Patterson and John L. Hennessy,Computer Organization and Design , 2nd Edition26

Reg. Dst.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 27/49

IF: Idle ID/RF: Idle WB EX: add $14 $8 $9 $12, $4, $5

ID/EX EX/MEM MEM/WB

i t e

ADD+4

UX

IF/ID

- 3 1

ControlDecode

t e d

R e g i s t e r

W r

[ $8 ]PInstruction

Address

MemoryLeftshift

2

Inst.

Reg. Block

Rs

Rt ReadData 1 I n

s t r u c t i o n b i t s

ALU Srce

Branch

M e m o r y

W r

M e m o r y

R e

e m o r y / A

L U R e s u l t

[$9]

[ $9 ] add

$12 ALUMUX

DataAddress

-

MUX

Rd

WriteData

ReadData 2

Write

ReadData

M

$ 14$ 14 $ 13 $ 12

MemorySignExtend

32 DataBits 0-15

Bits 16-20

Bits 11-15

ALUCont.

MU ALU Op

After David A. Patterson and John L. Hennessy,Computer Organization and Design , 2nd Edition27

Reg. Dst.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 28/49

IF: Idle ID/RF: Idle WB EX: Idle $13, $6, $7

ID/EX EX/MEM MEM/WB

i t e

ADD+4

UX

IF/ID

- 3 1

ControlDecode

t e d

R e g i s t e r

W r

PInstruction

Address

MemoryLeftshift

2

Inst.

Reg. Block

Rs

Rt ReadData 1 I n s t r u c t i o n

b i t s

ALU Srce

Branch

M e m o r y

W r

M e m o r y

R e

e m o r y / A

L U R e s u l t

$13 ALUMUX

DataAddress

-

MUX

Rd

WriteData

ReadData 2

Write

ReadData

M

$ 14 $ 13

MemorySignExtend

32 DataBits 0-15

Bits 16-20

Bits 11-15

ALUCont.

MU ALU Op

After David A. Patterson and John L. Hennessy,Computer Organization and Design , 2nd Edition28

Reg. Dst.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 29/49

IF: Idle ID/RF: Idle WB EX: Idle MEM: Idle $14, $8, $9

ID/EX EX/MEM MEM/WB

i t e

ADD+4

UX

IF/ID

- 3 1

ControlDecode

t e d

R e g i s t e r

W r

PInstruction

Address

MemoryLeftshift

2

Inst.

Reg. Block

Rs

Rt ReadData 1 I n s t r u c t i o n

b i t s

ALU Srce

Branch

M e m o r y

W r

M e m o r y

R e

e m o r y / A

L U R e s u l t

$14 ALUMUX

DataAddress

-

MUX

Rd

WriteData

ReadData 2

Write

ReadData

M

$ 14

MemorySignExtend

32 DataBits 0-15

Bits 16-20

Bits 11-15

ALUCont.

MU ALU Op

After David A. Patterson and John L. Hennessy,Computer Organization and Design , 2nd Edition29

Reg. Dst.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 30/49

IF: Idle ID/RF: Idle EX: Idle MEM: Idle WB: Idle

ID/EX EX/MEM MEM/WB

i t e

ADD+4

UX

IF/ID

- 3 1

ControlDecode

t e d

R e g

i s t e r W r

PInstruction

Address

MemoryLeftshift

2

Inst.

Reg. Block

Rs

Rt ReadData 1 I n s t r u c t i o n

b i t s

ALU Srce

Branch

M e m o r y

W r

M e m o r y

R e

e m o r y / A

L U R e s u l t

ALUMUX

DataAddress

-

MUX

Rd

WriteData

ReadData 2

Write

ReadData

M

MemorySignExtend

32 DataBits 0-15

Bits 16-20

Bits 11-15

ALUCont.

MU ALU Op

30

Reg. Dst.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 31/49

Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as

Pipeline Processor Operation SummaryPipeline Processor Operation Summary

• Pipelining replaces the “single-cycle” processor with a“ - ” ,

completing one part of each instruction.

• A new instruction is started every clock cycle.• Inter-process registers store instruction information

(data, write register, branch conditions) between cycles“ ”

between the pipeline stages.• When the pipeline is filled with instructions, an

© N. B. Dodge 09/0931 Lecture #21: The Pipeline MIPS Processor

instruction completes every clock cycle.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 32/49

Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as

Exercise 1Exercise 1

• On the diagram on the next page, identify thefollowing:1. Highlight all the control lines that must be active during a load

word instruction.2. As in our exercise in Lecture 20, identify the decoder

locations.

3. The ID/EX Re ister interface stores the most bits of an of the

pipeline section interfaces. Approximately how many bits isthat, according to the diagram?

© N. B. Dodge 09/0932 Lecture #21: The Pipeline MIPS Processor

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 33/49

Print out a copy of this diagram and bring to class.

MUX

ID/EX EX/MEM MEM/WB

ControlDecode

g i s t e r W r i t e

ADDADD

+4

MemoryLeftshift

2

Reg. Block i o n b i t s 0 - 3 1

Branch o r y

W r i t e

m o r y

R e a d

R e

R e s u l t

ALU

PC

InstructionAddress

M Data

Inst.0-31

M

Rt

Rd

ReadData 1

ReadData 2 Read

I n s t r u cALU Srce M

e M e

M e m o r y / A

L

Memory

UX

UX

WriteData

SignExtend

32WriteData

a a

Bits 0-15

ALU

Bits 16-20

Bits 11-15

.

MUX

ALU Op

Reg. Dst.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 34/49

Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as

HazardsHazards

• Hazards occur because data required for executing the.

• An instruction in the “register fetch” cycle may need

data from a register whose value will be changed by aninstruction “downstream” but still in process in thepipeline (in the ALU, memory/memory bypass orwriteback c cle .

• Thus an “upstream” instruction could access a registerand get incorrect data because the register data has not

© N. B. Dodge 09/0934 Lecture #21: The Pipeline MIPS Processor

yet een up ate y a ownstream nstruct on.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 35/49

Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as

Hazards (2)Hazards (2)

• There are two types of hazards, data hazards , andcontrol hazards .• Both occur because an instruction in the ID/RF stage of

the MIPS pipeline needs register data that will be

MEM/Bypass, or WB stage.• Data hazards occur when an instruction needs register

contents for an arithmetic/ logical/memory instruction.• Control hazards occur when a branch instruction is

© N. B. Dodge 09/0935 Lecture #21: The Pipeline MIPS Processor

branch is not yet available in the same sort of scenario.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 36/49

Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as

Data Hazard in the PipelineData Hazard in the PipelineTimeline

(clock cycles) 0 1 2 3 4 5 6 7 8 9 10

5 clock cycles

sub $2, $1, $3

and $12, $2, $5

or $13 $6 $2 Instruc. Reg. ALU Mem. R/W Reg.

Instruc.Fetch

Reg.Fetch

ALUProcess

Mem. R/Wor ALU Out

Reg.Write

Instruc.Fetch

Reg.Fetch

ALUProcess

Mem. R/Wor ALU Out

Reg.Write

add $14, $2, $2

sw $15, 100($2) Instruc.Fetch

Reg.Fetch

ALUProcess

Mem. R/Wor ALU Out

Reg.Write

Instruc.Fetch

Reg.Fetch

ALUProcess

Mem. R/Wor ALU Out

Reg.Write

• In the instruction sequence above, the last four instructionsrequire data from $2, which is changed in the first instruction.• The $2 data will not be rewritten until cycle 4, so the AND and OR

© N. B. Dodge 09/0936 Lecture #21: The Pipeline MIPS Processor

n an r ns ruc ons w e c ncorrec a a rom .• Even the add may not get the correct information ( sw is okay ).

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 37/49

Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as

Control Hazards in the PipelineControl Hazards in the PipelineTimeline

(clock cycles) 0 1 2 3 4 5 6 7 8 9 10

5 clock cycles

sub $2, $1, $3

blt $2, $8, wait

b t $2 $7 o Instruc. Reg. ALU Mem. R/W Reg.

Instruc.Fetch

Reg.Fetch

ALUProcess

Mem. R/Wor ALU Out

Reg.Write

Instruc.Fetch

Reg.Fetch

ALUProcess

Mem. R/Wor ALU Out

Reg.Write

add $14, $2, $2

sw $15, 100($2) Instruc.Fetch

Reg.Fetch

ALUProcess

Mem. R/Wor ALU Out

Reg.Write

Instruc.Fetch

Reg.Fetch

ALUProcess

Mem. R/Wor ALU Out

Reg.Write

• Here the problem is changed, with two branch instructions added.• Neither branch instruction may be executed correctly, once again

© N. B. Dodge 09/0937 Lecture #21: The Pipeline MIPS Processor

.

• This wrong data could cause an incorrect branch.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 38/49

Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as

Forwarding as a Solution to Data HazardsForwarding as a Solution to Data Hazards

0 1 2 3 4 5

oc cyc es

ID/RF

ID/

• One solution to the problem of data hazards is forwarding .

RF

• Forwarding uses the fact that although instruction 2 needs registerdata two clock cycles before instruction 1 enters the WB stage, thatdata is already available as the output of the ALU .

© N. B. Dodge 09/0938 Lecture #21: The Pipeline MIPS Processor

• If a mechanism were available, instruction 1 could forward requiredregister data after its ALU cycle to the ID/RF cycle of instruction 2.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 39/49

Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as

Forwarding Unit in the PipelineForwarding Unit in the Pipeline

Rs

Rt

ReadData 1

ID/EX EX/MEM MEM/WB

MU

ALU MU

X

Rd

WriteData

ReadData 2

DataAddress

ReadData

MU

X

Forward A

Memory

r eData

M

X

Rs

RtEX/MEM Register Rd

Forward B

XForwarding

UnitMEM/WB Register Rd

© N. B. Dodge 09/0939 Lecture #21: The Pipeline MIPS Processor

After David A. Patterson and John L. Hennessy, Computer Organization and Design , 2nd Edition

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 40/49

Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as

Forwarding Unit OperationForwarding Unit Operation

ALU

Reg. Block

Memory

ForwardingUnit

• The forwarding unit samples register id’s in the EX/MEM andMEM/WB registers to determine if source registers in the ID/RFcyc e are e same.

• If so, source register data is replaced by pipeline (as yet unwritten)data by the forwarding unit.

© N. B. Dodge 09/0940 Lecture #21: The Pipeline MIPS Processor

• The correct information is thus processed and the instruction canproceed to correct execution.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 41/49

Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as

StallsStalls

• Forwarding will not always solve the problems of data hazards.• For exam le su ose an add instruction follows a load word lw

and the add involves the register that receives the memory data.

• In this case, forwarding will not work.

,will not be available until the end of the MEM cycle. Thus therequired data is not available for a forward, and the addinstruction. if it roceeds will rocess the wron data.

• A solution to this problem is the stall.• A stall halts the instruction awaiting data, while the key

© N. B. Dodge 09/0941 Lecture #21: The Pipeline MIPS Processor

cycle, after which the desired data is available to the add.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 42/49

Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as

Result of Stall ApproachResult of Stall ApproachTimeline

(clock cycles) 0 1 2 3 4 5 6 7 8 9 10

5 clock cycles

lw $2, 32($3)

add $14, $6, $2

sw $15 80 $2 Instruc. Reg. ALU Mem. R/W Reg.

Instruc.Fetch

Reg.Fetch

ALUProcess

Mem. R/Wor ALU Out

Reg.Write

Instruc.Fetch

Reg.Fetch

ALUProcess

Mem. R/Wor ALU Out

Reg.Write

• Consider the 3 instructions above, the last twodepending on the lw.

• $2 contents will be available at the beginning of the WBstage in the first instruction, but not before.

© N. B. Dodge 09/0942 Lecture #21: The Pipeline MIPS Processor

,the add and sw instructions hold place for one cycle.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 43/49

Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as

Result of Stall Approach (2)Result of Stall Approach (2)(clock

cycles) 0 1 2 3 4 5 6 7 8 9 105 clock cycles

lw $2, 32($3)

add $14, $6, $2 (delayed 1 count)

sw $15, 80($2) (delayed 1 count)Instruc.

FetchReg.Fetch

ALUProcess

Mem. R/Wor ALU Out

Reg.Write

Instruc.Fetch

Reg.Fetch

ALUProcess

Mem. R/Wor ALU Out

Reg.Write

.Fetch

.Fetch Process

.or ALU Out

.Write

• With the delay, the lw result feeds the ALU input stageof the add instruction, and the fetch stage of the sw.

• Note that forwarding in still required (this time fromthe MEM/WB interface, not the ALU output).

© N. B. Dodge 09/0943 Lecture #21: The Pipeline MIPS Processor

, ,following a lw must also be delayed for one clock cycle.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 44/49

Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as

Other Problems With BranchesOther Problems With Branches

• A remaining problem is what to do about instructions following abranch. Even assuming forwarding and stalls, the branch/nobranch decision is not made until the third stage. This means thatin the MIPS pipeline, two following instructions will enter the pipe

before the branch/no branch decision is made. What if:– The following instructions were for the case of “branch taken” and

the branch was not taken.

– The following instructions were for “branch not taken” and it wasa en.

• In either case, the wrong instructions are in the pipe and they mustbe eliminated (“flushed”). How can this problem be prevented?

© N. B. Dodge 09/0944 Lecture #21: The Pipeline MIPS Processor

• A few approaches to the problem are shown in the following slides.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 45/49

Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as

Control Hazard Approaches (1)Control Hazard Approaches (1)MIPS R-2000 Pipeline Processor

WBALU/EX

ID/RFIFMEM/

• One a roach is to alwa s assume the branch is or is not taken:

Direction of pipeline flow

– Say we assume the branch is never taken . Then if the instruction in ALU/EX

is a branch, the instructions in IF and ID/RF will be those in the “not taken”program line (branch determination is made in ALU/EX).

– s assump on s correc , e p pe ne w con nue o ow w ou e ay.– When the branch is taken, instructions in IF and ID/RF must be “flushed,”

usually by changing the “op” code of those instructions to a “nop” and lettingthem proceed to the end of the pipe.

© N. B. Dodge 09/0945 Lecture #21: The Pipeline MIPS Processor

– Clearly, a 2-clock time delay is involved here, and it would be worse for longerpipelines (P-IV pipeline ~ 20 stages).

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 46/49

Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as

Control Hazard Approaches (2)Control Hazard Approaches (2)MIPS R-2000 Pipeline Processor

WBALU/EXID/RF

IFMEM/

BranchComparator

• Reducing the cost of taking the branch:– In this case, a branch assumption is still made (taken or not taken).

– identified in the ID/RF stage, a comparator can be added there to do thebranch/no-branch determination.

– With the branch determination made in this earl sta e onl one

© N. B. Dodge 09/0946 Lecture #21: The Pipeline MIPS Processor

instruction must be flushed, in the IF stage (only a 1-instruction delay).

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 47/49

Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as

Control Hazard Approaches (3)Control Hazard Approaches (3)MIPS R-2000 Pipeline Processor

WBALU/EXID/RF

IFMEM/

Branch feedback based on History

BranchHistory

• ynam c ranc pre ct on ase on recent ranc story:– In this approach, an indicator bit (0/1) gives the last branch condition.– The next branch can be made according to the bit setting.– ,

time until a substantial number of calculations are complete.– Some schemes use 2 bits and do not change the prediction until the

predictor is wrong twice, after which the alternate behavior is chosen.

© N. B. Dodge 09/0947 Lecture #21: The Pipeline MIPS Processor

– In either case, incorrect predictions will still be made, but hopefully notas often.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 48/49

Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as

Exercise 2Exercise 2

1. Explain forwarding in your own words..

problem be solved?

3. Wh could 2-bit d namic branch rediction work toensure about a 1% error rate in branch prediction ina subroutine that loops about 100 times before

called frequently, and that it always executes 100 ormore loop traversals before returning to the calling

© N. B. Dodge 09/0948 Lecture #21: The Pipeline MIPS Processor

program.

8/6/2019 Lec21-Quite Good but Same Thing

http://slidepdf.com/reader/full/lec21-quite-good-but-same-thing 49/49

Erik Jonsson School of En ineerin andComputer Sciencee n vers y o exas a a as

Homework Homework

• As usual, write down the two or three most importantthings you learned today and add to your list.

• Also, write down two or three things you did not clearly

you still have questions, see me during office hours.

• Readings, per syllabus. Note: Some of the PH material

is hard slogging, but for those of you interested incomputer engineering, it is well worth reading once.

© N. B. Dodge 09/0951 Lecture #21: The Pipeline MIPS Processor

.

top related