l11 pipelined datapath and

31
1 A pipeline diagram A pipeline diagram shows the execution of a series of instructions. The instruction sequence is shown vertically, from top to bottom. Clock cycles are shown horizontally, from left to right. Each instruction is divided into its component stages. (We show five stages for every instruction, which will make the control unit easier.) This clearly indicates the overlapping of instructions. For example, there are three instructions active in the third cycle above. The “lw” instruction is in its Execute stage. Simultaneously, the “sub” is in its Instruction Decode stage. Also, the “and” instruction is just being fetched. Clock cycle 1 2 3 4 5 6 7 8 9 lw $t0, 4($sp) IF ID EX MEM WB sub $v0, $a0, $a1 IF ID EX MEM WB and $t1, $t2, $t3 IF ID EX MEM WB or $s0, $s1, $s2 IF ID EX MEM WB add $sp, $sp, - 4 IF ID EX MEM WB

Upload: atkawa7

Post on 22-Jul-2016

37 views

Category:

Documents


2 download

DESCRIPTION

Mips, CSE

TRANSCRIPT

Page 1: L11 Pipelined Datapath And

1

A pipeline diagram

A pipeline diagram shows the execution of a series of instructions.—The instruction sequence is shown vertically, from top to bottom.—Clock cycles are shown horizontally, from left to right.—Each instruction is divided into its component stages. (We show

five stages for every instruction, which will make the control unit easier.)

This clearly indicates the overlapping of instructions. For example, there are three instructions active in the third cycle above.—The “lw” instruction is in its Execute stage.—Simultaneously, the “sub” is in its Instruction Decode stage.—Also, the “and” instruction is just being fetched.

Clock cycle1 2 3 4 5 6 7 8 9

lw $t0, 4($sp) IF ID EX MEM WBsub $v0, $a0, $a1

IF ID EX MEM

WB

and $t1, $t2, $t3 IF ID EX MEM

WB

or $s0, $s1, $s2

IF ID EX MEM WBadd $sp, $sp, -4 IF ID EX MEM WB

Page 2: L11 Pipelined Datapath And

2

Pipeline terminology

The pipeline depth is the number of stages—in this case, five. In the first four cycles here, the pipeline is filling, since there are

unused functional units. In cycle 5, the pipeline is full. Five instructions are being executed

simultaneously, so all hardware units are in use. In cycles 6-9, the pipeline is emptying.

filling full emptying

Clock cycle1 2 3 4 5 6 7 8 9

lw $t0, 4($sp) IF ID EX MEM WBsub $v0, $a0, $a1

IF ID EX MEM

WB

and $t1, $t2, $t3 IF ID EX MEM

WB

or $s0, $s1, $s2

IF ID EX MEM WBadd $sp, $sp, -4 IF ID EX MEM WB

Page 3: L11 Pipelined Datapath And

3

Pipelined datapath and control Now we’ll see a basic implementation of a pipelined processor.

— The datapath and control unit share similarities with both the single-cycle and multicycle implementations that we already saw.

— An example execution highlights important pipelining concepts.

In future lectures, we’ll discuss several complications of pipelining that we’re hiding from you for now.

Page 4: L11 Pipelined Datapath And

4

Pipelining concepts A pipelined processor allows multiple instructions to execute at

once, and each instruction uses a different functional unit in the datapath.

This increases throughput, so programs can run faster.— One instruction can finish executing on every clock cycle, and

simpler stages also lead to shorter cycle times.

Clock cycle1 2 3 4 5 6 7 8 9

lw $t0, 4($sp) IF ID EX MEM WBsub $v0, $a0, $a1

IF ID EX MEM

WB

and $t1, $t2, $t3 IF ID EX MEM

WB

or $s0, $s1, $s2

IF ID EX MEM WBadd $t5, $t6, $0 IF ID EX MEM WB

Page 5: L11 Pipelined Datapath And

5

Pipelined Datapath The whole point of pipelining is to allow multiple instructions to

execute at the same time. We may need to perform several operations in the same cycle.

— Increment the PC and add registers at the same time. — Fetch one instruction while another one reads or writes data.

Thus, like the single-cycle datapath, a pipelined processor will need to duplicate hardware elements that are needed several times in the same clock cycle.

Clock cycle1 2 3 4 5 6 7 8 9

lw $t0, 4($sp) IF ID EX MEM WBsub $v0, $a0, $a1

IF ID EX MEM

WB

and $t1, $t2, $t3 IF ID EX MEM

WB

or $s0, $s1, $s2

IF ID EX MEM WBadd $t5, $t6, $0 IF ID EX MEM WB

Page 6: L11 Pipelined Datapath And

6

We need only one register file to support both the ID and WB stages.

Reads and writes go to separate ports on the register file. Writes occur in the first half of the cycle, reads occur in the

second half.

One register file is enough

Readregister 1

Readregister 2

Writeregister

Writedata

Readdata 2

Readdata 1

Registers

Page 7: L11 Pipelined Datapath And

7

Single-cycle datapath, slightly rearranged

MemToReg

Readaddress

Instructionmemory

Instruction[31-0]

Address

Writedata

Data memory

Readdata

MemWrite

MemRead

1

0

4

Shiftleft 2

PC

Add

1

0

PCSrc

Signextend

ALUSrc

Result

ZeroALU

ALUOp

Instr [15 - 0] RegDst

Readregister 1

Readregister 2

Writeregister

Writedata

Readdata 2

Readdata 1

Registers

RegWrite

Add

Instr [15 - 11]

Instr [20 - 16]0

1

0

1

Page 8: L11 Pipelined Datapath And

10

Registers added to the multi-cycle

Memorydata

register

Result

ZeroALU

ALUOp

0Mux1

ALUSrcA

0123

ALUSrcB

Readregister 1

Readregister 2

Writeregister

Writedata

Readdata 2

Readdata 1

Registers

RegWrite

Address

Memory

MemData

Writedata

Signextend

Shiftleft 2

0Mux1

PCSource

PC

A

BALUOut

4[31-26][25-21][20-16][15-11]

[15-0]

Instructionregister

IRWrite0Mux1

RegDst

0Mux1

MemToReg

0Mux1

IorD

MemRead

MemWrite

PCWrite

Page 9: L11 Pipelined Datapath And

11

Pipeline registers We’ll add intermediate registers to our pipelined datapath too. There’s a lot of information to save, however. We’ll simplify our

diagrams by drawing just one big pipeline register between each stage.

The registers are named for the stages they connect.

IF/ID ID/EX EX/MEM MEM/WB No register is needed after the WB stage, because after WB the

instruction is done.

Page 10: L11 Pipelined Datapath And

12

Pipelined datapath

Readaddress

Instructionmemory

Instruction[31-0]

Address

Writedata

Data memory

Readdata

MemWrite

MemRead

1

0

MemToReg

4

Shiftleft 2

Add

Signextend

ALUSrc

Result

ZeroALU

ALUOp

Instr [15 - 0] RegDst

Readregister 1

Readregister 2

Writeregister

Writedata

Readdata 2

Readdata 1

Registers

RegWrite

Add

Instr [15 - 11]

Instr [20 - 16]0

1

0

1

IF/ID ID/EX EX/MEM MEM/WB

1

0

PCSrc

PC

Page 11: L11 Pipelined Datapath And

13

Propagating values forward Any data values required in later stages must be propagated

through the pipeline registers. The most extreme example is the destination register.

— The rd field of the instruction word, retrieved in the first stage (IF), determines the destination register. But that register isn’t updated until the fifth stage (WB).

— Thus, the rd field must be passed through all of the pipeline stages, as shown in red on the next slide.

Why can’t we keep a single instruction register like we did in the multi-cycle data-path?

Page 12: L11 Pipelined Datapath And

14

The destination register

Readaddress

Instructionmemory

Instruction[31-0]

Address

Writedata

Data memory

Readdata

MemWrite

MemRead

1

0

MemToReg

4

Shiftleft 2

Add

ALUSrc

Result

ZeroALU

ALUOp

Instr [15 - 0] RegDst

Readregister 1

Readregister 2

Writeregister

Writedata

Readdata 2

Readdata 1

Registers

RegWrite

Add

Instr [15 - 11]

Instr [20 - 16]0

1

0

1

IF/ID ID/EX EX/MEM MEM/WB

1

0

PCSrc

PC

Signextend

Page 13: L11 Pipelined Datapath And

15

What about control signals? The control signals are generated in the same way as in the

single-cycle processor—after an instruction is fetched, the processor decodes it and produces the appropriate control values.

But just like before, some of the control signals will not be needed until some later stage and clock cycle.

These signals must be propagated through the pipeline until they reach the appropriate stage. We can just pass them in the pipeline registers, along with the other data.

Control signals can be categorized by the pipeline stage that uses them.

Page 14: L11 Pipelined Datapath And

16

Pipelined datapath and control

Readaddress

Instructionmemory

Instruction[31-0]

Address

Writedata

Data memory

Readdata

MemWrite

MemRead

1

0

MemToReg

4

Shiftleft 2

Add

ALUSrc

Result

ZeroALU

ALUOp

Instr [15 - 0] RegDst

Readregister 1

Readregister 2

Writeregister

Writedata

Readdata 2

Readdata 1

Registers

RegWrite

Add

Instr [15 - 11]

Instr [20 - 16]0

1

0

1

IF/ID

ID/EX

EX/MEM

MEM/WB ControlM

WB

WB

PC

1

0

PCSrc

Signextend

EX

M

WB

Page 15: L11 Pipelined Datapath And

19

Notes about the diagram The control signals are grouped together in the pipeline registers,

just to make the diagram a little clearer. Not all of the registers have a write enable signal.

— Because the datapath fetches one instruction per cycle, the PC must also be updated on each clock cycle. Including a write enable for the PC would be redundant.

— Similarly, the pipeline registers are also written on every cycle, so no explicit write signals are needed.

Page 16: L11 Pipelined Datapath And

20

Here’s a sample sequence of instructions to execute.1000: lw $8, 4($29)1004: sub $2, $4, $51008: and $9, $10, $111012: or $16, $17, $181016: add $13, $14, $0

We’ll make some assumptions, just so we can show actual data values.— Each register contains its number plus 100. For instance,

register $8 contains 108, register $29 contains 129, and so forth.

— Every data memory location contains 99. Our pipeline diagrams will follow some conventions.

— An X indicates values that aren’t important, like the constant field of an R-type instruction.

— Question marks ??? indicate values we don’t know, usually resulting from instructions coming before and after the ones in our example.

An example execution sequence

addresses in decimal

Page 17: L11 Pipelined Datapath And

21

Cycle 1 (filling)IF: lw $8, 4($29) MEM: ??? WB: ???EX: ???ID: ???

Readaddress

Instructionmemory

Instruction[31-0]

Address

Writedata

Data memory

Readdata

MemWrite (?)

MemRead (?)

1

0

MemToReg(?)

Shiftleft 2

Add

1

0

PCSrc

ALUSrc (?)

Result

ZeroALU

ALUOp (???)

RegDst (?)

Readregister 1

Readregister 2

Writeregister

Writedata

Readdata 2

Readdata 1

Registers

RegWrite (?)

Add

0

1

0

1

IF/ID

ID/EX

EX/MEM

MEM/WB ControlM

WB

WB

1000

1004

???

???

???

???

???

???

???

???

???

???

???

???

???

???

???

???

???

???

???

???

??????

???

4

PC

Signextend

EX

M

WB

Page 18: L11 Pipelined Datapath And

22

Cycle 2ID: lw $8, 4($29)IF: sub $2, $4, $5 MEM: ??? WB: ???EX: ???

Readaddress

Instructionmemory

Instruction[31-0]

Address

Writedata

Data memory

Readdata 1

0

4

Shiftleft 2

Add

PCSrc

Result

ZeroALU

4

Readregister 1

Readregister 2

Writeregister

Writedata

Readdata 2

Readdata 1

Registers

Add

X

80

1

0

1

IF/ID

ID/EX

EX/MEM

MEM/WB ControlM

WB

WB

100429

X

1008

129

X

MemToReg(?)

???

???

???

???

???

???

RegWrite (?)

MemWrite (?)

MemRead (?)

???

???

???

ALUSrc (?)

ALUOp (???)

RegDst (?)

???

???

???

???

??????

???

PC

Signextend

EX

M

WB

1

0

Page 19: L11 Pipelined Datapath And

23

Cycle 3ID: sub $2, $4, $5IF: and $9, $10, $11 EX: lw $8, 4($29) MEM: ??? WB: ???

MemToReg(?)

Readaddress

Instructionmemory

Instruction[31-0]

Address

Writedata

Data memory

Readdata

MemWrite (?)

MemRead (?)

1

0

4

Shiftleft 2

Add

PCSrc

ALUSrc (1)

Result

ZeroALU

ALUOp (add)

X RegDst (0)

Readregister 1

Readregister 2

Writeregister

Writedata

Readdata 2

Readdata 1

Registers

Add

2

X0

1

0

1

IF/ID

ID/EX

EX/MEM

MEM/WB ControlM

WB

WB

10084

5

1012

104

105

129

4

X

8

X8

133 4

???

???

???

???

???

???

RegWrite (?)

???

???

???

PC

Signextend

1

0

EX

M

WB

Page 20: L11 Pipelined Datapath And

24

Cycle 4ID: and $9, $10, $11IF: or $16, $17, $18 EX: sub $2, $4, $5 MEM: lw $8, 4($29) WB: ???

Readaddress

Instructionmemory

Instruction[31-0]

Address

Writedata

Data memory

Readdata

MemWrite (0)

MemRead (1)

1

0

MemToReg(?)

4

Shiftleft 2

Add

PCSrc

ALUSrc (0)

Result

ZeroALU

ALUOp (sub)

X RegDst (1)

Readregister 1

Readregister 2

Writeregister

Writedata

Readdata 2

Readdata 1

Registers

RegWrite (?)

Add

9

X0

1

0

1

IF/ID

ID/EX

EX/MEM

MEM/WB ControlM

WB

WB

101210

11

1016

110

111

104

X

105

X

22

–1

133

X 99

8

???

???

???

???

???

???

PC

Signextend

EX

M

WB

1

0

Page 21: L11 Pipelined Datapath And

25

Cycle 5 (full)ID: or $16, $17, $18IF: add $13, $14, $0 EX: and $9, $10, $11 MEM: sub $2, $4, $5 WB:

lw $8, 4($29)

Readaddress

Instructionmemory

Instruction[31-0]

Address

Writedata

Data memory

Readdata

MemWrite (0)

MemRead (0)

1

0

MemToReg(1)

4

Shiftleft 2

Add

PCSrc

ALUSrc (0)

Result

ZeroALU

ALUOp (and)

X RegDst (1)

Readregister 1

Readregister 2

Writeregister

Writedata

Readdata 2

Readdata 1

Registers

RegWrite (1)

Add

16

X0

1

0

1

IF/ID

ID/EX

EX/MEM

MEM/WB ControlM

WB

WB

101617

18

1020

117

118

110

X

111

X

99

110

-1

105 X

2

99

133

99

8

99

8

PC

Signextend

EX

M

WB

1

0

Page 22: L11 Pipelined Datapath And

26

Cycle 6 (emptying)ID: add $13, $14, $0IF: ??? EX: or $16, $17, $18 MEM: and $9, $10, $11 WB: sub

$2, $4, $5

Readaddress

Instructionmemory

Instruction[31-0]

Address

Writedata

Data memory

Readdata

MemWrite (0)

MemRead (0)

1

0

MemToReg(0)

4

Shiftleft 2

Add

PCSrc

ALUSrc (0)

Result

ZeroALU

ALUOp (or)

X RegDst (1)

Readregister 1

Readregister 2

Writeregister

Writedata

Readdata 2

Readdata 1

Registers

RegWrite (1)

Add

13

X0

1

0

1

IF/ID

ID/EX

EX/MEM

MEM/WB ControlM

WB

WB

102014

0

???

114

0

117

X

118

X

1616

119

110

111 X

9

-1

2

PC

Signextend

1

0

EX

M

WB

Page 23: L11 Pipelined Datapath And

27

Cycle 7ID: ???IF: ??? EX: add $13, $14, $0 MEM: or $16, $17, $18 WB: and

$9, $10, $11

Readaddress

Instructionmemory

Instruction[31-0]

Address

Writedata

Data memory

Readdata

MemWrite (0)

MemRead (0)

1

0

MemToReg(0)

4

Shiftleft 2

Add

PCSrc

ALUSrc (0)

Result

ZeroALU

ALUOp (add)

??? RegDst (1)

Readregister 1

Readregister 2

Writeregister

Writedata

Readdata 2

Readdata 1

Registers

RegWrite (1)

Add

???

???0

1

0

1

IF/ID

ID/EX

EX/MEM

MEM/WB ControlM

WB

WB

???

???

???

???

114

X

0

X

1313

114

119

118 X

16

X

110

110

9

110

9

PC

Signextend

???

???

EX

M

WB

1

0

Page 24: L11 Pipelined Datapath And

28

Cycle 8ID: ???IF: ??? EX: ??? MEM: add $13, $14, $0 WB: or $16,

$17, $18

Readaddress

Instructionmemory

Instruction[31-0]

Address

Writedata

Data memory

Readdata

MemWrite (0)

MemRead (0)

1

0

MemToReg(0)

4

Shiftleft 2

Add

PCSrc

ALUSrc (?)

Result

ZeroALU

ALUOp (???)

??? RegDst (?)

Readregister 1

Readregister 2

Writeregister

Writedata

Readdata 2

Readdata 1

Registers

RegWrite (1)

Add

???

???0

1

0

1

IF/ID

ID/EX

EX/MEM

MEM/WB ControlM

WB

WB

???

???

???

??????

???

114

0 X

13

X

119

119

16

119

16

PC

Signextend

???

??? ???

???

???

???

???

1

0

EX

M

WB

Page 25: L11 Pipelined Datapath And

29

Cycle 9ID: ???IF: ??? EX: ??? MEM: ??? WB: add

$13, $14, $0

Readaddress

Instructionmemory

Instruction[31-0]

Address

Writedata

Data memory

Readdata

MemWrite (?)

MemRead (?)

1

0

MemToReg(0)

4

Shiftleft 2

Add

PCSrc

ALUSrc (?)

Result

ZeroALU

ALUOp (???)

??? RegDst (?)

Readregister 1

Readregister 2

Writeregister

Writedata

Readdata 2

Readdata 1

Registers

RegWrite (1)

Add

???

???0

1

0

1

IF/ID

ID/EX

EX/MEM

MEM/WB ControlM

WB

WB

???

???

???

???

??????

???

???

? X

???

X

114

114

13

114

13

PC

Signextend

???

???

???

???

???

???

1

0

EX

M

WB

Page 26: L11 Pipelined Datapath And

30

That’s a lot of diagrams there

Compare the last nine slides with the pipeline diagram above.— You can see how instruction executions are overlapped.— Each functional unit is used by a different instruction in each

cycle.— The pipeline registers save control and data values generated

in previous clock cycles for later use.— When the pipeline is full in clock cycle 5, all of the hardware

units are utilized. This is the ideal situation, and what makes pipelined processors so fast.

Try to understand this example or the similar one in the book at the end of Section 6.3.

Clock cycle1 2 3 4 5 6 7 8 9

lw $t0, 4($sp) IF ID EX MEM WBsub $v0, $a0, $a1

IF ID EX MEM

WB

and $t1, $t2, $t3 IF ID EX MEM

WB

or $s0, $s1, $s2

IF ID EX MEM WBadd $t5, $t6, $0 IF ID EX MEM WB

Page 27: L11 Pipelined Datapath And

31

Performance Revisited

Assuming the following functional unit latencies:

What is the cycle time of a single-cycle implementation?—What is its throughput?

What is the cycle time of a ideal pipelined implementation?—What is its steady-state throughput?

How much faster is pipelining?

ALU

Instmem

RegRead

Data Mem

RegWrite

3ns 2ns 2ns 3ns 2ns

Page 28: L11 Pipelined Datapath And

32

Ideal speedup

In our pipeline, we can execute up to five instructions simultaneously.—This implies that the maximum speedup is 5 times.—In general, the ideal speedup equals the pipeline depth.

Why was our speedup on the previous slide “only” 4 times?—The pipeline stages are imbalanced: a register file and ALU

operations can be done in 2ns, but we must stretch that out to 3ns to keep the ID, EX, and WB stages synchronized with IF and MEM.

—Balancing the stages is one of the many hard parts in designing a pipelined processor.

Clock cycle1 2 3 4 5 6 7 8 9

lw $t0, 4($sp) IF ID EX MEM WBsub $v0, $a0, $a1

IF ID EX MEM

WB

and $t1, $t2, $t3 IF ID EX MEM

WB

or $s0, $s1, $s2

IF ID EX MEM WBadd $sp, $sp, -4 IF ID EX MEM WB

Page 29: L11 Pipelined Datapath And

33

The pipelining paradox

Pipelining does not improve the execution time of any single instruction. Each instruction here actually takes longer to execute than in a single-cycle datapath (15ns vs. 12ns)!

Instead, pipelining increases the throughput, or the amount of work done per unit time. Here, several instructions are executed together in each clock cycle.

The result is improved execution time for a sequence of instructions, such as an entire program.

Clock cycle1 2 3 4 5 6 7 8 9

lw $t0, 4($sp) IF ID EX MEM WBsub $v0, $a0, $a1

IF ID EX MEM

WB

and $t1, $t2, $t3 IF ID EX MEM

WB

or $s0, $s1, $s2

IF ID EX MEM WBadd $sp, $sp, -4 IF ID EX MEM WB

Page 30: L11 Pipelined Datapath And

34

Instruction set architectures and pipelining The MIPS instruction set was designed especially for easy

pipelining.— All instructions are 32-bits long, so the instruction fetch stage

just needs to read one word on every clock cycle.— Fields are in the same position in different instruction formats

—the opcode is always the first six bits, rs is the next five bits, etc. This makes things easy for the ID stage.

— MIPS is a register-to-register architecture, so arithmetic operations cannot contain memory references. This keeps the pipeline shorter and simpler.

Pipelining is harder for older, more complex instruction sets.— If different instructions had different lengths or formats, the

fetch and decode stages would need extra time to determine the actual length of each instruction and the position of the fields.

— With memory-to-memory instructions, additional pipeline stages may be needed to compute effective addresses and read memory before the EX stage.

Page 31: L11 Pipelined Datapath And

35

Summary The pipelined datapath combines ideas from the single and

multicycle processors that we saw earlier.— It uses multiple memories and ALUs.— Instruction execution is split into several stages.

Pipeline registers propagate data and control values to later stages.

The MIPS instruction set architecture supports pipelining with uniform instruction formats and simple addressing modes.

Next lecture, we’ll start talking about Hazards.