coen-4710 computer hardware lecture 4dejazzer.com/coen4710/lectures/lec04_ch4b_pipelining.pdf · 1...

40
1 Page 1 Page 1 COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4) Cristinel Ababei Marquette University Department of Electrical and Computer Engineering Page 2 Marquette University Page 2 Outline Pipelining Introduction Pipelined Datapath Pipeline Control Data Hazards Address data hazards forwarding Double data hazards - revised forwarding Load-use data hazards - hazard detection unit to “Stall” Branch Hazards “Flush” Exceptions Exception Handling Routine Improving Performance 1 2

Upload: others

Post on 28-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: COEN-4710 Computer Hardware Lecture 4dejazzer.com/coen4710/lectures/lec04_ch4b_pipelining.pdf · 1 Page 1 COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4)

1

Page 1Page 1

COEN-4710 Computer Hardware

Lecture 4

Processor Part 2: Pipelining (Ch.4)

Cristinel Ababei

Marquette University

Department of Electrical and Computer Engineering

Page 2

Marquette University

Page 2

Outline

❖ Pipelining Introduction

❖ Pipelined Datapath

❖ Pipeline Control

❖ Data Hazards

◆Address data hazards – forwarding

◆Double data hazards - revised forwarding

◆Load-use data hazards - hazard detection unit to “Stall”

❖ Branch Hazards

◆ “Flush”

❖ Exceptions

◆Exception Handling Routine

❖ Improving Performance

1

2

Page 2: COEN-4710 Computer Hardware Lecture 4dejazzer.com/coen4710/lectures/lec04_ch4b_pipelining.pdf · 1 Page 1 COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4)

2

Page 3

Marquette University

Page 3

Pipelining is Natural

Laundry Example

❖Ann, Brian, Cathy, Dave

each have one load of clothes

to wash, dry, and fold

◆Washer takes 30 minutes

◆Dryer takes 30 minutes

◆“Folder” takes 30 minutes

◆“Stasher” takes 30 minutes

to put clothes into drawers

A B C D

Page 4

Marquette University

Page 4

Sequential Laundry

❖ Sequential laundry takes 8 hours for 4 loads

30Task

Order

B

C

D

ATime

30 30 3030 30 3030 30 30 3030 30 30 3030

6 PM 7 8 9 10 11 12 1 2 AM

If we pipelined, how long would laundry take?

3

4

Page 3: COEN-4710 Computer Hardware Lecture 4dejazzer.com/coen4710/lectures/lec04_ch4b_pipelining.pdf · 1 Page 1 COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4)

3

Page 5

Marquette University

Page 5

Pipelined Laundry: Start work ASAP

❖Pipelined laundry takes 3.5 hours for 4 loads!

Task

Order

12 2 AM6 PM 7 8 9 10 11 1

Time

B

C

D

A

3030 30 3030 30 30

What’s the theoretical speedup?

Page 6

Marquette University

Page 6

Pipelining Lessons

❖Pipelining doesn’t help latency

of a single task, it helps

throughput of entire workload

❖Max potential speedup

is Number of pipe stages

❖Performance limited by

slowest pipeline stage

(Unbalanced lengths of pipe)

❖Times to “fill” pipeline and to

“drain” it - reduce speedup

❖Might need to stall for

dependencies

6 PM 7 8 9

Time

B

C

D

A

3030 30 3030 30 30

Task

Order

5

6

Page 4: COEN-4710 Computer Hardware Lecture 4dejazzer.com/coen4710/lectures/lec04_ch4b_pipelining.pdf · 1 Page 1 COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4)

4

Page 7

Marquette University

Page 7

The Five Stages of “Load”

1. IF: Get instruction from Instruction Memory

2. ID: Register Fetch and Instruction Decode

3. EX: Calculate the memory address

4. MEM: Read the data from data Memory

5. WB: Write the data back to the register file

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5

Ifetch Reg/Dec Exec Mem WrLoad

Page 8

Marquette University

Page 8

Pipeline Performance

❖Assume time for stages is

◆100ps for register read or write

◆200ps for other stages

❖Compare pipelined datapath with single-cycle

datapath

Instr Instr fetch Register

read

ALU op Memory

access

Register

write

Total time

ld 200ps 100 ps 200ps 200ps 100 ps 800ps

sd 200ps 100 ps 200ps 200ps 700ps

R-format 200ps 100 ps 200ps 100 ps 600ps

beq 200ps 100 ps 200ps 500ps

7

8

Page 5: COEN-4710 Computer Hardware Lecture 4dejazzer.com/coen4710/lectures/lec04_ch4b_pipelining.pdf · 1 Page 1 COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4)

5

Page 9

Marquette University

Page 9

Pipeline Performance

Single-cycle (Tc= 800ps)

Pipelined (Tc= 200ps)

Page 10

Marquette University

Page 10

Pipeline Speedup

❖If all stages are balanced

◆i.e., all take the same time

◆Time between instructionspipelined

= Time between instructionsnonpipelined

Number of stages

❖If not balanced, speedup is less

❖Speedup due to increased throughput

◆Latency (time for each instruction) does not

decrease

9

10

Page 6: COEN-4710 Computer Hardware Lecture 4dejazzer.com/coen4710/lectures/lec04_ch4b_pipelining.pdf · 1 Page 1 COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4)

6

Page 11

Marquette University

Page 11

Pipelining and ISA Design

❖RISC-V ISA designed for pipelining◆All instructions are 32-bits

➢Easier to fetch and decode in one cycle

➢c.f. x86: 1- to 17-byte instructions

◆Few and regular instruction formats➢Can decode and read registers in one step

◆Load/store addressing➢Can calculate address in 3rd stage, access memory in 4th

stage

Page 12

Marquette University

Page 12

Implementing Pipelining

❖What makes it easy

◆Instructions are all the same length

◆Small number of instruction formats

◆Memory operands appear only in loads and stores

❖What makes it hard

◆Structural hazards

➢Multiple instructions needing the same datapath components

◆Data hazards

➢ Instructions depending on previous instruction’s output

◆Control hazards

➢Changes to instruction sequence - branching, jumping

◆Exception handling: overflows, interrupts

11

12

Page 7: COEN-4710 Computer Hardware Lecture 4dejazzer.com/coen4710/lectures/lec04_ch4b_pipelining.pdf · 1 Page 1 COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4)

7

Page 13

Marquette University

Page 13

RISC-V Pipelined Datapath

WB

MEM

Right-to-left

flow leads to

hazards

Page 14

Marquette University

Page 14

Pipeline registers

❖Need registers between stages

◆To hold information produced in previous cycle

13

14

Page 8: COEN-4710 Computer Hardware Lecture 4dejazzer.com/coen4710/lectures/lec04_ch4b_pipelining.pdf · 1 Page 1 COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4)

8

Page 15

Marquette University

Page 15

Pipeline Operation

❖Cycle-by-cycle flow of instructions through the

pipelined datapath

◆“Single-clock-cycle” pipeline diagram

➢Shows pipeline usage in a single cycle

➢Highlight resources used

◆c.f. “multi-clock-cycle” diagram

➢Graph of operation over time

❖We’ll look at “single-clock-cycle” diagrams for

Load & Store

Page 16

Marquette University

Page 16

IF for Load, Store, …

15

16

Page 9: COEN-4710 Computer Hardware Lecture 4dejazzer.com/coen4710/lectures/lec04_ch4b_pipelining.pdf · 1 Page 1 COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4)

9

Page 17

Marquette University

Page 17

ID for Load, Store, …

Page 18

Marquette University

Page 18

EX for Load

17

18

Page 10: COEN-4710 Computer Hardware Lecture 4dejazzer.com/coen4710/lectures/lec04_ch4b_pipelining.pdf · 1 Page 1 COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4)

10

Page 19

Marquette University

Page 19

MEM for Load

Page 20

Marquette University

Page 20

WB for Load

Wrong

register

number

19

20

Page 11: COEN-4710 Computer Hardware Lecture 4dejazzer.com/coen4710/lectures/lec04_ch4b_pipelining.pdf · 1 Page 1 COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4)

11

Page 21

Marquette University

Page 21

Corrected Datapath for Load

Page 22

Marquette University

Page 22

EX for Store

21

22

Page 12: COEN-4710 Computer Hardware Lecture 4dejazzer.com/coen4710/lectures/lec04_ch4b_pipelining.pdf · 1 Page 1 COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4)

12

Page 23

Marquette University

Page 23

MEM for Store

Page 24

Marquette University

Page 24

WB for Store

23

24

Page 13: COEN-4710 Computer Hardware Lecture 4dejazzer.com/coen4710/lectures/lec04_ch4b_pipelining.pdf · 1 Page 1 COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4)

13

Page 25

Marquette University

Page 25

Graphically Representing Pipelines

❖ Can help with answering questions like:◆How many cycles does it take to execute this code?

◆What is the ALU doing during cycle 4?

❖ Grayed portions – what’s being used for this instruction.

IM Reg DM Reg

IM Reg DM Reg

CC1 CC2 CC3 CC4 CC5 CC6

Time (in clockcycles)

lw$10,20($1)

Program

execution

order

(in instructions)

sub $11,$2, $3

ALU

ALU

Page 26

Marquette University

Page 26

Multi-Cycle Pipeline Diagram

❖Form showing resource usage

25

26

Page 14: COEN-4710 Computer Hardware Lecture 4dejazzer.com/coen4710/lectures/lec04_ch4b_pipelining.pdf · 1 Page 1 COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4)

14

Page 27

Marquette University

Page 27

Multi-Cycle Pipeline Diagram

❖Traditional form

Page 28

Marquette University

Page 28

Single-Cycle Pipeline Diagram

❖State of pipeline in a given cycle

27

28

Page 15: COEN-4710 Computer Hardware Lecture 4dejazzer.com/coen4710/lectures/lec04_ch4b_pipelining.pdf · 1 Page 1 COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4)

15

Page 29

Marquette University

Page 29

Pipeline Control (Simplified)

Page 30

Marquette University

Page 30

We have 5 separate stages

1. IF: Instruction Fetch and PC Increment

2. ID: Instruction Decode / Register Fetch

3. EX: Execution

4. MEM: Memory Access

5. WR: Write Back

❖ How should control be handled?

◆ A fancy control center telling everyone what to do?

◆ Should we use a state machine?

Pipeline control

What lines need to be controlled in each stage?

29

30

Page 16: COEN-4710 Computer Hardware Lecture 4dejazzer.com/coen4710/lectures/lec04_ch4b_pipelining.pdf · 1 Page 1 COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4)

16

Page 31

Marquette University

Page 31

Pipeline Control

❖ Control signals derived from instruction

◆ As in single-cycle implementation

❖ Pass control signals along just like the data

Execution/Address

Calculation stage control

lines

Memory access stage

control lines

Write-back

stage control

lines

Instruction

Reg

Dst

ALU

Op1

ALU

Op0

ALU

Src

Branc

h

Mem

Read

Mem

Write

Reg

write

Mem

to Reg

R-format 1 1 0 0 0 0 0 1 0

Load 0 0 0 1 0 1 0 1 1

Store X 0 0 1 0 0 1 0 X

BEQ X 0 1 0 1 0 0 0 X

Page 32

Marquette University

Page 32

Datapath with Control

31

32

Page 17: COEN-4710 Computer Hardware Lecture 4dejazzer.com/coen4710/lectures/lec04_ch4b_pipelining.pdf · 1 Page 1 COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4)

17

Page 33

Marquette University

Page 33

Pipeline Summary

❖Pipelining improves performance by increasing

instruction throughput

◆Executes multiple instructions in parallel

◆Each instruction has the same latency

❖Subject to hazards

◆Structure, data, control

❖Instruction set design affects complexity of

pipeline implementation

The BIG Picture

Page 34

Marquette University

Page 34

Outline

❖ Pipelining Introduction

❖ Pipelined Datapath

❖ Pipeline Control

❖ Data Hazards

◆Address data hazards – forwarding

◆Double data hazards - revised forwarding

◆Load-use data hazards - hazard detection unit to “Stall”

❖ Branch Hazards

◆ “Flush”

❖ Exceptions

◆Exception Handling Routine

❖ Improving Performance

33

34

Page 18: COEN-4710 Computer Hardware Lecture 4dejazzer.com/coen4710/lectures/lec04_ch4b_pipelining.pdf · 1 Page 1 COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4)

18

Page 35

Marquette University

Page 35

Hazards

❖Situations that prevent starting the next instruction in the next cycle

❖Structure hazards◆A required resource is busy

❖Data hazard◆Need to wait for previous instruction to complete its

data read/write

❖Control hazard◆Deciding on control action depends on previous

instruction

Page 36

Marquette University

Page 36

1) Structure Hazards

❖Conflict for use of a resource

❖In RISC-V pipeline with a single memory

◆Load/store requires data access

◆Instruction fetch would have to stall for that cycle

➢Would cause a pipeline “bubble”

❖Hence, pipelined datapaths require separate

instruction/data memories

◆Or separate instruction/data caches

35

36

Page 19: COEN-4710 Computer Hardware Lecture 4dejazzer.com/coen4710/lectures/lec04_ch4b_pipelining.pdf · 1 Page 1 COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4)

19

Page 37

Marquette University

Page 37

2 - A) Data Hazards

❖An instruction depends on completion of data

access by a previous instruction

◆add x19, x0, x1sub x2, x19, x3

Page 38

Marquette University

Page 38

Data Hazards in ALU Instructions

❖Consider this sequence:

sub x2, x1,x3and x12,x2,x5or x13,x6,x2add x14,x2,x2sd x15,100(x2)

❖We can resolve hazards with forwarding

◆How do we detect when to forward?

37

38

Page 20: COEN-4710 Computer Hardware Lecture 4dejazzer.com/coen4710/lectures/lec04_ch4b_pipelining.pdf · 1 Page 1 COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4)

20

Page 39

Marquette University

Page 39

Forwarding (aka Bypassing)

❖Use result when it is computed

◆Don’t wait for it to be stored in a register

◆Requires extra connections in the datapath

Page 40

Marquette University

Page 40

Dependencies & Forwarding

39

40

Page 21: COEN-4710 Computer Hardware Lecture 4dejazzer.com/coen4710/lectures/lec04_ch4b_pipelining.pdf · 1 Page 1 COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4)

21

Page 41

Marquette University

Page 41

Detecting the Need to Forward

❖Pass register numbers along pipeline◆e.g., ID/EX.RegisterRs1 = register number for Rs1

sitting in ID/EX pipeline register

❖ALU operand register numbers in EX stage are given by◆ID/EX.RegisterRs1, ID/EX.RegisterRs2

❖Data hazards when1a. EX/MEM.RegisterRd = ID/EX.RegisterRs1

1b. EX/MEM.RegisterRd = ID/EX.RegisterRs2

2a. MEM/WB.RegisterRd = ID/EX.RegisterRs1

2b. MEM/WB.RegisterRd = ID/EX.RegisterRs2

Fwd from

EX/MEM

pipeline reg

Fwd from

MEM/WB

pipeline reg

Page 42

Marquette University

Page 42

Detecting the Need to Forward

❖But only if forwarding instruction will write to a

register!

◆EX/MEM.RegWrite, MEM/WB.RegWrite

❖And only if Rd for that instruction is not x0

◆EX/MEM.RegisterRd ≠ 0,

MEM/WB.RegisterRd ≠ 0

41

42

Page 22: COEN-4710 Computer Hardware Lecture 4dejazzer.com/coen4710/lectures/lec04_ch4b_pipelining.pdf · 1 Page 1 COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4)

22

Page 43

Marquette University

Page 43

Forwarding Paths

Page 44

Marquette University

Page 44

Forwarding Conditions

Mux control Source Explanation

ForwardA = 00 ID/EX The first ALU operand comes from the register file.

ForwardA = 10 EX/MEM The first ALU operand is forwarded from the prior

ALU result.

ForwardA = 01 MEM/WB The first ALU operand is forwarded from data

memory or an earlier ALU result.

ForwardB = 00 ID/EX The second ALU operand comes from the register

file.

ForwardB = 10 EX/MEM The second ALU operand is forwarded from the prior

ALU result.

ForwardB = 01 MEM/WB The second ALU operand is forwarded from data

memory or an earlier ALU result.

43

44

Page 23: COEN-4710 Computer Hardware Lecture 4dejazzer.com/coen4710/lectures/lec04_ch4b_pipelining.pdf · 1 Page 1 COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4)

23

Page 45

Marquette University

Page 45

Double Data Hazard

❖Consider the sequence:

add x1,x1,x2add x1,x1,x3add x1,x1,x4

❖Both hazards occur

◆Want to use the most recent

❖Revise MEM hazard condition

◆Only fwd if EX hazard condition isn’t true

Page 46

Marquette University

Page 46

Revised Forwarding Condition

❖MEM hazard

◆ if (MEM/WB.RegWrite

and (MEM/WB.RegisterRd ≠ 0)

and not(EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)

and (EX/MEM.RegisterRd ≠ ID/EX.RegisterRs1))

and (MEM/WB.RegisterRd = ID/EX.RegisterRs1)) ForwardA = 01

◆ if (MEM/WB.RegWrite

and (MEM/WB.RegisterRd ≠ 0)

and not(EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)

and (EX/MEM.RegisterRd ≠ ID/EX.RegisterRs2))

and (MEM/WB.RegisterRd = ID/EX.RegisterRs2)) ForwardB = 01

45

46

Page 24: COEN-4710 Computer Hardware Lecture 4dejazzer.com/coen4710/lectures/lec04_ch4b_pipelining.pdf · 1 Page 1 COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4)

24

Page 47

Marquette University

Page 47

Datapath with Forwarding

Page 48

Marquette University

Page 48

2 - B) Load-Use Data Hazard

❖Can’t always avoid stalls by forwarding

◆If value not computed when needed

◆Can’t forward backward in time!

47

48

Page 25: COEN-4710 Computer Hardware Lecture 4dejazzer.com/coen4710/lectures/lec04_ch4b_pipelining.pdf · 1 Page 1 COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4)

25

Page 49

Marquette University

Page 49

Load-Use Data Hazard

Need STALL

inserted here

Cannot forward

back in time!

Page 50

Marquette University

Page 50

Load-Use Hazard Detection

❖Check when using instruction is decoded in ID stage

❖ALU operand register numbers in ID stage are given by◆IF/ID.RegisterRs1, IF/ID.RegisterRs2

❖Load-use hazard when◆ID/EX.MemRead and

((ID/EX.RegisterRd = IF/ID.RegisterRs1) or(ID/EX.RegisterRd = IF/ID.RegisterRs1))

❖If detected, stall and insert Bubble

49

50

Page 26: COEN-4710 Computer Hardware Lecture 4dejazzer.com/coen4710/lectures/lec04_ch4b_pipelining.pdf · 1 Page 1 COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4)

26

Page 51

Marquette University

Page 51

How to Stall the Pipeline

❖Force control values in ID/EX register to 0

◆EX, MEM and WB do nop (no-operation)

❖Prevent update of PC and IF/ID register

◆Using instruction is decoded again

◆Following instruction is fetched again

◆1-cycle stall allows MEM to read data for ld

➢Can subsequently forward to EX stage

Page 52

Marquette University

Page 52

Datapath with Hazard Detection

51

52

Page 27: COEN-4710 Computer Hardware Lecture 4dejazzer.com/coen4710/lectures/lec04_ch4b_pipelining.pdf · 1 Page 1 COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4)

27

Page 53

Marquette University

Page 53

Code Scheduling to Avoid Stalls

❖Reorder code to avoid use of load result in the

next instruction

❖C code for a = b + e; c = b + f;

ld x1, 0(x0)

ld x2, 8(x0)

add x3, x1, x2

sd x3, 24(x0)

ld x4, 16(x0)

add x5, x1, x4

sd x5, 32(x0)

stall

stall

ld x1, 0(x0)

ld x2, 8(x0)

ld x4, 16(x0)

add x3, x1, x2

sd x3, 24(x0)

add x5, x1, x4

sd x5, 32(x0)

11 cycles13 cycles

Page 54

Marquette University

Page 54

Stalls and Performance

❖Stalls reduce performance

◆But are required to get correct results

❖Compiler can arrange code to avoid hazards and

stalls

◆Requires knowledge of the pipeline structure

The BIG Picture

53

54

Page 28: COEN-4710 Computer Hardware Lecture 4dejazzer.com/coen4710/lectures/lec04_ch4b_pipelining.pdf · 1 Page 1 COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4)

28

Page 55

Marquette University

Page 55

3) Branch (Control) Hazards

❖Branch determines flow of control◆Fetching next instruction depends on branch outcome

◆Pipeline can’t always fetch correct instruction➢Still working on ID stage of branch

❖In RISC-V pipeline◆Need to compare registers and compute target early

in the pipeline

◆Add hardware to do it in ID stage

Page 56

Marquette University

Page 56

Branch Hazards

PC

Flush these

instructions

(Set control

values to 0)

❖ If branch outcome determined at the end of EX stage

❖ By the time we know to branch, new instructions are in the pipe!

55

56

Page 29: COEN-4710 Computer Hardware Lecture 4dejazzer.com/coen4710/lectures/lec04_ch4b_pipelining.pdf · 1 Page 1 COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4)

29

Page 57

Marquette University

Page 57

❖We are basically predicting “branch not taken”

❖What if it is?

◆Toss all instructions in the pipe

◆Keep going (the branch instruction will update the PC

correctly and we’ll get to the right place)

◆Undo any memory/register changes

(this won’t happen, since that’s at the end of the pipe)

So – how do we tell the datapath to quit executing

an instruction in the middle of the pipeline?

Predict!

Page 58

Marquette University

Page 58

❖To flush the pipe (when branch is taken)

◆Set IF.Flush (new control line)

◆Zero all control lines

➢Similar to stall, except don’t disable the PC and IF/ID write

controls – this effectively writes over what the previous

instruction was doing.

◆No memory or register writes will have yet happened,

so everything else is OK

Could we make the decision quicker, to lose fewer clock cycles?

Flush

57

58

Page 30: COEN-4710 Computer Hardware Lecture 4dejazzer.com/coen4710/lectures/lec04_ch4b_pipelining.pdf · 1 Page 1 COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4)

30

Page 59

Marquette University

Page 59

❖Move hardware to determine outcome to ID

stage

◆Target address adder

◆Register comparator

❖This means the branch decision can be made

during the ID stage instead of the EX stage.

◆This is why we need an IF.Flush, but not an ID.Flush!

But Problem: Messes up forwarding for branching

Reducing Branch Delay

Page 60

Marquette University

Page 60

Example: Branch Taken

59

60

Page 31: COEN-4710 Computer Hardware Lecture 4dejazzer.com/coen4710/lectures/lec04_ch4b_pipelining.pdf · 1 Page 1 COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4)

31

Page 61

Marquette University

Page 61

Example: Branch Taken

Page 62

Marquette University

Page 62

Data Hazards for Branches

❖If a comparison register is a destination of 2nd or

3rd preceding ALU instruction

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

add $4, $5, $6

add $1, $2, $3

beq $1, $4, target

◼ Can resolve using forwarding

61

62

Page 32: COEN-4710 Computer Hardware Lecture 4dejazzer.com/coen4710/lectures/lec04_ch4b_pipelining.pdf · 1 Page 1 COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4)

32

Page 63

Marquette University

Page 63

Data Hazards for Branches

❖If a comparison register is a destination of

preceding ALU instruction or 2nd preceding load

instruction

◆Need 1 stall cycle

beq stalled

IF ID EX MEM WB

IF ID EX MEM WB

IF ID

ID EX MEM WB

add $4, $5, $6

ld $1, addr

beq $1, $4, target

Page 64

Marquette University

Page 64

Data Hazards for Branches

❖If a comparison register is a destination of

immediately preceding load instruction

◆Need 2 stall cycles

beq stalled

IF ID EX MEM WB

IF ID

ID

ID EX MEM WB

beq stalled

ld $1, addr

beq $1, $0, target

63

64

Page 33: COEN-4710 Computer Hardware Lecture 4dejazzer.com/coen4710/lectures/lec04_ch4b_pipelining.pdf · 1 Page 1 COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4)

33

Page 65

Marquette University

Page 65

Other branch hazard methods

❖Dynamic branch prediction

◆Keep a table of bits associated with the current chunk

of memory, telling whether a branch was taken last

time that instruction was executed. Use that to guess

at the next instruction to execute

❖Delayed branching

◆Make the instruction set operate such that at a branch

the following instruction is always executed, then the

branch actually occurs. Puts the work on the

programmer / assembler. MIPS does this!

Page 66

Marquette University

Page 66

Outline

❖ Pipelining Introduction

❖ Pipelined Datapath

❖ Pipeline Control

❖ Data Hazards

◆Address data hazards – forwarding

◆Double data hazards - revised forwarding

◆Load-use data hazards - hazard detection unit to “Stall”

❖ Branch Hazards

◆ “Flush”

❖ Exceptions

◆Exception Handling Routine

❖ Improving Performance

65

66

Page 34: COEN-4710 Computer Hardware Lecture 4dejazzer.com/coen4710/lectures/lec04_ch4b_pipelining.pdf · 1 Page 1 COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4)

34

Page 67

Marquette University

Page 67

Exceptions and Interrupts

❖“Unexpected” events requiring change

in flow of control

◆Different ISAs use the terms differently

❖Exception

◆Arises within the CPU

➢e.g., undefined opcode, syscall, …

❖Interrupt

◆From an external I/O controller

❖Dealing with them without sacrificing

performance is hard

Page 68

Marquette University

Page 68

Handling Exceptions

❖Save PC of offending (or interrupted) instruction◆In RISC-V: Supervisor Exception Program Counter

(SEPC)

❖Save indication of the problem◆In RISC-V: Supervisor Exception Cause Register

(SCAUSE)

◆64 bits, but most bits unused➢Exception code field: 2 for undefined opcode, 12 for

hardware malfunction, …

❖Jump to handler◆Assume at 0000 0000 1C09 0000hex

67

68

Page 35: COEN-4710 Computer Hardware Lecture 4dejazzer.com/coen4710/lectures/lec04_ch4b_pipelining.pdf · 1 Page 1 COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4)

35

Page 69

Marquette University

Page 69

An Alternate Mechanism

❖Vectored Interrupts◆Handler address determined by the cause

❖Exception vector address to be added to a vector table base register:◆Undefined opcode 00 0100 0000two

◆Hardware malfunction: 01 1000 0000two

◆…: …

❖Instructions either◆Deal with the interrupt, or

◆Jump to real handler

Page 70

Marquette University

Page 70

Handler Actions

❖Read cause, and transfer to relevant handler

❖Determine action required

❖If restartable◆Take corrective action

◆Use SEPC to return to program

❖Otherwise◆Terminate program

◆Report error using SEPC, SCAUSE, …

69

70

Page 36: COEN-4710 Computer Hardware Lecture 4dejazzer.com/coen4710/lectures/lec04_ch4b_pipelining.pdf · 1 Page 1 COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4)

36

Page 71

Marquette University

Page 71

Exceptions in a Pipeline

❖Another form of control hazard

❖Consider malfunction on add in EX stageadd x1, x2, x1

◆Prevent x1 from being clobbered

◆Complete previous instructions

◆Flush add and subsequent instructions

◆Set SEPC and SCAUSE register values

◆Transfer control to handler

❖Similar to mispredicted branch◆Use much of the same hardware

Page 72

Marquette University

Page 72

Pipeline with Exceptions

71

72

Page 37: COEN-4710 Computer Hardware Lecture 4dejazzer.com/coen4710/lectures/lec04_ch4b_pipelining.pdf · 1 Page 1 COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4)

37

Page 73

Marquette University

Page 73

Improving Performance

❖Try to avoid stalls! e.g., reorder these

instructions:

lw $t0, 0($t1)

lw $t2, 4($t1)

sw $t2, 0($t1)

sw $t0, 4($t1)

Page 74

Marquette University

Page 74

Instruction-Level Parallelism (ILP)

❖Pipelining: executing multiple instructions in parallel

❖To increase ILP◆Deeper pipeline

➢Less work per stage shorter clock cycle

◆Multiple issue➢Replicate pipeline stages multiple pipelines

➢Start multiple instructions per clock cycle

➢CPI < 1, so use Instructions Per Cycle (IPC)

➢E.g., 4GHz 4-way multiple-issue

▪ 16 BIPS, peak CPI = 0.25, peak IPC = 4

➢But dependencies reduce this in practice

73

74

Page 38: COEN-4710 Computer Hardware Lecture 4dejazzer.com/coen4710/lectures/lec04_ch4b_pipelining.pdf · 1 Page 1 COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4)

38

Page 75

Marquette University

Page 75

Super-Pipelining

❖This is just creating longer pipelines.

Remember that the speedup is directly related to

the number of stages in the pipe.

❖Big issue:

◆How do you break up a datapath into smaller pieces,

each with smaller and roughly equal latencies!

Page 76

Marquette University

Page 76

Superscalar Processing

❖Could you do laundry faster if you had 3

washers and 3 dryers?

◆Yes, if you could keep them moving smoothly

❖Basic method:

◆Put multiple copies of datapath into hardware

◆Launch multiple instructions every clock cycle

◆Get ready to have serious control and hazard issues!

❖Biggest issue: need to schedule instructions

quickly and efficiently and get them into the

pipes, in such a way that hazards are minimized.

75

76

Page 39: COEN-4710 Computer Hardware Lecture 4dejazzer.com/coen4710/lectures/lec04_ch4b_pipelining.pdf · 1 Page 1 COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4)

39

Page 77

Marquette University

Page 77

Dynamic Pipeline Scheduling

❖This is like regular or super pipelining, except

with very advanced techniques to make sure

there are always instructions ready to fill in if a

stall happens (don’t want to waste a clock cycle.)

❖Basically keeps a series of instruction buffers

and schedule the instructions as needed to keep

the pipe full and the program running without

hazards.

◆Branch prediction / speculative execution

❖Can create quite a problem for accurate

exception handling.

Page 78

Marquette University

Page 78

Current processors

❖Nearly all have pipelining/superscalar design

❖Compiler technology important

◆Can’t create efficient code for a pipelined or

superscalar processor without the compiler

thoroughly understanding how the pipelining,

scheduling, forwarding and hazard detection all work

for that specific processor.

77

78

Page 40: COEN-4710 Computer Hardware Lecture 4dejazzer.com/coen4710/lectures/lec04_ch4b_pipelining.pdf · 1 Page 1 COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4)

40

Page 79

Marquette University

Page 79

Concluding Remarks

❖ISA influences design of datapath and control

❖Datapath and control influence design of ISA

❖Pipelining improves instruction throughput

using parallelism

◆More instructions completed per second

◆Latency for each instruction is NOT reduced!

❖Hazards: structural, data, control

❖Multiple issue and dynamic scheduling (ILP)

◆Dependencies limit achievable parallelism

◆Complexity leads to the power wall

79