© v. de florio kuleuven 2003 basic concepts computer design computer architectures for ai computer...

110
2.2/1 © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice Course contents Course contents Basic Concepts Computer Design Computer Architectures for AI Computer Architectures in Practice

Upload: egbert-harris

Post on 18-Jan-2018

219 views

Category:

Documents


0 download

DESCRIPTION

© V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/3 IS  DLX Architecture An example RISC architecture designed by Patterson and Hennessey Simple register-register (load-store) instruction set Designed for efficiency  From HW viewpoint  From compiler viewpoint Useful as an example of good IS design

TRANSCRIPT

Page 1: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/1

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Course contentsCourse contents• Basic ConceptsComputer Design• Computer Architectures for AI• Computer Architectures in Practice

Page 2: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/2

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Computer Design Computer Design IS IS• IS Classification• Role of the compilersDLX

Page 3: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/3

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

IS IS DLX Architecture DLX Architecture• An example RISC architecture designed by

Patterson and Hennessey• Simple register-register (load-store)

instruction set• Designed for efficiency

From HW viewpoint From compiler viewpoint

• Useful as an example of good IS design

Page 4: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/4

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

IS IS DLX Architecture DLX Architecture• Registers:

32 registers called R0 = 0, R1, …, R31 32 single-precision floating point registers or 16

double-precision floating point registers F0, F2, …, F30

• Data types Like in C: 1 byte, 2 byte, 4 byte integers and 4 byte

and 8 byte floats• Addressing modes: just 2

Immediate (example: Add R4, #3) Displacement (example: Add R4, 100(R1)) 16-bit fields

Register deferred: Add R4, 0(R1) Absolute: Add R4, 100(R0)

Page 5: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/5

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

IS IS DLX Architecture DLX Architecture• Big endian• DLX instruction format

Just two modes easily to encode in the opcode All instructions have the same length and start with

a 6 bit opcode easier decoding algorithm faster processing shorter cycle is possible

• Layout: P&H p.99

Page 6: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/6

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

IS IS DLX Architecture DLX Architecture• Mnemonics: L=load S=store followed by

B=byte H=half word W=wordF=float D=double

ExamplesLB R1, 50(R9)SF 50(R0), F2

• ADD…(arithmetic op’s), • SL... (shift left, logical op’s), • J…, B… (jump and branch op’s)

Page 7: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/7

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

IS IS DLX Architecture DLX Architecture• How good is the DLX architecture?

• DLX is a RISC architecture

• What’s a RISC architecture, and what’s the difference between a RISC and a non-RISC architecture?

Page 8: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/8

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

IS IS DLX Architecture DLX Architecture• CISC = complex IS architecture

Architecture of the ’70s Axioms:

(1) the IS must be easy to program with(2) the IS must be easy to compile for

IS not too far away from a HLL IS includes high level constructs

Loop instructions vs. gotoesComplex CALL instructions preserving the register fileCase/switch instructions

Large set of addressing modes All addressing modes are available with all the

instructionsKey requirement of the ’70s: Minimize code size

Page 9: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/9

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

IS IS DLX Architecture DLX Architecture• Why?• Because, in the ’70s, RAM memories were

1000 times smaller than today• Code space was a key factor

Page 10: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/10

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

IS IS DLX Architecture DLX Architecture• RISC = restricted IS architecture

Key architecture today Axioms:

(1) the IS must be simple, (2) easy to implement in HW,(3) should match well with clever design solutions

(e.g., pipelining)(4) should be a good target for nowadays

optimising compilers

Page 11: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/11

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

IS IS DLX Architecture DLX Architecture• RISC = restricted IS architecture

Simple instructionsA few simple addressing modesFixed-length instructions“Many” general purpose registersKey goal: Help the machine go fast

In general, RISCs increase the number of instructions executed (IC)…

Recall: CPUTIME(p) = IC(p) CPI(p) clock rate

…but at the same time they decrease CPI The decrease rate of CPI is higher than the increase

rate of IC shorter CPUTIME

Page 12: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/12

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

IS IS DLX Architecture DLX Architecture

Page 13: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/13

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

IS IS DLX Architecture DLX Architecture• Clock cycles: assumed to be the same• Results: • ICMIPS 2 x ICVAX

• CPIMIPS CPIVAX / 6• The performance of the MIPS M2000 is about

3 times the performance of the VAX 8700

Page 14: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/14

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Computer DesignComputer Design• Quantitative assessments• Instruction setsPipelining• Parallelism

Page 15: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/15

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

PipeliningPipelining• Pipelining =

“an implementation technique wherebymultiple instruction are overlapped in execution” (P&H)

• An assembly line: Different steps (pipe stages) …

are completing different parts …of different instructions …in parallel

Page 16: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/16

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

° Four persons (A, B, C, and D) have to perform a certain job on 4 sets of items. The job consists of 4 phases.

PipeliningPipelining

C DA B

° Phase 1 (washing) takes 30’

° Phase 2 (drying), another 30’

° Phase 3 (packaging), other 30’

° Phase 4 (delivering) also takes 30’

Page 17: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/17

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Doing the job sequentially takes 8 hours

30

BCD

A

Time

3030 3030 30 3030 3030 3030 3030 3030

6 PM 7 8 9 10 11 12 1 2 AM

PipeliningPipelining

Page 18: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/18

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

The whole job is now finished in just 3.5 hours

12 2 AM6 PM 7 8 9 10 11 1

BCD

A303030 3030 3030

Key idea: one starts a new phase as soon as possible

PipeliningPipelining

Page 19: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/19

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

What if they had more job to do?

6 PM

BCD

A303030 3030 3030

Between 7.30 and 8pm, each person is busy

PipeliningPipelining

12 2 AM7 8 9 10 11 1

Page 20: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/20

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Between 7.30 and 9.30pm, a whole job is completed every 30’

6 PM

BCD

A303030 3030

PipeliningPipelining

During that period, each worker is permanently at work…

…but a new input must arrive within 30’

12 2 AM7 8 9 10 11 1

Page 21: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/21

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

PipeliningPipelining• Important issues in this example• Each phase has the same complexityEach phase takes the same amount of time!

• In the sequential processing example, the requirement was: a new input must be ready for processing every four phases

• Now, a new input must be available every phase time!

The means that brings the input needs to be fourfold as fast

One gets more from the system; though one also asks more to it

Page 22: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/22

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

PipeliningPipelining• Also in the execution of, e.g., DLX

instructions, we distinguish a number of distinct phases – we call them cycles, because each one takes one clock cycle time

• In DLX, an instructions is completed in at most five cycles

• A number of special purpose registers are used for this:PC (program counter) = address of the instruction to be executedIR (instruction register) = instruction to be executed = *(PC)NPC (next program counter), etc.

Page 23: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/23

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Memory and special purpose registers Memory and special purpose registers in DLXin DLX

10010410810C110114118

52 71 73 10

4552 71 75 52

71 00 96… … … …… … … …… … … …… … … …… … … … …

… … … … …BEQ R1, R3, eq3BEQ R1, R5, eq5BGT R1, #0, positive…

NPC

IMM

PC

IR

ALUOUT

COND

LMD

TMP1

TMP2

Page 24: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/24

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Executing DLX Instructions:Executing DLX Instructions:Phase 1: Instruction Fetch (IF)Phase 1: Instruction Fetch (IF)

10010410810C110114118

52 71 73 10

4552 71 75 52

71 00 96… … … …… … … …… … … …… … … …… … … … …

PC

IR

NPC

IMM

… … … … …BEQ R1, R3, eq3BEQ R1, R5, eq5BGT R1, #0, positive…

00 00 01 00

52 71 73 10IR

ALUOUT

COND

LMD

TMP1

TMP2

Page 25: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/25

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Executing DLX Instructions:Executing DLX Instructions:Phase 1: Instruction Fetch (IF)Phase 1: Instruction Fetch (IF)

10010410810C110114118

52 71 73 10

4552 71 75 52

71 00 96… … … …… … … …… … … …… … … …… … … … …

PC

IR

NPC

IMM

… … … … …BEQ R1, R3, eq3BEQ R1, R5, eq5BGT R1, #0, positive…

00 00 01 00

52 71 73 10 +4

00 00 01 04NPC

ALUOUT

COND

LMD

TMP1

TMP2

Page 26: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/26

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Executing DLX Instructions:Executing DLX Instructions:Phase 2: Instruction Decode andPhase 2: Instruction Decode and

Register Fetch (ID)Register Fetch (ID)10010410810C110114118

52 71 73 10

4552 71 75 52

71 00 96… … … …… … … …… … … …… … … …… … … … …

IR

NPC

IMM

… … … … …BEQ R1, R3, eq3BEQ R1, R5, eq5BGT R1, #0, positive…

00 00 01 00

52 71 73 10

00 00 01 04

ALUOUT

COND

LMD

TMP1

TMP2

PC

(R1)

(R3)00 00 00 10

Page 27: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/27

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Executing DLX Instructions:Executing DLX Instructions:Phase 3: Execution (EX, branch) Phase 3: Execution (EX, branch)

10010410810C110114118

52 71 73 10

4552 71 75 52

71 00 96… … … …… … … …… … … …… … … …… … … … …

IR

NPC

IMM

… … … … …BEQ R1, R3, eq3BEQ R1, R5, eq5BGT R1, #0, positive…

00 00 01 00

52 71 73 10

00 00 01 04

ALUOUT

COND

LMD

TMP1

TMP2

PC

(R1)

(R3)00 00 00 10+

00 00 01 14

Page 28: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/28

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Executing DLX Instructions:Executing DLX Instructions:Phase 3: Execution (EX, branch)Phase 3: Execution (EX, branch)

10010410810C110114118

52 71 73 10

4552 71 75 52

71 00 96… … … …… … … …… … … …… … … …… … … … …

IR

NPC

IMM

… … … … …BEQ R1, R3, eq3BEQ R1, R5, eq5BGT R1, #0, positive…

00 00 01 00

52 71 73 10

00 00 01 04

ALUOUT

COND

LMD

TMP1

TMP2

PC

(R1)

(R3)00 00 00 10

00 00 01 14

=

(R1) == (R3)

Page 29: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/29

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Executing DLX Instructions:Executing DLX Instructions:Phase 3: ExecutionPhase 3: Execution

• An instruction only enters an active phase when it reaches state EX

• At that point, the instruction is said to have issued or to have committed

• The machine state is only changed when an instruction has committed

Page 30: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/30

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Executing DLX Instructions:Executing DLX Instructions:Phase 4: Memory access/branch Phase 4: Memory access/branch

completion (MEM, branch)completion (MEM, branch)10010410810C110114118

52 71 73 10

4552 71 75 52

71 00 96… … … …… … … …… … … …… … … …… … … … …

IR

NPC

IMM

… … … … …BEQ R1, R3, eq3BEQ R1, R5, eq5BGT R1, #0, positive…

00 00 01 14

52 71 73 10

00 00 01 04

ALUOUT

LMD

TMP1

TMP2

PC

(R1)

(R3)00 00 00 10

00 00 01 14

COND (R1) == (R3)

114

Page 31: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/31

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Executing DLX InstructionsExecuting DLX Instructions• DLX branch instructions have only 4 phases• The fifth phase is the write-back (WR), in

which registers are loaded with an output from the ALU (ALUOUT) or from LMD (see P&H Chapter 3)

• For instance, when the instruction is LW R1, 100(R0)phases 3 – 5 are as follows:

3. ALUOUT TMP1 + IMM /* i.e., R0 + 100 */4. LMD Mem[ALUOUT]5. R1 LMD

Page 32: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/32

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

PipelinedPipelinedCache/

memory

Fetchunit

Decodeunit

Executeunit

Regfile

FetchDecode

ExecuteW

riteback

Instr. 1

Instr. 1

Instr. 1

Instr. 1

T1 T2 T3 T4 T5 T6Instr 1 F1 D1 E1 W1Instr 2 F2 D2 E2 W2Instr 3 F3 D3 E3 W3Instr 4 F4 D4 E4Instr 5 F5 D5Instr 6 F6

T1 T2 T3 T4 T5 T6Instr 1 F1 D1 E1 W1Instr 2 F2 D2 E2 W2Instr 3 F3 D3 E3 W3Instr 4 F4 D4 E4Instr 5 F5 D5Instr 6 F6

Instr. 2

T1 T2 T3 T4 T5 T6Instr 1 F1 D1 E1 W1Instr 2 F2 D2 E2 W2Instr 3 F3 D3 E3 W3Instr 4 F4 D4 E4Instr 5 F5 D5Instr 6 F6

Instr. 2

Instr. 3

T1 T2 T3 T4 T5 T6Instr 1 F1 D1 E1 W1Instr 2 F2 D2 E2 W2Instr 3 F3 D3 E3 W3Instr 4 F4 D4 E4Instr 5 F5 D5Instr 6 F6

Instr. 2

Instr. 3

Instr. 4

T1 T2 T3 T4 T5 T6Instr 1 F1 D1 E1 W1Instr 2 F2 D2 E2 W2Instr 3 F3 D3 E3 W3Instr 4 F4 D4 E4Instr 5 F5 D5Instr 6 F6

Instr. 2

Instr. 3

Instr. 4

Instr. 5

Instr. 3

Instr. 4

Instr. 5

Instr. 6

T1 T2 T3 T4 T5 T6Instr 1 F1 D1 E1 W1Instr 2 F2 D2 E2 W2Instr 3 F3 D3 E3 W3Instr 4 F4 D4 E4Instr 5 F5 D5Instr 6 F6

Page 33: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/33

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

PipeliningPipelining• With respect to a non-pipelined machine,

the memory system must deliver n times that bandwidth (n being the number of pipeline stages)

• In pipelined operation, n instructions are concurrently being processed: on average n memory accesses per clock cycle This worsens the memory bottleneck: even apart

from technological advances, this architectural modification increases the number of memory accesses per clock cycle

Page 34: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/34

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

PipeliningPipelining• In DLX, each instruction takes 5 clock cycles

to complete…• …but during each clock cycle, the HW

initiates a new instruction and is executing some part of 5 different instructions

Page 35: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/35

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

PipeliningPipelining• Clearly pipelining increases the complexity of

the HW Each stage involves a set of HW resources; we need

to guarantee that the same HW resource be scheduled for execution in at most one pipeline stage

When the pipelined is in steady state, in each cycle the register file is accessed twice: in ID (for reading),in WB (for writing)

Each clock cycle, we need to perform two reads and one write

We need to guarantee consistent operation even when we read from and write to, e.g., the same register

Page 36: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/36

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

PipeliningPipelining In order to realize the pipeline, values and control

information must “move through” the pipeline from one stage to the next

Special registers, called pipeline registers or pipeline latches, convey that information

This because, instead of having, e.g., a single NPC register, we need to have

NPC’, NPC’’, NPC’’’…

representing the values of NPC during the different stages of different instructions

For instance, bwIDandEX.NPC bwIFandID.NPC

Page 37: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/37

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

PipeliningPipeliningStage Actions and pipeline registersIF bwIFandID.IR *PC

if (bwEXandMEM.COND == TRUE) bwIFandID.NPC bwEXandMEM.NPC else bwIFandID.NPC PC + 4

ID bwIDandEX.TMP1 RbwIFandID.IR[1]

bwIDandEX.TMP2 RbwIFandID.IR[2]

52 71 73 10BEQ R1, R3, eq3

Page 38: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/38

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

PipeliningPipeliningStage Actions and pipeline registersIF bwIFandID.IR *PC

if (bwEXandMEM.COND == TRUE) bwIFandID.NPC bwEXandMEM.NPC else bwIFandID.NPC *PC + 4

ID bwIDandEX.TMP1 RbwIFandID.IR[1]

bwIDandEX.TMP2 RbwIFandID.IR[2]

New reg old reg

bwIDandEX.NPC bwIFandID.NPC bwIDandEX.IR bwIFandID.IR

Page 39: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/39

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

PipeliningPipeliningStage Actions and pipeline registersIF bwIFandID.IR *PC

if (bwEXandMEM.COND == TRUE) bwIFandID.NPC bwEXandMEM.NPC else bwIFandID.NPC *PC + 4

ID bwIDandEX.TMP1 RbwIFandID.IR[1]

bwIDandEX.TMP2 RbwIFandID.IR[2] bwIDandEX.NPC bwIFandID.NPC

bwIDandEX.IR bwIFandID.IRbwIDandEX.IMM bwIFandID.IR[3]

52 71 73 10BEQ R1, R3, eq3

Page 40: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/40

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

PipeliningPipeliningStage Actions and pipeline registersEX bwEXandMEM.ALUOUT

bwIDandEX.NPC + bwIDandEX.Imm bwEXandMEM.cond bwIDandEX.TMP1 rel bwIDandEX.TMP2

…and so forth (see P&H, p.136)

Page 41: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/41

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

PipeliningPipelining• More registers are required a more

complex design is to be carried out• More complex algorithm takes more time

to complete

Page 42: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/42

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

PipeliningPipelining• Indeed, implementing an instruction pipeline

increases the instruction throughput(average number of instructions completed in one time unit)…

…though it slightly increases the execution time of each instruction Overhead for controlling the pipeline Overhead for avoiding “hazards” (to be discussed

later on)

Page 43: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/43

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Quantitative measurementsQuantitative measurements

• U be an unpipelined machine• Clock cycle of U = ccU = 10 ns• Cycle distribution of U is as follows:

ALU instructions (40%) take 4 cycles Branches (20%) take 4 cycles Memory operations (40%) take 5 cycles

• P = pipelined version of U• Clock cycle of P = ccP = 11 ns

(overhead: 1 ns per cycle)• How fast is P w.r.t. U?

(Assumption: continuous flow is available, no pipeline stalls...)

Page 44: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/44

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Quantitative measurementsQuantitative measurements

• Average Instruction Execution Time = T• TU = ccU x average CPI

= 10 ns x ( (40% + 20%) x 4 + 40% x 5 )

ALU BRANCH take 4cycles

MEM takes 5 cycles

= 10 ns x 4.4 = 44 ns

• TP = ccP x average CPI = ccP x 1• Speedup = TU / TP = 44 ns / 11 ns = 4

Page 45: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/45

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Hazards Hazards• Ideally, pipelines should continuously

“crunch” instructions without being interrupted

• This way, the speedup is maximum• In reality there exist three classes of

impediments that prevent the next instruction from entering the pipeline: Structural Hazards Data Hazards Control hazards

to be described in what follows

Page 46: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/46

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Hazards Hazards• Hazards are a problem because they require

to stall the pipeline (see later)• Later on we will show some techniques for

hazard prevention

Page 47: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/47

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Structural Hazards Structural Hazards• Structural Hazards are resource conflicts• Not every combination of instructions is

allowed because not every functional unit is fully pipelined Or because of other resource conflicts

A problem of cost-effectiveness

Consequence: a stall (“bubble”) floats through the pipeline

Page 48: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/48

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Structural Hazards Structural HazardsCycles1 2 3 4 5 6 7 8

LOAD

Instr2

Instr3

Instr4 Mem

Mem

If the machine has just one memory port, this is a structural hazard

Page 49: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/49

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Structural Hazards Structural Hazards

LOAD

Instr2

Instr3

Cycles1 2 3 4 5 6 7 8

Instr4 bubble

Page 50: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/50

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Structural Hazards Structural Hazards• One of the keywords of computer design:

make the common case fast, and the rare case correct

• If a particular structural hazard does not occur very frequently,it may not be worth the cost to avoid it

Page 51: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/51

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Structural Hazards Structural Hazards• Avoiding a conflict has a cost due to the

extra redundancy,but also a cost due to extra control

• Compare for instance fig. 3.1 and fig.3.4 of P&H

• One must be careful so that this overhead does not trigger a need for a higher clock cycle lower clock rate

Recall: CPUTIME(p) = IC(p) CPI(p) clock rate

Page 52: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/52

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Data Hazards Data Hazards• Pipelining overlaps the execution of a set of

instructions• Data Hazards are hazards due to

data dependencies between these overlapped executions

ADD R1, R2, R3

SUB R4, R5, R1

AND R6, R1, R7

OR R8, R1, R9

XOR R10, R1, R11

ADD requires5 cycles tocomplete!

SUB may use the wrong value!

Page 53: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/53

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Data Hazards Data Hazards• Pipelining overlaps the execution of a set of

instructions• Data Hazards are hazards due to

data dependencies between these overlapped executions

ADD R1, R2, R3

SUB R4, R5, R1

AND R6, R1, R7

OR R8, R1, R9

XOR R10, R1, R11

ADD requires5 cycles tocomplete!

SUB, AND, and OR requi-re R1 sooner

XOR is“far” enough

Page 54: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/54

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Data Hazards Data HazardsCycles1 2 3 4 5 6 7 8

ADD

R1, R

2, R

3

SUB R4, R1, R5

AND R6, R1, R7

OR R8, R1, R9

XOR R10, R1, R11

DATA HAZARDS

NOT A DATA HAZARD

Page 55: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/55

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Minimizing or Minimizing or Avoiding Data HazardsAvoiding Data Hazards

• Let us consider again ADD R1, R2, R3• “ADD requires 5 cycles to complete” means

“the sum of R2 and R3 will be stored intoR1 only at the 5th cycle”

Why should we wait for this to happen? Forwarding: using a pipeline register that

holds the right value

SUB R4, bwEXandMEM.ALUOUT , R5

SUB R4, R1, R5becomes

Page 56: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/56

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Minimizing or Minimizing or Avoiding Data HazardsAvoiding Data Hazards

• How forwarding is realized?• By propagating the result of the ALU directly

to an input latch of the ALU• A custom circuit selects the right value to be

input to the ALU: the named register or the propagated value

Page 57: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/57

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Minimizing or Minimizing or Avoiding Data HazardsAvoiding Data Hazards

• Sometimes forwarding can be avoided by very simple techniques

• For instance, let us assume that, during each cycle, writes into the register file occur in thefirst half of the cycle, whilereads occur in the second half

W

R

Page 58: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/58

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Minimizing or Minimizing or Avoiding Data HazardsAvoiding Data HazardsCycles

1 2 3 4 5 6 7 8AD

D R1

, R2,

R3

SUB R4, R1, R5

AND R6, R1, R7

OR R8, R1, R9

XOR R10, R1, R11

3, 4: Forwarding 5: F. Avoidance

Page 59: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/59

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Classification of Classification ofData HazardsData Hazards

• Let ( Ik)1 k IC(p) be the ordered series of instructions executed during the run of program p

• Let i j two integers, 1 i j IC(p)• So Ii occurs before Ij• Let us represent predicate

“instruction i writes in memory location v”as Ii v

• Let us represent predicate “instruction i reads from location v”as Ii v

Page 60: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/60

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Classification of Classification ofData HazardsData Hazards

1. RAW HAZARD (Read-After-Write hazard)

t

Ii v

Ij v

• RAW data dependency on an operand that needs first to be written by Ii, and then read by Ij

• If, due to pipelining, Ij reads v before Ii writes it, a RAW hazard occurs : Ij erroneously gets a stale value

Page 61: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/61

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Classification of Classification ofData HazardsData Hazards

2. WAW HAZARD (Write-After-Write hazard)

t

Ii v

Ij v

• WAW data dependency on an operand that must be written in a certain order while it is written in the wrong one

• If, due to pipelining, Ij writes v before Ii writes it, a WAW hazard occurs : the wrong value gets stored in v

Page 62: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/62

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Classification of Classification ofData HazardsData Hazards

• WAW hazards may happen in pipelines such that the write-back stage happens in different positions

LW R1, 0(R2)

ADD R1,R2,R3

IF

IF

ID

ID EX

EX MEM1 MEM2 WB

WBWB

WB

WB

WB

• This cannot happen with instruction sets such as, e.g., DLX, where each instruction takes the same amount of cycles

• Less tricky design less complexity to handle less pitfalls

Page 63: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/63

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Classification of Classification ofData HazardsData Hazards

3. WAR HAZARD (Write-After-Read hazard)

t

Ii v

Ij v

• WAR data dependency on an operand that needs first to be read by Ii, and then written by Ij

• If, due to pipelining, Ij writes v before Ii reads it, a WAR hazard occurs : the wrong value is read from v

• Ii erroneously gets the NEW value of v, the one produced by Ij

Page 64: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/64

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Classification of Classification ofData HazardsData Hazards

• This cannot happen with instruction sets such as, e.g., DLX, where all reads are early (ID stage) and all writes are late (WB stage)

• WAR hazards occur when there are instructions that write results early in the instruction pipeline, as well asinstructions that read a source late in the pipeline

• For instance, this may happen with the autoincrement addressing mode

Page 65: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/65

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Hazards Hazards

Cycles1 2 3 4 5 6 7 8

ADD

R1, R

2, R

3

SUB R4, R1, R5

AND R6, R1, R7

OR R8, R1, R9

• In some cases, forwarding and subcycling can prevent a stall

Page 66: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/66

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Hazards Hazards

Cycles1 2 3 4 5 6 7 8

LW R

1, 0

(R2)

SUB R4, R1, R5

AND R6, R1, R7

OR R8, R1, R9

• In some cases, forwarding and subcycling cannot prevent a stall

IMPOSSIBLE!

Page 67: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/67

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Hazards Hazards

1 2 3 4 5 6 7 8LW

R1,

0(R

2)

SUB R4, R1, R5

AND R6, R1, R7

OR R8, R1, R9

• A special HW, called the pipeline interlock, detects the hazard and stalls the pipeline until the hazard is cleared

bubble

bubble

bubble

Page 68: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/68

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Hazards Hazards• Pipeline interlock penalty:

one or more clock cycles• Consequences:

the CPI for the stalled instruction increases by the length of the stall

Page 69: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/69

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Pipeline Scheduling Pipeline Scheduling• Classical solution: pipeline scheduling• The compiler re-arranges the instructions in

order to (try to) avoid stalls• Example: the compiler tries to avoid

generating code like

LW x, … INSTR …, x

that is, a load followed by the immediate use of the load destination register

Page 70: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/70

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

LW R1, bLW R2, cADD R3, R1, R2SW a, R3LW R4, eLW R5, fSUB R6, R4, R5SW d, R6

Pipelining Pipelining Pipeline Scheduling Pipeline Scheduling1. Generate DLX code for the expressions

a = b + c d = e – f

Basic block

Page 71: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/71

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

LW R1, bLW R2, cADD R3, R1, R2SW a, R3LW R4, eLW R5, fSUB R6, R4, R5SW d, R6

Pipelining Pipelining Pipeline Scheduling Pipeline Scheduling2. We make a graph of the dependences

among the instructions and we order the instructions so as to minimize the stalls

LW R1, bLW R2, cLW R4, eADD R3, R1, R2 LW R5, f SW a, R3SUB R6, R4, R5SW d, R6

Page 72: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/72

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Control Hazards Control Hazards• Control hazards are hazards due to the

execution of branches• Let us call

TAKEN BRANCHa branch that sets the PC to its target address

• Let us callUNTAKEN BRANCHa branch that does not force the PC to be set; as far as PC is concerned, it behaves like a NOP

Page 73: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/73

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Control Hazards Control Hazards• The problem with branches is that their

nature is only known at run-time• Simplest method to deal with branches:

as soon as we detect a branch, we stall the pipeline

• What does exactly mean “as soon as”?

Page 74: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/74

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

DLX BranchDLX Branch1: IF (1/2)1: IF (1/2)

10010410810C110114118

52 71 73 10

4552 71 75 52

71 00 96… … … …… … … …… … … …… … … …… … … … …

PC

IR

NPC

IMM

… … … … …BEQ R1, R3, eq3BEQ R1, R5, eq5BGT R1, #0, positive…

00 00 01 00

52 71 73 10IR

ALUOUT

COND

LMD

TMP1

TMP2

Page 75: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/75

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

DLX Branch:DLX Branch: 1: IF (2/2) 1: IF (2/2)

10010410810C110114118

52 71 73 10

4552 71 75 52

71 00 96… … … …… … … …… … … …… … … …… … … … …

PC

IR

NPC

IMM

… … … … …BEQ R1, R3, eq3BEQ R1, R5, eq5BGT R1, #0, positive…

00 00 01 00

52 71 73 10 +4

00 00 01 04NPC

ALUOUT

COND

LMD

TMP1

TMP2

At this point, we’ve just fetched an instruction; butwe don’t know yet WHICH ONE!

Page 76: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/76

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

DLX BranchDLX Branch2: ID2: ID

10010410810C110114118

52 71 73 10

4552 71 75 52

71 00 96… … … …… … … …… … … …… … … …… … … … …

IR

NPC

IMM

… … … … …BEQ R1, R3, eq3BEQ R1, R5, eq5BGT R1, #0, positive…

00 00 01 00

52 71 73 10

00 00 01 04

ALUOUT

COND

LMD

TMP1

TMP2

PC

(R1)

(R3)00 00 00 10

At this point, we’ve decoded theinstruction and found that it’sindeed a branch

Page 77: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/77

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

DLX BranchDLX Branch3: EX (1/2) 3: EX (1/2)

10010410810C110114118

52 71 73 10

4552 71 75 52

71 00 96… … … …… … … …… … … …… … … …… … … … …

IR

NPC

IMM

… … … … …BEQ R1, R3, eq3BEQ R1, R5, eq5BGT R1, #0, positive…

00 00 01 00

52 71 73 10

00 00 01 04

ALUOUT

COND

LMD

TMP1

TMP2

PC

(R1)

(R3)00 00 00 10+

00 00 01 14

Here we get the next PC ofthe taken branch

Page 78: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/78

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

DLX BranchDLX Branch3: EX (2/2)3: EX (2/2)

10010410810C110114118

52 71 73 10

4552 71 75 52

71 00 96… … … …… … … …… … … …… … … …… … … … …

IR

NPC

IMM

… … … … …BEQ R1, R3, eq3BEQ R1, R5, eq5BGT R1, #0, positive…

00 00 01 00

52 71 73 10

00 00 01 04

ALUOUT

COND

LMD

TMP1

TMP2

PC

(R1)

(R3)00 00 00 10

00 00 01 14

=

(R1) == (R3)

Only at this point we now thenature of the branch:brnch = (cond)? Taken:Untaken;

Page 79: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/79

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Control Hazards Control Hazards• The problem with branches is that their

nature is only known at run-time• Simplest method to deal with branches:

as soon as we detect a branch, we stall the pipeline

1. “As soon as” means after the IF stage, during stage ID IF: first stall

2. Then we need to reach the EX stage to know the address where to branch to ID: second stall

3. The nature of a branch is revealed at the end of EX, in MEM EX: third stall

• At this point, the pipeline restarts

Page 80: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/80

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Control Hazards Control Hazards• With a 30% branch frequency and an ideal

CPI of 1, three clock cycles of penalty means that the machine only achieves about HALF the ideal speedup from pipelining

• What can we do to reduce the three cycle penalty?

Page 81: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/81

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Control Hazards Control Hazards1. Uncover the nature of the branch earlier in

the pipeline:in DLX, this means adding a test to the ID stage

2. Compute the taken PC earlier:at the cost of an additional adder, we can anticipate the addition that gives the taken PC

3. (For untaken branches): do not repeat the IF stage

• These strategies can reduce the branch penalty to one clock cycle

Page 82: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/82

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Control Hazards Control Hazards• How to deal with branch penalties• Four simple compile-time schemes

Static, fixed, per-branch predictions Compile-time guesses

• Simplest: freezing or flushing the pipeline Penalty: one clock cycle

• Predict not taken: The HW continues as if the branch was not taken

(next IR = *(PC + 4)) If the branch is taken, the fetched instruction is

invalidated (turned into a NOP) Penalty: no penalty if untaken,

one cycle if taken

Page 83: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/83

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Control Hazards Control Hazards Predict Predict not takennot taken

IFUntaken branch

i + 1 IF

ID

i + 2

EX

ID

IF

i + 3

MEM

EX

ID

IF

i + 4

WB

MEM

EX

ID

IF

WB

MEM

EX

ID

WB

MEM

EX

WB

MEM WB

Taken branch IF

i + 1 IF

ID EX

idle

Branch target IF

MEM

idle

ID

Branch target + 1 IF

idle

MEM

EX

ID

WB

MEM

EX

WB

MEM WB

WB

idle

EX

ID

IFBranch target + 2

Page 84: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/84

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Control Hazards Control Hazards• Predict taken:

Hypothesis: the taken branch address is known very early, long before the outcome of the branch is known

The HW assumes the branch is taken Penalty: no penalty if taken,

one cycle if untaken Due to loops, taken branches are more than

untaken branch

Page 85: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/85

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Control Hazards Control Hazards• Delayed branch • Hypothesis: a branch implies a delay that

adds up to the time required to execute n instructions

• The branch delay slot is then filled in with instructions that would be executed whatever the outcome of the branch test be

• In DLX, n = 1

Page 86: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/86

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Control Hazards Control Hazards Delayed branchDelayed branch

Untaken branch

Branch delay

IF

IF

ID EX MEM WB

ID EX MEM WB

i + 1 IF ID EX MEM WB

i + 2 IF ID EX MEM WB

i + 3 IF ID EX MEM WB

Taken branch

Branch delay

IF

IF

ID EX MEM WB

Branch target IF ID EX MEM WB

Branch target + 1 IF ID EX MEM WB

IF ID EX MEM WBBranch target + 2

ID EX MEM WB

Page 87: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/87

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Control Hazards Control Hazards Slot Slot scheduleschedule

• Problem: how to schedule the branch-delay slot

• Three ways• Best choice: an independent instruction from

before the branch

INSTR1INSTR2IF TEST THEN

Delay slot…INSTR N

INSTR1

INSTR2IF TEST THEN

…INSTR N

INSTR2

• Penalty: none

Page 88: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/88

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Control Hazards Control Hazards Slot Slot scheduleschedule

• If the best choice is not possible, e.g., due to a dependency, then one may choose among the following two methods:

1. From target :If it is not possible to select an independent instruction from before the branch (a sure one!), then you must guess: If the chance that the branch is taken is felt as higher, then you fill the delay slot with an instruction from the target of the branch

Page 89: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/89

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Control Hazards Control Hazards Slot Slot scheduleschedule

INSTR1INSTR2

IF TEST THEN…

• Penalty: none if the branch is a taken one, 1 clock cycle if it’s untaken

Delay slot

INSTR1INSTR2

IF TEST THEN…

• From target

INSTR 1

• Assumption: no side effect from executing INSTR 1 when branch is mispredicted (no undo required!)

INSTR1

Page 90: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/90

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Control Hazards Control Hazards Slot Slot scheduleschedule

2. From fall through :If it is not possible to select an independent instruction from before the branch (a sure one!), and if the chance that the branch is not taken is felt as higher, then you fill the delay slot with the instruction at PC+4

INSTR1INSTR2

IF TEST THENDelay slot

…INSTR N

INSTR1

…INSTR N

IF TEST THEN

INSTR2

Page 91: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/91

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Control Hazards Control Hazards Slot Slot scheduleschedule

• Again, the instruction selected to be placed in the delay slot must be side effect free

• That instruction must be such that no undo is required if the branch goes in the mispredicted direction

BEQ R2, R3, Skip LW R1, #100 . . .Skip LW R1, #200 . . .

The second load overwrites the first one

Page 92: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/92

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Control Hazards Control Hazards• The LW example is clearly an ideal one. In

reality, it is very difficult to select an instruction for the delay slot

• Furthermore, these schemes are compile-time predictions that may be found to be false at run-time

Page 93: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/93

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Control Hazards Control Hazards• Improvements are possible :

Canceling branches :the branch instructions include a prediction bit (taken vs. untaken).If the prediction bit is false, the branch instruction “cancels” the instruction in the delay slot by writing the NOP bit(s)

• This makes it easier to select instructions for the delay slot: the side-effect free requirement can be relaxed

Page 94: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/94

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Control Hazards Control Hazards The use of delayed and cancelling branches

resulted in no penalty in 70% of the time on average with 10 programs of the SPECint92 benchmarks (5 int., 5 f.p.)

Delayed branches have an extra cost:an interrupt may occur also during the execution of the instruction in the branch delay slot (BDSI). If the branch was taken, then both the address of the BDSI and that of the branch target need to be preserved and restored when the interrupt has been served

Page 95: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/95

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Control Hazards Control Hazards• The longer the pipeline, the more pipeline

stages are required (1) to uncover the current branch target address and (2) to tell the nature of the current branch

• In DLX, one clock cycle (very small)• In R4000, it is 3 clock cycles (1) and 1 clock

cycle (2)

Page 96: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/96

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Static Branch Prediction & Static Branch Prediction & Compiler SupportCompiler Support

• The effectiveness of delayed branch depends on the truth value of our guess

• Static branch prediction: predicting the outcome of a branch at compile time(vs. dynamic prediction: prediction based on runtime program behaviour)

• Static prediction method 1: observing and analysing the program behaviour

• Static prediction method 2: using profile information collected from earlier runs of the program

Page 97: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/97

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Static Branch Prediction & Static Branch Prediction & Compiler SupportCompiler Support

• Static prediction method 1: observing and analysing the program behaviour

• Observations (10 SPECint92 benchmark programs) show that most branches are taken On average, 62% in integer programs, 70% in f.p.

programs (total 67%) Of taken branches, backward branches are at least

1.5 times more than forward branches

Page 98: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/98

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Static Branch Prediction & Static Branch Prediction & Compiler SupportCompiler Support

• Simplest sub-method: predict-as-taken (1.1) • In our benchmark, a minority of these predictions is

wrong (34%)

• Note: On the average! Worst misprediction is 59%, best is 9%(in the worst case, predict-as-untaken would give better performance!)

Page 99: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/99

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Static Branch Prediction & Static Branch Prediction & Compiler SupportCompiler Support

• Method 1.2: predict-bw-as-takenpredict-fw-as-untaken

• For some programs and compilers, fw branches) 50%

• In this case only, M1.2 is better than M1.1

• This is not true for the 10 SPECint92 programs and in most cases

Page 100: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/100

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Static Branch Prediction & Static Branch Prediction & Compiler SupportCompiler Support

• Static prediction method 2: using profile information collected from earlier runs of the program

• You see what happened in the past and consider this as a good model for the future

• Per branch prediction• Key observation and principle: “often,”

a given branch has a high-probability behaviour A privileged attribute It is most likely a taken or an untaken branch

Page 101: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/101

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Static Branch Prediction & Static Branch Prediction & Compiler SupportCompiler Support

• Average # of instructions between mispredictions: 20 vs 110

• St.dev: 27 vs. 85 (very large)

Page 102: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/104

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Branches?OK! Done!

Latencies?

Dependencies?

Pipelining? Nice idea…Pipelining? Nice idea…

Page 103: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/105

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining? Nice idea… Pipelining? Nice idea… …though……though…

Exceptions? Oh God…

Page 104: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/106

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

ExceptionsExceptions• An exception is an event that is triggered at

run time due to the interaction with the environment and results in a (temporary or permanent) suspension of the current application so to manage the event

• Examples: A key has been pressed (interrupt) The user invokes a service of the OS A breakpoint is encountered A division-by-zero condition is encountered An overflow or underflow condition A NaN float Misalignments Access to protected or non existing memory areas Power failures…

Page 105: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/107

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Exceptions Exceptions• What happens to the pipeline when

an exception takes place?• With pipelining, instructions are no more

atomic• An instruction is further subdivided into

stages• The instruction is only completed at the end

of the last stage• If an interrupt occurs in the middle of a

committed instruction, the result may be a half-finished instruction

Page 106: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/108

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Exceptions Exceptions• Interrupt

An external event asks for immediate attention (service) by raising an input line (the INT line)

The main program is interrupted whatever it is doing

A jump is made to the interrupt service routine (ISR)

After processing the ISR, the main program resumes where it was broken off

• A pipeline (or machine) is said to be restartable if it can handle an exception (e.g. an interrupt), save the state, and restart without affecting the execution of the program being interrupted

Page 107: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/109

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Exceptions Exceptions• Precise exceptions: a property of a pipelined

machine such that

instructions just before the exceptions are completed andinstructions after the exceptions can be restarted from scratch

• Often precise exceptions imply a huge penalty

• The IBM PowerPc and others adopt two modes: Precise exceptions mode (slow, for debugging) Performance mode (inprecise, fast)

Page 108: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/110

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Exceptions Exceptions• In the DLX integer pipeline no instruction

updates the machine state before the end of the MEM stage

• This makes realising precise exceptions very easy

• The instructions later in the pipeline have not committed yet

• This is not true, e.g., for the autodecrement mode instructions of the VAX, which cause the update of registers in the middle of the execution of an instruction

Page 109: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/111

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Exceptions Exceptions• If such an instruction is aborted due to an

exception, the machine state would be left altered

• Machines with these instructions often have the ability to back out any state change before the instruction has committed

• If an exception occurs, the machine uses this feature to reset the state of the machine to its value before the interrupted instruction started

Page 110: © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.2/1 Course contents Basic

2.2/112

© V. De FlorioKULeuven 2003

BasicConcepts

ComputerDesign

ComputerArchitecturesfor AI

ComputerArchitecturesIn Practice

Pipelining Pipelining Exceptions Exceptions• On VAX and the 360 family, special

instructions use the general purpose registers as working storage

• In such machines, g.p. registers are always saved on exception and restored after the exception

• The state of partially completed instructions lies in these registers, which makes the exceptions precise