lec 7 instruction pipeline intro

7/31/2019 Lec 7 Instruction Pipeline Intro

1/35

Instruction Pipeline: Introduction

Ajit Pal

Professor

Department of Computer Science and

EngineeringIndian Institute of Technology Kharagpur

INDIA -721302


2/35

Ajit Pal, IIT Kharagpur

Outline

IntroductionIdeal Conditions

Implementing Instruction pipeline

CPI of a multi-cycle implementationPipeline registers

Speedup

Limits of instruction pipelining


3/35


Introduction

The earliest use of parallelism in designingCPUs (since 1985) to enhance processingspeed was Pipelining

Computers execute billions of instructions,so instruction throughput is what matters

It is an implementation technique wherebymultiple instructions are executed in anoverlapped manner

It takes advantage of temporal parallelismthat exists among the actions needed toexecute an instruction


4/35


Ideal Conditions

Assumptions:

Instructions can be divided intoindependent parts, each takingnearly equal time

Instructions are executed insequence one after the other in theorder in which they are written

Successive instructions are

independent of one anotherThere is no resources constraint


5/35


6/35


Simple RISC DatapathIF ID EX MEM WB


7/35


Pipelining of a RISC-like Processor

All ALU operations are performed on registeroperandsSeparate Instruction and data memoryOnly instructions which access memory are

load and store instructionsInstructions can be broken into the followingfive parts:

Instruction Fetch from instruction memory

Instruction decode and operand readInstruction Execution

Load/store operands (data memory)Write back results in the registers


8/35

Instruction Fetch

IF Cycle: IR Mem [PC]; NPC PC + 4


9/35

Instruction Decode

ID Cycle: A Regs [IR 6-10]; B Regs [IR 11-15];

Imm ((IR16)16 # # IR 16-31)


10/35

Execute

EX Cycle: ALUoutput A + Immor ALUoutput A func B or ALUoutput A op Imm

or ALUoutput NPC + Imm; Cond (A op 0)


11/35

Memory access

MEM Cycle: LMD Mem [ALUoutput ]or Mem [ALUoutput ] B

or If (Cond) PC ALUoutput ; Else PC NPC


12/35

Write Back

WB Cycle: Regs [IR 16-20] ALUoutputor Regs [IR 11-15] ALUoutput

or Regs [IR 11-15] LMD


13/35

CPI for the Multiple-Cycle Implementation

Branches and stores: 4 cycles (no WB), allother instructions: 5 cycles

If 20% of the instructions are branches orloads, CPI = 0.8*5 + 0.2*4 = 4.80

ALU operations can be allowed to completein 4 cycles (no MEM)

If 40% of the instructions are ALUoperations, CPI = 0.4*5 + 0.6*4 = 4.40

Pipelining the implementation can help toreduce the CPI


14/35


Pipeline Registers Pipeline registersare essential part of pipelines:

There are 4 groups of pipeline registers in the 5stage pipeline.

Each group saves output from one stage and passesit as input to the next stage:

IF/ID

ID/EX EX/MEM

MEM/WB

This way, each time something is computed...

Effective address, Immediate value, Registercontent, etc.

It is saved safely in the context of the instructionthat needs it.


15/35


16/35


Pipeline Registers

IF/ID ID/EX EX/MEM MEM/WB

inst. 1

inst. 2

inst. 3

IF


17/35


Pipeline Registers


inst. 1

inst. 2

inst. 3

IF ID

IF


18/35


19/35


Pipeline Registers


inst. 1

inst. 2

inst. 3

IF ID EX

IF ID

IF


20/35


Pipeline Registers


inst. 1

inst. 2

inst. 3

IF ID EX

IF ID

IF


21/35


Pipeline Registers


inst. 1

inst. 2

inst. 3

IF ID EX MEM

IF ID EX

IFID


22/35


Pipeline Registers


inst. 1

inst. 2

inst. 3

IF ID EX MEM

IF ID EX

IFID


23/35


Pipeline Registers

Typically, we will not think too much about pipelineregisters and one just assumes that values arepassed magically down stages of the pipeline


inst. 1

inst. 2

inst. 3

IF ID EX MEM WB

IF ID EX MEM

IF ID EX

. . .

. . .


24/35


Pipeline Register Depiction

instructio

ns

IMALU

DM

IM ALU DM

IMALU

DM

IMALU

DM

Reg Reg

Reg

Reg

Reg

Reg

Reg

IF/ID

ID/EX

EX/MEM

MEM/WB

IF/ID

ID/EX

EX/ME

M

MEM/W

B

IF/ID

ID/EX

EX/MEM

M

EM/WB

IF/ID

ID/EX

EX/MEMEvery stage reads input

and writes output to

the pipeline registers


25/35


Why is Pipelining RISC Processors Easy?

Recall the main principles of RISC:

All operands are in registers

The only operations that affect memory are loadsand stores.

Few instruction formats, fixed encoding (i.e., all

instructions are the same size) Although pipelining could conceivably be

implemented for any architecture,

It would be inefficient.

Pentium has characteristics of RISC and CISC ---CISC Instructions are internally converted toRISC-like instructions.


26/35


Operation of Different Stages IF Cycle

IR Mem [PC]

NPC PC + 4

ID Cycle

A Regs [IR 6-10];

B Regs [IR 11-15];

Imm ((IR16)16 # # IR 16-31)

EX Cycle

ALUoutput A + Imm or

ALUoutput A func B or

ALUoutput A op Imm or

ALUoutput NPC + Imm

Cond (A op 0)

MEM Cycle

LMD Mem [ALUoutput ] or

Mem [ALUoutput ] BorIf (Cond) PC ALUoutputElse PC NPC

WB CycleRegs [IR 16-20] ALUoutputor

Regs [IR 11-15] ALUoutputorRegs [IR 11-15] LMD


27/35


Adding Pipeline Registers


28/35


Basic RISC Pipelining

Basic idea:

Each instruction spends 1 clock cyclein each of the 5 execution stages.

During 1 clock cycle, the pipeline canprocess (in different stages) 5 different

instructions.


29/35


Alternative Visualization


30/35

Speedup Assume that a multiple cycle RISC implementation has a 10 ns

clock cycle, loads take 5 clock cycles and account for 40% of

the instructions, and all other instructions take 4 clock cycles.

If pipelining the machine adds 1 ns to the clock cycle, howmuch speedup in instruction execution rate do we get frompipelining?

Average instruction execution Time (nonpipelined)= Clock cycle x Average CPI= 10 ns x (0.6 x 4 + 0.4 x 5)= 44 ns

Average instruction execution Time (pipelined)

= 10 + 1 = 11 ns Speedup = 44 / 11 = 4 The above expression assumes a pipelining CPI of 1 Should we expect this in practice? Any complications?


31/35


Limits of Pipelining

Increasing the number of pipeline stages in a givenlogic block by a factor of k :

Generally allows increasing clock speed &throughput by a factor of almost k.

Usually less than kbecause of overheads such aslatches and balance of delay in each stage.

But, pipelining has a natural limit:

At least 1 layer of logic gates per pipeline stage!

Practical minimum is usually several gates (2-10).

Commercial designs are rapidly nearing this point.


32/35


Optimal number of stages


33/35


Speedup


34/35


SummaryIntroduced the basic concepts of

Instruction pipelining

Observed that the time required toperform an individual instruction

does not decrease, but throughputincreases

Observed that the increase is

performance is not linear with thenumber of stages


35/35


Thanks!

lec 7 instruction pipeline intro

Documents

computer architecture lec 3 – performance + pipeline...

lec.1. intro to construction management & planning

lec 2 grep sed intro

lec-1 intro to research-2.pptx

computer architecture lec 3 – performance + pipeline...

lec 1-chap 1-intro to engineering

hoa lec iii romansique intro

intro mis lec

summer 2015 intro lec

operations management lec 01 - intro

elem lec intro

dc lec-01 (intro to data com)

lec 1 intro

business law lec. 1a (intro) (22.03.2011)

lec 8 super alloys intro

teaser -intro lec

history- intro lec 1

lec-6 pipeline introduction

db system lec-02 & 03 (intro to dbms)

parasitology lec intro