cpe232 basic mips architecture1 computer organization multi-cycle approach dr. iyad jafar adapted...

24
CPE232 Basic MIPS Architecture 1 Computer Organization Multi-cycle Approach Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides http://www.abandah.com/gheith/Courses/CPE335_S08/index .html

Upload: gillian-atkins

Post on 14-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CPE232 Basic MIPS Architecture1 Computer Organization Multi-cycle Approach Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides

CPE232 Basic MIPS Architecture 1

Computer Organization

Multi-cycle Approach

Dr. Iyad Jafar

Adapted from Dr. Gheith Abandah slides

http://www.abandah.com/gheith/Courses/CPE335_S08/index.html

Page 2: CPE232 Basic MIPS Architecture1 Computer Organization Multi-cycle Approach Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides

CPE232 Basic MIPS Architecture 2

Multicycle Datapath Approach Let an instruction take more than 1 clock cycle to complete

Break up instructions into steps where - each step takes a cycle while trying to balance the amount of work to be

done in each step

- restrict each cycle to use only one major functional unit; unless used in parallel

Not every instruction takes the same number of clock cycles

In addition to faster clock rates, multicycle allows functional units that can be used more than once per instruction as long as they are used on different clock cycles, as a result

Need one memory only– but only one memory access per cycle Need one ALU/adder only – but only one ALU operation per cycle

Page 3: CPE232 Basic MIPS Architecture1 Computer Organization Multi-cycle Approach Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides

CPE232 Basic MIPS Architecture 3

At the end of a cycle Store values needed in a later cycle by the current instruction in internal registers

(A,B, IR, and MDR) . These registers are invisible to the programmer. All of these registers, except IR, hold data only between a pair of adjacent clock

cycles thus they don’t need write control signal.

IR – Instruction Register MDR – Memory Data Register

A, B – regfile read data registers ALUout – ALU output register

Multicycle Datapath Approach, con’t

Address

Read Data(Instr. or Data)

Memory

PC

Write Data

Read Addr 1

Read Addr 2

Write Addr

Register

File

Read Data 1

Read Data 2

ALU

Write Data

IRM

DR

AB A

LU

ou

t

Data used by subsequent instructions are stored in programmer visible registers (i.e., register file, PC, or memory)

Page 4: CPE232 Basic MIPS Architecture1 Computer Organization Multi-cycle Approach Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides

CPE232 Basic MIPS Architecture 4

Multicycle Datapath Approach, con’t

Similar to single cycle, shared functional units should have multiplexers at their inputs. There is only one adder that will be used to update PC, perform ALU operations, comparison for beq, memory address computation, and branch address computation.

Page 5: CPE232 Basic MIPS Architecture1 Computer Organization Multi-cycle Approach Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides

CPE232 Basic MIPS Architecture 5

Multicycle Datapath Approach- Control Signals

Page 6: CPE232 Basic MIPS Architecture1 Computer Organization Multi-cycle Approach Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides

CPE232 Basic MIPS Architecture 6

The Multicycle Datapath with Control Signals

Address

Read Data(Instr. or Data)

Memory

PC

Write Data

Read Addr 1

Read Addr 2

Write Addr

Register

File

Read Data 1

Read Data 2

ALU

Write Data

IRM

DR

AB

AL

Uo

ut

SignExtend

Shiftleft 2 ALU

control

Shiftleft 2

ALUOpControl

IRWriteMemtoReg

MemWriteMemRead

IorD

PCWrite

PCWriteCond

RegDstRegWrite

ALUSrcAALUSrcB

zero

PCSource

1

1

1

1

1

10

0

0

0

0

0

2

2

3

4

Instr[5-0]

Instr[25-0]

PC[31-28]

Instr[15-0]

Instr[3

1-2

6]

32

28

Page 7: CPE232 Basic MIPS Architecture1 Computer Organization Multi-cycle Approach Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides

CPE232 Basic MIPS Architecture 7

Multicycle Machine: 1-bit Control Signals

Signal Effect when deasserted Effect when asserted

RegDstThe destination register number comes from the rt field

The destination register number comes from the rd field

RegWrite NoneWrite is enabled to selected destination register

ALUSrcA The first ALU operand is the PC The first ALU operand is register A

MemRead NoneContent of memory address is placed on Memory data out

MemWrtite NoneMemory location specified by the address is replaced by the value on Write data input

MemtoRegThe value fed to register file is from ALUOut

The value fed to register file is from memory

IorDPC is used as an address to memory unit

ALUOut is used to supply the address to the memory unit

IRWrite None The output of memory is written into IR

PCWrite NonePC is written; the source is controlled by PCSource

PCWriteCond NonePC is written if Zero output from ALU is also active

Page 8: CPE232 Basic MIPS Architecture1 Computer Organization Multi-cycle Approach Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides

CPE232 Basic MIPS Architecture 8

Multicycle Machine: 2-bit Control Signals

Signal Value Effect

ALUOp

00 ALU performs add operation

01 ALU performs subtract operation

10 The funct field of the instruction determines the ALU operation

ALUSrcB

00 The second input to the ALU comes from register B

01 The second input to the ALU is 4 (to increment PC)

10The second input to the ALU is the sign extended offset , lower 16 bits of IR.

11The second input to the ALU is the sign extended , lower 16 bits of the IR shifted left by two bits

PCSource

00 Output of ALU (PC +4) is sent to the PC for writing

01The content of ALUOut are sent to the PC for writing (Branch address)

10 The jump address is sent to the PC for writing

Page 9: CPE232 Basic MIPS Architecture1 Computer Organization Multi-cycle Approach Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides

CPE232 Basic MIPS Architecture 9

Breaking Instruction Execution into Clock Cycles

1. IFetch: Instruction Fetch and Update PC (Same for all instructions) Operations

1.1 Instruction Fetch: IR <= Memory[PC]

1.2 Update PC : PC <= PC + 4

Control signals values- IorD = 0 , MemRead = 1 , IRWrite = 1

- ALUSrcA = 0, ALUSrcB = 01, ALUOp = 00, PCWrite = 1

- PCSrc = 00

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5

IFetch Dec Exec Mem WB

Page 10: CPE232 Basic MIPS Architecture1 Computer Organization Multi-cycle Approach Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides

CPE232 Basic MIPS Architecture 10

Breaking Instruction Execution into Clock Cycles

2. Decode - Instruction decode and register fetch (same for all instructions)

We don’t know the instruction yet, do non harmful operations Operations

2.1 read the two source registers rs and rt and place them in registers A and B, respectively.

A <= Reg[IR[25:21]]

B <= Reg[IR[20:16]]

2.2 Compute the branch address

ALUOut <= PC + (sign-extend(IR[15:0]) <<2)

Control signals values- ALUSrcA = 0, ALUSrcB = 11, ALUOp = 00

Page 11: CPE232 Basic MIPS Architecture1 Computer Organization Multi-cycle Approach Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides

CPE232 Basic MIPS Architecture 11

Breaking Instruction Execution into Clock Cycles

3. Execution, Memory address computation, or branch completion

Operation in this cycle depends on instruction type Operations

* if memory reference, compute address

ALUOut <= A + sign-extend(IR[15:0])

ALUSrcA = 1, ALUSrcB = 10, ALUOp = 00

* if arithmetic-logic instruction, perform operation

ALUOut <= A op B

ALUSrcA = 1, ALUSrcB = 00, ALUOp = 10

Page 12: CPE232 Basic MIPS Architecture1 Computer Organization Multi-cycle Approach Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides

CPE232 Basic MIPS Architecture 12

Breaking Instruction Execution into Clock Cycles

3. Execution, Memory address computation, or branch completion (continued)

operation depends on instruction type Operations

* if branch instruction

if (A == B) PC<= ALUOut

ALUSrcA = 1, ALUSrcB = 00, ALUOp = 01, PCWriteCond = 1, PCSrc = 01

* if jump instruction

PC <= {PC[31:28], (IR[25:0],2’b00)}

PCSource = 10, PCWrite = 1

Page 13: CPE232 Basic MIPS Architecture1 Computer Organization Multi-cycle Approach Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides

CPE232 Basic MIPS Architecture 13

Breaking Instruction Execution into Clock Cycles

4. Memory access or R-type completion

operation in this cycle depends on instruction type Operations

* if load instruction : read value from memory into MDR

MDR <= Memory[ALUOut]

MemRead = 1, IorD = 1

* if store instruction: store rt into memory

Memory[ALUOut] <= B

MemWrite = 1, IorD = 1

* if arithmetic-logical instruction: write ALU result into rd

Reg[IR[15:11]] <= ALUOut

MemtoReg = 0, RegDst = 1, RegWrite = 1

Page 14: CPE232 Basic MIPS Architecture1 Computer Organization Multi-cycle Approach Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides

CPE232 Basic MIPS Architecture 14

Breaking Instruction Execution into Clock Cycles

5. Memory read completion

Needed for the load instruction only Operations

5.1 store the loaded value in MDR into rt

Reg[IR[20:16]] <= MDR

RegWrite = 1, MemtoReg = 1, RegDst = 0

Page 15: CPE232 Basic MIPS Architecture1 Computer Organization Multi-cycle Approach Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides

CPE232 Basic MIPS Architecture 15

Breaking Instruction Execution into Clock Cycles

In this implementation, not all instructions take 5 cycles

Instruction Class Clock Cycles Required

Load 5

Store 4

Branch 3

Arithmetic-logical 4

Jump 3

Page 16: CPE232 Basic MIPS Architecture1 Computer Organization Multi-cycle Approach Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides

CPE232 Basic MIPS Architecture 16

Multicycle Performance

Compute the average CPI for multicycle implementation for SPECINT2000 program which has the following instruction mix: 25% loads, 10% stores, 11% branches, 2% jumps, 52% ALU. Assume the CPI for each instruction class as given in the previous table

CPI = Σ CPIi x ICi / IC

= 0.25 x 5 + 0.1 x 4 + 0.11 x 3 + 0.02 x 3 + 0.52 x 4

= 4.12

Compare to CPI = 1 for single cycle ?!! Assume CCM = 1/5 CCS

Then

PerformanceM / PerformanceS = (IC x 1 x CCS ) / (IC x 4.12 x (1/5) CCS)

= 1.21 Multicycle is also cost-effective in terms of hardware.

Page 17: CPE232 Basic MIPS Architecture1 Computer Organization Multi-cycle Approach Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides

CPE232 Basic MIPS Architecture 17

Multicycle datapath control signals are not determined solely by the bits in the instruction e.g., op code bits tell what operation the ALU should be doing, but not what instruction cycle is to be done next

Since the instruction is broken into multiple cycles, we need to know what we did in the previous cycle(s) in order to determine the current action

Must use a finite state machine (FSM) for control a set of states (current state stored in State Register) next state function (determined

by current state and the input) output function (determined by

current state and the input)

Multicycle Control Unit

Combinationalcontrol logic

State RegInst

Opcode

Datapathcontrolpoints

Next State

. . . . . .

. . .

Page 18: CPE232 Basic MIPS Architecture1 Computer Organization Multi-cycle Approach Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides

CPE232 Basic MIPS Architecture 18

The States of the Control Unit

10 states are required in the FSM control

The sequence of states is determined by five steps of execution and the instruction

Page 19: CPE232 Basic MIPS Architecture1 Computer Organization Multi-cycle Approach Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides

CPE232 Basic MIPS Architecture 19

The Control Unit

1. Logic gates inputs : present state +

opcode #bits = 10 outputs: control +

next state #bits = 20 truth table size =

210 rows x 20 columns

2. ROM Can be used to implement

the truth table above (210 x 20 bit = 20 Kbit)

Each location stores the control signals values and the next state

Each location is addressable by the opcode and next state value

Page 20: CPE232 Basic MIPS Architecture1 Computer Organization Multi-cycle Approach Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides

CPE232 Basic MIPS Architecture 20

Micro-programmed Control Unit ROM implementation is

vulnerable to bugs and expensive especially for complex CPU. Size increase as the number and complexity of instructions (states) increases.

Use Microprogramming

The next state value may not be sequential

Generate the next state outside the storage element

Each state is a microinstruction and the signals are specified symbolically

Use labels for sequencing

Page 21: CPE232 Basic MIPS Architecture1 Computer Organization Multi-cycle Approach Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides

CPE232 Basic MIPS Architecture 21

Sequencer

Page 22: CPE232 Basic MIPS Architecture1 Computer Organization Multi-cycle Approach Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides

CPE232 Basic MIPS Architecture 22

Microprogram

The microassembler converts the microcode into actual signal values

The sequencing field is used along with the opcode to determine the next state

Page 23: CPE232 Basic MIPS Architecture1 Computer Organization Multi-cycle Approach Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides

CPE232 Basic MIPS Architecture 23

Multicycle Advantages & Disadvantages

Uses the clock cycle efficiently – the clock cycle is timed to accommodate the slowest instruction step

Multicycle implementations allow functional units to be used more than once per instruction as long as they are used on different clock cycles

but

Requires additional internal state registers, more muxes, and more complicated (FSM) control

Clk

Cycle 1

IFetch Dec Exec Mem WB

Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10

IFetch Dec Exec Mem

lw sw

IFetch

R-type

Page 24: CPE232 Basic MIPS Architecture1 Computer Organization Multi-cycle Approach Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides

CPE232 Basic MIPS Architecture 24

Single Cycle vs. Multiple Cycle Timing

Clk Cycle 1

Multiple Cycle Implementation:

IFetch Dec Exec Mem WB

Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10

IFetch Dec Exec Mem

lw sw

IFetch

R-type

Clk

Single Cycle Implementation:

lw sw Waste

Cycle 1 Cycle 2

multicycle clock slower than 1/5th of single cycle clock due to state register overhead