designing a multicycle processor adapted from dave patterson (http.cs.berkeley.edu/~patterson)...

46
Designing a Multicycle Pro cessor Adapted from Dave Patterson (http.cs.berkeley.edu/~patters on) lecture slides

Upload: michelle-hodges

Post on 26-Mar-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Designing a Multicycle Processor Adapted from Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides

Designing a Multicycle Processor

Adapted from

Dave Patterson (http.cs.berkeley.edu/~patterson)

lecture slides

Page 2: Designing a Multicycle Processor Adapted from Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides

Dave Patterson serving as ACM president

Page 3: Designing a Multicycle Processor Adapted from Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides

Recap: Processor Design is a Process

° Bottom-up• assemble components in target technology to establish critical

timing

° Top-down• specify component behavior from high-level requirements

° Iterative refinement• establish partial solution, expand and improve

datapath control

processorInstruction SetArchitecture

=>

Reg. File Mux ALU Reg Mem Decoder Sequencer

Cells Gates

Page 4: Designing a Multicycle Processor Adapted from Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides

Recap: A Single Cycle Datapath

° We have everything except control signals (underlined)• How to generate the control signals?

32

ALUctr

Clk

busW

RegWr

32

32

busA

32

busB

55 5

Rw Ra Rb

32 32-bitRegisters

Rs

Rt

Rt

RdRegDst

Exten

der

Mu

x

Mux

3216imm16

ALUSrc

ExtOp

Mu

x

MemtoReg

Clk

Data InWrEn

32

Adr

DataMemory

32

MemWrA

LU

InstructionFetch Unit

Clk

Zero

Instruction<31:0>

0

1

0

1

01<

21:25>

<16:20>

<11:15>

<0:15>

Imm16RdRsRt

nPC_sel

Page 5: Designing a Multicycle Processor Adapted from Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides

The “Truth Table” for the Main Control

R-type ori lw sw beq jump

RegDst

ALUSrc

MemtoReg

RegWrite

MemWrite

Branch

Jump

ExtOp

ALUop (Symbolic)

1

0

0

1

0

0

0

x

“R-type”

0

1

0

1

0

0

0

0

Or

0

1

1

1

0

0

0

1

Add

x

1

x

0

1

0

0

1

Add

x

0

x

0

0

1

0

x

Subtract

x

x

x

0

0

0

1

x

xxx

op 00 0000 00 1101 10 0011 10 1011 00 0100 00 0010

ALUop <2> 1 0 0 0 0 x

ALUop <1> 0 1 0 0 0 x

ALUop <0> 0 0 0 0 1 x

MainControl

op

6

ALUControl(Local)

func

3

6

ALUop

ALUctr

3

RegDst

ALUSrc

:

Conditions

Page 6: Designing a Multicycle Processor Adapted from Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides

PLA Implementation of the Main Control

op<0>

op<5>. .op<5>. .<0>

op<5>. .<0>

op<5>. .<0>

op<5>. .<0>

op<5>. .<0>

R-type ori lw sw beq jumpRegWrite

ALUSrc

MemtoReg

MemWrite

Branch

Jump

RegDst

ExtOp

ALUop<2>

ALUop<1>

ALUop<0>

Page 7: Designing a Multicycle Processor Adapted from Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides

Systematic Generation of Control

° In our single-cycle processor, each instruction is realized by exactly one control command or “microinstruction”

• in general, the controller is a finite state machine

• microinstruction can also control sequencing

Control Logic / Store(PLA, ROM)

OPcode

Datapath

Inst

ruct

ion

Decode

Con

ditio

nsControlPoints

microinstruction

Page 8: Designing a Multicycle Processor Adapted from Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides

Behavioral models of Datapath Components

entity adder16 isgeneric (ccOut_delay : TIME := 12 ns; adderOut_delay: TIME := 12 ns);port(A, B: in vlbit_1d(15 downto 0); DOUT: out vlbit_1d(15 downto 0); CIN: in vlbit; COUT: out vlbit);end adder16;

architecture behavior of adder16 isbegin

adder16_process: process(A, B, CIN)

variable tmp : vlbit_1d(18 downto 0);variable adder_out : vlbit_1d(15 downto 0);variable carry: vlbit;

begintmp := addum (addum (A, B), CIN);

adder_out := tmp(15 downto 0);carry :=tmp(16);

COUT <= carry after ccOut_delay; DOUT <= adder_out after adderOut_delay; end process;end behavior;

16

1616

A B

DOUT

CinCout

addum adds two n-bit vectors to produce an n+1 bit vector

Page 9: Designing a Multicycle Processor Adapted from Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides

Note on VHDL variables

° Can only be used in processes

° Are declared before the begin keyword of a process statement

° Assignment operator is :=

° Can be assigned an initial value by adding the := and some constant expression after the type part of the declaration

° Do not have or cause events and are modified differently than signals

° In the following code, if x is a bit signal, then the process will count the number of rising and falling edges that occur on the signal x

count: process (x) variable changes : integer := -1; begin changes:=changes+1; end process;

Page 10: Designing a Multicycle Processor Adapted from Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides

Behavioral Specification of Control Logic

° Decode / Control-store address modeled by Case statement

° Each arm drives control signals for that operation• just like the microinstruction

• either can be symbolic

entity maincontrol isport(opcode: in vlbit_1d(5 downto 0); equal_cond: in vlbit;

extop out vlbit;ALUsrc out vlbit;ALUop out vlbit_1d(2 downto 0);MEMwr out vlbit;MemtoReg out vlbit;RegWr out vlbit;RegDst out vlbit;nPC out vlbit;

end maincontrol;

Page 11: Designing a Multicycle Processor Adapted from Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides

Abstract View of our single cycle processor

° looks like a FSM with PC as state

PC

Nex

t P

C

Reg

iste

rF

etch ALU Reg

. W

rt

Mem

Acc

ess

Dat

aM

emInst

ruct

ion

Fet

ch

Res

ult

Sto

re

AL

Uct

r

Reg

Dst

AL

US

rc

Ext

Op

Mem

WrEq

ual

nPC

_sel

Reg

Wr

Mem

Wr

Mem

Rd

MainControl

ALUcontrol

op

fun

Ext

Page 12: Designing a Multicycle Processor Adapted from Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides

What’s wrong with our CPI=1 processor?

° Long Cycle Time

° All instructions take as much time as the slowest

° Real memory is not so nice as our idealized memory• cannot always get the job done in one (short) cycle

PC Inst Memory mux ALU Data Mem mux

PC Reg FileInst Memory mux ALU mux

PC Inst Memory mux ALU Data Mem

PC Inst Memory cmp mux

Reg File

Reg File

Reg File

Arithmetic & Logical

Load

Store

Branch

Critical Path

setup

setup

It seems the timing of Branch has been deliberately shortened with the addition of a highly efficient comparator.

Page 13: Designing a Multicycle Processor Adapted from Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides

Memory Access Time

° Physics => fast memories are small (large memories are slow)

• question: register file vs. memory

° => Use a hierarchy of memories

Storage Array

selected word line

addressstorage cell

bit line

sense ampsaddressdecoder

CacheProcessor

1 cycle

proc

. bu

s

L2Cache

mem

. bu

s

2-3 cycles 20 - 50 cycles

memory

Page 14: Designing a Multicycle Processor Adapted from Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides

Reducing Cycle Time

° Cut combinational dependency graph and insert register / latch

° Do same work in two fast cycles, rather than one slow one

storage element

Acyclic CombinationalLogic

storage element

storage element

Acyclic CombinationalLogic (A)

storage element

storage element

Acyclic CombinationalLogic (B)

=>

Page 15: Designing a Multicycle Processor Adapted from Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides

Basic Limits on Cycle Time° Next address logic

• PC = (branch == 1) ? PC + 4 + offset : PC + 4

° Instruction Fetch• InstructionReg <= Mem[PC]

° Register Access• A <= R[rs]

° ALU operation• S <= A + B

PC

Nex

t P

C

Ope

rand

Fet

ch Exec Reg

. F

ile

Mem

Acc

ess

Dat

aM

em

Inst

ruct

ion

Fet

ch

Res

ult

Sto

re

AL

Uct

r

Reg

Dst

AL

US

rc

Ext

Op

Mem

Wr

nPC

_sel

Reg

Wr

Mem

Wr

Mem

Rd

Control

Page 16: Designing a Multicycle Processor Adapted from Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides

Partitioning the CPI=1 Datapath° Add registers between smallest steps

PC

Nex

t P

C

Ope

rand

Fet

ch Exec Reg

. F

ile

Mem

Acc

ess

Dat

aM

em

Inst

ruct

ion

Fet

ch

Res

ult

Sto

re

AL

Uct

r

Reg

Dst

AL

US

rc

Ext

Op

Mem

Wr

nPC

_sel

Reg

Wr

Mem

Wr

Mem

Rd

Page 17: Designing a Multicycle Processor Adapted from Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides

Example Multicycle Datapath

PC

Nex

t P

C

Ope

rand

Fet

ch

Ext

ALU Reg

. F

ile

Mem

Acc

ess

Dat

aM

em

Inst

ruct

ion

Fet

ch

Res

ult

Sto

re

AL

Uct

r

Reg

Dst

AL

US

rc

Ext

Op

nPC

_sel

Reg

Wr

Mem

Wr

Mem

Rd

IRA

B

S

M

RegFile

Mem

ToR

eg

Equ

al

Page 18: Designing a Multicycle Processor Adapted from Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides

Recall: Step-by-step Processor Design

Step 1: ISA => Logical Register Transfers

Step 2: Components of the Datapath

Step 3: RTL + Components => Datapath

Step 4: Datapath + Logical RTs => Physical RTs

Step 5: Physical RTs => Control

Page 19: Designing a Multicycle Processor Adapted from Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides

Step 4: R-type (add, sub, . . .)° Logical Register Transfer

° Physical Register Transfers

inst Logical Register Transfers

ADDU R[rd] <– R[rs] + R[rt]; PC <– PC + 4

inst Physical Register Transfers

IR <– MEM[PC]

ADDU A<– R[rs]; B <– R[rt]

S <– A + B

R[rd] <– S; PC <– PC + 4

Exe

c

Reg

. F

ile

Mem

Acc

ess

Dat

aM

em

A

B

S

M

Reg

File

Equ

al

PC

Nex

t P

C

IR

Inst

. M

em

Page 20: Designing a Multicycle Processor Adapted from Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides

Step 4:Logical immed° Logical Register Transfer

° Physical Register Transfers

inst Logical Register Transfers

ORi R[rt] <– R[rs] OR zx(Im16); PC <– PC + 4

inst Physical Register Transfers

IR <– MEM[pc]

ORi A<– R[rs]; B <– R[rt]

S <– A OR ZeroExt(Im16)

R[rt] <– S; PC <– PC + 4

Exe

c

Reg

. F

ile

Mem

Acc

ess

Dat

aM

em

A

B

S

M

Reg

File

Equ

al

PC

Nex

t P

C

IR

Inst

. M

em

Page 21: Designing a Multicycle Processor Adapted from Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides

Step 4 : Load° Logical Register Transfer

° Physical Register Transfers

inst Logical Register Transfers

LW R[rt] <– MEM(R[rs] + sx(Im16));

PC <– PC + 4

inst Physical Register Transfers

IR <– MEM[PC]

LW A<– R[rs]; B <– R[rt]

S <– A + SignEx(Im16)

M <– MEM[S]

R[rt] <– M; PC <– PC + 4

Exe

c

Reg

. F

ile

Mem

Acc

ess

Dat

aM

em

A

B

S

M

Reg

File

Equ

al

PC

Nex

t P

C

IR

Inst

. M

em

Page 22: Designing a Multicycle Processor Adapted from Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides

Step 4 : Store° Logical Register Transfer

° Physical Register Transfers

inst Logical Register Transfers

SW MEM(R[rs] + sx(Im16) <– R[rt];

PC <– PC + 4

inst Physical Register Transfers

IR <– MEM[PC]

SW A<– R[rs]; B <– R[rt]

S <– A + SignEx(Im16);

MEM[S] <– B PC <– PC + 4

Exe

c

Reg

. F

ile

Mem

Acc

ess

Dat

aM

em

A

B

S

M

Reg

File

Equ

al

PC

Nex

t P

C

IR

Inst

. M

em

Page 23: Designing a Multicycle Processor Adapted from Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides

Step 4 : Branch° Logical Register Transfer

° Physical Register Transfers

inst Logical Register Transfers

BEQ if R[rs] == R[rt]

then PC <= PC + 4 + sx(Im16) << 2

else PC <= PC + 4

inst Physical Register Transfers

IR <– MEM[PC]

A<– R[rs]; B <– R[rt]

BEQ|Eq PC <– PC + 4

inst Physical Register Transfers

IR <– MEM[PC]

A<– R[rs]; B <– R[rt]

BEQ|Eq PC <– PC + 4 + sx(Im16)<<2

Exe

c

Reg

. F

ile

Mem

Acc

ess

Dat

aM

em

A

B

S

M

Reg

File

Equ

al

PC

Nex

t P

C

IR

Inst

. M

em

Page 24: Designing a Multicycle Processor Adapted from Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides

Alternative datapath: Multiple Cycle Datapath

° Minimizes Hardware: 1 memory (Princeton organization), 1 adder (ALU)

IdealMemoryWrAdrDin

RAdr

32

32

32Dout

MemWr

32

AL

U

3232

ALUOp

ALUControl

Instru

ction R

eg

32

IRWr

32

Reg File

Ra

Rw

busW

Rb5

5

32busA

32busB

RegWr

Rs

Rt

Mu

x

0

1

Rt

Rd

PCWr

ALUSelA

Mux 01

RegDst

Mu

x

0

1

32

PC

MemtoReg

Extend

ExtOp

Mu

x

0

132

0

1

23

4

16Imm 32

<< 2

ALUSelB

Mu

x1

0

Target32

Zero

ZeroPCWrCond PCSel BrWr

32

IorD

AL

U O

ut

2

Note: On p. 323 of COD ed. 3, RAdr & WrAdr are combined into Address.

Page 25: Designing a Multicycle Processor Adapted from Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides

Seminal works of Mealy and Moore

° G.H. Mealy, A method for synthesizing sequential circuits, Bell Technical Journal, vol. 34, pp. 1045~1079, 1955.

° E.F. Moore, Gedanken-experiments on sequential machines, Automata Studies (C.E. Shannon & J. McCarthy, eds), pp.29~153, Princeton University Press, 1956.

Note: Gedanken = thought, mind

Aphorisms:

° "Anything on earth you want to do is play. Anything on earth you have to do is work. Play will never kill you, work will. I never worked a day in my life." - Dr. Leila Denmark, 105 (in 2004), USA's oldest practicing physician.

° “The purpose of computing is insight, not numbers” and “Anything a faculty member can learn, a student can easily.” - Richard Hamming, inventor of Hamming code.

Page 26: Designing a Multicycle Processor Adapted from Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides

Control Model° State specifies control points for Register Transfer

° Transfer occurs upon exiting state (same falling edge)

Control State

Next StateLogic

Output Logic

inputs (conditions)

outputs (control points)

State X

Register TransferControl Points

Depends on Input

Page 27: Designing a Multicycle Processor Adapted from Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides

Step 4 => Control Specification for multicycle proc

IR <= MEM[PC]

R-type

A <= R[rs]B <= R[rt]

S <= A fun B

R[rd] <= SPC <= PC + 4

S <= A or ZX

R[rt] <= SPC <= PC + 4

ORi

S <= A + SX

R[rt] <= MPC <= PC + 4

M <= MEM[S]

LW

S <= A + SX

MEM[S] <= BPC <= PC + 4

BEQ & EqualBEQ & ~Equal

PC <= PC + 4 PC <= PC + 4 +SX << 2

SW

“instruction fetch”

“decode / operand fetch”

Exe

cute

Mem

ory

read

/writ

e

Writ

e-ba

ck

Page 28: Designing a Multicycle Processor Adapted from Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides

Traditional FSM Controller

state

6

4

11nextstate

op

Equal

control signals

(current) state

op condnextstate control signals

Truth Table

datapath state

Page 29: Designing a Multicycle Processor Adapted from Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides

Step 5: datapath + state diagram => control

° Translate RTs (register transfers) into control points

° Assign states

° Then go build the controller

Page 30: Designing a Multicycle Processor Adapted from Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides

Mapping RTs to Control Signals

IR <= MEM[PC]

R-type

A <= R[rs]B <= R[rt]

S <= A fun B

R[rd] <= SPC <= PC + 4

S <= A or ZX

R[rt] <= SPC <= PC + 4

ORi

S <= A + SX

R[rt] <= MPC <= PC + 4

M <= MEM[S]

LW

S <= A + SX

MEM[S] <= BPC <= PC + 4

BEQ & EqualBEQ & ~Equal

PC <= PC + 4 PC <= PC + 4 +SX << 2

SW

“instruction fetch”

“decode / operand fetch”

Exe

cute

Mem

ory

read

/writ

e

Writ

e-ba

ck

imem_rd, IRen

Aen, Ben

ALUfun, Sen

RegDst, RegWr,PCen

Page 31: Designing a Multicycle Processor Adapted from Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides

Assigning States (State Assignment)

IR <= MEM[PC]

R-type

A <= R[rs]B <= R[rt]

S <= A fun B

R[rd] <= SPC <= PC + 4

S <= A or ZX

R[rt] <= SPC <= PC + 4

ORi

S <= A + SX

R[rt] <= MPC <= PC + 4

M <= MEM[S]

LW

S <= A + SX

MEM[S] <= BPC <= PC + 4

BEQ & EqualBEQ & ~Equal

PC <= PC + 4 PC <= PC + 4 +SX << 2

SW

“instruction fetch”

“decode / operand fetch”

Exe

cute

Mem

ory

read

/writ

e

Writ

e-ba

ck

0000

0001

0100

0101

0110 1000 1011 00110010

0111 1010

11001001

Page 32: Designing a Multicycle Processor Adapted from Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides

Detailed Control Specification

0000 ?????? ? 0001 10001 BEQ 0 0011 1 10001 BEQ 1 0010 1 10001 R-type x 0100 1 10001 orI x 0110 1 10001 LW x 1000 1 10001 SW x 1011 1 10010 xxxxxx x 0000 1 10011 xxxxxx x 0000 1 00100 xxxxxx x 0101 0 1 fun 10101 xxxxxx x 0000 1 0 0 1 10110 xxxxxx x 0111 0 0 or 10111 xxxxxx x 0000 1 0 0 1 01000 xxxxxx x 1001 1 0 add 11001 xxxxxx x 1010 1 0 11010 xxxxxx x 0000 1 0 1 1 01011 xxxxxx x 1100 1 0 add 11100 xxxxxx x 0000 1 0 0 1

State Op field Eq Next IR PC Ops Exec Mem Write-Backen en sel AenBenEx Sr ALU SenR W MenM-R Wr Dst

R:

ORi:

LW:

SW:

-all same in Moore machine

Page 33: Designing a Multicycle Processor Adapted from Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides

Performance Evaluation

° What is the average CPI?• state diagram gives CPI for each instruction type

• workload gives frequency of each type

Type CPIi for type Frequency CPIi x freqi

Arith/Logic 4 40% 1.6

Load 5 30% 1.5

Store 4 10% 0.4

branch 3 20% 0.6

Average CPI:4.1

Page 34: Designing a Multicycle Processor Adapted from Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides

Controller Design° The state diagrams that define the controller for an instruction s

et processor are highly structured

° Use this structure to construct a simple “microsequencer”

° Control reduces to programming this very simple device• microprogramming

sequencercontrol

datapath control

micro-PCsequencer

microinstruction

Page 35: Designing a Multicycle Processor Adapted from Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides

Example: Jump-Counter

op-codeMapping ROM

Counterzeroincload

0000i

i+1

i

Page 36: Designing a Multicycle Processor Adapted from Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides

Using a Jump Counter

IR <= MEM[PC]

R-type

A <= R[rs]B <= R[rt]

S <= A fun B

R[rd] <= SPC <= PC + 4

S <= A or ZX

R[rt] <= SPC <= PC + 4

ORi

S <= A + SX

R[rt] <= MPC <= PC + 4

M <= MEM[S]

LW

S <= A + SX

MEM[S] <= BPC <= PC + 4

BEQ & EqualBEQ & ~Equal

PC <= PC + 4 PC <= PC + SX || 00

SW

“instruction fetch”

“decode”

Exe

cute

Mem

ory

Writ

e-ba

ck

0000

0001

0100

0101

0110

0111

1000

1001

1010

0011 00101011

1100

inc

load inc

zero zero zero zero

zero zeroinc inc inc inc

inc

Page 37: Designing a Multicycle Processor Adapted from Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides

Our Microsequencer

op-code

Mapping ROM

Micro-PC

Z I Ldatapath control

taken

注意:此圖之控制記憶體已較 Traditional FSM Controller

投影片所示者縮小甚多,請注意

比較此二圖之差異。

5

Page 38: Designing a Multicycle Processor Adapted from Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides

Microprogram Control Specification

0000 ? inc 10001 1 inc0001 0 load0010 1 zero 1 10011 0 zero 1 00100 x inc 0 1 fun 10101 x zero 1 0 0 1 10110 x inc 0 0 or 10111 x zero 1 0 0 1 01000 x inc 1 0 add 11001 x inc 1 0 11010 x zero 1 0 1 1 01011 x inc 1 0 add 11100 x zero 1 0 0 1

µPC Taken Next IR PC Ops Exec Mem Write-Backen sel A B Ex Sr ALU S R W M M-R Wr Dst

R:

ORi:

LW:

SW:

BEQ

註:表中各暫存器 (IR 、 A、 B、 S、 M) 之載入致能訊號皆省略en。

Page 39: Designing a Multicycle Processor Adapted from Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides

Mapping ROM– mapping opcode to PC

R-type 000000 0100

BEQ 000100 0011

Ori 001101 0110

LW 100011 1000

SW 101011 1011

opcode PC

(input) (output)

註: Mapping ROM 輸出須配合 load 訊號存入 PC中

Page 40: Designing a Multicycle Processor Adapted from Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides

Example: Controlling Memory

PC

InstructionMemory

Inst. Reg

addr

Data (instruction)

IR_en

InstMem_rd

IM_wait

Page 41: Designing a Multicycle Processor Adapted from Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides

Controller handles non-ideal memory

IR <= MEM[PC]

R-type

A <= R[rs]B <= R[rt]

S <= A fun B

R[rd] <= SPC <= PC + 4

S <= A or ZX

R[rt] <= SPC <= PC + 4

ORi

S <= A + SX

R[rt] <= MPC <= PC + 4

M <= MEM[S]

LW

S <= A + SX

MEM[S] <= B

BEQ & EqualBEQ & ~Equal

PC <= PC + 4 PC <= PC + SX || 00

SW

“instruction fetch”

“decode / operand fetch”

Exe

cute

Mem

ory

Writ

e-ba

ck

~wait wait

~wait wait

PC <= PC + 4

~wait wait

Page 42: Designing a Multicycle Processor Adapted from Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides

Really Simple Time-State Control

inst

ruct

ion

fet

chde

code

Exe

cute

Mem

ory

IR <= MEM[PC]

R-type

A <= R[rs]B <= R[rt]

S <= A fun B

R[rd] <= SPC <= PC + 4

S <= A or ZX

R[rt] <= SPC <= PC + 4

ORi

S <= A + SX

R[rt] <= MPC <= PC + 4

M <= MEM[S]

LW

S <= A + SX

MEM[S] <= B

BEQ & EqualBEQ & ~Equal

PC <= PC + 4PC <= PC + SX || 00

SW

~wait wait

wait

PC <= PC + 4

wait

writ

e-ba

ck

Page 43: Designing a Multicycle Processor Adapted from Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides

Time-state Control Path° Local decode and control at each stage

Exe

c

Reg

. F

ile

Mem

Acc

ess

Dat

aM

em

A

B

S

M

Reg

File

Equ

al

PC

Nex

t P

C

IR

Inst

. M

em

Valid

IRex

Dcd

Ctr

l

IRm

em

Ex

Ctr

l

IRw

b

Mem

Ctr

l

WB

Ctr

l

Valid 位元指示 IR、 IRex 、 IRmem 、 IRwb 中之控制位元

是否已有效;此種方式常用於 pipeline 控制單元之設計。

Page 44: Designing a Multicycle Processor Adapted from Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides

Overview of Control

° Control may be designed using one of several initial representations. The choice of sequence control, and how logic is represented, can then be determined independently; the control can then be implemented with one of several methods using a structured logic technique.

Initial Representation Finite State Diagram Microprogram

Sequencing Control Explicit Next State Microprogram counter Function + Dispatch ROMs

(即Mapping ROMs)

Logic Representation Logic Equations Truth Tables

Implementation PLA ROM Technique

“hardwired control” “microprogrammed control”

Page 45: Designing a Multicycle Processor Adapted from Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides

Summary

° Disadvantages of the Single Cycle Processor

• Long cycle time

• Cycle time is too long for all instructions except the Load

° Multiple Cycle Processor:

• Divide the instructions into smaller steps

• Execute each step (instead of the entire instruction) in one cycle

° Partition datapath into equal-size chunks to minimize cycle time• ~10 levels of logic between latches

° Follow same 5-step method for designing “real” processors

Page 46: Designing a Multicycle Processor Adapted from Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides

Summary (cont’d)

° Control is specified by finite state diagram

° Specialize state-diagrams easily captured by microsequencer• simple increment & “branch” fields

• datapath control fields

° Control design reduces to Microprogramming

° Control is more complicated with:• complex instruction sets

• restricted datapaths (see the book)

° Simple Instruction set and powerful datapath => simple control• could try to reduce hardware (see the book)

• rather go for speed => many instructions at once (Pipeline) !