1 eecs 150 - components and design techniques for digital systems lec 22 – designing an...

38
1 EECS 150 - Components and Design Techniques for Digital Systems Lec 22 Designing an Instruction Set Interpreter 11/18/2004 David Culler Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~culler http://www-inst.eecs.berkeley.edu/~cs150

Post on 21-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 EECS 150 - Components and Design Techniques for Digital Systems Lec 22 – Designing an Instruction Set Interpreter 11/18/2004 David Culler Electrical

1

EECS 150 - Components and Design

Techniques for Digital Systems

Lec 22 – Designing an

Instruction Set Interpreter

11/18/2004David Culler

Electrical Engineering and Computer SciencesUniversity of California, Berkeley

http://www.eecs.berkeley.edu/~cullerhttp://www-inst.eecs.berkeley.edu/~cs150

Page 2: 1 EECS 150 - Components and Design Techniques for Digital Systems Lec 22 – Designing an Instruction Set Interpreter 11/18/2004 David Culler Electrical

2

Review: Datapath vs Control

• Datapath: Storage, FU, interconnect sufficient to perform the desired functions

– Inputs are Control Points– Outputs are signals

• Controller: State machine to orchestrate operation on the data path

– Based on desired function and signals

Datapath Controller

Control Points

signals

Page 3: 1 EECS 150 - Components and Design Techniques for Digital Systems Lec 22 – Designing an Instruction Set Interpreter 11/18/2004 David Culler Electrical

3

Resource Utilization Charts

• One way to visualize datapath optimizations is through the use of a resource utilization charts.

• These are used in high-level design to help schedule operations on shared resources.

• Resources are listed on the y-axis. Time (in cycles) on the x-axis.

• Example:

memory fetch A1 fetch A2

bus fetch A1 fetch A2

register-file read B1 read B2

ALU A1+B1 A2+B2

cycle 1 2 3 4 5 6 7

• Our list processor has two shared resources: memory and adder

Page 4: 1 EECS 150 - Components and Design Techniques for Digital Systems Lec 22 – Designing an Instruction Set Interpreter 11/18/2004 David Culler Electrical

4

List Example Resource Scheduling• Unoptimized solution: 1. SUMSUM + Memory[NEXT+1]; 2. NEXTMemory[NEXT];

memory fetch x fetch next fetch x fetch nextadder1 next+1 next+1adder2 sum sum

1 2 1 2• Optimized solution: 1. SUMSUM + Memory[NUMA];

2. NEXTMemory[NEXT], NUMAMemory[NEXT]+1;memory fetch x fetch next fetch x fetch nextadder sum numa sum numa

• How about the other combination: add x registermemory fetch x fetch next fetch x fetch nextadder numa sum numa sum

1. XMemory[NUMA], NUMANEXT+1;2. NEXTMemory[NEXT], SUMSUM+X;

• Does this work? If so, a very short clock period. Each cycle could have independent fetch and add. T = max(Tmem, Tadd) instead of Tmem+ Tadd.

Page 5: 1 EECS 150 - Components and Design Techniques for Digital Systems Lec 22 – Designing an Instruction Set Interpreter 11/18/2004 David Culler Electrical

5

Outline

• Review: high level optimization of the list processor

• General notion of instruction execution cycle and the pieces that perform it

• ISA => implementation

• Example

• Generalize and discuss

Page 6: 1 EECS 150 - Components and Design Techniques for Digital Systems Lec 22 – Designing an Instruction Set Interpreter 11/18/2004 David Culler Electrical

6

Approaching an ISA

• Instruction Set Architecture– Defines set of operations, instruction format, hardware

supported data types, named storage, addressing modes, sequencing

• Meaning of each instruction is described by RTL on architected registers and memory

• Given technology constraints assemble adequate datapath

– Architected storage mapped to actual storage– Function units to do all the required operations– Possible additional storage (eg. MAR, MBR, …)– Interconnect to move information among regs and FUs

• Map each instruction to sequence of RTLs• Collate sequences into symbolic controller STD• Lower symbolic STD to control points• Implement controller

Page 7: 1 EECS 150 - Components and Design Techniques for Digital Systems Lec 22 – Designing an Instruction Set Interpreter 11/18/2004 David Culler Electrical

7

Instruction Sequencing

• Example – an instruction to add the contents of two registers (Rx and Ry) and place result in a third register (Rz)

• Step 1: Fetch the ADD instruction from memory into an instruction register

• Step 2: Decode instruction– Instruction in IR has the code of an ADD instruction

– Register indices used to generate output enables for registers Rx and Ry

– Register index used to generate load signal for register Rz

• Step 3: Execute instruction– Enable Rx and Ry output and direct to ALU

– Setup ALU to perform ADD operation

– Direct result to Rz so that it can be loaded into register

Page 8: 1 EECS 150 - Components and Design Techniques for Digital Systems Lec 22 – Designing an Instruction Set Interpreter 11/18/2004 David Culler Electrical

8

Instruction Types

• Data Manipulation– Add, subtract

– Increment, decrement

– Multiply

– Shift, rotate

– Immediate operands

• Data Staging– Load/store data to/from memory

– Register-to-register move

• Control– Conditional/unconditional branches in program flow

– Subroutine call and return

Page 9: 1 EECS 150 - Components and Design Techniques for Digital Systems Lec 22 – Designing an Instruction Set Interpreter 11/18/2004 David Culler Electrical

9

Elements of the Control Unit (aka Instruction Unit)

• Standard FSM Elements– State register

– Next-state logic

– Output logic (datapath/control signaling)

– Moore or synchronous Mealy machine to avoid loops unbroken by FF

• Plus Additional ”Control" Registers (in DP)– Instruction register (IR)

– Program counter (PC)

• Inputs/Outputs– Outputs control elements of data path

– Inputs from data path used to alter flow of program (test if zero)

Page 10: 1 EECS 150 - Components and Design Techniques for Digital Systems Lec 22 – Designing an Instruction Set Interpreter 11/18/2004 David Culler Electrical

10

Reset

InitializeMachine

Register-to-Register

BranchNot Taken

Branch Taken

Instruction Execution

• Control State Diagram (for each diagram)– Reset

– Fetch instruction

– Decode

– Execute

• Instructions partitioned into three classes– Branch

– Load/store

– Register-to-register

• Different sequencethrough diagram for each instruction type

• Controller manipulates the data path to perform the instruction

Init

FetchInstr.

XEQInstr.

Load/StoreBranch

Incr.PC

Page 11: 1 EECS 150 - Components and Design Techniques for Digital Systems Lec 22 – Designing an Instruction Set Interpreter 11/18/2004 David Culler Electrical

11

Cin

AinBin Sum

Cout

FA

HAAin

Bin

Sum

CinCoutHA

Data Path (Hierarchy)

• Arithmetic circuits constructed in hierarchical and iterative fashion– each bit in datapath is

functionally identical

– 4-bit, 8-bit, 16-bit, 32-bit datapaths

Page 12: 1 EECS 150 - Components and Design Techniques for Digital Systems Lec 22 – Designing an Instruction Set Interpreter 11/18/2004 David Culler Electrical

12

16 16

A B

S ZN

Operation

16

Data Path (ALU)

• ALU Block Diagram– Input: data and operation to perform

– Output: result of operation and status information

Page 13: 1 EECS 150 - Components and Design Techniques for Digital Systems Lec 22 – Designing an Instruction Set Interpreter 11/18/2004 David Culler Electrical

13

16

Z

N

OP

16

ACREG

16

16

Data Path (ALU + Registers + interconnect)

• Accumulator– Special register

– One of the inputs to ALU

– Output of ALU stored back in accumulator

• One-address instructions– Operation and address of one operand

– Other operand and destinationis accumulator register

– AC <– AC op Mem[addr]

– ”Single address instructions”(AC implicit operand)

• Multiple registers– Part of instruction used

to choose register operands

Page 14: 1 EECS 150 - Components and Design Techniques for Digital Systems Lec 22 – Designing an Instruction Set Interpreter 11/18/2004 David Culler Electrical

142 bits wide1 bit wide

Data Path (Bit-slice)

• Bit-slice concept: iterate to build n-bit wide datapaths

CO CIALU

AC

R0

frommemory

rs

rt

rd

CO ALU

AC

R0

frommemory

rs

rt

rd

CIALU

AC

R0

frommemory

rs

rt

rd

Page 15: 1 EECS 150 - Components and Design Techniques for Digital Systems Lec 22 – Designing an Instruction Set Interpreter 11/18/2004 David Culler Electrical

15

Instruction Path

• Program Counter– Keeps track of program execution– Address of next instruction to read from memory– May have auto-increment feature or use ALU

• Instruction Register– Current instruction– Includes ALU operation and address of operand– Also holds target of jump instruction– Immediate operands

• Relationship to Data Path– Contents of IR may also be required as input to ALU

» Literals, address offsets– Contents of PC used in branch target calculation

• Relationship to controller– Causes IR <= mem[[PC]]– IR contains OPCODE, which dictate controller outputs

Page 16: 1 EECS 150 - Components and Design Techniques for Digital Systems Lec 22 – Designing an Instruction Set Interpreter 11/18/2004 David Culler Electrical

16

Data Path (Memory Interface)• Memory

– Separate data and instruction memory (Harvard architecture)

» Two address busses, two data busses

– Single combined memory (Princeton architecture)

» Single address bus, single data bus

• Separate memory– ALU output goes to data memory input

– Register input from data memory output

– Data memory address from instruction register

– Instruction register from instruction memory output

– Instruction memory address from program counter

• Single memory– Address from PC or IR

– Memory output to instruction and data registers

– Memory input from ALU output

Page 17: 1 EECS 150 - Components and Design Techniques for Digital Systems Lec 22 – Designing an Instruction Set Interpreter 11/18/2004 David Culler Electrical

17

16

Z

N

OP

16

ACREG16

16loadpath

storepath

Data Memory(16-bit words)

16

OP

16

PCIR16

16

data

addr

rd wr

MARControlFSM

Block Diagram of Processor

• Register Transfer View of Princeton Architecture– Which register outputs are connected to which register inputs

– Arrows represent data-flow, other are control signals from control FSM

– MAR may be a simple multiplexerrather than separate register

– MBR is split in two(REG and IR)

– Load control for each register

Page 18: 1 EECS 150 - Components and Design Techniques for Digital Systems Lec 22 – Designing an Instruction Set Interpreter 11/18/2004 David Culler Electrical

18

ControlFSM

16 16

Z

N

OP

16

ACREG

16loadpath

storepath

Data Memory(16-bit words)

16 16

OP

16

PCIR

16

data

addr

rd wr

Inst Memory(8-bit words)

data

addr

Block Diagram of Processor

• Register transfer view of Harvard architecture– Which register outputs are connected to which register inputs

– Arrows represent data-flow, other are control signals from control FSM

– Two MARs (PC and IR)

– Two MBRs (REG and IR)

– Load control for each register

Page 19: 1 EECS 150 - Components and Design Techniques for Digital Systems Lec 22 – Designing an Instruction Set Interpreter 11/18/2004 David Culler Electrical

19

A simplified Processor Data-path and Memory

• Princeton architecture

• Register file

• Instruction register

• PC incremented through ALU

• Modeled afterMIPS rt000(used in 61Ctextbook byPatterson &Hennessy)– Really a 32 bit

machine

– We’ll do a 16 bitversion

memory has only 255 wordswith a display on the last one

Page 20: 1 EECS 150 - Components and Design Techniques for Digital Systems Lec 22 – Designing an Instruction Set Interpreter 11/18/2004 David Culler Electrical

20

Processor Control

• Synchronous Mealy machine

• Multiple cycles per instruction

Datapath

Control Points

Signals

(conditions)

Page 21: 1 EECS 150 - Components and Design Techniques for Digital Systems Lec 22 – Designing an Instruction Set Interpreter 11/18/2004 David Culler Electrical

21

Announcements

• Reading: 11.3 and 12.1

• HW 9 due Monday 2:10 pm– Check updated handout

• Digital Design in the News:– NY Times, NPR etc. 11-15

RFID on wholesale pill bottles

– J. Stephen Smith – fluidic self-assembly for low-cost RFID tags

– Another side of Moore’s Law

– Power, power, power…

Page 22: 1 EECS 150 - Components and Design Techniques for Digital Systems Lec 22 – Designing an Instruction Set Interpreter 11/18/2004 David Culler Electrical

22

Example: Processor Instructions• Three principal types (16 bits in each instruction)

type op rs rt rd functR(egister) 3 3 3 3 4I(mmediate) 3 3 3 7J(ump) 3 13

• Some of the instructionsadd 0 rs rt rd 0 rd = rs + rtsub 0 rs rt rd 1 rd = rs - rtand 0 rs rt rd 2 rd = rs & rtor 0 rs rt rd 3 rd = rs | rtslt 0 rs rt rd 4 rd = (rs < rt)lw 1 rs rt offset rt = mem[rs + offset] sw 2 rs rt offset mem[rs + offset] = rtbeq 3 rs rt offset pc = pc + offset, if (rs == rt)addi 4 rs rt offset rt = rs + offsetj 5 target address pc = target addresshalt 7 - stop execution until reset

R

I

J

Page 23: 1 EECS 150 - Components and Design Techniques for Digital Systems Lec 22 – Designing an Instruction Set Interpreter 11/18/2004 David Culler Electrical

23

Tracing an Instruction's Execution

• Instruction: r3 = r1 + r2R 0 rs=r1 rt=r2 rd=r3 funct=0

• 1. Instruction fetch– Move instruction address from PC to memory address bus

– Assert memory read

– Move data from memory data bus into IR

– Configure ALU to add 1 to PC

– Configure PC to store new value from ALUout

• 2. Instruction decode– Op-code bits of IR are input to control FSM

– Rest of IR bits encode the operand addresses (rs and rt)

» These go to register file

Page 24: 1 EECS 150 - Components and Design Techniques for Digital Systems Lec 22 – Designing an Instruction Set Interpreter 11/18/2004 David Culler Electrical

24

Tracing an Instruction's Execution (cont’d)• Instruction: r3 = r1 + r2

R 0 rs=r1 rt=r2 rd=r3 funct=0

• 3. Instruction execute– Set up ALU inputs

– Configure ALU to perform ADD operation

– Configure register file to store ALU result (rd)

Page 25: 1 EECS 150 - Components and Design Techniques for Digital Systems Lec 22 – Designing an Instruction Set Interpreter 11/18/2004 David Culler Electrical

25

Tracing an Instruction's Execution (cont’d)• Step 1

Page 26: 1 EECS 150 - Components and Design Techniques for Digital Systems Lec 22 – Designing an Instruction Set Interpreter 11/18/2004 David Culler Electrical

26

Tracing an Instruction's Execution (cont’d)• Step 2

to controller

Page 27: 1 EECS 150 - Components and Design Techniques for Digital Systems Lec 22 – Designing an Instruction Set Interpreter 11/18/2004 David Culler Electrical

27

Tracing an Instruction's Execution (cont’d)• Step 3

Page 28: 1 EECS 150 - Components and Design Techniques for Digital Systems Lec 22 – Designing an Instruction Set Interpreter 11/18/2004 David Culler Electrical

28

Register-Transfer-Level Description

• Control– Transfer data btwn registers by asserting appropriate control signals

• Register transfer notation: work from register to register– Instruction fetch:

mabus PC; – move PC to memory address bus (PCmaEN, ALUmaEN)memory read; – assert memory read signal (mr, RegBmdEN)IR memory; – load IR from memory data bus (IRld)op add – send PC into A input, 1 into B input, add

(srcA, srcB0, scrB1, op)PC ALUout – load result of incrementing in ALU into PC (PCld,

PCsel)– Instruction decode:

IR to controllervalues of A and B read from register file (rs, rt)

– Instruction execution:op add – send regA into A input, regB into B input, add

(srcA, srcB0, scrB1, op)rd ALUout – store result of add into destination register

(regWrite, wrDataSel, wrRegSel)

Page 29: 1 EECS 150 - Components and Design Techniques for Digital Systems Lec 22 – Designing an Instruction Set Interpreter 11/18/2004 David Culler Electrical

29

Register-Transfer-Level Description (cont’d)

• How many states are needed to accomplish these transfers?– Data dependencies (where do values that are needed come from?)

– Resource conflicts (ALU, busses, etc.)

• In our case, it takes three cycles– One for each step

– All operation within a cycle occur between rising edges of the clock

• How do we set all of the control signals to be output by the state machine?– Depends on the type of machine (Mealy, Moore, synchronous Mealy)

Page 30: 1 EECS 150 - Components and Design Techniques for Digital Systems Lec 22 – Designing an Instruction Set Interpreter 11/18/2004 David Culler Electrical

30

Review of FSM Timing

step 1 step 2 step 3

fetch decode execute

IR mem[PC];PC PC + 1;

rd A + BA rsB rt

to configure the data-path to do this here,when do we set the control signals?

Page 31: 1 EECS 150 - Components and Design Techniques for Digital Systems Lec 22 – Designing an Instruction Set Interpreter 11/18/2004 David Culler Electrical

31

instructionexecution

instructiondecode

LWSW ADD J

reset

FSM Controller for CPU (skeletal Moore FSM)

• First pass at deriving the state diagram (Moore machine)– These will be further refined into sub-states

instructionfetch

Page 32: 1 EECS 150 - Components and Design Techniques for Digital Systems Lec 22 – Designing an Instruction Set Interpreter 11/18/2004 David Culler Electrical

32

FSM Controller for CPU (reset and instruction fetch)• Assume Moore machine

– Outputs associated with states rather than arcs

• Reset state and instruction fetch sequence

• On reset (go to Fetch state)– Start fetching instructions– PC will set itself to zero

mabus PC;memory read;IR memory data bus;PC PC + 1;

reset

instructionfetchFetch

Page 33: 1 EECS 150 - Components and Design Techniques for Digital Systems Lec 22 – Designing an Instruction Set Interpreter 11/18/2004 David Culler Electrical

33

FSM Controller for CPU (decode)

• Operation Decode State– Next state branch based on operation code in instruction

– Read two operands out of register file

» What if the instruction doesn’t have two operands?

instructiondecodeDecode

branch based on value ofInst[15:13] and Inst[3:0]

add

Page 34: 1 EECS 150 - Components and Design Techniques for Digital Systems Lec 22 – Designing an Instruction Set Interpreter 11/18/2004 David Culler Electrical

34

FSM Controller for CPU (Instruction Execution)

• For add instruction– Configure ALU and store result in register

rd A + B

– Other instructions may require multiple cycles

instructionexecutionadd

Page 35: 1 EECS 150 - Components and Design Techniques for Digital Systems Lec 22 – Designing an Instruction Set Interpreter 11/18/2004 David Culler Electrical

35

FSM Controller for CPU (Add Instruction)• Putting it all together

and closing the loop– the famous

instructionfetchdecodeexecutecycle

reset

instructionfetchFetch

instructiondecodeDecode

addinstructionexecutionadd

Page 36: 1 EECS 150 - Components and Design Techniques for Digital Systems Lec 22 – Designing an Instruction Set Interpreter 11/18/2004 David Culler Electrical

36

FSM Controller for CPU

• Now we need to repeat this for all the instructions of our processor– Fetch and decode states stay the same

– Different execution states for each instruction

» Some may require multiple states if available register transfer paths require sequencing of steps

Page 37: 1 EECS 150 - Components and Design Techniques for Digital Systems Lec 22 – Designing an Instruction Set Interpreter 11/18/2004 David Culler Electrical

37

Approach an ISA

• Instruction Set Architecture– Defines set of operations, instruction format, hardware

supported data types, named storage, addressing modes, sequencing

• Meaning of each instruction is described by RTL on architected registers and memory

• Given technology constraints assemble adequate datapath

– Architected storage mapped to actual storage– Function units to do all the required operations– Possible additional storage (eg. MAR, MBR, …)– Interconnect to move information among regs and FUs

• Map each instruction to sequence of RTLs• Collate sequences into symbolic controller STD• Lower symbolic STD to control points• Implement controller

Page 38: 1 EECS 150 - Components and Design Techniques for Digital Systems Lec 22 – Designing an Instruction Set Interpreter 11/18/2004 David Culler Electrical

38

Discussion

• How would enhancing the datapath simplify control

– Instruction and data access

– PC arithmetic separate from ALU

– Register file ports

• What determines the cycle time