lectures 11-12 1 lectures 11-12 the goals of this lecture are…mniemier/teaching/2008_b_fall... ·...
TRANSCRIPT
1
Computer Sci. & Engr.University of Notre Dame
1Lectures 11-12
CSE 30321MIPS Single Cycle Dataflow
2
Computer Sci. & Engr.University of Notre Dame
2Lectures 11-12
The goals of this lecture are…• …to show how ISAs map to real HW and affect the
organization of processing logic…
• …and to set up a discussion of pipelining + otherprinciples of modern processing…
3
Computer Sci. & Engr.University of Notre Dame
3Lectures 11-12
The organization of a computer
Control
Datapath
MemoryInput
Output
Von Neumann Model:• Stored-program machine instructions are represented as numbers• Programs can be stored in memory to be read/written just like numbers.
Processor
CompilerToday
we’ll talkabout these
things
4
Computer Sci. & Engr.University of Notre Dame
4Lectures 11-12
Functions of Each Component• Datapath: performs data manipulation operations
– arithmetic logic unit (ALU)
– floating point unit (FPU)
• Control: directs operation of other components– finite state machines
– micro-programming
• Memory: stores instructions and data– random access v.s. sequential access
– volatile v.s. non-volatile
– RAMs (SRAM, DRAM), ROMs (PROM, EEPROM), disk
– tradeoff between speed and cost/bit
• Input/Output and I/O devices: interface to theenvironment– mouse, keyboard, display, device drivers
5
Computer Sci. & Engr.University of Notre Dame
5Lectures 11-12
The Performance Perspective• Performance of a machine determined by
– Instruction count, clock cycles per instruction, clockcycle time
• Processor design (datapath and control) determines:– Clock cycles per instruction
– Clock cycle time
• We will discuss a simplified MIPS implementation
6
Computer Sci. & Engr.University of Notre Dame
6Lectures 11-12
Review of Design Steps• Instruction Set Architecture => RTL
representation
• RTL representation =>– Datapath components
– Datapath interconnects
• Datapath components => Control signals
• Control signals => Control logic
• Writing RTL: How many states (cycles) should aninstruction take?– CPI
– Datapath component sharing
7
Computer Sci. & Engr.University of Notre Dame
7Lectures 11-12
MIPS Instruction Formats
• All MIPS instructions are 32 bits (4 bytes) long.
• R-type:
• I-Type:
• J-type
op (6) rs (5) rt (5) rd (5) shamt (5)
31 26 25 21 20 16 15 11 10 6 5 0
funct (6)
Op (6) rs (5) rt (5) Address/Immediate value (16)
31 26 25 21 20 16 15 0
Op (6) Target address (26)
31 26 25 0
8
Computer Sci. & Engr.University of Notre Dame
8Lectures 11-12
Let’s talk about this generally on theboard first…
• Let’s just look at our instruction formats and “derive”a simple datapath– (we need to make all of these instruction formats
“work”)
9
Computer Sci. & Engr.University of Notre Dame
9Lectures 11-12
The MIPS Subset• To simplify things a bit we’ll just look at a few instructions:
– memory-reference: lw, sw
– arithmetic-logical: add, sub, and, or, slt
– branching: beq, j
• Organizational overview:– fetch an instruction based on the content of PC
– decode the instruction
– fetch operands• (read one or two registers)
– execute• (effective address calculation/arithmetic-logical
operations/comparison)
– store result• (write to memory / write to register / update PC)
At simplest level, thisis how Von Neumann,RISC model works
10
Computer Sci. & Engr.University of Notre Dame
10Lectures 11-12
What we’ll do…• …look at instruction encodings…
• …look at datapath development…
• …discuss how we generate the control signals to makethe datapath elements work…
11
Computer Sci. & Engr.University of Notre Dame
11Lectures 11-12
Implementation Overview
• Abstract / Simplified View:
• 2 types of signals: data and control• Clocking strategy: All storage elements clocked by same
clock edge.
A L U
PC Address
Instruction
InstructionMemory
Ra
Rb
Rw
Data
RegisterFile
DataMemory
Address
Data
Clk
Clk Clk
Clk
simplest view ofVon Neumann, RISC µP
12
Computer Sci. & Engr.University of Notre Dame
12Lectures 11-12
Review of Design Steps• Instruction set Architecture => RTL representation
• RTL representation =>– Datapath components
– Datapath interconnects
• Datapath components => Control signals
• Control signals => Control logic
• Writing RTL: How many states (cycles) should aninstruction take?– CPI
– Datapath component sharing
i.e. PC ! PC + 4(or $4 ! $3 + $2)
need these to do
need these to do
need these to do
13
Computer Sci. & Engr.University of Notre Dame
13Lectures 11-12
• How many cycles should the above take?• You are the architect so you decide!• Less cycles => more to be done in one cycle
What to be Done for EachInstruction?
14
Computer Sci. & Engr.University of Notre Dame
14Lectures 11-12
Single Cycle Implementation
• Each instruction takes one cycle to complete.
• We wait for everything to settle down, and the
right thing to be done
– ALU might not produce “right answer” right away
– (why?)
– we use write signals along with clock to determine
when to write
• Cycle time determined by length of the longest
pathPC instr. fetch &
execute
15
Computer Sci. & Engr.University of Notre Dame
15Lectures 11-12
Instruction Word32
Next AddrLogic
Instruction Fetch Unit• Fetch the instruction: mem[PC] ,
• Update the program counter:– sequential code: PC <- PC+4
– branch and jump: PC <- “something else”
PC
InstructionMemory
Address
Clk
16
Computer Sci. & Engr.University of Notre Dame
16Lectures 11-12
Let’s say we want to fetch……an R-type instruction (arithmetic)
• Instruction format:
• RTL:– Instruction fetch: mem[PC]
– ALU operation: reg[rd] <- reg[rs] op reg[rt]
– Go to next instruction: Pc <- PC+ 4
• Ra, Rb and Rw are from instruction’s rs, rt, rdfields.
• Actual ALU operation and register write should occurafter decoding the instruction.
op (6) rs (5) rt (5) rd (5) shamt (5)
31 26 25 21 20 16 15 11 10 6 5 0
funct (6)
17
Computer Sci. & Engr.University of Notre Dame
17Lectures 11-12
BusA
32 ALU
Datapath for R-Type Instructions
• Register timing:– Register can always be read.– Register write only happens when RegWr is set to high and at the falling
edge of the clock
32 32-bit
RegistersClk
Ra
Rw
Rb
5rs
rd
rt 5
5
RegWr
BusB
32
BusW32
ALUctr
18
Computer Sci. & Engr.University of Notre Dame
18Lectures 11-12
I-Type Arithmetic/Logic Instructions• Instruction format:
• RTL for arithmetic operations: e.g., ADDI– Instruction fetch: mem[PC]
– Add operation: reg[rt] <- reg[rs] + SignExt(imm16)
– Go to next instruction: Pc <- PC+ 4
• Also, immediate instructions
Op (6) rs (5) rt (5) Address/Immediate value (16)
31 26 25 21 20 16 15 0
19
Computer Sci. & Engr.University of Notre Dame
19Lectures 11-12
BusA
32 ALU
Datapath for I-Type A/LInstructions
3232-bit
RegistersClk
Ra
Rw
Rb
5rs
rd
rt 5
5
RegWr
BusB
32
BusW32
MUX
MUX
32
Extender
16
imm16
rt
ALUSrc
RegDst
ALUctr
BusW
32
In MIPS, destinationregisters are in different
places in opcode " thereforewe need a mux
must “zero out” 1st 16 bits…
note that we reuse ALU…
20
Computer Sci. & Engr.University of Notre Dame
20Lectures 11-12
I-Type Load/Store Instructions• Instruction format:
• RTL for load/store operations: e.g., LW– Instruction fetch: mem[PC]
– Compute memory address: Addr <- reg[rs] +SignExt(imm16)
– Load data into register: reg[rt] <- mem[Addr]
– Go to next instruction: Pc <- PC+ 4
• How about store?
Op (6) rs (5) rt (5) Address/Immediate value (16)
31 26 25 21 20 16 15 0
same thing, just skip 3rd step(mem[addr] ! reg[rs])
21
Computer Sci. & Engr.University of Notre Dame
21Lectures 11-12
Datapath for Load/StoreInstructions
BusA
32 ALU3232-bit
RegistersClk
Ra
RwRb
5rs
rd
rt 5
5
RegWr
BusB
32
MUX
MUX
32
Extender
16
imm16
rt
ALUSrcRegDst
ALUctr
Data Memory
Clk
DataIn32
MemWr
WrEn Addr
need a control signal
32 bits of data
address input
22
Computer Sci. & Engr.University of Notre Dame
22Lectures 11-12
I-Type Branch Instructions• Instruction format:
• RTL for branch operations: e.g., BEQ– Instruction fetch: mem[PC]
– Compute conditon: Cond <- reg[rs] - reg[rt]
– Calculate the next instruction’s address:
if (Cond eq 0) then
PC <- PC+ 4 + (SignExd(imm16) x 4)
else ?
Op (6) rs (5) rt (5) Address/Immediate value (16)
31 26 25 21 20 16 15 0
23
Computer Sci. & Engr.University of Notre Dame
23Lectures 11-12
Datapath for Branch Instructions
BusA
32 ALU3232-bit
RegistersClk
Ra
RwRb
5rs
rd
rt 5
5
RegWr
BusB
32
MUX
MUX
32
Extender
16rt
ALUSrcRegDst
ALUctr
Next AddrLogic
PCClk
Zero
imm16
To Instruction Mem
we’ll define this next;(will need PC, zero test
condition from ALU)
24
Computer Sci. & Engr.University of Notre Dame
24Lectures 11-12
Next Address Logic
30ADD
MUX
30
SignExt
16
“0”
PCClk
imm16Branch Zero
CarryIn“1”
30 InstructionMemory
When does the correct new PC become available? Can we do better?
May not want tochange PC if BEQcondition not met
(implicitly says:“this stuff happensanyway so we haveto be sure we don’tchange things we
don’t want to change”)
if branch instructionAND 0, can automaticallygenerate control signal
contains PC + 4(why 30? – subtlety – seeChapter 5 in your text)
25
Computer Sci. & Engr.University of Notre Dame
25Lectures 11-12
J-Type Jump Instructions• Instruction format:
• RTL operations: e.g., BEQ– Instruction fetch: mem[PC]
– Set up PC: PC <- ((PC+ 4)<31:29>CONCAT(target<25:0>) x 4
Op (6) Target address (26)
31 26 25 0
26
Computer Sci. & Engr.University of Notre Dame
26Lectures 11-12
Instruction Fetch Unit
30 ADD
MUX
30
SignExt
16
“0”
PCClk
imm16Branch Zero
CarryIn“1”
30
InstructionMemory
PC<31:28>
Instruction<25:0>
MUX
Jump
Use:New address injump instruction
ORuse 4 MSB
of PC
(why PC<31:28> – subtlety – see
Page 383 in your text)
27
Computer Sci. & Engr.University of Notre Dame
27Lectures 11-12
A Single Cycle Datapath
i
i
PC
Instructionmemory
Readaddress
Instruction
16 32
Add ALUresult
Mux
Registers
Writeregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Shift
left 2
4
Mux
3
RegWrite
MemRead
MemWrite
PCSrc
ALUSrc
MemtoReg
ALUresult
ZeroALU
Datamemory
Address
Writedata
Readdata M
ux
Sign
extend
Add
Add Jump.
ALUctr
28
Computer Sci. & Engr.University of Notre Dame
28Lectures 11-12
Let’s trace a few instructions• For example…
– Add $5, $6, $7
– SW 0($9), $10
– Sub $1, $2, $3
– LW $11, 0($12)
29
Computer Sci. & Engr.University of Notre Dame
29Lectures 11-12
Control Logic
(I.e. now, we need to make the HW dowhat we want it to do - add, subtract,
etc. - when we want it to…)
30
Computer Sci. & Engr.University of Notre Dame
30Lectures 11-12
The HW needed, plus control
PC
Instructionmemory
Readaddress
Instruction[31– 0]
Instruction [20 16]
Instruction [25 21]
Add
Instruction [5 0]
MemtoReg
ALUOp
MemWrite
RegWrite
MemRead
Branch
RegDst
ALUSrc
Instruction [31 26]
4
16 32Instruction [15 0]
0
0Mux
0
1
Control
AddALU
result
Mux
0
1
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Mux
1
ALUresult
Zero
PCSrc
Datamemory
Writedata
Readdata
Mux
1
Instruction [15 11]
ALUcontrol
Shiftleft 2
ALU
Address
When we talk about control,we talk about these blocks
Single cycle MIPS machine
31
Computer Sci. & Engr.University of Notre Dame
31Lectures 11-12
Implementing Control
• Implementation Steps:
– Identify control inputs and control output (control words)
– Make a control signal table for each cycle
– Derive control logic from the control table
• Do we need a FSM here?
!"#$%"&
'($)
*($(
'($)
Control
input
Control
output
Control inputs:
Opcode (5 bits)
Func (6 bits)
Control outputs:
RegDst
MemtoReg
RegWrite
MemRead
MemWrite
ALUSrc
ALUctr
Branch
Jump
32
Computer Sci. & Engr.University of Notre Dame
32Lectures 11-12
Implementing Control• Implementation Steps:
1. Identify control inputs and control outputs
2. Make a control signal table for each cycle
3. Derive control logic from the control table– This logic can take on many forms: combinational logic,
ROMs, microcode, or combinations…
33
Computer Sci. & Engr.University of Notre Dame
33Lectures 11-12
Single Cycle Control Input/Output• Control Inputs:
– Opcode (6 bits)– How about R-type instructions?
• Control Outputs:– RegDst– ALUSrc– MemtoReg– RegWrite– MemRead– MemWrite– Branch– Jump– ALUctr
these are columns
these are rows
Step 2:Make a control signaltable for each cycle
Step 1: Idenitfyinputs & outputs
34
Computer Sci. & Engr.University of Notre Dame
34Lectures 11-12
Control Signal Table
010000SubAddALUOp
10000Branch
01000Mem. Write
00100Mem. Read
00111Reg. Write
XX100Mem-to-Reg
01100ALUSrc
XX011RegDst
000100101011100011000000000000Op (input)
xxxxxxxxxxxxxxxxxx100010100000Func (input)
BEQSWLWSubAdd
R-type
(outputs)
(inputs)
35
Computer Sci. & Engr.University of Notre Dame
35Lectures 11-12
The HW needed, plus control
PC
Instructionmemory
Readaddress
Instruction[31– 0]
Instruction [20 16]
Instruction [25 21]
Add
Instruction [5 0]
MemtoReg
ALUOp
MemWrite
RegWrite
MemRead
Branch
RegDst
ALUSrc
Instruction [31 26]
4
16 32Instruction [15 0]
0
0Mux
0
1
Control
AddALU
result
Mux
0
1
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Mux
1
ALUresult
Zero
PCSrc
Datamemory
Writedata
Readdata
Mux
1
Instruction [15 11]
ALUcontrol
Shiftleft 2
ALU
Address
For MIPS, we have tobuild a Main Control Blockand an ALU Control Block
Single cycle MIPS machine
36
Computer Sci. & Engr.University of Notre Dame
36Lectures 11-12
Main control, ALU control
• Use OP field to generate ALUOp (encoding)– Control signal fed to ALU control block
• Use Func field and ALUOp to generate ALUctr(decoding)– Specifically sets 3 ALU control signals
• B-Invert, Carry-in, operation
MainControl
ALUControl
ALU
ALUOp
Func
OPALUctr
6
6
2
3
(opcode)
Our 2 blocksof control logic
Other cnt.signals
37
Computer Sci. & Engr.University of Notre Dame
37Lectures 11-12
Main control, ALU control
MainControl
ALUControl
ALU
ALUOp
Func
OPALUctr
6
6
2
3
Or in other words…00 = ALU performs add01 = ALU performs sub10 = ALU does what function code says
01000010ALUOp<1:0>
subtractaddadd“R-type”ALU Operation
beqswlwR-type
Outputs of main control,become inputs to ALU control
We have 8 bitsof input to our ALU controlblock; we need 3 bits of
output…
38
Computer Sci. & Engr.University of Notre Dame
38Lectures 11-12
Generating ALUctr• We want these outputs:
ALUctr<2> = B-negate (C-in & B-invert)ALUctr<1> = Select ALU OutputALUctr<0> = Select ALU Output
110
sub
111010001000ALUctr<2:0>
sltaddorandALU Operation
mux
and - 00
or - 01
adder - 10
less - 11
Invert B and C-in must be a 1 for
subtract
36 (and) = 1 0 0 1 0 037 (or) = 1 0 0 1 0 132 (add) = 1 0 0 0 0 034 (sub) = 1 0 0 0 1 042 (slt) = 1 0 1 0 1 0
can ignore these(they’re the same for all…)
func<5:0>
• We have these inputs…ALUOp Funct field ALUctr
ALUOp1 ALUOp0 F5 F4 F3 F2 F1 F00 0 X X X X X X0 1 X X X X X X1 X X X 0 0 0 01 X X X 0 0 1 01 X X X 0 1 0 01 X X X 0 1 0 11 X X X 1 0 1 0
Inputs
lw/swbeq
R-type
010 (add)110 (sub)
010 (add)
110 (sub)
000 (and)001 (or)
111 (slt)
Outputs
39
Computer Sci. & Engr.University of Notre Dame
39Lectures 11-12
The Logic
Ex: ALUctr<2>
(SUB/BEQ)
orand
or
andor
(func<5:0>)
F0
F1
F2
F3
(ALUOp)ALUOp0
ALUctr<2>
ALUctr<1>
ALUctr<0>
ALUctr
ALUOp1
Could generategates by hand,
often done w/SW.
ALUOp Funct field ALUctrALUOp1 ALUOp0 F5 F4 F3 F2 F1 F0
0 0 X X X X X X0 1 X X X X X X1 X X X 0 0 0 01 X X X 0 0 1 01 X X X 0 1 0 01 X X X 0 1 0 11 X X X 1 0 1 0
Inputs
lw/swbeq
R-type
010 (add)110 (sub)
010 (add)
110 (sub)
000 (and)001 (or)
111 (slt)
OutputsThis table is used togenerate the actualBoolean logic gates
that produce ALUctr.
1/0X/1
0/X
0/X
1/X
0/X 0/X 0/0
1/0 1/1
1/1
0/0
110/110
40
Computer Sci. & Engr.University of Notre Dame
40Lectures 11-12
Recall…
PC
Instructionmemory
Readaddress
Instruction[31– 0]
Instruction [20 16]
Instruction [25 21]
Add
Instruction [5 0]
MemtoReg
ALUOp
MemWrite
RegWrite
MemRead
Branch
RegDst
ALUSrc
Instruction [31 26]
4
16 32Instruction [15 0]
0
0Mux
0
1
Control
AddALU
result
Mux
0
1
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Mux
1
ALUresult
Zero
PCSrc
Datamemory
Writedata
Readdata
Mux
1
Instruction [15 11]
ALUcontrol
Shiftleft 2
ALU
Address
Recall, for MIPS, we have tobuild a Main Control Blockand an ALU Control Block
Single cycle MIPS machine
41
Computer Sci. & Engr.University of Notre Dame
41Lectures 11-12
Well, here’s what we did…
PC
Instructionmemory
Readaddress
Instruction[31– 0]
Instruction [20 16]
Instruction [25 21]
Add
Instruction [5 0]
MemtoReg
ALUOp
MemWrite
RegWrite
MemRead
Branch
RegDst
ALUSrc
Instruction [31 26]
4
16 32Instruction [15 0]
0
0Mux
0
1
Control
AddALU
result
Mux
0
1
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Mux
1
ALUresult
Zero
PCSrc
Datamemory
Writedata
Readdata
Mux
1
Instruction [15 11]
ALUcontrol
Shiftleft 2
ALU
Address
Single cycle MIPS machine
orand
or
andor
(func<5:0>)
F0
F1
F2
F3
(ALUOp)ALUOp0
ALUctr<2>
ALUctr<1>
ALUctr<0>
ALUctr
ALUOp1
We came up withthe information togenerate this logic
which would fit herein the datapath.
42
Computer Sci. & Engr.University of Notre Dame
42Lectures 11-12
(and again, remember, realistically logic,ISAs, insturction types, etc. would be
much more complex)
(we’d also have to route all signalstoo…which may affect how we’d like to
organzie processing logic)
43
Computer Sci. & Engr.University of Notre Dame
43Lectures 11-12
Single cycle versus multi-cycle
44
Computer Sci. & Engr.University of Notre Dame
44Lectures 11-12
Single-Cycle Implementation• Single-cycle, fixed-length clock:
– CPI = 1
– Clock cycle = propagation delay of the longest datapathoperations among all instruction types
– Easy to implement
• Single-cycle, variable-length clock:– CPI = 1
– Clock cycle = ! (%(type-i instructions) * propagationdelay of the type “i” instruction datapath operations)
– Better than the previous, but impractical to implement
• Disadvantages:– What if we have floating-point operations?
– How about component usage?
45
Computer Sci. & Engr.University of Notre Dame
45Lectures 11-12
Multiple Cycle Alternative• Break an instruction into smaller steps
• Execute each step in one cycle.
• Execution sequence:– Balance amount of work to be done
– Restrict each cycle to use only one major functionalunit
– At the end of a cycle• Store values for use in later cycles, why?
• Introduce additional “internal” registers
• The advantages:– Cycle time much shorter
– Diff. inst. take different # of cycles to complete
– Functional unit used more than once per instruction