single cycle datapath - georgia institute of...
TRANSCRIPT
1
Single Cycle Datapath
Lecture notes from MKP, H. H. Lee and S. Yalamanchili
(2)
Reading
• Section 4.1-4.4
• Appendices B.3, B.7, B.8, B.11, D.2
• Note: Appendices A-E in the hardcopy text correspond to chapters 7-11 in the online text.
• Practice Problems: 1, 4, 6, 9
2
(3)
Introduction
• We will examine two MIPS implementationsv A simplified version à this modulev A more realistic pipelined version
• Simple subset, shows most aspectsv Memory reference: lw, swv Arithmetic/logical: add, sub, and, or, sltv Control transfer: beq, j
(4)
Instruction Execution
• PC ® instruction memory, fetch instruction• Register numbers ® register file, read registers
• Depending on instruction class1. Use ALU to calculate
o Arithmetic resulto Memory address for load/storeo Branch target address
2. Access data memory for load/store3. PC ¬ An address or PC + 4
8d0b0000014b5020210800042129ffff1520fffc000a082a…..…..
An Encoded Program
Address
3
(5)
Basic Ingredients
• Include the functional units we need for each instruction – combinational and sequential
PC
Instruction memory
Instruction address
Instruction
a. Instruction memory b. Program counter
Add Sum
c. Adder
16 32Sign
extend
b. Sign-extension unit
MemRead
MemWrite
Data memory
Write data
Read data
a. Data memory unit
Address
ALU control
RegWrite
RegistersWriteregister
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Writedata
ALUresult
ALU
Data
Data
Registernumbers
a. Registers b. ALU
Zero5
5
5 3
(6)
Sequential Elements (4.2, B.7, B.11)
• Register: stores data in a circuitv Uses a clock signal to determine when to update the
stored valuev Edge-triggered: update when Clk changes from 0 to 1
D
Clk
Q
Clk
D
Q
falling edge rising edge
_Q
Q
_Q
D latch
D
C
D latch
DD
C
Cc
4
(7)
Sequential Elements
• Register with write controlv Only updates on clock edge when write control input is 1v Used when stored value is required later
D
Clk
QWrite
Write
D
Q
Clk
_Q
Q
_Q
D latch
D
C
D latch
DD
C
C
cycle time
_Q
Q
_Q
D latch
D
C
D latch
DD
C
C
_Q
Q
_Q
D latch
D
C
D latch
DD
C
Cc
(8)
Clocking Methodology
• Combinational logic transforms data during clock cyclesv Between clock edgesv Input from state elements, output to state elementv Longest delay determines clock period
• Synchronous vs. Asynchronous operation
Recall: Critical Path Delay
5
(9)
• Built using D flip-flops (remember ECE 2020!)
Register File (B.8)
M u x
Register 0Register 1
Register n – 1Register n
M u x
Read data 1
Read data 2
Read register number 1
Read register number 2
Read register number 1 Read
data 1
Read data 2
Read register number 2
Register fileWrite register
Write data Write
(10)
Register File
• Note: we still use the real clock to determine when to write
n-to-1 decoder
Register 0
Register 1
Register n – 1C
C
D
DRegister n
C
C
D
D
Register number
Write
Register data
01
n – 1n
Read register number 1 Read
data 1
Read data 2
Read register number 2
Register fileWrite register
Write data Write
6
(11)
Building a Datapath (4.3)
• Datapathv Elements that process data and addresses
in the CPUo Registers, ALUs, mux’s, memories, …
• We will build a MIPS datapath incrementallyv Refining the overview design
(12)
High Level Description
• Single instruction single data stream model of execution v Serial execution model
• Commonly known as the von Neumann execution modelv Stored program modelv Instructions and data share memory
FetchInstructions
Execute Instructions
Memory Operations
Control
7
(13)
Instruction Fetch
Increment by 4 for next instruction32-bit
register
clk
cycle timeStart instruction fetch Complete instruction fetch
clk
(14)
R-Format Instructions
• Read two register operands• Perform arithmetic/logical operation• Write register result
op rs rt rd shamt funct
8
(15)
Executing R-Format Instructions
ALU control
RegWrite
Writeregister
Readdata 1
Readdata 2
Readregister 1Readregister 2
Writedata
ALUresult
ALUZero
5
5
53
op rs rt rd shamt funct
(16)
Load/Store Instructions• Read register operands• Calculate address using 16-bit offset
v Use ALU, but sign-extend offset• Load: Read memory and update register• Store: Write register value to memory
op rs rt 16-bit constant
9
(17)
Executing I-Format Instructions
16 32S ignextend
M e m R e a d
M e m W r it e
D a tam e m o r y
W r i ted a ta
R e a dd a ta
A d d r e s s
RegWrite
Readregister 1Readregister 2
Writeregister
op rs rt 16-bit constant
(18)
Branch Instructions
• Read register operands• Compare operands
v Use ALU, subtract and check Zero output
• Calculate target addressv Sign-extend displacementv Shift left 2 places (word displacement)v Add to PC + 4
o Already calculated by instruction fetch
op rs rt 16-bit constant
10
(19)
Branch Instructions
Justre-routes
wires
Sign-bit wire replicated
op rs rt 16-bit constant
(20)
Updating the Program Counter
PC
Instruction
memory
Readaddress
Instruction[31–0]
Instruction [20–16]
Instruction [25–21]
Add
4
16 32Instruction [15–0] Sign
extend
1
Mux
0
Instruction [15–11
Shift
Branch
AddALU
resultComputation of the branch
address
loop: beq $t0, $0, exit
addi $t0, $t0, -1
lw $a0, arg1($t1)
lw $a1, arg2($t2)
jal func
add $t3, $t3, $v0
addi $t1, $t1, 4
addi $t2, $t2, 4
j loop
11
(21)
Composing the Elements• First-cut data path does an instruction in one
clock cyclev Each datapath element can only do one function at a
timev Hence, we need separate instruction and data
memories
• Use multiplexers where alternate data sources are used for different instructions
014b5020210800042129ffff1520fffc000a082a…..…..
An Encoded Program
AddressPC
(22)
Full Single Cycle Datapath
Destination register is “instruction-
specific”lw$t0, 0($t4) vs.
add $t0m $t1, $t2
12
(23)
ALU Control (4.4, D.2)
• ALU used forv Load/Store: Function = addv Branch: Function = subtractv R-type: Function depends on func field
ALU control Function000 AND001 OR010 add110 subtract111 set-on-less-than
(24)
ALU Control
• Assume 2-bit ALUOp derived from opcodev Combinational logic derives ALU control
opcode ALUOp Operation funct ALU function ALU controllw 00 load word XXXXXX add 010sw 00 store word XXXXXX add 010beq 01 branch equal XXXXXX subtract 110R-type 10 add 100000 add 010
subtract 100010 subtract 110AND 100100 AND 000OR 100101 OR 001set-on-less-than 101010 set-on-less-than 111
• How do we turn this description into gates?
don’t care
13
(25)
ALU Controller
ALUOp Funct field ALUControlALUOp1 ALUOp0 F5 F4 F3 F2 F1 F0
0 0 X X X X X X 010X 1 X X X X X X 1101 X X X 0 0 0 0 0101 X X X 0 0 1 0 1101 X X X 0 1 0 0 0001 X X X 0 1 0 1 0011 X X X 1 0 1 0 111
inst[5:0]Generated fromDecoding inst[31:26]
A LU co ntro l
A L Ure su lt
A L U
Ze ro
3
addsubaddsubandorslt
lw/swbeq
arith
ALU control
ALUOp
funct =inst[5:0]
(26)
ALU Control
• Simple combinational logic (truth tables)
Operation2
Operation1
Operation0
Operation
ALUOp1
F3
F2
F1
F0
F (5– 0)
ALUOp0
ALUOp
ALU control block
14
(27)
The Main Control Unit
• Control signals derived from instruction
0 rs rt rd shamt funct31:26 5:025:21 20:16 15:11 10:6
35 or 43 rs rt address31:26 25:21 20:16 15:0
4 rs rt address31:26 25:21 20:16 15:0
R-type
Load/Store
Branch
opcode always read
read, except for load
write for R-type
and load
sign-extend and add
(28)
Datapath With Control
Use rt not rdInstruction RegDst ALUSrc
Memto-Reg
Reg Write
Mem Read
Mem Write Branch ALUOp1 ALUp0
R-format 1 0 0 1 0 0 0 1 0lw 0 1 1 1 1 0 0 0 0sw X 1 X 0 0 1 0 0 0beq X 0 X 0 0 0 1 0 1
15
(29)
Commodity ProcessorsARM 7
Single Cycle Datapath
(30)
Control Unit Signals
R-format Iw sw beq
Op0Op1Op2Op3Op4Op5
Inputs
Outputs
RegDst
ALUSrc
MemtoReg
RegWrite
MemRead
MemWrite
Branch
ALUOp1
ALUOpO
To harness the datapath
Inst[31:26]Instruction RegDst ALUSrc
Memto-
Reg
Reg
Write
Mem
Read
Mem
Write Branch ALUOp1 ALUp0
R-format 1 0 0 1 0 0 0 1 0
lw 0 1 1 1 1 0 0 0 0
sw X 1 X 0 0 1 0 0 0
beq X 0 X 0 0 0 1 0 1
Adding a new instruction?
Programmable logic array (PLA) implementation (B.3)
16
(31)
Controller ImplementationLIBRARY IEEE;USE IEEE.STD_LOGIC_1164.ALL;USE IEEE.STD_LOGIC_ARITH.ALL;USE IEEE.STD_LOGIC_SIGNED.ALL;
ENTITY control ISPORT(
SIGNAL Opcode : IN STD_LOGIC_VECTOR( 5 DOWNTO 0 );SIGNAL RegDst : OUT STD_LOGIC;SIGNAL ALUSrc : OUT STD_LOGIC;SIGNAL MemtoReg : OUT STD_LOGIC;SIGNAL RegWrite : OUT STD_LOGIC;SIGNAL MemRead : OUT STD_LOGIC;SIGNAL MemWrite : OUT STD_LOGIC;SIGNAL Branch : OUT STD_LOGIC;SIGNAL ALUop : OUT STD_LOGIC_VECTOR( 1 DOWNTO 0 );SIGNAL clock, reset : IN STD_LOGIC );
END control;
(32)
Controller Implementation (cont.)ARCHITECTURE behavior OF control IS
SIGNAL R_format, Lw, Sw, Beq : STD_LOGIC;
BEGIN -- Code to generate control signals using
opcode bitsR_format <= '1' WHEN Opcode = "000000" ELSE '0';Lw <= '1' WHEN Opcode = "100011" ELSE '0';Sw <= '1' WHEN Opcode = "101011" ELSE '0';Beq <= '1' WHEN Opcode = "000100" ELSE '0';RegDst <= R_format;ALUSrc <= Lw OR Sw;MemtoReg <= Lw;RegWrite <= R_format OR Lw;MemRead <= Lw;MemWrite <= Sw; Branch <= Beq;ALUOp( 1 ) <= R_format;ALUOp( 0 ) <= Beq;
END behavior;
Implementation of each table
column
Instruction RegDst ALUSrc
Memto-
Reg
Reg
Write
Mem
Read
Mem
Write Branch ALUOp1 ALUp0
R-format 1 0 0 1 0 0 0 1 0
lw 0 1 1 1 1 0 0 0 0
sw X 1 X 0 0 1 0 0 0
beq X 0 X 0 0 0 1 0 1
17
(33)
R-Type Instruction
(34)
Load Instruction
18
(35)
Branch-on-Equal Instruction
(36)
Implementing Jumps
• Jump uses word address• Update PC with concatenation of
v Top 4 bits of old PCv 26-bit jump addressv 00
• Need an extra control signal decoded from opcode
2 address31:26 25:
0
Jump
19
(37)
Datapath With Jumps Added
clk
(38)
Example: ARM Cortex M3
ARM Processor
Blue Tooth ICwww.ifixit.com
zembedded.com
Fitbit Flex
20
(39)
• All of the logic is combinational
• We wait for everything to settle down, and the right thing to be donev ALU might not produce �right answer� right away
v we use write signals along with clock to determine when to write
• Cycle time determined by length of the longest path
Our Simple Control Structure
We are ignoring some details like setup and hold timesClock cycle
State element
1Combinational logic
State element
2
(40)
Performance Issues
• Longest delay determines clock periodv Critical path: load instructionv Instruction memory ® register file ® ALU ® data
memory ® register file
• Not feasible to vary period for different instructions
• Violates design principlev Making the common case fast
• We will improve performance by pipelining
21
(41)
Summary
• Single cycle datapathv All instructions execute in one clock cyclev Not all instructions take the same amount of timev Software sees a simple interfacev Can memory operations really take one cycle?
• Improve performance via pipelining, multi-cycle operation, parallelism or customization
• We will address these next
(42)
Study Guide
• Given an instruction, be able to specify the values of all control signals required to execute that instruction
• Add new instructions: modify the datapath and control to affect its executionv Modify the dataflow in support, e.g., jal, jr, shift, etc. v Modify the VHDL controller
• Given delays of various components, determine the cycle time of the datapath
• Distinguish between those parts of the datapath that are unique to each instruction and those components that are shared across all instructions
22
(43)
Study Guide (cont.)
• Given a set of control signal values determine what operation the datapath performs
• Know the bit width of each signal in the datapath
• Add support for procedure calls – jal instruction
(44)
Glossary
• Asynchronous• Clock• Controller • Critical path• Cycle Time• Dataflow• Flip Flop
• Program Counter• Register File• Sign Extension • Synchronous