lecture 9. mips processor design – single-cycle processor design
DESCRIPTION
2010 R&E Computer System Education & Research. Lecture 9. MIPS Processor Design – Single-Cycle Processor Design. Prof. Taeweon Suh Computer Science Education Korea University. Single-Cycle MIPS Processor. Again, microarchitecture (CPU implementation) is divided into 2 interacting parts - PowerPoint PPT PresentationTRANSCRIPT
Lecture 9. MIPS Processor Design – Single-Cycle Processor Design
Prof. Taeweon SuhComputer Science Education
Korea University
2010 R&E Computer System Education & Research
Korea Univ
Single-Cycle MIPS Processor
• Again, microarchitecture (CPU implementation) is divided into 2 interacting parts Datapath Control
2
Korea Univ
Single-Cycle Processor Design
• Let’s start with a memory access instruction - lw Example: lw $2, 80($0)
3
CLK
A RD
InstructionMemory
A1
A3WD3
RD2
RD1WE3
A2
CLK
RegisterFile
A RDData
MemoryWD
WEPCPC' Instr
CLK
op rs rt imm6 bits 5 bits 5 bits 16 bits
I-Type• STEP 1: Instruction
Fetch
Korea Univ
Single-Cycle Processor Design
• STEP 2: Decoding Read source operands from register file
4
Instr
CLK
A RD
InstructionMemory
A1
A3WD3
RD2
RD1WE3
A2
CLK
RegisterFile
A RDData
MemoryWD
WEPCPC'
25:21
CLK
Example: lw $2, 80($0) op rs rt imm6 bits 5 bits 5 bits 16 bits
I-Type
Korea Univ
Single-Cycle Processor Design
• STEP 2: Decoding Sign-extend the immediate
5
SignImm
CLK
A RD
InstructionMemory
A1
A3WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
A RDData
MemoryWD
WEPCPC' Instr 25:21
15:0
CLK
Example: lw $2, 80($0)
module signext(input [15:0] a, output [31:0] y); assign y = {{16{a[15]}}, a};endmodule
op rs rt imm6 bits 5 bits 5 bits 16 bits
I-Type
Korea Univ
Single-Cycle Processor Design
6
SignImm
CLK
A RD
InstructionMemory
A1
A3WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
A RDData
MemoryWD
WEPCPC' Instr 25:21
15:0
SrcB
ALUResult
SrcA Zero
CLK
ALUControl2:0
ALU
010
Example: lw $2, 80($0) op rs rt imm6 bits 5 bits 5 bits 16 bits
I-Type
• STEP 3: Execution Compute the memory address
Korea Univ
Single-Cycle Processor Design
7
A1
A3WD3
RD2
RD1WE3
A2
SignImm
CLK
A RD
InstructionMemory
CLK
Sign Extend
RegisterFile
A RDData
MemoryWD
WEPCPC' Instr 25:21
15:0
SrcB20:16
ALUResult ReadData
SrcA
RegWrite
Zero
CLK
ALUControl2:0
ALU
0101
Example: lw $2, 80($0) op rs rt imm6 bits 5 bits 5 bits 16 bits
I-Type
• STEP 4: Execution Read data from memory and write it back to
register file
Korea Univ
Single-Cycle Processor Design• We are done with lw• CPU starts fetching the next instruction from PC+4
8
SignImm
CLK
A RD
InstructionMemory
+
4
A1
A3WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
A RDData
MemoryWD
WEPCPC' Instr 25:21
15:0
SrcB20:16
ALUResult ReadData
SrcA
PCPlus4
Result
RegWrite
Zero
CLK
ALUControl2:0
ALU
0101
module adder(input [31:0] a, b, output [31:0] y);
assign y = a + b;endmodule
adder pcadd1(pc, 32'b100, pcplus4);
Korea Univ
Single-Cycle Processor Design• Let’s consider another memory access instruction - sw
sw instruction needs to write data to data memory
9
SignImm
CLK
A RD
InstructionMemory
+
4
A1
A3WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
A RDData
MemoryWD
WEPCPC' Instr 25:21
20:16
15:0
SrcB20:16
ALUResult ReadData
WriteData
SrcA
PCPlus4
Result
MemWriteRegWrite
Zero
CLK
ALUControl2:0
ALU
10100
Example: sw $2, 84($0) op rs rt imm6 bits 5 bits 5 bits 16 bits
I-Type
Korea Univ
Single-Cycle Processor Design
• Let’s consider arithmetic and logical instructions - add, sub, and, or Write ALUResult to register file Note that R-type instructions write to rd field of instruction (instead of rt)
10
SignImm
CLK
A RD
InstructionMemory
+
4
A1
A3WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
01
01
A RDData
MemoryWD
WE01
PCPC' Instr 25:21
20:16
15:0
SrcB
20:16
15:11
ALUResult ReadData
WriteData
SrcA
PCPlus4WriteReg4:0
Result
RegDst MemWrite MemtoRegALUSrcRegWrite
Zero
CLK
ALUControl2:0
ALU
0varies1 001
op rs rt rd shamt funct6 bits 5 bits 5 bits 5 bits 5 bits 6 bits
R-Type
Korea Univ
Single-Cycle Processor Design• Let’s consider a branch instruction - beq
Determine whether register values are equal Calculate branch target address (BTA) from sign-extended immediate and PC+4
11
SignImm
CLK
A RD
InstructionMemory
+
4
A1
A3WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
01
01
A RDData
MemoryWD
WE01
PC01
PC' Instr 25:21
20:16
15:0
SrcB
20:16
15:11
<<2
+
ALUResult ReadData
WriteData
SrcA
PCPlus4
PCBranch
WriteReg4:0
Result
RegDst Branch MemWrite MemtoRegALUSrcRegWrite
Zero
PCSrc
CLK
ALUControl2:0
ALU
01100 x0x 1
Example: beq $4,$0, around op rs rt imm6 bits 5 bits 5 bits 16 bits
I-Type
Korea Univ
Single-Cycle Datapath Example
12
SignImm
CLK
A RD
InstructionMemory
+
4
A1
A3WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
01
01
A RDData
MemoryWD
WE01
PC01
PC' Instr 25:21
20:16
15:0
5:0
SrcB
20:16
15:11
<<2
+
ALUResult ReadData
WriteData
SrcA
PCPlus4
PCBranch
WriteReg4:0
Result
31:26
RegDst
BranchMemWriteMemtoReg
ALUSrc
RegWrite
OpFunct
ControlUnit
Zero
PCSrc
CLK
ALUControl2:0
ALU
0010
01
0
0
1
0
• We are done with the implementation of basic instructions• Let’s see how or instruction works out in the implementation
op rs rt rd shamt funct6 bits 5 bits 5 bits 5 bits 5 bits 6 bits
R-Type
Korea Univ
Single-Cycle Processor - Control
13
SignImm
CLK
A RD
InstructionMemory
+
4
A1
A3WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
01
01
A RDData
MemoryWD
WE01
PC01
PC' Instr 25:21
20:16
15:0
5:0
SrcB
20:16
15:11
<<2
+
ALUResult ReadData
WriteData
SrcA
PCPlus4
PCBranch
WriteReg4:0
Result
31:26
RegDst
BranchMemWriteMemtoReg
ALUSrc
RegWrite
OpFunct
ControlUnit
Zero
PCSrc
CLK
ALUControl2:0
ALU
• As mentioned, CPU is designed with datapath and control• Now, let’s delve into the control part design
Korea Univ
Control Unit
14
RegDst
BranchMemWriteMemtoReg
ALUSrcOpcode5:0
ControlUnit
ALUControl2:0Funct5:0
MainDecoder
ALUOp1:0
ALUDecoder
RegWriteOpcode and funct fields come from the fetched instruction
Korea Univ
ALU Implementation and Control
15
ALU
N N
N3
A B
Y
F
F2:0 Function
000 A & B
001 A | B
010 A + B
011 not used
100 A & ~B
101 A | ~B
110 A - B
111 SLTN = 32 in 32-bit processor
+
2 01
A B
Cout
Y
3
01
F2
F1:0
[N-1] S
NN
N
N
N NNN
N
2Zero
Extend
slt: set less than
Example: slt $t0, $t1, $t2// $t0 = 1 if $t1 < $t2
adder
Korea Univ
Control Unit: ALU Control
16
ALUOp1:0 Meaning
00 Add
01 Subtract
10 Look at Funct
11 Not Used
ALUOp1:0 Funct ALUControl2:0
00 X 010 (Add)
X1 X 110 (Subtract)
1X 100000 (add) 010 (Add)
1X 100010 (sub) 110 (Subtract)
1X 100100 (and) 000 (And)
1X 100101 (or) 001 (Or)
1X 101010 (slt) 111 (SLT)
RegDst
BranchMemWriteMemtoReg
ALUSrcOpcode5:0
ControlUnit
ALUControl2:0Funct5:0
MainDecoder
ALUOp1:0
ALUDecoder
RegWrite
• Implementation is completely dependent on hardware designers• But, the designers should make sure the implementation is reasonable enough
• Memory access instructions (lw, sw) need to use ALU to calculate memory target address (addition)
• Branch instructions (beq, bne) need to use ALU for the equality check (subtraction)
Korea Univ
Control Unit: Main Decoder
17
Instruction Op5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp1:0
R-type 000000
lw 100011
sw 101011
beq 000100
RegDst
BranchMemWriteMemtoReg
ALUSrcOpcode5:0
ControlUnit
ALUControl2:0Funct5:0
MainDecoder
ALUOp1:0
ALUDecoder
RegWrite
ALUOp1:0 Meaning
00 Add
01 Subtract
10 Look at Funct field
11 Not Used
1 1 0 0 0 10
100
0 1 0 0 1 00X 1 0 1 X 00X 0 1 0 X 01
0
Korea Univ
How about Other Instructions?
18
SignImm
CLK
A RD
InstructionMemory
+
4
A1
A3WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
01
01
A RDData
MemoryWD
WE01
PC01
PC' Instr 25:21
20:16
15:0
5:0
SrcB
20:16
15:11
<<2
+
ALUResult ReadData
WriteData
SrcA
PCPlus4
PCBranch
WriteReg4:0
Result
31:26
RegDst
BranchMemWriteMemtoReg
ALUSrc
RegWrite
OpFunct
ControlUnit
Zero
PCSrc
CLK
ALUControl2:0
ALU
Example: addi $t0, $t1, -14
• Hmmm.. Now, we are done with the control part design• Let’s examine if the design is able to execute other instructions
addi
Korea Univ
Control Unit: Main Decoder
19
Instruction Op5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp1:0
R-type 000000 1 1 0 0 0 0 10lw 100011 1 0 1 0 0 1 00sw 101011 0 X 1 0 1 X 00beq 000100 0 X 0 1 0 X 01addi 001000 1 0 1 0 0 0 00
Korea Univ
How about Other Instructions?
20
SignImm
CLK
A RD
InstructionMemory
+
4
A1
A3WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
01
01
A RDData
MemoryWD
WE01
PC01
PC' Instr 25:21
20:16
15:0
5:0
SrcB
20:16
15:11
<<2
+
ALUResult ReadData
WriteData
SrcA
PCPlus4
PCBranch
WriteReg4:0
Result
31:26
RegDst
BranchMemWriteMemtoReg
ALUSrc
RegWrite
OpFunct
ControlUnit
Zero
PCSrc
CLK
ALUControl2:0
ALU
• Ok. So far, so good…• How about jump instructions?
jop addr
6 bits 26 bits
J-Type
Korea Univ
How about Other Instructions?
21
SignImm
CLK
A RD
InstructionMemory
+
4
A1
A3WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
01
01
A RDData
MemoryWD
WE01
PC01 PC' Instr 25:21
20:16
15:0
5:0
SrcB
20:16
15:11
<<2
+
ALUResult ReadData
WriteData
SrcA
PCPlus4
PCBranch
WriteReg4:0
Result
31:26
RegDst
BranchMemWriteMemtoReg
ALUSrc
RegWrite
OpFunct
ControlUnit
Zero
PCSrc
CLK
ALUControl2:0
ALU
01
25:0 <<2
27:0 31:28
PCJump
Jump
• We need to add some hardware to support the j instruction A logic to compute the target address Mux and control signal op addr
6 bits 26 bits
J-Type
Korea Univ
Control Unit: Main Decoder
22
Instruction Op5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp1:0 Jump
R-type 000000 1 1 0 0 0 0 10 0
lw 100011 1 0 1 0 0 1 00 0
sw 101011 0 X 1 0 1 X 00 0
beq 000100 0 X 0 1 0 X 01 0
addi 001000 1 0 1 0 0 0 00 0
j 000100 0 X X X 0 X XX 1
• There is one more output in the main decoder to support the jump instructions• Jump
Korea Univ
Verilog Code - Main Decoder and ALU Control
23
module maindec(input [5:0] op, output memtoreg, memwrite, output branch, alusrc, output regdst, regwrite, output jump, output [1:0] aluop);
reg [8:0] controls;
assign {regwrite, regdst, alusrc, branch, memwrite, memtoreg, jump, aluop} = controls;
always @(*) case(op) 6'b000000: controls <= 9'b110000010; // R-type 6'b100011: controls <= 9'b101001000; // lw 6'b101011: controls <= 9'b001010000; // sw 6'b000100: controls <= 9'b000100001; // beq 6'b001000: controls <= 9'b101000000; // addi 6'b000010: controls <= 9'b000000100; // j default: controls <= 9'bxxxxxxxxx; // ??? endcaseendmodule
module aludec(input [5:0] funct, input [1:0] aluop, output reg [2:0] alucontrol);
always @(*) case(aluop) 2'b00: alucontrol <= 3'b010; // add 2'b01: alucontrol <= 3'b110; // sub default: case(funct) // RTYPE 6'b100000: alucontrol <= 3'b010; // ADD 6'b100010: alucontrol <= 3'b110; // SUB 6'b100100: alucontrol <= 3'b000; // AND 6'b100101: alucontrol <= 3'b001; // OR 6'b101010: alucontrol <= 3'b111; // SLT default: alucontrol <= 3'bxxx; // ??? endcase endcaseendmodule
RegDst
BranchMemWriteMemtoReg
ALUSrcOpcode5:0
ControlUnit
ALUControl2:0Funct5:0
MainDecoder
ALUOp1:0
ALUDecoder
RegWrite
Korea Univ
Verilog Code – ALU
24
module alu(input [31:0] a, b, input [2:0] alucont, output reg [31:0] result, output zero);
wire [31:0] b2, sum, slt;
assign b2 = alucont[2] ? ~b:b; assign sum = a + b2 + alucont[2]; assign slt = sum[31];
always@(*) case(alucont[1:0]) 2'b00: result <= a & b2; 2'b01: result <= a | b2; 2'b10: result <= sum; 2'b11: result <= slt; endcase
assign zero = (result == 32'b0);endmodule
ALU
N N
N
3
A B
Y
FF2:0 Function
000 A & B
001 A | B
010 A + B
011 not used
100 A & ~B
101 A | ~B
110 A - B
111 SLT
+
2 01
A B
Cout
Y
3
01
F2
F1:0
[N-1] S
NN
N
N
N NNN
N
2
ZeroE
xtend
Korea Univ
Single-Cycle Processor Performance
• How fast is the single-cycle processor?• Clock cycle time (frequency) is limited by the critical path
The critical path is the path that takes the longest time What do you think the critical path is?
• The path that lw instruction goes through
25
SignImm
CLK
A RD
InstructionMemory
+
4
A1
A3WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
01
01
A RDData
MemoryWD
WE01
PC01
PC' Instr 25:21
20:16
15:0
5:0
SrcB
20:16
15:11
<<2
+
ALUResult ReadData
WriteData
SrcA
PCPlus4
PCBranch
WriteReg4:0
Result
31:26
RegDst
BranchMemWriteMemtoReg
ALUSrc
RegWrite
OpFunct
ControlUnit
Zero
PCSrc
CLK
ALUControl2:0
ALU1
0100
1
0
1
0 0
Korea Univ
Single-Cycle Processor Performance
• Single-cycle critical path:Tc = tpcq_PC + tmem + max(tRFread, tsext) + tmux + tALU + tmem + tmux + tRFsetup
• In most implementations, limiting paths are: memory (instruction and data), ALU, register file. Thus,Tc = tpcq_PC + 2tmem + tRFread + 2tmux + tALU + tRFsetup
26
SignImm
CLK
A RD
InstructionMemory
+
4
A1
A3WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
01
01
A RDData
MemoryWD
WE01
PC01
PC' Instr 25:21
20:16
15:0
5:0
SrcB
20:16
15:11
<<2
+
ALUResult ReadData
WriteData
SrcA
PCPlus4
PCBranch
WriteReg4:0
Result
31:26
RegDst
BranchMemWriteMemtoReg
ALUSrc
RegWrite
OpFunct
ControlUnit
Zero
PCSrc
CLK
ALUControl2:0
ALU1
0100
1
0
1
0 0 Elements Parameter
Register clock-to-Q tpcq_PC
Multiplexer tmux
ALU tALU
Memory read tmem
Register file read tRFread
Register file setup tRFsetup
Korea Univ
Single-Cycle Processor Performance Example
27
Tc = tpcq_PC + 2tmem + tRFread + 2tmux + tALU + tRFsetup = [30 + 2(250) + 150 + 2(25) + 200 + 20] ps = 950 ps
Elements Parameter Delay (ps)
Register clock-to-Q tpcq_PC 30
Multiplexer tmux 25
ALU tALU 200
Memory read tmem 250
Register file read tRFread 150
Register file setup tRFsetup 20
• Assuming that the CPU executes 100 billion instructions to run your program, what is the execution time of the program on a single-cycle MIPS processor?
Execution Time = (#instructions)(cycles/instruction)(seconds/cycle) = (100 × 109)(1)(950 × 10-12 s)
= 95 seconds
fc = 1/Tc
fc = 1/950ps = 1.052GHz