lecture 9. mips processor design – single-cycle processor design

27
Lecture 9. MIPS Processor Design – Single-Cycle Processor Design Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System Education & Research

Upload: keon

Post on 05-Feb-2016

82 views

Category:

Documents


1 download

DESCRIPTION

2010 R&E Computer System Education & Research. Lecture 9. MIPS Processor Design – Single-Cycle Processor Design. Prof. Taeweon Suh Computer Science Education Korea University. Single-Cycle MIPS Processor. Again, microarchitecture (CPU implementation) is divided into 2 interacting parts - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Lecture 9. MIPS Processor Design –   Single-Cycle Processor Design

Lecture 9. MIPS Processor Design – Single-Cycle Processor Design

Prof. Taeweon SuhComputer Science Education

Korea University

2010 R&E Computer System Education & Research

Page 2: Lecture 9. MIPS Processor Design –   Single-Cycle Processor Design

Korea Univ

Single-Cycle MIPS Processor

• Again, microarchitecture (CPU implementation) is divided into 2 interacting parts Datapath Control

2

Page 3: Lecture 9. MIPS Processor Design –   Single-Cycle Processor Design

Korea Univ

Single-Cycle Processor Design

• Let’s start with a memory access instruction - lw Example: lw $2, 80($0)

3

CLK

A RD

InstructionMemory

A1

A3WD3

RD2

RD1WE3

A2

CLK

RegisterFile

A RDData

MemoryWD

WEPCPC' Instr

CLK

op rs rt imm6 bits 5 bits 5 bits 16 bits

I-Type• STEP 1: Instruction

Fetch

Page 4: Lecture 9. MIPS Processor Design –   Single-Cycle Processor Design

Korea Univ

Single-Cycle Processor Design

• STEP 2: Decoding Read source operands from register file

4

Instr

CLK

A RD

InstructionMemory

A1

A3WD3

RD2

RD1WE3

A2

CLK

RegisterFile

A RDData

MemoryWD

WEPCPC'

25:21

CLK

Example: lw $2, 80($0) op rs rt imm6 bits 5 bits 5 bits 16 bits

I-Type

Page 5: Lecture 9. MIPS Processor Design –   Single-Cycle Processor Design

Korea Univ

Single-Cycle Processor Design

• STEP 2: Decoding Sign-extend the immediate

5

SignImm

CLK

A RD

InstructionMemory

A1

A3WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

A RDData

MemoryWD

WEPCPC' Instr 25:21

15:0

CLK

Example: lw $2, 80($0)

module signext(input [15:0] a, output [31:0] y); assign y = {{16{a[15]}}, a};endmodule

op rs rt imm6 bits 5 bits 5 bits 16 bits

I-Type

Page 6: Lecture 9. MIPS Processor Design –   Single-Cycle Processor Design

Korea Univ

Single-Cycle Processor Design

6

SignImm

CLK

A RD

InstructionMemory

A1

A3WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

A RDData

MemoryWD

WEPCPC' Instr 25:21

15:0

SrcB

ALUResult

SrcA Zero

CLK

ALUControl2:0

ALU

010

Example: lw $2, 80($0) op rs rt imm6 bits 5 bits 5 bits 16 bits

I-Type

• STEP 3: Execution Compute the memory address

Page 7: Lecture 9. MIPS Processor Design –   Single-Cycle Processor Design

Korea Univ

Single-Cycle Processor Design

7

A1

A3WD3

RD2

RD1WE3

A2

SignImm

CLK

A RD

InstructionMemory

CLK

Sign Extend

RegisterFile

A RDData

MemoryWD

WEPCPC' Instr 25:21

15:0

SrcB20:16

ALUResult ReadData

SrcA

RegWrite

Zero

CLK

ALUControl2:0

ALU

0101

Example: lw $2, 80($0) op rs rt imm6 bits 5 bits 5 bits 16 bits

I-Type

• STEP 4: Execution Read data from memory and write it back to

register file

Page 8: Lecture 9. MIPS Processor Design –   Single-Cycle Processor Design

Korea Univ

Single-Cycle Processor Design• We are done with lw• CPU starts fetching the next instruction from PC+4

8

SignImm

CLK

A RD

InstructionMemory

+

4

A1

A3WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

A RDData

MemoryWD

WEPCPC' Instr 25:21

15:0

SrcB20:16

ALUResult ReadData

SrcA

PCPlus4

Result

RegWrite

Zero

CLK

ALUControl2:0

ALU

0101

module adder(input [31:0] a, b, output [31:0] y);

assign y = a + b;endmodule

adder pcadd1(pc, 32'b100, pcplus4);

Page 9: Lecture 9. MIPS Processor Design –   Single-Cycle Processor Design

Korea Univ

Single-Cycle Processor Design• Let’s consider another memory access instruction - sw

sw instruction needs to write data to data memory

9

SignImm

CLK

A RD

InstructionMemory

+

4

A1

A3WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

A RDData

MemoryWD

WEPCPC' Instr 25:21

20:16

15:0

SrcB20:16

ALUResult ReadData

WriteData

SrcA

PCPlus4

Result

MemWriteRegWrite

Zero

CLK

ALUControl2:0

ALU

10100

Example: sw $2, 84($0) op rs rt imm6 bits 5 bits 5 bits 16 bits

I-Type

Page 10: Lecture 9. MIPS Processor Design –   Single-Cycle Processor Design

Korea Univ

Single-Cycle Processor Design

• Let’s consider arithmetic and logical instructions - add, sub, and, or Write ALUResult to register file Note that R-type instructions write to rd field of instruction (instead of rt)

10

SignImm

CLK

A RD

InstructionMemory

+

4

A1

A3WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

01

01

A RDData

MemoryWD

WE01

PCPC' Instr 25:21

20:16

15:0

SrcB

20:16

15:11

ALUResult ReadData

WriteData

SrcA

PCPlus4WriteReg4:0

Result

RegDst MemWrite MemtoRegALUSrcRegWrite

Zero

CLK

ALUControl2:0

ALU

0varies1 001

op rs rt rd shamt funct6 bits 5 bits 5 bits 5 bits 5 bits 6 bits

R-Type

Page 11: Lecture 9. MIPS Processor Design –   Single-Cycle Processor Design

Korea Univ

Single-Cycle Processor Design• Let’s consider a branch instruction - beq

Determine whether register values are equal Calculate branch target address (BTA) from sign-extended immediate and PC+4

11

SignImm

CLK

A RD

InstructionMemory

+

4

A1

A3WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

01

01

A RDData

MemoryWD

WE01

PC01

PC' Instr 25:21

20:16

15:0

SrcB

20:16

15:11

<<2

+

ALUResult ReadData

WriteData

SrcA

PCPlus4

PCBranch

WriteReg4:0

Result

RegDst Branch MemWrite MemtoRegALUSrcRegWrite

Zero

PCSrc

CLK

ALUControl2:0

ALU

01100 x0x 1

Example: beq $4,$0, around op rs rt imm6 bits 5 bits 5 bits 16 bits

I-Type

Page 12: Lecture 9. MIPS Processor Design –   Single-Cycle Processor Design

Korea Univ

Single-Cycle Datapath Example

12

SignImm

CLK

A RD

InstructionMemory

+

4

A1

A3WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

01

01

A RDData

MemoryWD

WE01

PC01

PC' Instr 25:21

20:16

15:0

5:0

SrcB

20:16

15:11

<<2

+

ALUResult ReadData

WriteData

SrcA

PCPlus4

PCBranch

WriteReg4:0

Result

31:26

RegDst

BranchMemWriteMemtoReg

ALUSrc

RegWrite

OpFunct

ControlUnit

Zero

PCSrc

CLK

ALUControl2:0

ALU

0010

01

0

0

1

0

• We are done with the implementation of basic instructions• Let’s see how or instruction works out in the implementation

op rs rt rd shamt funct6 bits 5 bits 5 bits 5 bits 5 bits 6 bits

R-Type

Page 13: Lecture 9. MIPS Processor Design –   Single-Cycle Processor Design

Korea Univ

Single-Cycle Processor - Control

13

SignImm

CLK

A RD

InstructionMemory

+

4

A1

A3WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

01

01

A RDData

MemoryWD

WE01

PC01

PC' Instr 25:21

20:16

15:0

5:0

SrcB

20:16

15:11

<<2

+

ALUResult ReadData

WriteData

SrcA

PCPlus4

PCBranch

WriteReg4:0

Result

31:26

RegDst

BranchMemWriteMemtoReg

ALUSrc

RegWrite

OpFunct

ControlUnit

Zero

PCSrc

CLK

ALUControl2:0

ALU

• As mentioned, CPU is designed with datapath and control• Now, let’s delve into the control part design

Page 14: Lecture 9. MIPS Processor Design –   Single-Cycle Processor Design

Korea Univ

Control Unit

14

RegDst

BranchMemWriteMemtoReg

ALUSrcOpcode5:0

ControlUnit

ALUControl2:0Funct5:0

MainDecoder

ALUOp1:0

ALUDecoder

RegWriteOpcode and funct fields come from the fetched instruction

Page 15: Lecture 9. MIPS Processor Design –   Single-Cycle Processor Design

Korea Univ

ALU Implementation and Control

15

ALU

N N

N3

A B

Y

F

F2:0 Function

000 A & B

001 A | B

010 A + B

011 not used

100 A & ~B

101 A | ~B

110 A - B

111 SLTN = 32 in 32-bit processor

+

2 01

A B

Cout

Y

3

01

F2

F1:0

[N-1] S

NN

N

N

N NNN

N

2Zero

Extend

slt: set less than

Example: slt $t0, $t1, $t2// $t0 = 1 if $t1 < $t2

adder

Page 16: Lecture 9. MIPS Processor Design –   Single-Cycle Processor Design

Korea Univ

Control Unit: ALU Control

16

ALUOp1:0 Meaning

00 Add

01 Subtract

10 Look at Funct

11 Not Used

ALUOp1:0 Funct ALUControl2:0

00 X 010 (Add)

X1 X 110 (Subtract)

1X 100000 (add) 010 (Add)

1X 100010 (sub) 110 (Subtract)

1X 100100 (and) 000 (And)

1X 100101 (or) 001 (Or)

1X 101010 (slt) 111 (SLT)

RegDst

BranchMemWriteMemtoReg

ALUSrcOpcode5:0

ControlUnit

ALUControl2:0Funct5:0

MainDecoder

ALUOp1:0

ALUDecoder

RegWrite

• Implementation is completely dependent on hardware designers• But, the designers should make sure the implementation is reasonable enough

• Memory access instructions (lw, sw) need to use ALU to calculate memory target address (addition)

• Branch instructions (beq, bne) need to use ALU for the equality check (subtraction)

Page 17: Lecture 9. MIPS Processor Design –   Single-Cycle Processor Design

Korea Univ

Control Unit: Main Decoder

17

Instruction Op5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp1:0

R-type 000000

lw 100011

sw 101011

beq 000100

RegDst

BranchMemWriteMemtoReg

ALUSrcOpcode5:0

ControlUnit

ALUControl2:0Funct5:0

MainDecoder

ALUOp1:0

ALUDecoder

RegWrite

ALUOp1:0 Meaning

00 Add

01 Subtract

10 Look at Funct field

11 Not Used

1 1 0 0 0 10

100

0 1 0 0 1 00X 1 0 1 X 00X 0 1 0 X 01

0

Page 18: Lecture 9. MIPS Processor Design –   Single-Cycle Processor Design

Korea Univ

How about Other Instructions?

18

SignImm

CLK

A RD

InstructionMemory

+

4

A1

A3WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

01

01

A RDData

MemoryWD

WE01

PC01

PC' Instr 25:21

20:16

15:0

5:0

SrcB

20:16

15:11

<<2

+

ALUResult ReadData

WriteData

SrcA

PCPlus4

PCBranch

WriteReg4:0

Result

31:26

RegDst

BranchMemWriteMemtoReg

ALUSrc

RegWrite

OpFunct

ControlUnit

Zero

PCSrc

CLK

ALUControl2:0

ALU

Example: addi $t0, $t1, -14

• Hmmm.. Now, we are done with the control part design• Let’s examine if the design is able to execute other instructions

addi

Page 19: Lecture 9. MIPS Processor Design –   Single-Cycle Processor Design

Korea Univ

Control Unit: Main Decoder

19

Instruction Op5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp1:0

R-type 000000 1 1 0 0 0 0 10lw 100011 1 0 1 0 0 1 00sw 101011 0 X 1 0 1 X 00beq 000100 0 X 0 1 0 X 01addi 001000 1 0 1 0 0 0 00

Page 20: Lecture 9. MIPS Processor Design –   Single-Cycle Processor Design

Korea Univ

How about Other Instructions?

20

SignImm

CLK

A RD

InstructionMemory

+

4

A1

A3WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

01

01

A RDData

MemoryWD

WE01

PC01

PC' Instr 25:21

20:16

15:0

5:0

SrcB

20:16

15:11

<<2

+

ALUResult ReadData

WriteData

SrcA

PCPlus4

PCBranch

WriteReg4:0

Result

31:26

RegDst

BranchMemWriteMemtoReg

ALUSrc

RegWrite

OpFunct

ControlUnit

Zero

PCSrc

CLK

ALUControl2:0

ALU

• Ok. So far, so good…• How about jump instructions?

jop addr

6 bits 26 bits

J-Type

Page 21: Lecture 9. MIPS Processor Design –   Single-Cycle Processor Design

Korea Univ

How about Other Instructions?

21

SignImm

CLK

A RD

InstructionMemory

+

4

A1

A3WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

01

01

A RDData

MemoryWD

WE01

PC01 PC' Instr 25:21

20:16

15:0

5:0

SrcB

20:16

15:11

<<2

+

ALUResult ReadData

WriteData

SrcA

PCPlus4

PCBranch

WriteReg4:0

Result

31:26

RegDst

BranchMemWriteMemtoReg

ALUSrc

RegWrite

OpFunct

ControlUnit

Zero

PCSrc

CLK

ALUControl2:0

ALU

01

25:0 <<2

27:0 31:28

PCJump

Jump

• We need to add some hardware to support the j instruction A logic to compute the target address Mux and control signal op addr

6 bits 26 bits

J-Type

Page 22: Lecture 9. MIPS Processor Design –   Single-Cycle Processor Design

Korea Univ

Control Unit: Main Decoder

22

Instruction Op5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp1:0 Jump

R-type 000000 1 1 0 0 0 0 10 0

lw 100011 1 0 1 0 0 1 00 0

sw 101011 0 X 1 0 1 X 00 0

beq 000100 0 X 0 1 0 X 01 0

addi 001000 1 0 1 0 0 0 00 0

j 000100 0 X X X 0 X XX 1

• There is one more output in the main decoder to support the jump instructions• Jump

Page 23: Lecture 9. MIPS Processor Design –   Single-Cycle Processor Design

Korea Univ

Verilog Code - Main Decoder and ALU Control

23

module maindec(input [5:0] op, output memtoreg, memwrite, output branch, alusrc, output regdst, regwrite, output jump, output [1:0] aluop);

reg [8:0] controls;

assign {regwrite, regdst, alusrc, branch, memwrite, memtoreg, jump, aluop} = controls;

always @(*) case(op) 6'b000000: controls <= 9'b110000010; // R-type 6'b100011: controls <= 9'b101001000; // lw 6'b101011: controls <= 9'b001010000; // sw 6'b000100: controls <= 9'b000100001; // beq 6'b001000: controls <= 9'b101000000; // addi 6'b000010: controls <= 9'b000000100; // j default: controls <= 9'bxxxxxxxxx; // ??? endcaseendmodule

module aludec(input [5:0] funct, input [1:0] aluop, output reg [2:0] alucontrol);

always @(*) case(aluop) 2'b00: alucontrol <= 3'b010; // add 2'b01: alucontrol <= 3'b110; // sub default: case(funct) // RTYPE 6'b100000: alucontrol <= 3'b010; // ADD 6'b100010: alucontrol <= 3'b110; // SUB 6'b100100: alucontrol <= 3'b000; // AND 6'b100101: alucontrol <= 3'b001; // OR 6'b101010: alucontrol <= 3'b111; // SLT default: alucontrol <= 3'bxxx; // ??? endcase endcaseendmodule

RegDst

BranchMemWriteMemtoReg

ALUSrcOpcode5:0

ControlUnit

ALUControl2:0Funct5:0

MainDecoder

ALUOp1:0

ALUDecoder

RegWrite

Page 24: Lecture 9. MIPS Processor Design –   Single-Cycle Processor Design

Korea Univ

Verilog Code – ALU

24

module alu(input [31:0] a, b, input [2:0] alucont, output reg [31:0] result, output zero);

wire [31:0] b2, sum, slt;

assign b2 = alucont[2] ? ~b:b; assign sum = a + b2 + alucont[2]; assign slt = sum[31];

always@(*) case(alucont[1:0]) 2'b00: result <= a & b2; 2'b01: result <= a | b2; 2'b10: result <= sum; 2'b11: result <= slt; endcase

assign zero = (result == 32'b0);endmodule

ALU

N N

N

3

A B

Y

FF2:0 Function

000 A & B

001 A | B

010 A + B

011 not used

100 A & ~B

101 A | ~B

110 A - B

111 SLT

+

2 01

A B

Cout

Y

3

01

F2

F1:0

[N-1] S

NN

N

N

N NNN

N

2

ZeroE

xtend

Page 25: Lecture 9. MIPS Processor Design –   Single-Cycle Processor Design

Korea Univ

Single-Cycle Processor Performance

• How fast is the single-cycle processor?• Clock cycle time (frequency) is limited by the critical path

The critical path is the path that takes the longest time What do you think the critical path is?

• The path that lw instruction goes through

25

SignImm

CLK

A RD

InstructionMemory

+

4

A1

A3WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

01

01

A RDData

MemoryWD

WE01

PC01

PC' Instr 25:21

20:16

15:0

5:0

SrcB

20:16

15:11

<<2

+

ALUResult ReadData

WriteData

SrcA

PCPlus4

PCBranch

WriteReg4:0

Result

31:26

RegDst

BranchMemWriteMemtoReg

ALUSrc

RegWrite

OpFunct

ControlUnit

Zero

PCSrc

CLK

ALUControl2:0

ALU1

0100

1

0

1

0 0

Page 26: Lecture 9. MIPS Processor Design –   Single-Cycle Processor Design

Korea Univ

Single-Cycle Processor Performance

• Single-cycle critical path:Tc = tpcq_PC + tmem + max(tRFread, tsext) + tmux + tALU + tmem + tmux + tRFsetup

• In most implementations, limiting paths are: memory (instruction and data), ALU, register file. Thus,Tc = tpcq_PC + 2tmem + tRFread + 2tmux + tALU + tRFsetup

26

SignImm

CLK

A RD

InstructionMemory

+

4

A1

A3WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

01

01

A RDData

MemoryWD

WE01

PC01

PC' Instr 25:21

20:16

15:0

5:0

SrcB

20:16

15:11

<<2

+

ALUResult ReadData

WriteData

SrcA

PCPlus4

PCBranch

WriteReg4:0

Result

31:26

RegDst

BranchMemWriteMemtoReg

ALUSrc

RegWrite

OpFunct

ControlUnit

Zero

PCSrc

CLK

ALUControl2:0

ALU1

0100

1

0

1

0 0 Elements Parameter

Register clock-to-Q tpcq_PC

Multiplexer tmux

ALU tALU

Memory read tmem

Register file read tRFread

Register file setup tRFsetup

Page 27: Lecture 9. MIPS Processor Design –   Single-Cycle Processor Design

Korea Univ

Single-Cycle Processor Performance Example

27

Tc = tpcq_PC + 2tmem + tRFread + 2tmux + tALU + tRFsetup = [30 + 2(250) + 150 + 2(25) + 200 + 20] ps = 950 ps

Elements Parameter Delay (ps)

Register clock-to-Q tpcq_PC 30

Multiplexer tmux 25

ALU tALU 200

Memory read tmem 250

Register file read tRFread 150

Register file setup tRFsetup 20

• Assuming that the CPU executes 100 billion instructions to run your program, what is the execution time of the program on a single-cycle MIPS processor?

Execution Time = (#instructions)(cycles/instruction)(seconds/cycle) = (100 × 109)(1)(950 × 10-12 s)

= 95 seconds

fc = 1/Tc

fc = 1/950ps = 1.052GHz