computer architecture: a constructive approach one-cycle implementation joel emer computer science...

36
Computer Architecture: A Constructive Approach One-Cycle Implementation Joel Emer Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology February 29, 2012 L5-1 http:// csg.csail.mit.edu/6.s078

Upload: allen-davidson

Post on 01-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Computer Architecture: A Constructive Approach

One-Cycle Implementation

Joel EmerComputer Science & Artificial Intelligence Lab.Massachusetts Institute of Technology

February 29, 2012 L5-1http://csg.csail.mit.edu/6.s078

The MIPS ISAProcessor State

32 32-bit GPRs, R0 always contains a 0 PC, the program counter some other special registers

Data types 8-bit byte, 16-bit half word 32-bit word for integers

Load/Store style instruction set data addressing modes- immediate & indexed branch addressing modes- PC relative & register indirect All instructions are 32 bits Byte addressable memory- big endian mode

February 29, 2012 L4-2http://csg.csail.mit.edu/6.s078

r0 - r31

PC

Registers

Floating point, memory management and other systems instructions are not includes in the SMIPS subset

Instruction formats

Only three formats but the fields are used differently by different types of instructions

6 5 5 16opcode rs rt immediate I-type

6 5 5 5 5 6opcode rs rt rd shamt func R-type

6 26opcode target J-type

February 29, 2012 L4-3http://csg.csail.mit.edu/6.s078

Instruction formats cont

Computational Instructions

Load/Store Instructions

opcode rs rt immediate rt (rs) op immediate

6 5 5 5 5 6 0 rs rt rd 0 func rd (rs) func (rt)

rs is the base registerrt is the destination of a Load or the source for a Store

6 5 5 16 addressing modeopcode rs rt displacement (rs) + displacement31 26 25 21 20 16 15 0

February 29, 2012 L4-4http://csg.csail.mit.edu/6.s078

Control InstructionsConditional (on GPR) PC-relative branch

target address = (offset in words)4 + (PC+4) range: 128 KB range

Unconditional register-indirect jumps

Unconditional absolute jumps

target address = {PC<31:28>, target4} range : 256 MB range

6 5 5 16opcode rs offset BEQZ, BNEZ

6 26opcode target J, JAL

6 5 5 16opcode rs JR, JALR

jump-&-link stores PC+4 into the link register (R31)

February 29, 2012 L4-5http://csg.csail.mit.edu/6.s078

Instruction ExecutionExecution of an instruction involves

1. Instruction fetch2. Decode3. Register fetch4. ALU operation5. Memory operation (optional)6. Write back

and the computation of the next instruction address

February 29, 2012 L4-6http://csg.csail.mit.edu/6.s078

Implementing an ISAInstructions fetch

requires an Instruction memory, PCDecode

requires understanding the instruction formatRegister Fetch

requires interaction with a register file with a specific number of read/write ports

ALU must have the ability to carry out the specified ops

Memory operations requires a data memory

Write-back requires interaction with the register file

Update the PC requires arithmetic ops to calculate pc and condition

February 29, 2012 L4-7http://csg.csail.mit.edu/6.s078

A single-cycle implementation

A single-cycle MIPS implementation requires: A register file with 2 read ports and a write port An instruction memory , separate from data memory

so that we can fetch an instruction as well as perform a data operation (Load/store) on the memory

fetch & execute

pc

iMem dMem

rfCPU

February 29, 2012 L4-8http://csg.csail.mit.edu/6.s078

Single-Cycle SMIPS

PC

InstMemory

Decode

Register File

Execute

DataMemory

+4

February 29, 2012 L4-9http://csg.csail.mit.edu/6.s078

Datapath is shown only for convenience; it will be derived automatically from the high-level textual description

Single-Cycle SMIPS code structure

module mkProc(Proc); // State

Reg#(Addr) pc <- mkRegU; RFile rf <- mkRFile; Memory imem <- mkMemory; Memory dmem <- mkMemory;

February 29, 2012 L4-10http://csg.csail.mit.edu/6.s078

Single-Cycle SMIPS code structure

… rule fetchExecute; //fetch

let instResp = imem(MemReq{…}); //decode let decInst = decode(instResp); //register read let rVal1 = rf[…]; let rVal2 = … //execute let execInst = exec(decInst, pc, rVal1, rVal2); //memory (optional) if (instType Ld || St) data = dmem(…) //writeback if(instType Alu || Ld) rf.wr(rDst,data); pc <= execInst.brTaken ? execInst.addr : pc + 4; endrule endmodule;

February 29, 2012 L4-11http://csg.csail.mit.edu/6.s078

DecodingdInst is all the information we need to do the later phases of execution on the instruction

Not all fields of dInst are defined in each case

We may decide to pass more information from decode to execute for efficiency– in that case the definition of the type of the decoded instruction has to be adjusted accordingly

February 29, 2012 L5-12http://csg.csail.mit.edu/6.s078

Instruction groupingMany instructions have the same implementation except for the ALU function they invoke

We can group such instructions and reduce the amount of code we have to write

Example: R-Type ALU, I-Type ALU, Br Type, Memory Type, …

February 29, 2012 L5-13http://csg.csail.mit.edu/6.s078

immValid

Decoding Instructions: extract fields needed for execution from each instruction

instructionbranchComp

rDst

rSrc1

rSrc2

immext

aluFunc

instType31:26

31:26

5:0

31:26

20:16

15:11

25:21

20:16

15:0

25:0

Lot of pure combinational logic: will be derived automatically from the high-level description

decode

February 29, 2012 L5-14http://csg.csail.mit.edu/6.s078

decode

Decoding Instructions: input-output types

instructionbrComp

rDst

rSrc1

rSrc2

immext

aluFunc

iType31:26, 5:0

31:26

5:0

31:26

20:16

15:11

25:21

20:16

15:0

25:0

Typ

e D

eco

ded

Inst

Bit#(32)

IType

AluFunc

Rindex

Rindex

Rindex

BrType

Bit#(32)

Mux control logic not shown

immValidBool

February 29, 2012 L7-15http://csg.csail.mit.edu/6.s078

Typedefstypedef enum {Alu, Ld, St, J, Jr, Jal, Jalr, Br} IType deriving(Bits, Eq);

typedef enum {Eq, Neq, Le, Lt, Ge, Gt} BrType deriving(Bits, Eq);

typedef enum {Add, Sub, And, Or, Xor, Nor, Slt, Sltu, LShift, RShift, Sra} AluFunc deriving(Bits, Eq);

typedef enum {RAlu, IAlu, LdSt, J, Jr, Br} InstDType deriving(Bits, Eq);

February 29, 2012 L7-16http://csg.csail.mit.edu/6.s078

Decode Functionfunction DecodedInst decode(Bit#(32) inst); DecodedInst dInst = ?; let opcode = inst[ 31 : 26 ]; let rs = inst[ 25 : 21 ]; let rt = inst[ 20 : 16 ]; let rd = inst[ 15 : 11 ]; let funct = inst[ 5 : 0 ]; let imm = inst[ 15 : 0 ]; let target = inst[ 25 : 0 ]; case (getInstDType(inst)) ... endcase return dInst;endfunction

February 29, 2012 L7-17http://csg.csail.mit.edu/6.s078

initially undefin

ed

Instruction EncodingBit#(6) opADDIU = 6’b001001;Bit#(6) opSLTI = 6’b001010;Bit#(6) opLW = 6’b100011;Bit#(6) opSW = 6’b101011;Bit#(6) opJ = 6’b000010;Bit#(6) opBEQ = 6’b000100;…Bit#(6) opFUNC = 6’b000000;Bit#(6) fcADDU = 6’b100001;Bit#(6) fcAND = 6’b100100;Bit#(6) fcJR = 6’b001000;…Bit#(6) opRT = 6’b000001;Bit#(6) rtBLTZ = 5’b00000;Bit#(6) rtBGEZ = 5’b00100;

February 29, 2012 L7-18http://csg.csail.mit.edu/6.s078

Instruction Groupingfunction InstDType getInstDType(Bit#(32) inst) return case (inst[31:26]) opADDIU, opSLTI, …: IAlu; opLW, opSW: LdSt; opJ, opJAL: J; opBEQ, opRT, …: Br;

opFUNC: case (inst[5:0]) fcADDU, fcAND, …: RAlu; fcJR, fcJALR: Jr; endcase; endcase;endfunction

February 29, 2012 L7-19http://csg.csail.mit.edu/6.s078

Decoding Instructions:R-Type ALU case (getInstDType(opcode)) … RAlu: begin dInst.iType = Alu; dInst.aluFunc = case (funct) fcADDU: Add; fcSUBU: Sub; ... endcase; dInst.rDst = rd; dInst.rSrc1 = rs; dInst.rSrc2 = rt;

dInst.immValid = False; end

February 29, 2012 L7-20http://csg.csail.mit.edu/6.s078

Decoding Instructions:I-Type ALU case (getInstDType(opcode)) … IAlu: begin dInst.iType = Alu; dInst.aluFunc = case (opcode) opADDIU: Add; ... endcase; dInst.rDst = rt; dInst.rSrc1 = rs; dInst.imm = signedIAlu(opcode) ? signExtend(imm): zeroExtend(imm);

dInst.immValid = True; end

February 29, 2012 L7-21http://csg.csail.mit.edu/6.s078

Decoding Instructions:Load & Store case (getInstDType(opcode)) … LdSt: begin dInst.iType = opcode==opLW ? Ld : St; dInst.aluFunc = Add; dInst.rDst = rt; dInst.rSrc1 = rs; dInst.rSrc2 = rt; dInst.imm = signExtned(imm);

dInst.immValid = True; end

February 29, 2012 L7-22http://csg.csail.mit.edu/6.s078

Decoding Instructions:Jump case (getInstDType(opcode)) … J: begin dInst.iType = opcode==opJ ? J : Jal; dInst.rDst = 31; dInst.imm = zeroExtend({target, 2’b00});

dInst.immValid = False; end

Jr: begin dInst.iType = funct==fcJR ? Jr : Jalr; dInst.rDst = rd; dInst.rSrc1 = rs;

dInst.immValid = False; end

February 29, 2012 L7-23http://csg.csail.mit.edu/6.s078

Decoding Instructions:Branch case (getInstDType(opcode)) … Br: begin dInst.iType = Br; dInst.brComp = case (opcode) opBEQ : EQ; opBLEZ: LE; ... endcase; dInst.rSrc1 = rs; dInst.rSrc2 = rt; dInst.imm = signExtend({imm, 2’b00});

dInst.immValid = False; end

February 29, 2012 L7-24http://csg.csail.mit.edu/6.s078

Reading Registers

Read registers

RVal2

RVal1RF

RSrc2

Pure combinational logic

February 29, 2012 L7-25http://csg.csail.mit.edu/6.s078

RSrc1

Executing Instructions

execute

dInst

addr

brTaken

data

iType

rDst

ALU

BranchAddresspc

rVal2

rVal1

Pure combinational logic

either for memory reference or branch target

either for rf write or St

February 29, 2012 L7-26http://csg.csail.mit.edu/6.s078

ALUBr

Some Useful Functionsfunction Bool memType (IType i) return (i==Ld || i == St);endfunction

function Bool regWriteType (IType i) return (i==Alu || i==Ld || i==Jal || i==Jalr);endfunction

function Bool controlType (IType i) return (i==J || i==Jr || i==Jal || i==Jalr || i==Br);endfunction

February 29, 2012 L7-27http://csg.csail.mit.edu/6.s078

Execute Functionfunction ExecInst exec(DecodedInst dInst, Data rVal1, Data rVal2, Addr pc); ExecInst einst = ?;

Data aluVal2 = (dInst.immValid)? dInst.imm : rVal2 let aluRes = alu(rVal1, aluVal2, dInst.aluFunc); let brAddr = brAddrCal(pc, rVal1, dInst.iType, dInst.imm); einst.itype = dInst.iType; einst.addr = (memType(dInst.iType)? aluRes : brAddr; einst.data = dInst.iType==St ? rVal2 : aluRes; einst.brTaken = aluBr(rVal1, aluVal2, dInst.brComp); einst.rDst = dInst.rDst; return einst;endfunction

February 29, 2012 L7-28http://csg.csail.mit.edu/6.s078

ALUfunction Data alu(Data a, Data b, Func func); Data res = case(func) Add : add(a, b); Sub : subtract(a, b); And : (a & b); Or : (a | b); Xor : (a ^ b); Nor : ~(a | b); Slt : setLessThan(a, b); Sltu : setLessThanUnsigned(a, b); LShift: logicalShiftLeft(a, b[4:0]); RShift: logicalShiftRight(a, b[4:0]); Sra : signedShiftRight(a, b[4:0]); endcase; return res;endfunction

February 29, 2012 L7-29http://csg.csail.mit.edu/6.s078

Branch Resolutionfunction Bool aluBr(Data a, Data b, BrType brComp);

Bool brTaken = case(brComp) Eq : (a == b); Neq : (a != b); Le : signedLE(a, 0); Lt : signedLT(a, 0); Ge : signedGE(a, 0); Gt : signedGT(a, 0);endcase;

return brTaken;endfunction

February 29, 2012 L7-30http://csg.csail.mit.edu/6.s078

Branch Address Calculationfunction Addr brAddrCalc(Address pc, Data val, IType iType, Data imm); let targetAddr = case (iType) J, Jal : {pc[31:28], imm[27:0]}; Jr, Jalr : val; default : pc + imm; endcase; return targetAddr;endfunction

February 29, 2012 L7-31http://csg.csail.mit.edu/6.s078

Single-Cycle SMIPSmodule mkProc(Proc); Reg#(Addr) pc <- mkRegU; RFile rf <- mkRFile; Memory mem <- mkTwoPortedMemory; let iMem = mem.iport ; let dMem = mem.dport;

rule doProc; let inst = iMem(MemReq{op: Ld, addr: pc, data: ?});

let dInst = decode(inst);

Data rVal1 = rf.rd1(dInst.rSrc1); Data rVal2 = rf.rd2(dInst.rSrc2);

February 29, 2012 L7-32http://csg.csail.mit.edu/6.s078

Single-Cycle SMIPS cont

let eInst = exec(dInst, rVal1, rVal2, pc);

if(memType(eInst.iType)) eInst.data <- dMem(MemReq{ op: eInst.iType==Ld ? Ld : St, addr: eInst.addr, data: eInst.data});

if(regWriteType(eInst.iType)) rf.wr(eInst.rDst, eInst.data);

pc <= eInst.brTaken ? eInst.addr : pc + 4;

endrule endmodule

February 29, 2012 L7-33http://csg.csail.mit.edu/6.s078

state updates

Single-Cycle SMIPS

PC

InstMemory

Decode

Register File

Execute

DataMemory

+4

The whole system was described using one rule; lots of big combinational functions

performance?

February 29, 2012 L5-34http://csg.csail.mit.edu/6.s078

Harvard-Style Datapath for MIPS

February 29, 2012 http://csg.csail.mit.edu/6.s078 L03-35

0x4

RegWrite

Add

Add

clk

WBSrcMemWrite

addr

wdata

rdataData Memory

we

RegDst BSrcExtSelOpCode

z

OpSel

clk

zero?

clk

addrinst

Inst.Memory

PC rd1

GPRs

rs1rs2

wswd rd2

we

ImmExt

ALU

ALUControl

31

PCSrcbrrindjabspc+4

old way

Hardwired Control Table

http://csg.csail.mit.edu/6.s078 L03-36

Opcode ExtSel BSrc OpSel MemW RegW WBSrc RegDst PCSrc

ALU

ALUi

ALUiu

LW

SW

BEQZz=0

BEQZz=1

J

JAL

JR

JALR

BSrc = Reg / Imm WBSrc = ALU / Mem / PC RegDst = rt / rd / R31 PCSrc = pc+4 / br / rind / jabs

* * * no yes rindPC R31

rind* * * no no * *jabs* * * no yes PC R31

jabs* * * no no * *pc+4sExt16

* 0? no no * *

brsExt16* 0? no no * *

pc+4sExt16Imm + yes no * *

pc+4Imm Op no yes ALU rt

pc+4* Reg Func no yes ALU rd

sExt16Imm Op pc+4no yes ALU rt

pc+4sExt16Imm + no yes Mem rt

uExt16

February 29, 2012

old way