computer architecture: a constructive approach one-cycle implementation joel emer computer science...
TRANSCRIPT
Computer Architecture: A Constructive Approach
One-Cycle Implementation
Joel EmerComputer Science & Artificial Intelligence Lab.Massachusetts Institute of Technology
February 29, 2012 L5-1http://csg.csail.mit.edu/6.s078
The MIPS ISAProcessor State
32 32-bit GPRs, R0 always contains a 0 PC, the program counter some other special registers
Data types 8-bit byte, 16-bit half word 32-bit word for integers
Load/Store style instruction set data addressing modes- immediate & indexed branch addressing modes- PC relative & register indirect All instructions are 32 bits Byte addressable memory- big endian mode
February 29, 2012 L4-2http://csg.csail.mit.edu/6.s078
r0 - r31
PC
Registers
Floating point, memory management and other systems instructions are not includes in the SMIPS subset
Instruction formats
Only three formats but the fields are used differently by different types of instructions
6 5 5 16opcode rs rt immediate I-type
6 5 5 5 5 6opcode rs rt rd shamt func R-type
6 26opcode target J-type
February 29, 2012 L4-3http://csg.csail.mit.edu/6.s078
Instruction formats cont
Computational Instructions
Load/Store Instructions
opcode rs rt immediate rt (rs) op immediate
6 5 5 5 5 6 0 rs rt rd 0 func rd (rs) func (rt)
rs is the base registerrt is the destination of a Load or the source for a Store
6 5 5 16 addressing modeopcode rs rt displacement (rs) + displacement31 26 25 21 20 16 15 0
February 29, 2012 L4-4http://csg.csail.mit.edu/6.s078
Control InstructionsConditional (on GPR) PC-relative branch
target address = (offset in words)4 + (PC+4) range: 128 KB range
Unconditional register-indirect jumps
Unconditional absolute jumps
target address = {PC<31:28>, target4} range : 256 MB range
6 5 5 16opcode rs offset BEQZ, BNEZ
6 26opcode target J, JAL
6 5 5 16opcode rs JR, JALR
jump-&-link stores PC+4 into the link register (R31)
February 29, 2012 L4-5http://csg.csail.mit.edu/6.s078
Instruction ExecutionExecution of an instruction involves
1. Instruction fetch2. Decode3. Register fetch4. ALU operation5. Memory operation (optional)6. Write back
and the computation of the next instruction address
February 29, 2012 L4-6http://csg.csail.mit.edu/6.s078
Implementing an ISAInstructions fetch
requires an Instruction memory, PCDecode
requires understanding the instruction formatRegister Fetch
requires interaction with a register file with a specific number of read/write ports
ALU must have the ability to carry out the specified ops
Memory operations requires a data memory
Write-back requires interaction with the register file
Update the PC requires arithmetic ops to calculate pc and condition
February 29, 2012 L4-7http://csg.csail.mit.edu/6.s078
A single-cycle implementation
A single-cycle MIPS implementation requires: A register file with 2 read ports and a write port An instruction memory , separate from data memory
so that we can fetch an instruction as well as perform a data operation (Load/store) on the memory
fetch & execute
pc
iMem dMem
rfCPU
February 29, 2012 L4-8http://csg.csail.mit.edu/6.s078
Single-Cycle SMIPS
PC
InstMemory
Decode
Register File
Execute
DataMemory
+4
February 29, 2012 L4-9http://csg.csail.mit.edu/6.s078
Datapath is shown only for convenience; it will be derived automatically from the high-level textual description
Single-Cycle SMIPS code structure
module mkProc(Proc); // State
Reg#(Addr) pc <- mkRegU; RFile rf <- mkRFile; Memory imem <- mkMemory; Memory dmem <- mkMemory;
…
February 29, 2012 L4-10http://csg.csail.mit.edu/6.s078
Single-Cycle SMIPS code structure
… rule fetchExecute; //fetch
let instResp = imem(MemReq{…}); //decode let decInst = decode(instResp); //register read let rVal1 = rf[…]; let rVal2 = … //execute let execInst = exec(decInst, pc, rVal1, rVal2); //memory (optional) if (instType Ld || St) data = dmem(…) //writeback if(instType Alu || Ld) rf.wr(rDst,data); pc <= execInst.brTaken ? execInst.addr : pc + 4; endrule endmodule;
February 29, 2012 L4-11http://csg.csail.mit.edu/6.s078
DecodingdInst is all the information we need to do the later phases of execution on the instruction
Not all fields of dInst are defined in each case
We may decide to pass more information from decode to execute for efficiency– in that case the definition of the type of the decoded instruction has to be adjusted accordingly
February 29, 2012 L5-12http://csg.csail.mit.edu/6.s078
Instruction groupingMany instructions have the same implementation except for the ALU function they invoke
We can group such instructions and reduce the amount of code we have to write
Example: R-Type ALU, I-Type ALU, Br Type, Memory Type, …
February 29, 2012 L5-13http://csg.csail.mit.edu/6.s078
immValid
Decoding Instructions: extract fields needed for execution from each instruction
instructionbranchComp
rDst
rSrc1
rSrc2
immext
aluFunc
instType31:26
31:26
5:0
31:26
20:16
15:11
25:21
20:16
15:0
25:0
Lot of pure combinational logic: will be derived automatically from the high-level description
decode
February 29, 2012 L5-14http://csg.csail.mit.edu/6.s078
decode
Decoding Instructions: input-output types
instructionbrComp
rDst
rSrc1
rSrc2
immext
aluFunc
iType31:26, 5:0
31:26
5:0
31:26
20:16
15:11
25:21
20:16
15:0
25:0
Typ
e D
eco
ded
Inst
Bit#(32)
IType
AluFunc
Rindex
Rindex
Rindex
BrType
Bit#(32)
Mux control logic not shown
immValidBool
February 29, 2012 L7-15http://csg.csail.mit.edu/6.s078
Typedefstypedef enum {Alu, Ld, St, J, Jr, Jal, Jalr, Br} IType deriving(Bits, Eq);
typedef enum {Eq, Neq, Le, Lt, Ge, Gt} BrType deriving(Bits, Eq);
typedef enum {Add, Sub, And, Or, Xor, Nor, Slt, Sltu, LShift, RShift, Sra} AluFunc deriving(Bits, Eq);
typedef enum {RAlu, IAlu, LdSt, J, Jr, Br} InstDType deriving(Bits, Eq);
February 29, 2012 L7-16http://csg.csail.mit.edu/6.s078
Decode Functionfunction DecodedInst decode(Bit#(32) inst); DecodedInst dInst = ?; let opcode = inst[ 31 : 26 ]; let rs = inst[ 25 : 21 ]; let rt = inst[ 20 : 16 ]; let rd = inst[ 15 : 11 ]; let funct = inst[ 5 : 0 ]; let imm = inst[ 15 : 0 ]; let target = inst[ 25 : 0 ]; case (getInstDType(inst)) ... endcase return dInst;endfunction
February 29, 2012 L7-17http://csg.csail.mit.edu/6.s078
initially undefin
ed
Instruction EncodingBit#(6) opADDIU = 6’b001001;Bit#(6) opSLTI = 6’b001010;Bit#(6) opLW = 6’b100011;Bit#(6) opSW = 6’b101011;Bit#(6) opJ = 6’b000010;Bit#(6) opBEQ = 6’b000100;…Bit#(6) opFUNC = 6’b000000;Bit#(6) fcADDU = 6’b100001;Bit#(6) fcAND = 6’b100100;Bit#(6) fcJR = 6’b001000;…Bit#(6) opRT = 6’b000001;Bit#(6) rtBLTZ = 5’b00000;Bit#(6) rtBGEZ = 5’b00100;
February 29, 2012 L7-18http://csg.csail.mit.edu/6.s078
Instruction Groupingfunction InstDType getInstDType(Bit#(32) inst) return case (inst[31:26]) opADDIU, opSLTI, …: IAlu; opLW, opSW: LdSt; opJ, opJAL: J; opBEQ, opRT, …: Br;
opFUNC: case (inst[5:0]) fcADDU, fcAND, …: RAlu; fcJR, fcJALR: Jr; endcase; endcase;endfunction
February 29, 2012 L7-19http://csg.csail.mit.edu/6.s078
Decoding Instructions:R-Type ALU case (getInstDType(opcode)) … RAlu: begin dInst.iType = Alu; dInst.aluFunc = case (funct) fcADDU: Add; fcSUBU: Sub; ... endcase; dInst.rDst = rd; dInst.rSrc1 = rs; dInst.rSrc2 = rt;
dInst.immValid = False; end
February 29, 2012 L7-20http://csg.csail.mit.edu/6.s078
Decoding Instructions:I-Type ALU case (getInstDType(opcode)) … IAlu: begin dInst.iType = Alu; dInst.aluFunc = case (opcode) opADDIU: Add; ... endcase; dInst.rDst = rt; dInst.rSrc1 = rs; dInst.imm = signedIAlu(opcode) ? signExtend(imm): zeroExtend(imm);
dInst.immValid = True; end
February 29, 2012 L7-21http://csg.csail.mit.edu/6.s078
Decoding Instructions:Load & Store case (getInstDType(opcode)) … LdSt: begin dInst.iType = opcode==opLW ? Ld : St; dInst.aluFunc = Add; dInst.rDst = rt; dInst.rSrc1 = rs; dInst.rSrc2 = rt; dInst.imm = signExtned(imm);
dInst.immValid = True; end
February 29, 2012 L7-22http://csg.csail.mit.edu/6.s078
Decoding Instructions:Jump case (getInstDType(opcode)) … J: begin dInst.iType = opcode==opJ ? J : Jal; dInst.rDst = 31; dInst.imm = zeroExtend({target, 2’b00});
dInst.immValid = False; end
Jr: begin dInst.iType = funct==fcJR ? Jr : Jalr; dInst.rDst = rd; dInst.rSrc1 = rs;
dInst.immValid = False; end
February 29, 2012 L7-23http://csg.csail.mit.edu/6.s078
Decoding Instructions:Branch case (getInstDType(opcode)) … Br: begin dInst.iType = Br; dInst.brComp = case (opcode) opBEQ : EQ; opBLEZ: LE; ... endcase; dInst.rSrc1 = rs; dInst.rSrc2 = rt; dInst.imm = signExtend({imm, 2’b00});
dInst.immValid = False; end
February 29, 2012 L7-24http://csg.csail.mit.edu/6.s078
Reading Registers
Read registers
RVal2
RVal1RF
RSrc2
Pure combinational logic
February 29, 2012 L7-25http://csg.csail.mit.edu/6.s078
RSrc1
Executing Instructions
execute
dInst
addr
brTaken
data
iType
rDst
ALU
BranchAddresspc
rVal2
rVal1
Pure combinational logic
either for memory reference or branch target
either for rf write or St
February 29, 2012 L7-26http://csg.csail.mit.edu/6.s078
ALUBr
Some Useful Functionsfunction Bool memType (IType i) return (i==Ld || i == St);endfunction
function Bool regWriteType (IType i) return (i==Alu || i==Ld || i==Jal || i==Jalr);endfunction
function Bool controlType (IType i) return (i==J || i==Jr || i==Jal || i==Jalr || i==Br);endfunction
February 29, 2012 L7-27http://csg.csail.mit.edu/6.s078
Execute Functionfunction ExecInst exec(DecodedInst dInst, Data rVal1, Data rVal2, Addr pc); ExecInst einst = ?;
Data aluVal2 = (dInst.immValid)? dInst.imm : rVal2 let aluRes = alu(rVal1, aluVal2, dInst.aluFunc); let brAddr = brAddrCal(pc, rVal1, dInst.iType, dInst.imm); einst.itype = dInst.iType; einst.addr = (memType(dInst.iType)? aluRes : brAddr; einst.data = dInst.iType==St ? rVal2 : aluRes; einst.brTaken = aluBr(rVal1, aluVal2, dInst.brComp); einst.rDst = dInst.rDst; return einst;endfunction
February 29, 2012 L7-28http://csg.csail.mit.edu/6.s078
ALUfunction Data alu(Data a, Data b, Func func); Data res = case(func) Add : add(a, b); Sub : subtract(a, b); And : (a & b); Or : (a | b); Xor : (a ^ b); Nor : ~(a | b); Slt : setLessThan(a, b); Sltu : setLessThanUnsigned(a, b); LShift: logicalShiftLeft(a, b[4:0]); RShift: logicalShiftRight(a, b[4:0]); Sra : signedShiftRight(a, b[4:0]); endcase; return res;endfunction
February 29, 2012 L7-29http://csg.csail.mit.edu/6.s078
Branch Resolutionfunction Bool aluBr(Data a, Data b, BrType brComp);
Bool brTaken = case(brComp) Eq : (a == b); Neq : (a != b); Le : signedLE(a, 0); Lt : signedLT(a, 0); Ge : signedGE(a, 0); Gt : signedGT(a, 0);endcase;
return brTaken;endfunction
February 29, 2012 L7-30http://csg.csail.mit.edu/6.s078
Branch Address Calculationfunction Addr brAddrCalc(Address pc, Data val, IType iType, Data imm); let targetAddr = case (iType) J, Jal : {pc[31:28], imm[27:0]}; Jr, Jalr : val; default : pc + imm; endcase; return targetAddr;endfunction
February 29, 2012 L7-31http://csg.csail.mit.edu/6.s078
Single-Cycle SMIPSmodule mkProc(Proc); Reg#(Addr) pc <- mkRegU; RFile rf <- mkRFile; Memory mem <- mkTwoPortedMemory; let iMem = mem.iport ; let dMem = mem.dport;
rule doProc; let inst = iMem(MemReq{op: Ld, addr: pc, data: ?});
let dInst = decode(inst);
Data rVal1 = rf.rd1(dInst.rSrc1); Data rVal2 = rf.rd2(dInst.rSrc2);
February 29, 2012 L7-32http://csg.csail.mit.edu/6.s078
Single-Cycle SMIPS cont
let eInst = exec(dInst, rVal1, rVal2, pc);
if(memType(eInst.iType)) eInst.data <- dMem(MemReq{ op: eInst.iType==Ld ? Ld : St, addr: eInst.addr, data: eInst.data});
if(regWriteType(eInst.iType)) rf.wr(eInst.rDst, eInst.data);
pc <= eInst.brTaken ? eInst.addr : pc + 4;
endrule endmodule
February 29, 2012 L7-33http://csg.csail.mit.edu/6.s078
state updates
Single-Cycle SMIPS
PC
InstMemory
Decode
Register File
Execute
DataMemory
+4
The whole system was described using one rule; lots of big combinational functions
performance?
February 29, 2012 L5-34http://csg.csail.mit.edu/6.s078
Harvard-Style Datapath for MIPS
February 29, 2012 http://csg.csail.mit.edu/6.s078 L03-35
0x4
RegWrite
Add
Add
clk
WBSrcMemWrite
addr
wdata
rdataData Memory
we
RegDst BSrcExtSelOpCode
z
OpSel
clk
zero?
clk
addrinst
Inst.Memory
PC rd1
GPRs
rs1rs2
wswd rd2
we
ImmExt
ALU
ALUControl
31
PCSrcbrrindjabspc+4
old way
Hardwired Control Table
http://csg.csail.mit.edu/6.s078 L03-36
Opcode ExtSel BSrc OpSel MemW RegW WBSrc RegDst PCSrc
ALU
ALUi
ALUiu
LW
SW
BEQZz=0
BEQZz=1
J
JAL
JR
JALR
BSrc = Reg / Imm WBSrc = ALU / Mem / PC RegDst = rt / rd / R31 PCSrc = pc+4 / br / rind / jabs
* * * no yes rindPC R31
rind* * * no no * *jabs* * * no yes PC R31
jabs* * * no no * *pc+4sExt16
* 0? no no * *
brsExt16* 0? no no * *
pc+4sExt16Imm + yes no * *
pc+4Imm Op no yes ALU rt
pc+4* Reg Func no yes ALU rd
sExt16Imm Op pc+4no yes ALU rt
pc+4sExt16Imm + no yes Mem rt
uExt16
February 29, 2012
old way