computer architecture: a constructive approach branch prediction - 2 arvind

30
Computer Architecture: A Constructive Approach Branch Prediction - 2 Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology April 11, 2012 L17-1 http://csg.csail.mit.edu/ 6.S078

Upload: mareo

Post on 06-Feb-2016

56 views

Category:

Documents


0 download

DESCRIPTION

Computer Architecture: A Constructive Approach Branch Prediction - 2 Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology. Two-Stage pipeline A robust two-rule solution. Bypass FIFO. Register File. eEpoch. fEpoch. nextPC. PC. Execute. Decode. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Computer Architecture: A Constructive Approach Branch Prediction - 2 Arvind

Computer Architecture: A Constructive Approach

Branch Prediction - 2

ArvindComputer Science & Artificial Intelligence Lab.Massachusetts Institute of Technology

April 11, 2012 L17-1http://csg.csail.mit.edu/6.S078

Page 2: Computer Architecture: A Constructive Approach Branch Prediction - 2 Arvind

Two-Stage pipelineA robust two-rule solution

PC

InstMemory

Decode

Register File

Execute

DataMemory

+4ir

BypassFIFO

PipelineFIFO

nextP

C

fEpoch

eEpoch

Either fifo can be a normal (>1 element) fifo

April 11, 2012 L17-2http://csg.csail.mit.edu/6.S078

Page 3: Computer Architecture: A Constructive Approach Branch Prediction - 2 Arvind

April 11, 2012 L17-3http://csg.csail.mit.edu/6.S078

Decoupled Fetch and Execute

Fetch Execute

<instructions, pc, epoch>

<updated pc>

Properly decoupled systems permit greater freedom in independent refinement of blocksFIFOs must permit concurrent enq and deqFor pipelined behavior ir behavior must be deq<enqFor proper scheduling nextPC behavior must be enq<deq (deq < enq would be just wrong)

ir

nextPC

Page 4: Computer Architecture: A Constructive Approach Branch Prediction - 2 Arvind

April 11, 2012 L17-4http://csg.csail.mit.edu/6.S078

Three one-element FIFOs

Ordinary: No concurrent enq/deqPipeline: deq before enq, combinational pathBypass: enq before deq, combinational pathPipeline and Bypass fifos can create combinational cycles in the presence of feedback

notEmptynotFull

deqenq

notEmptynotFull

deqenq

or

notEmptynotFull

deqenq

or

Ordinary FIFO

Pipeline FIFO

Bypass FIFO

Page 5: Computer Architecture: A Constructive Approach Branch Prediction - 2 Arvind

April 11, 2012 L17-5http://csg.csail.mit.edu/6.S078

Multi-element FIFOsNormal FIFO

Permits concurrent enq and deq when notFull and notEmpty

Unlike a pipeline FIFO, does not permit enq when full, even if there is a concurrent deq

Unlike a bypass FIFO, does not permit deq when empty, even if there is a concurrent enq

Normal FIFO implementations have at least two elements, but they do not have combinational paths => make it easier to reduce critical paths at the expense of area

Page 6: Computer Architecture: A Constructive Approach Branch Prediction - 2 Arvind

A decoupled solution using epoch

Add fEpoch and eEpoch registers to the processor state; initialize them to the same value The epoch changes whenever Execute determines that the pc prediction is wrong. This change is reflected immediately in eEpoch and eventually in fEpoch via nextPC FIFOAssociate the fEpoch with every instruction when it is fetched In the execute stage, reject, i.e., kill, the instruction if its epoch does not match eEpoch

April 11, 2012 L17-6http://csg.csail.mit.edu/6.S078

Page 7: Computer Architecture: A Constructive Approach Branch Prediction - 2 Arvind

Two-stage pipeline Decoupledmodule mkProc(Proc); Reg#(Addr) pc <- mkRegU; RFile rf <- mkRFile; IMemory iMem <- mkIMemory; DMemory dMem <- mkDMemory; PipeReg#(TypeFetch2Decode) ir <- mkPipeReg; Reg#(Bool) fEpoch <- mkReg(False); Reg#(Bool) eEpoch <- mkReg(False); FIFOF#(Tuple2#(Addr,bool)) nextPC <- mkBypassFIFOF;

rule doFetch … endrule

rule doExecute … endrule

endmodule

April 11, 2012 L17-7http://csg.csail.mit.edu/6.S078

Page 8: Computer Architecture: A Constructive Approach Branch Prediction - 2 Arvind

Two-stage pipeline doFetch rulerule doFetch (ir.notFull); let inst = iMem(pc); ir.enq(TypeFetch2Decode {pc:pc, epoch:fEpoch, inst:inst}); if(nextPC.notEmpty) begin match{.ipc,.epoch} = nextPC.first; pc<=ipc; fEpoch<=epoch; nextPC.deq; end else pc <= pc + 4;endrule

explicit guard

simple branch prediction

April 11, 2012 L17-8http://csg.csail.mit.edu/6.S078

Page 9: Computer Architecture: A Constructive Approach Branch Prediction - 2 Arvind

Two-stage pipeline doExecute rulerule doExecute (ir.notEmpty); let irpc = ir.first.pc; let inst = ir.first.inst; if(ir.first.epoch==eEpoch) begin let eInst = decodeExecute(irpc, inst, rf); let memData <- dMemAction(eInst, dMem); regUpdate(eInst, memData, rf); if (eInst.brTaken) begin nepoch = next(epoch); eEpoch <= nepoch; nextPC.enq(tuple2(eInst.addr, nepoch); end end ir.deq;endruleendmodule

April 11, 2012 L17-9http://csg.csail.mit.edu/6.S078

Page 10: Computer Architecture: A Constructive Approach Branch Prediction - 2 Arvind

Two-Stage pipeline with a Branch Predictor

PC

InstMemory

Decode

Register File

Execute

DataMemory

ir+

ppc

nextP

C

fEpoch

eEpoch

BranchPredictor

April 11, 2012 L17-10http://csg.csail.mit.edu/6.S078

Page 11: Computer Architecture: A Constructive Approach Branch Prediction - 2 Arvind

Branch Predictor Interface

interface NextAddressPredictor; method Addr prediction(Addr pc); method Action update(Addr pc, Addr target);

endinterface

April 11, 2012 L17-11http://csg.csail.mit.edu/6.S078

Page 12: Computer Architecture: A Constructive Approach Branch Prediction - 2 Arvind

Example

Null Branch Predictionmodule mkNeverTaken(NextAddressPredictor); method Addr prediction(Addr pc); return pc+4; endmethod

method Action update(Addr pc, Addr target); noAction; endmethod

endmodule

Replaces PC+4 with … Already implemented in the pipeline

Right most of the time Why?

April 11, 2012 L17-12http://csg.csail.mit.edu/6.S078

Page 13: Computer Architecture: A Constructive Approach Branch Prediction - 2 Arvind

Example Branch Target Prediction (BTB)module mkBTB(NextAddressPredictor); RegFile#(LineIdx, Addr) tagArr <- mkRegFileFull; RegFile#(LineIdx, Addr) targetArr <- mkRegFileFull;

method Addr prediction(Addr pc); LineIdx index = truncate(pc >> 2); let tag = tagArr.sub(index); let target = targetArr.sub(index); if (tag==pc) return target; else return (pc+4); endmethod

method Action update(Addr pc, Addr target); LineIdx index = truncate(pc >> 2); tagArr.upd(index, pc);

targetArr.upd(index, target); endmethodendmoduleApril 11, 2012 L17-13http://csg.csail.mit.edu/6.S078

Page 14: Computer Architecture: A Constructive Approach Branch Prediction - 2 Arvind

Two-stage pipeline + BPmodule mkProc(Proc); Reg#(Addr) pc <- mkRegU; RFile rf <- mkRFile; IMemory iMem <- mkIMemory; DMemory dMem <- mkDMemory; PipeReg#(TypeFetch2Decode) ir <- mkPipeReg; Reg#(Bool) fEpoch <- mkReg(False); Reg#(Bool) eEpoch <- mkReg(False); FIFOF#(Tuple3#(Addr,Addr,Bool)) nextPC <- mkBypassFIFOF; NextAddressPredictor bpred <- mkNeverTaken; The definition of TypeFetch2Decode is changed to

include predicted pc typedef struct { Addr pc; Addr ppc; Bool epoch; Data inst; } TypeFetch2Decode deriving (Bits, Eq);

Some target predictor

April 11, 2012 L17-14http://csg.csail.mit.edu/6.S078

Page 15: Computer Architecture: A Constructive Approach Branch Prediction - 2 Arvind

Two-stage pipeline + BP Fetch rule rule doFetch (ir.notFull); let ppc = bpred.prediction(pc); let inst = iMem(pc); ir.enq(TypeFetch2Decode {pc:pc, ppc:ppc, epoch:fEpoch, inst:inst}); if(nextPC.notEmpty) begin match{.ipc, .ippc, .epoch} = nextPC.first; pc <= ippc; fEpoch <= epoch; nextPC.deq; bpred.update(ipc, ippc); end else pc <= ppc; endrule

April 11, 2012 L17-15http://csg.csail.mit.edu/6.S078

Page 16: Computer Architecture: A Constructive Approach Branch Prediction - 2 Arvind

Two-stage pipeline + BP Execute rulerule doExecute (ir.notEmpty); let irpc = ir.first.pc; let inst = ir.first.inst; let irppc = ir.first.ppc; if(ir.first.epoch==eEpoch) begin let eInst = decodeExecute(irpc, irppc, inst, rf); let memData <- dMemAction(eInst, dMem); regUpdate(eInst, memData, rf); if (eInst.missPrediction) begin nepoch = next(eEpoch); eEpoch <= nepoch; nextPC.enq(tuple3(irpc, eInst.brTaken ? eInst.addr : irpc+4), nepoch)); end end ir.deq;endrule endmodule

April 11, 2012 L17-16http://csg.csail.mit.edu/6.S078

Requires changes in decodeExecute to return missPrediction as opposed to brTaken information

Page 17: Computer Architecture: A Constructive Approach Branch Prediction - 2 Arvind

Execute Functionfunction ExecInst exec(DecodedInst dInst, Data rVal1, Data rVal2, Addr pc, Addr ppc); ExecInst einst = ?;

let aluVal2 = (dInst.immValid)? dInst.imm : rVal2 let aluRes = alu(rVal1, aluVal2, dInst.aluFunc); let brAddr = brAddrCal(pc, rVal1, dInst.iType, dInst.imm); einst.itype = dInst.iType; einst.addr = (memType(dInst.iType)? aluRes : brAddr; einst.data = dInst.iType==St ? rVal2 : aluRes; einst.brTaken = aluBr(rVal1, aluVal2, dInst.brComp); einst.missPrediction = brTaken ? brAddr!=ppc : (pc+4)!=ppc; einst.rDst = dInst.rDst; return einst;endfunction

April 11, 2012 L17-17http://csg.csail.mit.edu/6.S078

Page 18: Computer Architecture: A Constructive Approach Branch Prediction - 2 Arvind

Multiple predictorsFor multiple predictors to make sense we first need to have more than two stage pipelineWith a slightly different (even a 2-satge) pipeline we also need to resolve data-hazards simultaneouslyPlan

Present a different two stage pipeline with data hazards

Present a three stage pipeline with One branch predictor Two branch predictors

April 11, 2012 L17-18http://csg.csail.mit.edu/6.S078

Page 19: Computer Architecture: A Constructive Approach Branch Prediction - 2 Arvind

A different 2-Stage pipeline

PC

InstMemory

Decode

Register File

Execute

DataMemory

itr

nextP

C

fE

poch

eEpoch

April 11, 2012 L17-19http://csg.csail.mit.edu/6.S078

BranchPredictor

stall

Page 20: Computer Architecture: A Constructive Approach Branch Prediction - 2 Arvind

TypeDecode2Execute

typedef struct { Addr pc; Addr ppc; Bool epoch; DecodedInst dInst; Data rVal1; Data rVal2} TypeDecode2Execute deriving (Bits, Eq);

April 11, 2012 L17-20http://csg.csail.mit.edu/6.S078

value instead of register names

Page 21: Computer Architecture: A Constructive Approach Branch Prediction - 2 Arvind

The stall function

function Bool stall(Maybe#(Rindx) src1, Maybe#(Rindx) src2, PipeReg#(TypeDecode2Execute) itr); dst = itr.first.dInst.rDst;

return (itr.notEmpty && isValid(dst) && ((validValue(dst)==validValue(src1) && isValid(src1)) || (validValue(dst)==validValue(src2) && isValid(src2))));endfunction

April 11, 2012 L17-21http://csg.csail.mit.edu/6.S078

src1, src2 and rDst in DecodedInst are changedfrom Rindx to Maybe#(Rindx)to determine the stall condition

Page 22: Computer Architecture: A Constructive Approach Branch Prediction - 2 Arvind

A different 2-Stage pipelinemodule mkProc(Proc); Reg#(Addr) pc <- mkRegU; RFile rf <- mkConfigRFile; IMemory iMem <- mkIMemory; DMemory dMem <- mkDMemory;

PipeReg#(TypeDecode2Execute) itr <- mkConfigPipeReg;

Reg#(Bool) fEpoch <- mkReg(False); Reg#(Bool) eEpoch <- mkReg(False);

FIFOF#(Tuple3#(Addr,Addr,Bool)) nextPC <- mkBypassFIFOF;

NextAddressPredictor bpred <- mkNeverTaken;

April 11, 2012 L17-22http://csg.csail.mit.edu/6.S078

Page 23: Computer Architecture: A Constructive Approach Branch Prediction - 2 Arvind

A different 2-Stage pipelinedoFetch rule rule doFetch (itr.notFull); let inst = iMem(pc); let dInst = decode(inst); if(!stall(dInst.src1, dInst.src2, itr)) begin let ppc = bpred.prediction(pc); let rVal1 = rf.rd1(validValue(dInst.src1)); let rVal2 = rf.rd2(validValue(dInst.src2)); itr.enq(TypeDecode2Execute{pc:pc, ppc:ppc, epoch:fEpoch, dInst:dInst, rVal1:rVal1, rVal2:rVal2}); if(nextPC.notEmpty) begin match{.ipc, .ippc, .epoch} = nextPC.first; pc <= ippc; fEpoch <= epoch; nextPC.deq; bpred.update(ipc, ippc); end else pc <= ppc; end endrule

April 11, 2012 L17-23http://csg.csail.mit.edu/6.S078

Page 24: Computer Architecture: A Constructive Approach Branch Prediction - 2 Arvind

A different 2-Stage pipelinedoExecute rulerule doExecute (itr.notEmpty); let itrpc=itr.first.pc; let dInst=itr.first.dInst; let itrppc=itr.first.ppc; let rVal1=itr.first.rVal1; let rVal2=itr.first.rVal2; if(itr.first.epoch==eEpoch) begin let eInst = execute(dInst, rVal1, rVal2, itrpc); let memData <- dMemAction(eInst, dMem); regUpdate(eInst, memData, rf); if(eInst.missPrediction) begin nepoch = next(epoch); eEpoch <= nepoch; nextPC.enq(tuple3(itrpc, eInst.brTaken ? eInst.addr : itrpc+4) nepoch); end end itr.deq;endrule endmodule

April 11, 2012 L17-24http://csg.csail.mit.edu/6.S078

Page 25: Computer Architecture: A Constructive Approach Branch Prediction - 2 Arvind

April 11, 2012 L17-25http://csg.csail.mit.edu/6.S078

Concurrency analysisnextPC bypass fifo functionality: enq < deq

Hence doExecute happens before doFetch every cycle

itr pipeline fifo functionality: deq < enq Hence doExecute happens before doFetch every cycle

itr pipeline fifo functionality: first < deq Hence doFetch happens before doExecute every cycle to

determine the stall condition Use config pipeline fifo to remove scheduling constraint

mkRFile functionality: {rd1, rd2} < wr Hence doFetch happens before doExecute every cycle Use mkConfigRFile to remove scheduling constraint

Page 26: Computer Architecture: A Constructive Approach Branch Prediction - 2 Arvind

3-Stage pipeline – 1 predictor

PC

InstMemory

Decode

Register File

Execute

DataMemory

itr

nextP

C

fE

poch

eEpoch

April 11, 2012 L17-26http://csg.csail.mit.edu/6.S078

BranchPredictor

stall

ir

nextP

C

dEpoch

Page 27: Computer Architecture: A Constructive Approach Branch Prediction - 2 Arvind

3-Stage pipeline – 1 predictormodule mkProc(Proc); Reg#(Addr) pc <- mkRegU; RFile rf <- mkConfigRFile; IMemory iMem <- mkIMemory; DMemory dMem <- mkDMemory; PipeReg#(TypeFetch2Decode) ir <- mkPipeReg; PipeReg#(TypeDecode2Execute) itr <- mkConfigPipeReg; Reg#(Bool) fEpoch <- mkReg(False); Reg#(Bool) dEpoch <- mkReg(False); Reg#(Bool) eEpoch <- mkReg(False); FIFOF#(Tuple2#(Addr,Addr)) nextPCE2D <-mkBypassFIFOF; FIFOF#(Tuple2#(Addr,Addr)) nextPCD2F <-mkBypassFIFOF; NextAddressPredictor bpred <- mkNeverTaken;

April 11, 2012 L17-27http://csg.csail.mit.edu/6.S078

Page 28: Computer Architecture: A Constructive Approach Branch Prediction - 2 Arvind

3-Stage pipeline – 1 predictor rule doFetch (ir.notFull); let inst = iMem(pc); let ppc = bpred.prediction(pc); ir.enq(TypeFetch2Decode{ pc:pc, ppc:ppc, epoch:fEpoch, inst:inst}); if(nextPCD2F.notEmpty) begin match{.ipc, .ippc} = nextPCD2F.first; pc <= ippc; fEpoch <= !fEpoch; nextPCD2F.deq; bpred.update(ipc, ippc); end else pc <= ppc; end endrule

April 11, 2012 L17-28http://csg.csail.mit.edu/6.S078

Page 29: Computer Architecture: A Constructive Approach Branch Prediction - 2 Arvind

3-Stage pipeline – 1 predictor

rule doDecode (itr.notFull && ir.notEmpty); let irpc=ir.first.pc; let irppc=ir.first.ppc; let inst=ir.first.inst; if(nextPCE2D.notEmpty) begin dEpoch <= !dEpoch; nextPCD2F.enq(nextPCE2D.first); nextPCE2D.deq; ir.deq; end else if(ir.first.epoch==dEpoch) begin let dInst = decode(inst); if(!stall(dInst.src1, dInst.src2, itr)) begin let rVal1 = rf.rd1(validValue(dInst.src1)); let rVal2 = rf.rd2(validValue(dInst.src2)); itr.enq(TypeDecode2Execute{pc:irpc, ppc:irppc, epoch:dEpoch, dInst:dInst, rVal1:rVal1, rVal2:rVal2}); ir.deq; end end else ir.deq; endrule

April 11, 2012 L17-29http://csg.csail.mit.edu/6.S078

Page 30: Computer Architecture: A Constructive Approach Branch Prediction - 2 Arvind

3-Stage pipeline – 1 predictorrule doExecute (itr.notEmpty); let itrpc=itr.first.pc; let dInst=itr.first.dInst; let itrppc=itr.first.ppc; let rVal1=itr.first.rVal1; let rVal2=itr.first.rVal2; if(itr.first.epoch==eEpoch) begin let eInst = execute(dInst, rVal1, rVal2, itrpc); let memData <- dMemAction(eInst, dMem); regUpdate(eInst, memData, rf); if(eInst.missPrediction) begin nextPCE2D.enq(tuple2(itrpc, eInst.brTaken ? eInst.addr : itrpc+4)); eEpoch <= !eEpoch; end end itr.deq;endrule endmodule

April 11, 2012 L17-30http://csg.csail.mit.edu/6.S078