implementing for correct concurrency nirav dave computer science & artificial intelligence lab

March 9, 2011 http://csg.csail.mit.edu/6.375 L11-1

Implementing for Correct Concurrency

Nirav DaveComputer Science & Artificial Intelligence LabMassachusetts Institute of Technology

http://csg.csail.mit.edu/6.375

March 9, 2011 L11-2http://csg.csail.mit.edu/6.375

Dealing with ConflictsWhen do conflicts arise?

How do we Analyze them?

How do we fix them?

How do we make sure we’re okay?


SFIFOinterface SFIFO#(type t, type tr, type v); method Action enq(t); // enqueue an item method Action deq(); // remove oldest entry method t first(); // inspect oldest item method Action clear(); // make FIFO empty method Maybe#(v) find(tr); // search FIFOendinterface

n = # of bits needed to represent the values of type “t“ m = # of bits needed to represent the values of type “tr“ v = # of bits needed to represent the values of type “v“

not full

not empty

not empty

rdyenab

n

nrdy

enab

rdy

enq

deq

first SF

IFO

mod

ule

clea

renab

findmbool

V


Processor Example

fetch execute

iMem

rf

CPU

decode memory

pc

write-back

dMem

5 – stage Processor. 1 element FIFOs in between stages

Let’s add bypassing


Decode Rulerule decode (!newStallFunc(instr, d2eQ, e2mQ, m2wQ)); let fetInst = f2dQ.first(); f2dQ.deq(); match {.ra, .rb} = getRARB(fetInst);

let va0 = rf[ra]; let va1 = fromMaybe (m2wQ.find(ra), va0); let va2 = fromMaybe (e2mQ.find(ra), va1);

let vb0 = rf[rb]; let vb1 = fromMaybe (m2wQ.find(rb), vb0); let vb2 = fromMaybe (e2mQ.find(rb), vb1);

let newInst = case (fetInst) match Add: return (DAdd .va2 .vb2); … endcase; d2eQ.enq(newInst);endrule When do we want it to execute?

Decode is also correct correct anytime it’s allowed to execute

Search through each place in

design


some insight intoConcurrent rule firing

There are more intermediate states in the rule semantics (a state after each rule step) In the HW, states change only at clock edges

Rules

HW

Ri Rj Rk

clocks

rulesteps

Ri

RjRk



Parallel executionreorders reads and writes

In the rule semantics, each rule sees (reads) the effects (writes) of previous rules In the HW, rules only see the effects from previous clocks, and only affect subsequent clocks

Rules

HW clocks

rulestepsreads writes reads writes reads writesreads writesreads writes

reads writes reads writes



Correctness

Rules are allowed to fire in parallel only if the net state change is equivalent to sequential rule execution Consequence: the HW can never reach a state unexpected in the rule semantics

Rules

HW

Ri Rj Rk

clocks

rulesteps

Ri

RjRk



UpshotGiven the concurrency of method/rules in a system we can determine viable schedules Some variation do to applicability

BUT we know what schedule we want (mostly) We should be able to back propagate results

to submodules


Determining Concurrency Properties


Processor: Concurrencies

In-order: F < D < E < M < WPipelined W < M < E < D < F

fetch execute

iMem

rf

CPU

decode memory

pc

write-back

dMem



Concurrency requirements for Full Pipelining – Reg File

In-Order RF: (D calls sub) < (W calls upd)

Pipelined RF: (W calls upd) < (D calls sub)

fetch

execute

imem

rf

CPU

decode memory

pc

write-back

dMem


Concurrency requirements for Full Pipelining – FIFOs

In-Order FIFOs: 1. m2wQ, e2mQ: find < enq < first < deq 2. d2eQ: find < enq < first < deq, clear

Pipeline FIFOs: 3. m2wQ, e2mQ : first < deq < enq < find 4. d2eQ : first < deq < find < enq

fetch

execute

imem

rf

CPU

decode memory

pc

write-back

dMem


Constructing Appropriately concurrent submodules


From Analysis to DesignWe need to create modules which behave as needed

Construct modules using “unsafe” primitives to have “safe” behaviors

Three major concepts: Use primitives which remove “false” concurrency

orderings (e.g. ConfigRegs vs. Regs) Add RWires for forwarding values intra-cycle Reason carefully to assure that execution appears

“atomic”


ConfigReg and RWiremkConfigReg is a Reg without this restriction mkReg requires that read < write Allows us to read stale values (dangerous)

RWire is a “wire” wset :: a -> Action writes wget :: Maybe#(a) returns written value if

read happened. wset happens before wget each cycle


Let’s implement some modules


Processor Redux


fetch execute

iMem

rf

CPU

decode memory

pc

write-back

dMem



Concurrency: RegFileThe standard library regfile is implemented using with concurrency (sub < upd) This handles the in-order case

We need to build a RegisterFile for the pipelined case


BypassRegFilemodule mkBypassRegFile(RegFile#(a,d)) #(d l, d h) provisos#(Bits(a,asz), Bits#(d,dsz)); RegFile#(a,d) rfInt <- mkRegFileWCF(l,h); RWire#(Tuple2#(a,d)) curWrite <- mkRWire();

method Action upd(a x, d v); rfInternal.upd(x,v); curWrite.wset(tuple2(x,v));endmethod

method d sub(a x); case (curWrite.wget()) matches tagged Valid {.wa, .wd} &&& wa == a: return wd; default: return

rfInternal.sub(a); endcase endmethod endmodule


Processor Redux


fetch execute

iMem

rf

CPU

decode memory

pc

write-back

dMem



One Element SFIFO (Naïve)module mkSFIFO1#(function Maybe#(v) findf(tr r, t x)) (SFIFO#(t,tr,v)); Reg#(t) data <- mkRegU(); Reg#(Bool) full <- mkReg(False); method Action enq(t x) if (!full); full <= True; data <= x; endmethod method Action deq() if (full); full <= False; endmethod method t first() if (full); return (data); endmethod method Maybe#(v) find(tr r); return (full ? findf(r, data): Nothing); endmethod endmodule


Concurrency:find < first < (enq C deq)


One Element SFIFO (In-Order d2eQ #1)module mkSFIFO1#(function Maybe#(v) findf(tr r, t x)) (SFIFO#(t,tr,v)); Reg#(t) data <- mkConfigRegU(); Reg#(Bool) full <- mkConfigReg(False); RWire#(t) enqv <- mkRWire(); method Action enq(t x) if (!full); full <= True; data <= x; enqv.wset(x); endmethod method Action deq() if (full || isValid(enqv.wget())); full <= False; endmethod method t first() if (full); return data; endmethod method Maybe#(v) find(tr r); return full ? findf(r,data): Nothing; endmethodendmodule


find < first < enq < deq


One Element SFIFO (In-Order e2mQ, m2wQ #2)module mkSFIFO1#(function Bool findf(tr r, t x)) (SFIFO#(t,tr)); Reg#(t) data <- mkRegU(); Reg#(Bool) full <- mkConfigReg(False); RWire#(t) enqv <- mkRWire(); method Action enq(t x) if (!full); full <= True; data <= x; enqv.wset(x); endmethod method Action deq() if (full || isValid(enqv.wget())); full <= False; endmethod method t first() if (full || isValid(enqv.wget())); return (fromMaybe(enqv.wget(), data)); endmethod method Maybe#(v) find(tr r); return full ? findf(r,data): Nothing; endmethodendmodule


find < enq < first < deq


One Element Searchable SFIFO (Pipelined #3)module mkSFIFO1#(function Bool findf(tr r, t x)) (SFIFO#(t,tr)); Reg#(t) data <- mkConfigRegU(); Reg#(Bool) full <- mkConfigReg(False); RWire#(void) deqw <- mkRWire(); RWire#(void) enqw <- mkRWire(); method Action enq(t x) if (!full || isValid(deqw.wget()); full <= True; data <= x; enqw.wset(x); endmethod method Action deq() if (full); full <= False; deqw.wset(?); endmethod method t first() if (full); return (data); endmethod method Maybe#(v) find(tr r); return (full&&!isValid(deqw.wget()) ? findf(r,data) : isValid(enqw.wget()) ? findf(r, fromMaybe(enqw.wget(),?)): Nothing; endmethod endmodulehttp://csg.csail.mit.edu/6.375

first < deq < enq < find


One Element Searchable SFIFO (Pipelined #4)module mkSFIFO1#(function Bool findf(tr r, t x)) (SFIFO#(t,tr)); Reg#(t) data <- mkConfigRegU(); Reg#(Bool) full <- mkConfigReg(False); RWire#(void) deqw <- mkRWire(); method Action enq(t x) if (!full || isValid(deqw.wget()); full <= True; data <= x; endmethod method Action deq() if (full); full <= False; deqw.wset(?); endmethod method t first() if (full); return (data); endmethod method Maybe#(v) find(tr r); return (full&&!isValid(deqw.wget()) ? findf(r, data): Nothing;endmethod endmodule


first < deq < find < enq


One Element Searchable SFIFO (Pipelined #4)module mkSFIFO1#(function Bool findf(tr r, t x)) (SFIFO#(t,tr)); Reg#(t) data <- mkRegU(); Reg#(Bool) full <- mkConfigReg(False); RWire#(void) deqEN <- mkRWire(); Bool deqp = isValid (deqEN.wget())); method Action enq(t x) if (!full|| deqp); full <= True; data <= x; 12endmethod method Action deq() if (full); full <= False; deqEN.wset(?); endmethod method t first() if (full); return (data); endmethod method Maybe#(v) find(tr r);

return (full&&!deqp) ? findf(r, data): Nothing; endmethod endmodule


first < deq < find < enq


Up-Down Counter


Counter Module Interfaceinterface Counter method Action up(); method Action down(); method Bit#(32) _read();endinterface

Concurrency: up and down should be independent


Naïve Counter Examplemodule mkCounter(Counter); Reg#(int) r <- mkReg(); method int _read(); return r; endmethod method Action up(); r <= r + 1; endmethod method Action down(); c <= r – 1; endmethodendmodule


Counter Examplemodule mkCounter(Counter); Reg#(int) r <- mkConfigReg(); RWire#(void) upW <- mkRWire(); RWire#(void) downW <- mkRWire();

method int _read(); return r; endmethod method Action up(); upW.wset(); endmethod method Action down(); downW.wset(); endmethod

rule updateR(True); r <= r + (isValid( upW.wget()) ? 1 : 0) - (isValid(downW.wget()) ? 1 : 0); endruleendmodule

What if want to call up then _read?


Completion Buffer


Completion buffer: Interface

interface CBuffer#(type t); method ActionValue#(Token) getToken(); method Action put(Token tok, t d); method ActionValue#(t) getResult();endinterface

typedef Bit#(TLog#(n)) TokenN#(numeric type n);typedef TokenN#(16) Token;

cbuf getResultgetToken

put (result & token)



IP-Lookup module with the completion buffer

module mkIPLookup(IPLookup); rule recirculate… ; rule exit …; method Action enter (IP ip); Token tok <- cbuf.getToken(); ram.req(ip[31:16]); fifo.enq(tuple2(tok,ip[15:0])); endmethod method ActionValue#(Msg) getResult(); let result <- cbuf.getResult(); return result; endmethodendmodule

done?RAM

fifo

enter

getResultcbufyes

no

getToken

for enter and getResult to execute simultaneously, cbuf.getToken and cbuf.getResult must execute simultaneously



IP Lookup rules with completion buffer

rule recirculate (!isLeaf(ram.peek())); match{.tok,.rip} = fifo.first(); fifo.enq(tuple2(tok,(rip << 8))); ram.req(ram.peek() + rip[15:8]); fifo.deq(); ram.deq();endrule

rule exit (isLeaf(ram.peek())); cbuf.put(ram.peek()); fifo.deq(); ram.deq();endrule

For rule exit and method enter to execute simultaneously, cbuf.put and cbuf.getToken must execute simultaneously

For no dead cycles cbuf.getToken and cbuf.put and cbuf.getResult must be able to execute simultaneously



Naïve Completion Buffermodule mkCBuffer(CBuffer#(a)); Vector#(Reg#(Bool)) valids <- replicateM(mkReg(False)); RegFile#(Token, t) data <- mkRegFile(); Reg#(Token) rdP <- mkReg(0); Reg#(Token) wrP <- mkReg(0); Reg#(Token) cnt <- mkReg(0); method ActionValue#(Token) getToken() if (cnt < Max); cnt <= cnt + 1; rdP <= nextPointer(rdP); valids[rdP] <= False; return rdp; endmethod method Action put(Token tok, t d); valids[tok] <= True; data.upd(tok, d); endmethod method ActionValue#(t) getResult() if (valids[wrP]) cnt <= cnt -1; wrP <= nextPointer(wrP); return (data.sub(wrP)); endmethodendmodule


Completion buffer: Interface Requirements

cbuf getResultgetToken

put (result & token)

Rules and methods concurrency requirement to avoid dead-cycles: exit < getResult < enter cbuf methods’ concurency: cbuf.getResult < cbuf.put < cbuf.getToken



Completion Buffermodule mkCBuffer(CBuffer#(a)); Vector#(Reg#(Bool)) valids <- replicateM(mkReg(False)); RegFile#(Token, t) data <- mkRegFile(); Reg#(Token) rdP <- mkConfigReg(0); Reg#(Token) wrP <- mkConfigReg(0); Counter cnt <- mkCounter(); method ActionValue#(Token) getToken() if (cnt < Max); cnt.up(); rdP <= rdP + 1; valids[rdP] <= False; return rdp; endmethod method Action put(Token tok, t d); valids[tok] <= True; data.upd(tok, d); endmethod method ActionValue#(t) getResult() if (valids[wrP]) cnt.down(); wrP <= wrP + 1; return (data.sub(wrP)); endmethodendmodule

getResult < put < getToken

Is the ordering correct?

Is valids okay?

implementing for correct concurrency nirav dave computer science & artificial intelligence lab

Documents