bluespec - imperial college londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12....

61
Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge

Upload: others

Post on 01-Aug-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

Bluespec

Lectures 3 & 4

with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge

Page 2: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

Course Resources

• http://cas.ee.ic.ac.uk/~ssingh

• Lecture notes (Power Point, PDF)

• Example Bluespec programs used in Lectures

• Complete Photoshop system (Bluespec)

• Links to Bluespec code samples

• User guide, reference guide: doc sub-directory of Bluespec installation

• More information at http://bluespec.com

Page 3: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

Rules, not clock edges

• rules are atomic

– they execute within one clock cycle

• structure: rule name (explicit conditions) statements; endrule

• conditions:

– explicit – conditions (Boolean expression) provided

– implicit – conditions that have to be met to allow the statements to fire, e.g. for fifo.enq only if fifo not full

Page 4: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

Rules: powerful alternative to always blocks

• rules for state updates instead of always blocks

• Simple concept: think if…then…

• Rule can execute (or “fire”) only when its conditions are TRUE

• Every rule is atomic with respect to other rules

• Powerful ramifications: – Executable specification – design around operations as described in specs

– Atomicity of rules dramatically reduces concurrency bugs

– Automates management of shared resources – avoids many complex errors

rule ruleName (<boolean cond>);

<state update(s)>

endrule

Page 5: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

Bits, Bools and conversion

• Bit#(width) – vector of bits

• Bool – single bit for Booleans (True, False)

• pack() – function to convert most things (pack) into a bit representation

• unpack() – opposite of pack()

• extend() – extend an integer (signed, unsigned, bits)

• truncate() – truncate an integer

Page 6: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

Reg and Bit/Uint/Int types • registers (initialised and uninitialised versions):

Reg#(type) name0 <- mkReg(initial_value); Reg#(type) name1 <- mkRegU;

• some types (unsigned and signed integer, and bits): UInt#(width), Int#(width), Bit#(width)

• example: Reg#(UInt#(8)) counter <- mkReg(0); rule count_up; counter <= counter+1; endrule

name of module to “make” (i.e. instantiate)

N.B. modules are typically prefixed “mk” interface type

type parameter (e.g. UInt#(8))

since Reg is generic

Page 7: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

Registers

interface Reg#(type a); method Action _write (a x1); method a _read (); endinterface: Reg • Polymorphic • Just library elements • In one cycle register reads must execute before

register writes • x <= y + 1 is syntactic sugar for

x._write (y._read + 1)

Page 8: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

Scheduling Annotations

C Conflict

CF Conflict free

SB Sequence before

SBR Sequence before restricted (cannot be in the same rule)

SA Sequence after

SAR Sequence after restricted (cannot be in the same rule)

Page 9: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

Scheduling Annotations for a Register

read write

read CF SB

write SA SBR

• Two read methods would be conflict-free (CF), that is, you could have multiple methods that read from the same register in the same rule, sequenced in any order.

• A write is sequenced after (SA) a read. • A read is sequenced before (SB) a write. • If you have two write methods, one must be sequenced before the other,

and they cannot be in the same rule, as indicated by the annotation SBR.

Page 10: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

Updating Registers

Reg#(int) x <- mkReg (0) ; rule countup (x < 30); int y = x + 1; x <= x + 1; $display ("x = %0d, y = %0d", x, y); endrule

Page 11: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

Rules of Rules (The Three Basics)

1. Rules are atomic

2. Rules fire or don’t at most once per cycle

3. Rules don’t conflict with other rules

Page 12: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

x

y

+1 Q

D

D

Q +1

clk

rule r1; x <= y + 1; endrule rule r2; y <= x + 1; endrule

Page 13: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

x2

y2

+1 Q

D

D

Q +1

clk

(* synthesize *) module rules4 (Empty); Reg#(int) x <- mkReg (10); Reg#(int) y <- mkReg (100); rule r1; x <= y + 1; endrule rule r2; y <= x + 1; endrule rule monitor; $display ("x, y = %0d, %0d ", x, y); endrule endmodule

$ ./rules4 -m 5 x, y = 10, 100 x, y = 10, 11 x, y = 10, 11 x, y = 10, 11

Page 14: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

$ ./rules5 -m 5 x, y = 10, 100 x, y = 101, 11 x, y = 12, 102 x, y = 103, 13

x

y

+1 Q

D

D

Q +1

clk

(* synthesize *) module rules5 (Empty); Reg#(int) x <- mkReg (10); Reg#(int) y <- mkReg (100); rule r ; x <= y + 1; y <= x + 1; endrule rule monitor; $display ("x, y = %0d, %0d ", x, y); endrule endmodule

Page 15: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

x2

y2

+1 Q

D

D

Q +1

clk

(* synthesize *) module rules6 (Empty); Reg#(int) x <- mkReg (10); Reg#(int) y <- mkReg (100); rule r1; x <= y + 1; endrule rule r2; y <= x + 1; endrule (* descending_urgency = "r1, r2" *) rule monitor; $display ("x, y = %0d, %0d ", x, y); endrule endmodule

$ ./rules6 -m 5 x, y = 10, 100 x, y = 101, 100 x, y = 101, 100 x, y = 101, 100

Page 16: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

interface Rules7_Interface ; method int readValue ; method Action setValue (int newXvalue) ; method ActionValue#(int) increment ; endinterface (* synthesize *) module rules7 (Rules7_Interface); Reg#(int) x <- mkReg (0); method readValue ; return x ; endmethod method Action setValue (int newXvalue); x <= newXvalue ; endmethod method ActionValue#(int) increment ; x <= x + 1 ; return x ; endmethod endmodule

Page 17: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

interface Rules7_Interface ; (* always_ready *) method int readResult ; (* always_enabled *) method Action setValues (int newX, int newY, int newZ) ; endinterface (* synthesize *) module rules7 (Rules7_Interface) ; Reg#(int) x <- mkReg (0) ; Reg#(int) y <- mkReg (0) ; Reg#(int) z <- mkReg (0) ; Reg#(int) result <- mkRegU ; Reg#(Bool) b <- mkReg (False) ; rule toggle ; b <= !b ; endrule rule r1 (b) ; result <= x * y ; endrule rule r2 (!b) ; result <= x * z ; endrule method readResult = result ; method Action setValues (int newX, int newY, int newZ) ; x <= newX ; y <= newY ; z <= newZ ; endmethod endmodule

// remaining internal signals assign x_MUL_y___d8 = x * y ; assign x_MUL_z___d5 = x * z ;

Page 18: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock
Page 19: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

interface Rules8_Interface ; (* always_ready *) method int readResult ; (* always_enabled *) method Action setValues (int newX, int newY, int newZ) ; endinterface (* synthesize *) module rules8 (Rules8_Interface) ; Reg#(int) x <- mkReg (0) ; Reg#(int) y <- mkReg (0) ; Reg#(int) z <- mkReg (0) ; Wire#(int) t <- mkWire ; Reg#(int) result <- mkRegU ; Reg#(Bool) b <- mkReg (False) ; rule toggle ; b <= !b ; endrule rule computeT ; if (b) t <= y ; else t <= z ; endrule rule r1 (b) ; result <= x * t ; endrule method readResult = result ; method Action setValues (int newX, int newY, int newZ) ; x <= newX ; y <= newY ; z <= newZ ; endmethod endmodule

// inlined wires assign t$wget = b ? y : z ; … // remaining internal signals assign x_MUL_t_wget___d6 = x * t$wget ;

Page 20: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock
Page 21: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

High Level Synthesis

• Most work on high level synthesis focuses on the automation scheduling and allocation to achieve resource sharing.

• Perspective: high level synthesis in general applies to many aspects of converting high level descriptions into efficient circuits but there has been an undue level of effort on resource sharing in an ASIC context.

• Bluespec automates many aspects of scheduling (it makes scheduling composable) but resource usage is under the explicit control of the designer.

• For FPGA-based design this is often a better bit as a programming model.

Page 22: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

Simple example with concurrency and shared resources

Process 0: increments register x when cond0

Process 1: transfers a unit from register x to register y when cond1

Process 2: decrements register y when cond2

Each register can only be updated by one process on each clock. Priority: 2 > 1 > 0

Just like real applications, e.g.: Bank account: 0 = deposit to checking, 1 = transfer from checking to

savings, 2 = withdraw from savings

0 1 2

x y

+1 -1 +1 -1

Process priority: 2 > 1 > 0

cond0 cond1 cond2

Page 23: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

Fundamentally, we are scheduling three potentially concurrent atomic transactions that share resources.

What if the priorities changed: cond1 > cond2 > cond0? What if the processes are in different modules?

0 1 2

x y

+1 -1 +1 -1 Process priority: 2 > 1 > 0

cond0 cond1 cond2

always @(posedge CLK) begin

if (cond2)

y <= y – 1;

else if (cond1) begin

y <= y + 1; x <= x – 1;

end

if (cond0 && !cond1)

x <= x + 1;

end

* There are other ways to write this RTL, but all suffer from same analysis

Resource-access scheduling logic i.e., control logic

always @(posedge CLK) begin

if (cond2)

y <= y – 1;

else if (cond1) begin

y <= y + 1; x <= x – 1;

end

if (cond0 && (!cond1 || cond2) )

x <= x + 1;

end

Better scheduling

Page 24: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

With Bluespec, the design is direct

(* descending_urgency = “proc2, proc1, proc0” *)

rule proc0 (cond0);

x <= x + 1;

endrule

rule proc1 (cond1);

y <= y + 1;

x <= x – 1;

endrule

rule proc2 (cond2);

y <= y – 1;

endrule

Hand-written RTL: Explicit scheduling Complex clutter,

unmaintainable

BSV: Functional correctness follows directly from rule semantics (atomicity)

Executable spec (operation-centric)

Automatic handling of shared resource control logic

Same hardware as the RTL

0 1 2

x y

+1 -1 +1 -1

Process priority: 2 > 1 > 0

cond0 cond1 cond2

Page 25: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

Now, let’s make a small change: add a new process and insert its priority

0

1

2

x y

+1

-1 +1

-1

Process priority: 2 > 3 > 1 > 0

cond0 cond1 cond2

3 +2 -2

cond3

Page 26: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

Process priority: 2 > 3 > 1 > 0

Changing the Bluespec design

0

1

2

x y

+1

-1 +1

-1

cond0 cond1 cond2

3 +2 -2

cond3

(* descending_urgency = “proc2, proc1, proc0” *)

rule proc0 (cond0);

x <= x + 1;

endrule

rule proc1 (cond1);

y <= y + 1;

x <= x – 1;

endrule

rule proc2 (cond2);

y <= y – 1;

endrule

(* descending_urgency = "proc2, proc3, proc1, proc0" *)

rule proc0 (cond0);

x <= x + 1;

endrule

rule proc1 (cond1);

y <= y + 1;

x <= x - 1;

endrule

rule proc2 (cond2);

y <= y - 1;

x <= x + 1;

endrule

rule proc3 (cond3);

y <= y - 2;

x <= x + 2;

endrule

Pre-Change

?

Page 27: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

Process priority: 2 > 3 > 1 > 0

Changing the Verilog design

0

1

2

x y

+1

-1 +1

-1

cond0 cond1 cond2

3 +2 -2

cond3

always @(posedge CLK) begin

if (!cond2 && cond1)

x <= x – 1;

else if (cond0)

x <= x + 1;

if (cond2)

y <= y – 1;

else if (cond1)

y <= y + 1;

end

always @(posedge CLK) begin

if ((cond2 && cond0) || (cond0 && !cond1 && !cond3))

x <= x + 1;

else if (cond3 && !cond2)

x <= x + 2;

else if (cond1 && !cond2)

x <= x - 1

if (cond2)

y <= y - 1;

else if (cond3)

y <= y - 2;

else if (cond1)

y <= y + 1;

end

Pre-Change

?

Page 28: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

Alternate RTL style (more common)

• Combinatorial explosion

• Case 3’b111 is subtle

• Many repetitions of update actions ( cut-paste errors)

– cf. “WTO Principle” (Write Things Once—Gerard Berry)

• Difficult to maintain/extend

• Difficult to modularize

0 1 2

x y

+1 -1 +1 -1 Process priority: 2 > 1 > 0

cond0 cond1 cond2

always @ (posedge clk)

case ({cond0, cond1, cond2})

3'b000: begin // nothing happens

x <= x; y <= y;

end

3'b001: begin //proc2 fires

y <= y-1;

end

3'b010: begin //proc1

x <= x-1; y <= y+1;

end

3'b011: begin //proc2 fires (2>1)

y <= y-1;

end

3'b100: begin //proc0

x <= x+1;

end

3'b101: begin //proc2 + proc0

x <= x+1; y <= y-1;

end

3'b110: begin //proc1 (1>0)

x <= x-1; y <= y+1;

end

3'b111: begin //proc2 + proc0

x <= x+1; // NOTE – subtle!

y <= y-1;

end

endcase

Page 29: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

Late Specifications

Late specification changes and feature enhancements are challenging to deal with.

Micro-architectural changes for timing/area/performance, e.g.: Adding a pipeline stage to an existing pipeline

Adding a pipeline stage where pipelining was not anticipated

Spreading a calculation over more clocks (longer iteration)

Moving logic across a register stage (rebalancing)

Restructuring combinational clouds for shallower logic

Fixing bugs

Bluespec makes it easier to try out multiple macro/micro-architectures earlier in the design cycle

Page 30: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

Why Rule atomicity improves correctness

Correctness is often couched (formally or informally) as an invariant E.g.,

Rule atomicity improves thinking about (and formally proving) invariants, because invariants can be verified one rule at a time

In contrast, in RTL and thread models, must think of all possible interleavings cf. The Problem With Threads, Edward A. Lee, IEEE Computer

39(5), May 2006, pp. 33-42

“# ingress packets — # egress packets == packet-count register value”

Page 31: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

Bank Account: Key Benefits

• Executable specifications

• Rapid changes

• But, with fine-grained control of RTL:

– Define the optimal architecture/micro-architecture

– Debug at the source OR RTL level – designer understands both

– The Quality of Results (QoR) of RTL!

Page 32: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

A more complex example, from CPU design

Speculative, out-of-order

Many, many concurrent activities

Branch

Register File

ALU Unit Re-

Order Buffer (ROB) MEM

Unit

Data Memory

Instruction Memory

Fetch Decode

FIF

O

FIFO FIFO FIFO FIFO

FIF

O

FIF

O F

IFO

FIF

O F

IFO

Re- Order Buffer (ROB)

Branch

Register File

ALU Unit

MEM Unit

Data Memory

Instruction Memory

Fetch Decode

Page 33: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

33

Many concurrent actions on common state: nightmare to manage explicitly

Empty Waiting

E W

Head

Tail

V - - Instr - V -

V - - Instr - V -

V - - Instr - V -

V - - Instr - V -

V - - Instr - V -

V - - Instr - V -

V - - Instr - V -

V - - Instr - V -

V - - Instr - V -

V - - Instr - V -

V 0 - Instr B V 0 W

V 0 - Instr C V 0 W

- Instr D V 0 W

V 0 - Instr A V 0 W

V - - Instr - V -

V - - Instr - V - E

E

E

E

E

E

E

E

E

E

E

E

V 0

Re-Order Buffer

Put an instr into

ROB

Decode Unit

Register File

Get operands for instr

Writeback results

Get a ready ALU instr

Get a ready MEM instr

Put ALU instr results in ROB

Put MEM instr results in ROB

ALU Unit(s)

MEM Unit(s) Resolve

branches

Operand 1 Result Instruction Operand 2 State

Page 34: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

Branch Resolution

• …

• …

• … Commit Instr

• Write results to register file (or allow memory write for store)

• Set to Empty

• Increment head pointer

Write Back Results to ROB

• Write back results to instr result

• Write back to all waiting tags

• Set to done

Dispatch Instr

• Mark instruction dispatched

• Forward to appropriate unit

In Bluespec…

..you can code each operation in isolation, as a rule

..the tool guarantees that operations are INTERLOCKED (i.e. each runs to completion without external interference)

Insert Instr in ROB

• Put instruction in first available slot

• Increment tail pointer

• Get source operands

- RF <or> prev instr

Page 35: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

Which one is correct?

What’s required to verify that they’re correct? What if the priorities changed: cond1 > cond2 > cond0? What if the processes are in different modules?

always @(posedge CLK) begin

if (!cond2 || cond1)

x <= x – 1;

else if (cond0)

x <= x + 1;

if (cond2)

y <= y – 1;

else if (cond1)

y <= y + 1;

end

0 1 2

x y

+1 -1 +1 -1 Process priority: 2 > 1 > 0

cond0 cond1 cond2

always @(posedge CLK) begin

if (!cond2 && cond1)

x <= x – 1;

else if (cond0)

x <= x + 1;

if (cond2)

y <= y – 1;

else if (cond1)

y <= y + 1;

end

Page 36: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

Some Verilog solutions

Functional code and scheduling code are deeply (inextricably) intertwined.

What’s required to verify that they’re correct? What if the priorities changed: cond1 > cond2 > cond0? What if the processes are in different modules?

always @(posedge CLK) begin

if (!cond2 || cond1)

x <= x – 1;

else if (cond0)

x <= x + 1;

if (cond2)

y <= y – 1;

else if (cond1)

y <= y + 1;

end

0 1 2

x y

+1 -1 +1 -1

always @(posedge CLK) begin

if (!cond2 && cond1)

x <= x – 1;

else if (cond0)

x <= x + 1;

if (cond2)

y <= y – 1;

else if (cond1)

y <= y + 1;

end

Which one

is correct?

Process priority:

2 > 1 > 0

cond0 cond1 cond2

Page 37: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

37

Finite State Machines in Bluespec

for makigncomposable, parallel, nested, suspendable/abortable FSMs

Features: • FSMs automatically synthesized

•Complex FSMs expressed succinctly

• FSM actions have same atomic semantics as BSV rule bodies • Well-behaved on shared resources—no surprises

• Standard BSV interfaces and BSV’s higher-order functions can write your

own FSM generators

fsm

sequential

loops

fsm fsm

sequencing

fsm

fsm

fsm

fsm

if-then-else parallel FSMs

(fork-join)

fsm

fsm fsm

hierarchy

(with suspend and abort)

This powerful capability is enabled by higher-order functions, polymorphic types, advanced parameterization and atomic transactions

Enables exponentially smaller

descriptions compared to flat FSMs

Page 38: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

38

FSM example (from testbench stimulus section)

Stmt s =

seq

action

rand_packets0.init;

rand_packets1.init;

endaction

par

for (j0 <= 0; j0 < n; j0 <= j0 + 1) action

let pkt0 <- rand_packets0.next;

switch.ports[0].put (pkt0);

endaction

for (j1 <= 0; j1 < n; j1 <= j1 + 1) action

let pkt1 <- rand_packets1.next;

switch.ports[1].put (pkt1);

endaction

endpar

drain_switch;

endseq;

FSM fsm <- mkFSM (s);

rule go;

s.start;

endrule

Basic FSM statements are “Actions”, just like rule bodies, and have exactly the same atomic semantics. Thus, BSV FSMs are well-behaved with respect to concurrent resource contention and flow control.

Page 39: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

39

Strong support for multiple clock and reset domains

• Rich and mature support for MCD (multiple clock domains and

resets)

• Clock is a first-class data type

• Cannot accidentally mix clocks and ordinary signals

• Strong static checking ensures that it is impossible to

accidentally cross clock domain boundaries (i.e., without a

synchronizer)

• No need for linting tools to check domain discipline

• Clock manipulation

• Clocks can be passed in and out of module interfaces

• Library of clock dividers and other transformations

• Module instantiation can specify an alternative clock (instead of

inheriting parent’s default clock)

• (Similarly: Reset and reset domains)

Page 40: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

Synthesis of Atomic Actions

state

Compute Predicates

for each rule

Compute next state

for each rule

scheduler

Selector Mux’s & priority

encoders

read

p3

p2

p1

d1

d2

d3

f1 f2 f3

update

Predicates computed for each rule with a combinational circuit

Select maximal subset of applicable rules

enabled rules

Potential update functions

Page 41: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

Key Issue: How to select to maximal subset of rules for firing?

• Two rules R1 and R2 can execute simultaneously if they are “conflict free” i.e.

– R1 and R2 do not update the same state; and

– Neither R1 or R2 do not read the that the other updates (“sequentially composable” rules)

Page 42: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

Rules of Rules (The Details 1-5/10)

1. Rules are atomic: rules fire completely or not at all, and you can imagine that nothing else happens during their execution.

2. Explicit and implicit conditions may prevent rules from firing. 3. Every rule fires exactly 0 or 1 times every cycle (at this point in our

product's history anyway ;) 4. Rules that conflict in some way may fire together in the same cycle, but

only if the compiler can schedule them in a valid order to do so -- that is, where the overall effect is as if they had happened one at at time as in (1) above.

5. Rules determine if they are going to fire or not before they actually do so. They are considered in their order of "urgency" (by a "greedy algorithm"): they "will fire" if they "can fire" and are not prevented by a conflict with a rule which has been selected already. It's OK to think of this phase as being completed (except for wires) before any rules are actually executed. This is what "urgency" is about.

Page 43: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

Rules of Rules (The Details 6-10/10)

6. After determining which rules are going to fire, the simulator can then schedule their execution. (In hardware it's all done by combinational logic which has the same effect.) Rules do not need to execute in the same order as they were considered for deciding whether they "will fire". For example rule1 can have a higher urgency than rule2, but it is possible that rule2 executes its logic before rule1. Urgency is used to determine which rules "will fire“. Earliness defines the order they fire in.

7. All reads from a register must be scheduled before any writes to the same register: any rule which reads from a register must be scheduled "earlier" than any other rule which writes to it.

8. Constants may be "read" at any time; a register *might* have a write but no read.

9. The compiler creates a sequence of steps, where each step is essentially a rule firing. Its inputs are valid at the beginning of the cycle, its outputs are valid at the end of the cycle. Data is not allowed to be driven "backwards" in the schedule: that is, no action may influence any action that happened "earlier" in the cycle. This would go against causality, and constitutes a "feedback" path that the compiler will not allow.

10. If the compiler is not told otherwise, methods have higher urgency than rules, and will execute earlier than rules, unless there's some reason to the contrary. There is a compiler switch to flip this around and make rules have higher urgency.

Page 44: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

The Swap Conundrum (* synthesize *) module rules9 (Empty) ; Reg#(int) x <- mkReg (12) ; Reg#(int) y <- mkReg (17) ; rule r1 ; x <= y ; endrule rule r2 ; y <= x ; endrule rule monitor ; $display ("x, y = %0d, %0d", x, y) ; endrule endmodule

$ ./rules9 -m 5 x, y = 12, 17 x, y = 12, 12 x, y = 12, 12 x, y = 12, 12

Page 45: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

The Swap Conundrum (* synthesize *) module rules9 (Empty) ; Reg#(int) x <- mkReg (12) ; Reg#(int) y <- mkReg (17) ; rule r1 ; x <= y ; endrule rule r2 ; y <= x ; endrule rule monitor ; $display ("x, y = %0d, %0d", x, y) ; endrule endmodule

rule r1 (tick 1) x._write (y._read ()) y read x write

rule r2 (tick 2) y._write(x._read()) x read y write

PROBLEM: register x must read before write

Page 46: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

(* synthesize *) module rules10 (Empty) ; Reg#(int) x <- mkReg (12) ; Reg#(int) y <- mkReg (17) ; rule r ; x <= y ; y <= x ; endrule rule monitor ; $display ("x, y = %0d, %0d", x, y) ; endrule endmodule

$ ./rules10 -m 5 x, y = 12, 17 x, y = 17, 12 x, y = 12, 17 x, y = 17, 12

Schedule wise, step 1 reads x and y at the beginning and writes x and y at the end.

Page 47: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

Wires

• In Bluespec from a scheduling perspective registers and wires are dual concepts.

• In one cycle all register reads must execute before register writes.

• In one cycle a wire must be written to (at most once) before it is read (any number of times).

Page 48: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

Rules of Wires

• Wires truly become wires in hardware: they do not save “state” between cycles (compare to signal in VHDL).

• A wire’s schedule requires that it be written before it is read (as opposed to a register that is read before it is written).

• A wire can not be written more than once in a cycle.

Page 49: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

(* synthesize *) module rules11 (Empty) ; Reg#(int) x <- mkReg (12) ; Reg#(int) y <- mkReg (17) ; Wire#(int) xwire <- mkWire; rule r1 ; x <= y ; endrule rule r2 ; y <= xwire ; endrule rule driveX ; xwire <= x ; endrule rule monitor ; $display ("x, y = %0d, %0d", x, y) ; endrule endmodule

$ ./rules11 -m 5 x, y = 12, 17 x, y = 17, 12 x, y = 12, 17 x, y = 17, 12

Page 50: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

(* synthesize *) module rules11 (Empty) ; Reg#(int) x <- mkReg (12) ; Reg#(int) y <- mkReg (17) ; Wire#(int) xwire <- mkWire; rule r1 ; x <= y ; endrule rule r2 ; y <= xwire ; endrule rule driveX ; xwire <= x ; endrule rule monitor ; $display ("x, y = %0d, %0d", x, y) ; endrule endmodule

$ cat rules11.sched === Generated schedule for rules11 === Rule schedule ------------- Rule: monitor Predicate: True Blocking rules: (none) Rule: driveX Predicate: True Blocking rules: (none) Rule: r2 Predicate: xwire.whas Blocking rules: (none) Rule: r1 Predicate: True Blocking rules: (none) Logical execution order: monitor, driveX, r1, r2 =======================================

Question: is monitor, driveX, r2, r1 a valid schedule?

Page 51: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

Wire

• Implements Reg interface (_read and _write methods).

• Implicit condition:

– it not ready if it has not been written

• In any cycle if there is no write to a wire then any rule that reads that wire is blocked (it can not fire).

Page 52: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

(* synthesize *) module rules12 (Empty) ; Reg#(int) y <- mkReg (17) ; Reg#(int) count <- mkReg (0) ; Wire#(int) x <- mkWire; rule producer ; if (count % 3 == 0) x <= count ; endrule rule consumer ; y <= x ; $display ("cycle %0d: y set to %0d", count, x) ; endrule rule counter ; count <= count + 1 ; endrule endmodule

$ ./rules12 -m 9 cycle 0: y set to 0 cycle 3: y set to 3 cycle 6: y set to 6

Page 53: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

DWire

• A Wire with a default value.

• A Dwire is always ready.

• If there is a write to a DWire in a cycle then just like a Wire it assumes that value.

• If there is no write to a DWire in a cycle it assumes a default value (given at instantiation time).

Page 54: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

(* synthesize *) module rules13 (Empty) ; Reg#(int) y <- mkReg (17) ; Reg#(int) count <- mkReg (0) ; Wire#(int) x <- mkDWire (42); rule producer ; if (count % 3 == 0) x <= count ; endrule rule consumer ; y <= x ; $display ("cycle %0d: y set to %0d", count, x) ; endrule rule counter ; count <= count + 1 ; endrule endmodule

$ cycle 1: y set to 42 cycle 2: y set to 42 cycle 3: y set to 3 cycle 4: y set to 42 cycle 5: y set to 42 cycle 6: y set to 6 cycle 7: y set to 42

Page 55: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

BypassWire

• Closest thing to a wire in Verilog.

• A BypassWire is always ready.

• Rather than having a default value the compiler must be able to statically determine that this wire is driven on every cycle.

Page 56: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

FIFOs

• Lots and lots of FIFOs provided in FIFO, FIFOF, SpecialFIFOs libraries

• Examples (2 and 4 element FIFOs): FIFO#(UInt#(8)) myfifo <- mkFIFO; FIFO#(UInt#(8)) biggerfifo <- mkSizedFIFO(4);

• Example BypassFIFO (1 storage element, data passes straight through if enq and deq on same cycle when empty) FIFO#(UInt#(8)) bypassfifo <- mkBypassFIFO;

• Basic interfaces: – enq(value) // enqueue “value”

– first // returns first element of fifo

– deq // dequeue

Page 57: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

import FIFO::*; (* synthesize *) module rules14 (Empty) ; Reg#(int) count <- mkReg (0) ; FIFO#(int) fifo <- mkSizedFIFO (30); rule producer (count < 5) ; fifo.enq (count*3) ; $display ("cycle %0d: enqeuing value %d", count, count*3) ; endrule rule consumer (count > 5) ; int x = fifo.first ; fifo.deq ; $display ("cycle %0d: deqeued value %0d", count, x) ; endrule rule counter ; count <= count + 1 ; endrule endmodule

$ ./rules14 -m 20 cycle 0: enqeuing value 0 cycle 1: enqeuing value 3 cycle 2: enqeuing value 6 cycle 3: enqeuing value 9 cycle 4: enqeuing value 12 cycle 6: deqeued value 0 cycle 7: deqeued value 3 cycle 8: deqeued value 6 cycle 9: deqeued value 9 cycle 10: deqeued value 12

Page 58: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

import FIFO::*; (* synthesize *) module rules15 (Empty) ; Reg#(int) count <- mkReg (0) ; FIFO#(int) fifo <- mkSizedFIFO (30); rule producer (count < 5) ; fifo.enq (count*3) ; $display ("cycle %0d: enqeuing value %0d", count, count*3) ; endrule rule consumer (count < 5) ; int x = fifo.first ; fifo.deq ; $display ("cycle %0d: deqeued value %0d", count, x) ; endrule rule counter ; count <= count + 1 ; endrule endmodule

Page 59: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

import GetPut::* ; import Connectable::* ; module mkProducer (Get#(int)) ; Reg#(int) i <- mkReg (0) ; rule incrementI ; i <= i + 1 ; endrule method ActionValue#(int) get () ; return i ; endmethod endmodule: mkProducer module mkConsumer (Put#(int)) ; Wire#(int) i <- mkWire ; rule report ; $display ("mkConsumer %d", i) ; endrule method Action put (int x) ; i <= x ; endmethod endmodule: mkConsumer

(* synthesize *) module mkConnectableExample(Empty) ; Get#(int) p <- mkProducer ; Put#(int) c <- mkConsumer ; mkConnection (p, c) ; endmodule: mkConnectableTest

Higher Order Types p and c are methods which are passed as arguments

Page 60: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

ServerFarm

ServerFarm Information Flow

DividerServer

req

ue

st

resp

on

se

DividerServer

req

ue

st

resp

on

se

resp

on

se

req

ue

st

Page 61: Bluespec - Imperial College Londoncas.ee.ic.ac.uk/people/ssingh/bluespec_l3l4.pdf · 2010. 12. 16. · Rules, not clock edges • rules are atomic –they execute within one clock

Conclusions

• Bluespec:

– provides cleaner interfaces

• quicker to create large systems from libraries of components

• easier to refine design

– creates most of the control for you (unless you don’t want it to)

• less likely to get it wrong!

– has strong typing

• helps remove bugs

– provides powerful static elaboration