clockless design language - ilia greenblat
DESCRIPTION
TRANSCRIPT
1May 2, 2012 1
clockless design language
May 2, 2012
First steps from language to silicon
Ilia [email protected]+972-54-4927322skype: igreenblat
2May 2, 2012
So no clocks, now what?
2
We specify and control the sequence
1. wait till inputs become available2. execute the operation3. latch result, release inputs4. send the result down the stream.5. wait until accepted.6. start over
input passive channel [7:0] A,B;latch [15:0] Creg;output active channel [15:0] C;
always begin fork wait(A); wait(B); join Creg = B*A; release A; release B; C = Creg;end
same in pseudo-Verilog
Let’s focus on just one operation (like multiply)
May 2, 2012 3
loop example Bunch of modules communicating asynchronously using channels and tokens.Each module gets datum or token to work on, arbitrates for shared resources, processes data and when ready, passes the result down the stream.Each building block implements the protocol of request/ack.
Each module “knows” when the inputs become valid, when it’s own computation completed, and when it’s outputs are used and new cycle can begin.
4May 2, 2012
Clockless is an old idea
4
5May 2, 2012
Why bother?
• advanced nodes easier implementation.• low power, low leakage, high VT cells• clocking, power grid, ocv, sleep modes• aging, voltage swings• modularity & integration
5
6May 2, 2012
Lot of activity
• LARD, Balsa, Tangram : CSP languages• VerilogCSP - verilog + macros : modeling
of ..• Handshake, Philips : out of business.• Tiempo - SystemVerilog + Library : full flow,
IP provider• + many point tools, not whole
language+flow.
6
7May 2, 2012
So what the problem i am trying to solve
• Good, Solid, Comprehensive and Free • Entry level language• Show tentative flow to silicon.
7
8May 2, 2012
The proposaluse slightly extended Verilog as clockless design entry language, because:expresses parallelism well.describes hardware well.expresses design intent.is timing aware.has all the needed language structures.hierarchical and “objective”.few additions make the life easier
8
9May 2, 2012
Procedural Verilog
9
Verilog written like C describes the sequence of operations.each “always” is sequential process.many “always” blocks run concurrently.the basic storage element is latch.
Extensions:
latch, flipfloprelease, wait overloadin/out channelpassive/active channelactive latchseveral $system functions
always begin wait (dt); wait (!dt); counter = counter +1; if (counter==100) -> beep;endalways @beep begin out=1; #20; //delays out = 0;end
10May 2, 2012
The compiler
10
latch [15:0] mcounter,scounter;initial begin mcounter=0;endalways begin wait (!cnt); wait (cnt); scounter = mcounter +1; mcounter=scounter;end
design turned into a graphof one-hot ring of state latches
Each line becomes a state latch
11May 2, 2012
Arbitration
11
So for example:
The compiler inserts arbiters to separate reads from writes, and writes from writes. So read value will not get ruined by write.
Arbitration is needed where access to shared resource (like ram memory) is needed and it comes from two not directly related “always” blocks.
Another example:
output passive channel [15:0] readval;always begin wait readval; //passive channel waits for request. readval = mcounter; // wins arb, reads latch, drives data out release readval; // when request negated, we can de-assert the dataend
12May 2, 2012 12
Usable Verilog Elementsfor (i=0; i<100; i=i+1) begin...end
always begin...end
while (xx>10) begin ...end
fork -> eventA; -> eventB -> eventC;join task do_something;
...endtask
wait(till_something);
#(delay_time);initial begin...end
always @eventA begin...end
release some;
also: if, if-else, case, assign and more
13May 2, 2012
Tentative flow
13
common testbench for all views
14May 2, 2012
Side Stepping
14
always @(computeIt) begin data=0; for (i=0;i<100;i=i+1) begin adat = ram[i]; #1; bdat = ram[i+100]; #1; ram[i]=adat-bdat; endend
Suppose we see code like this:
We can use the same extended verilog syntax to produce clocked synthesizable RTL.
Instead of Clock-Reset-Data basic synthesizable rtl, We get to write the design in procedural fasion.
15May 2, 2012 15
goes on..
packager creates best hierarchy and inserts correct breakers
netlist from regular adder synthesis is “dual-railed” here.
flow assembles timing constraints.
16May 2, 2012 16
side stepping (again)
Fpga validation needs another tricks to fool the fpga software to create fastest circuit.
17May 2, 2012 17
Cleaning up
18May 2, 2012
Status
• First implementation of the compiler is working.
• Cadence toolchain is used to assemble the flow to gds.
• Several modules were designed and run through:
• Like: Uart, Pwm, Fir, picoblaze cpu.
• The only validation was with sdf verilog and fast spice in tester-like setup.
• The subset appears to be powerful enough to implement these modules. It is still evolving.
• Optimization at various stages is needed to reach the performance.
18
19May 2, 2012
What’s next?
• more code examples to verify the usefulness of the language
• identify kind of designs where this flow can have biggest advantage, biggest impact.
• select a comprehensive proving ground project.• add optimizations to reach
performance/area/power goals. • cover missing validation steps in the flow
19
20May 2, 2012 20
תודהThank You
谢谢