a synthesis algorithm for modular design of pipelined circuits
DESCRIPTION
A Synthesis Algorithm for Modular Design of Pipelined Circuits. Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology. Overall Goal. Efficient, Synchronous, Parallel Implementation in Synthesizable Verilog. Modular, - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: A Synthesis Algorithm for Modular Design of Pipelined Circuits](https://reader036.vdocuments.us/reader036/viewer/2022062500/56815a94550346895dc80dbf/html5/thumbnails/1.jpg)
Maria-Cristina MarinescuMartin Rinard
Laboratory for Computer ScienceMassachusetts Institute of Technology
A Synthesis Algorithm for Modular Design of Pipelined
Circuits
![Page 2: A Synthesis Algorithm for Modular Design of Pipelined Circuits](https://reader036.vdocuments.us/reader036/viewer/2022062500/56815a94550346895dc80dbf/html5/thumbnails/2.jpg)
Overall Goal
Modular,Asynchronous,
SequentialSpecification
Efficient,Synchronous,
ParallelImplementation
in Synthesizable
Verilog
![Page 3: A Synthesis Algorithm for Modular Design of Pipelined Circuits](https://reader036.vdocuments.us/reader036/viewer/2022062500/56815a94550346895dc80dbf/html5/thumbnails/3.jpg)
RESET RESET
Instruction Fetch
RegisterOperand
Fetch
Computeand
Writeback
Example: Specification
PC
IM ... ...
RF wen
inc
br
inc
![Page 4: A Synthesis Algorithm for Modular Design of Pipelined Circuits](https://reader036.vdocuments.us/reader036/viewer/2022062500/56815a94550346895dc80dbf/html5/thumbnails/4.jpg)
Specification Properties
• Basic Concepts•State
•Registers and Memories•Conceptually Infinite Queues
•Modules (state transformers)• Queues Provide Modularity
•Decouple Modules•Enable Independent Development•Promote Reusable Modular Designs
![Page 5: A Synthesis Algorithm for Modular Design of Pipelined Circuits](https://reader036.vdocuments.us/reader036/viewer/2022062500/56815a94550346895dc80dbf/html5/thumbnails/5.jpg)
RESET RESET
Instruction Fetch
RegisterOperand
Fetch
Computeand
Writeback
Example: Implementation
PC
IM
RF wen
inc
br
inc
![Page 6: A Synthesis Algorithm for Modular Design of Pipelined Circuits](https://reader036.vdocuments.us/reader036/viewer/2022062500/56815a94550346895dc80dbf/html5/thumbnails/6.jpg)
Implementation Issues
• Synthesizing Efficient Combinational Logic
• Queue Finitization• Synchronous Global Scheduling
![Page 7: A Synthesis Algorithm for Modular Design of Pipelined Circuits](https://reader036.vdocuments.us/reader036/viewer/2022062500/56815a94550346895dc80dbf/html5/thumbnails/7.jpg)
Specification Language
![Page 8: A Synthesis Algorithm for Modular Design of Pipelined Circuits](https://reader036.vdocuments.us/reader036/viewer/2022062500/56815a94550346895dc80dbf/html5/thumbnails/8.jpg)
Type Declarationstype reg = int(3), val = int(8), loc = int(8);
type ins = <INC reg> | <JZ reg loc>;
type irf = <INC reg val> | <JZ val loc>;
State Declarationsvar PC: loc, IM: ins[N], RF: val[8];
var IQ: queue(ins), RQ: queue(irf);
![Page 9: A Synthesis Algorithm for Modular Design of Pipelined Circuits](https://reader036.vdocuments.us/reader036/viewer/2022062500/56815a94550346895dc80dbf/html5/thumbnails/9.jpg)
Modules
• Each module is set of update rules• Each Update Rule Consists of
•Precondition•Action (set of updates)
• Rule is enabled (and can execute) if precondition is true in current state
• When rule executes, atomically applies updates in action to produce new state
![Page 10: A Synthesis Algorithm for Modular Design of Pipelined Circuits](https://reader036.vdocuments.us/reader036/viewer/2022062500/56815a94550346895dc80dbf/html5/thumbnails/10.jpg)
Update Rules And Modules
• Instruction Fetch ModuleTRUE
iq = append(iq,im[pc]), pc = pc + 1;
• Register Operand Fetch Module<INC r> = head(iq) and notin(rq, <INC r _>)
iq = tail(iq), rq = append(rq, <INC r rf[r]>);
<JZ r l> = head(iq) and notin(rq, <INC r _>) iq = tail(iq), rq = append(rq, <JZ rf[r]
l>);
![Page 11: A Synthesis Algorithm for Modular Design of Pipelined Circuits](https://reader036.vdocuments.us/reader036/viewer/2022062500/56815a94550346895dc80dbf/html5/thumbnails/11.jpg)
Update Rules And Modules
• Compute and Writeback Module<INC r v> = head(rq)
rf = rf[r = v+1], rq = tail(rq);
<JZ v l> = head(rq) and (v == 0) pc = l, iq = nil, rq = nil;
<JZ v l> = head(rq) and (v !=0) rq = tail(rq);
![Page 12: A Synthesis Algorithm for Modular Design of Pipelined Circuits](https://reader036.vdocuments.us/reader036/viewer/2022062500/56815a94550346895dc80dbf/html5/thumbnails/12.jpg)
Abstract Model of Execution
• Conceptually, system execution is a sequence of rule executions
• while TRUE choose an enabled rule
execute rule obtain new state
• Concepts in Abstract Execution Model•Rules execute atomically•Rules execute asynchronously•Rules execute sequentially
![Page 13: A Synthesis Algorithm for Modular Design of Pipelined Circuits](https://reader036.vdocuments.us/reader036/viewer/2022062500/56815a94550346895dc80dbf/html5/thumbnails/13.jpg)
Synthesis Algorithm
![Page 14: A Synthesis Algorithm for Modular Design of Pipelined Circuits](https://reader036.vdocuments.us/reader036/viewer/2022062500/56815a94550346895dc80dbf/html5/thumbnails/14.jpg)
Synthesis Algorithm
• Starting Point•Asynchronous, sequential abstract
execution•Conceptually infinite queues
• Goal: efficient synchronous global schedule•In each clock cycle, multiple rules execute
•Synchronously and Concurrently •(pipeline stages move together)
•Implement each queue with a finite hardware buffer
•Can read and write buffer in same cycle
![Page 15: A Synthesis Algorithm for Modular Design of Pipelined Circuits](https://reader036.vdocuments.us/reader036/viewer/2022062500/56815a94550346895dc80dbf/html5/thumbnails/15.jpg)
Basic Idea
• At Each Clock Cycle•Check each rule to see if enabled
•If so, atomically update state to reflect execution
•For each variable, generate expression that specifies new value at end of cycle
• Challenge:sequential, atomic semantics for rules
• Solution: symbolic rule execution
![Page 16: A Synthesis Algorithm for Modular Design of Pipelined Circuits](https://reader036.vdocuments.us/reader036/viewer/2022062500/56815a94550346895dc80dbf/html5/thumbnails/16.jpg)
Final Result for PC
if (<JZ v location> = rq and v == 0) new pc = location
else if ((iq == nil) or (<INC r> = iq and <INC r _> != rq)
or (<JZ r l> = iq and <INC r _> != rq))
new pc = pc+1 else
new pc = pc
![Page 17: A Synthesis Algorithm for Modular Design of Pipelined Circuits](https://reader036.vdocuments.us/reader036/viewer/2022062500/56815a94550346895dc80dbf/html5/thumbnails/17.jpg)
Algorithm Outline
• Rule Numbering: for symbolic execution• Relaxation: shorten critical path by testing
intial state• Queue Finitization: ensure rules execute
only if will be room for the result in output queues
• Symbolic Execution• Optimizations• Synthesizable Verilog Generation: from
optimized expressions
![Page 18: A Synthesis Algorithm for Modular Design of Pipelined Circuits](https://reader036.vdocuments.us/reader036/viewer/2022062500/56815a94550346895dc80dbf/html5/thumbnails/18.jpg)
Rule Numbering
• Goal: •Resolve Conflicts Between Parallel Rule
Executions• Approach:
•For each state variable, number versions according to order
•Feed results of previous rule into next rule
![Page 19: A Synthesis Algorithm for Modular Design of Pipelined Circuits](https://reader036.vdocuments.us/reader036/viewer/2022062500/56815a94550346895dc80dbf/html5/thumbnails/19.jpg)
• TRUE iq1 = append(iq0, im[pc0]), pc1 =
pc0+1;
• <INC r> = head(iq1) and notin(rq1, <INC r _>)
iq2 = tail(iq1), rq2 = append(rq1,
<INC r rf1[r]>);
• <JZ r l> = head(iq2) and notin(rq2, <INC r _>)
iq3 = tail(iq2),
rq3 = append(rq2, <JZ rf2[r] l>);
• <INC r v> = head(rq3)
rf4 = rf3[r v+1], rq4 = tail(rq3);
• <JZ v l> = head(rq4) and v != 0
rq5 = tail(rq4);
• <JZ v l> = head(rq5) and v == 0
pc6 = l, rq6 = nil, iq6 = nil;
![Page 20: A Synthesis Algorithm for Modular Design of Pipelined Circuits](https://reader036.vdocuments.us/reader036/viewer/2022062500/56815a94550346895dc80dbf/html5/thumbnails/20.jpg)
Relaxation• Issue:
rule numbering may produce long clock cycle
• Solution: for each rule Ri with precondition
Pi for each variable instance vi in precondition Pi
replace vi with its earliest safe version
... Rk-1: Pk-1 -> vk = ... ... Ri : Pi(vi,...) -> ... ...
vk safe for vi if either•Pi[vk/vi] implies Pi
•(Pi,Pk-1) mutually exclusive
0 1 2 3 =>0 1
3
2
![Page 21: A Synthesis Algorithm for Modular Design of Pipelined Circuits](https://reader036.vdocuments.us/reader036/viewer/2022062500/56815a94550346895dc80dbf/html5/thumbnails/21.jpg)
Relaxation Result
• Queues separate pipeline stages• Items traverse one stage per clock
cycle• Safety: If a rule executes in new system
•Then it also executes in old system•And it generates same result
• Liveness: After relaxation, all rules test initial state
•If rule enabled in old system but not in new system, then
•Some rule executes in new system
![Page 22: A Synthesis Algorithm for Modular Design of Pipelined Circuits](https://reader036.vdocuments.us/reader036/viewer/2022062500/56815a94550346895dc80dbf/html5/thumbnails/22.jpg)
• TRUE iq1 = append(iq0, im[pc0]),
pc1 = pc0+1;
• <INC r> = head(iq0) and notin(rq0, <INC r _>)
iq2 = tail(iq1), rq2 = append(rq1, <INC r rf1[r]>);
• <JZ r l> = head(iq0) and notin(rq0, <INC r _>)
iq3 = tail(iq2), rq3 = append(rq2, <JZ rf2[r] l>);
• <INC r v> = head(rq0)
rf4 = rf3[r v+1], rq4 = tail(rq3);
• <JZ v l> = head(rq0) and v != 0
rq5 = tail(rq4);
• <JZ v l> = head(rq0) and v == 0
pc6 = l, rq6 = nil, iq6 = nil;
![Page 23: A Synthesis Algorithm for Modular Design of Pipelined Circuits](https://reader036.vdocuments.us/reader036/viewer/2022062500/56815a94550346895dc80dbf/html5/thumbnails/23.jpg)
Queue Finitization
• Issue:•Conceptually unbounded queues•Finite hardware buffers
• Assumption: queues start within length at beginning of cycle
• Goal: generate circuit that makes queues remain within length at end of cycle
• Basic Approach: •Before enabled rule executes•Be sure will be room for result in output
queues at end of clock cycle
![Page 24: A Synthesis Algorithm for Modular Design of Pipelined Circuits](https://reader036.vdocuments.us/reader036/viewer/2022062500/56815a94550346895dc80dbf/html5/thumbnails/24.jpg)
Queue Finitization Algorithm
• Build Producer-Consumer Graph•Nodes are rules•Edge between rules if first inserts into queue and second removes from queue
• In Example:
12
3
4
5
6
![Page 25: A Synthesis Algorithm for Modular Design of Pipelined Circuits](https://reader036.vdocuments.us/reader036/viewer/2022062500/56815a94550346895dc80dbf/html5/thumbnails/25.jpg)
Acyclic Graphs• Process Rules in Topological Sort Order
•Augment execution precondition•If rule inserts into a queue, require that either•there is room in queue when rule executes or
•future rules will execute and remove items to make room in queue
• Each queue has counter of number of elements in queue at start of cycle
• Combinational logic tracks queue insertions and deletions
![Page 26: A Synthesis Algorithm for Modular Design of Pipelined Circuits](https://reader036.vdocuments.us/reader036/viewer/2022062500/56815a94550346895dc80dbf/html5/thumbnails/26.jpg)
Example
Instruction Fetch Rule After Queue Finitization
Empty(iq0) or
(<JZ v l> = head(rq0) and v == 0) or
<INC r> = head(iq0) and notin(rq0, <INC r _>) or
<JZ r l> = head(iq0) and notin(rq0, <INC r _>)
iq1 = append(iq0, im[pc0]);
pc1 = pc0+1;
![Page 27: A Synthesis Algorithm for Modular Design of Pipelined Circuits](https://reader036.vdocuments.us/reader036/viewer/2022062500/56815a94550346895dc80dbf/html5/thumbnails/27.jpg)
Pipeline Implications
• Counter becomes presence bit for single element queues
• Additional preconditions can be viewed as pipeline stall logic
• Design can be written to generate pipeline forwarding/bypassing instead of stall
![Page 28: A Synthesis Algorithm for Modular Design of Pipelined Circuits](https://reader036.vdocuments.us/reader036/viewer/2022062500/56815a94550346895dc80dbf/html5/thumbnails/28.jpg)
Cyclic Graphs• Cyclic Graphs lead to Cyclic Dependences
•Rule 1 depends on rule 2 to remove an item from a queue
•But rule 2 depends on rule 1 to remove an item from another queue
• Algorithm from acyclic case would generate recursive preconditions
rule 2rule 1
![Page 29: A Synthesis Algorithm for Modular Design of Pipelined Circuits](https://reader036.vdocuments.us/reader036/viewer/2022062500/56815a94550346895dc80dbf/html5/thumbnails/29.jpg)
Solution to Cyclic Dependence Problem
• Groups of rules must execute together
• Use depth-first search on producer-consumer graph to find cyclic groups
• Augment preconditions to allow all rules in cycle to execute together
• Extensions include paths into and out of cyclic group
![Page 30: A Synthesis Algorithm for Modular Design of Pipelined Circuits](https://reader036.vdocuments.us/reader036/viewer/2022062500/56815a94550346895dc80dbf/html5/thumbnails/30.jpg)
Symbolic Execution
• Substitute out all intermediate versions of variables
• Obtain expression for last version of each variable
• Each expression defines new value of corresponding variable
![Page 31: A Synthesis Algorithm for Modular Design of Pipelined Circuits](https://reader036.vdocuments.us/reader036/viewer/2022062500/56815a94550346895dc80dbf/html5/thumbnails/31.jpg)
Optimizations
• Optimize expressions from symbolic execution•CSE: avoid unnecessary replication of
HW•Mutual Exclusion Testing:
•Eliminate computation of values that never occur in practice as result of mutually exclusive preconditions
![Page 32: A Synthesis Algorithm for Modular Design of Pipelined Circuits](https://reader036.vdocuments.us/reader036/viewer/2022062500/56815a94550346895dc80dbf/html5/thumbnails/32.jpg)
Symbolic Execution with Optimization
Final result for rq, assuming single item queues
if <JZ v l> = head(rq) and v==0new rq = nil
else if <INC r>=head(iq) and notin(rq,<INC r _>)new rq = <INC r rf[r]>
else if <JZ r l>=head(iq) and notin(rq,<INC r _>)new rq = <JZ rf[r] l>
else new rq = nil
![Page 33: A Synthesis Algorithm for Modular Design of Pipelined Circuits](https://reader036.vdocuments.us/reader036/viewer/2022062500/56815a94550346895dc80dbf/html5/thumbnails/33.jpg)
Verilog Generation
• Synthesize HW directly from expressions:•Each queue as one or more registers
•Each memory variable as library block
•Each state variable as one or more registers, depending on type
•Each expression as combinational logic that feeds back into corresponding registers
![Page 34: A Synthesis Algorithm for Modular Design of Pipelined Circuits](https://reader036.vdocuments.us/reader036/viewer/2022062500/56815a94550346895dc80dbf/html5/thumbnails/34.jpg)
Experimental Results• We have implemented synthesis
system• Used system to generate synthesizable
Verilog for several specifications
(map effort medium, area effort low, constraints 10ns)
Benchmark Cycle Time Area (cells)Bubblesort 9.34ns ~370Butterfly 9.57ns ~412Processor 11.28ns ~387Filter 9.51ns ~252
![Page 35: A Synthesis Algorithm for Modular Design of Pipelined Circuits](https://reader036.vdocuments.us/reader036/viewer/2022062500/56815a94550346895dc80dbf/html5/thumbnails/35.jpg)
Conclusion• Starting Point: (Good for Designer)
Modular, Asynchronous, Sequential Specification with Conceptually Infinite Queues
• Ending Point: (Good for Implementation)
Efficient, Synchronous, Globally Scheduled, Parallel Implementation with Finite Queues in Synthesizable Verilog
• Variety of Techniques:•Symbolic Execution•Queue Finitization