advanced digital design asynchronous eda by a. steininger, j. lechner and r. najvirt vienna...
TRANSCRIPT
Advanced Digital DesignAsynchronous EDA
by A. Steininger, J. Lechner and R. NajvirtVienna University of Technology
© A. Steininger & J. Lechner & R. Najvirt / TU Vienna 2Lecture "Advanced Digital Design"
Overview
Synchronous-Asynchronous Direct Translation (SADT)
Null Convention Logic Syntax Directed Compilation (Balsa) Martin Synthesis (Caltech
Asynchronous Synthesis Tools)
© A. Steininger & J. Lechner & R. Najvirt / TU Vienna 3Lecture "Advanced Digital Design"
Synchronous-Asynchronous Direct Translation (SADT)
Starting point: synchronous circuit description in a standard HDL
Synthesis with conventional tools into sync. gate-level netlist
Transformation of synchronous netlist into asynchronous netlist
Technology mapping Place and Route Timing Verification
© A. Steininger & J. Lechner & R. Najvirt / TU Vienna 4Lecture "Advanced Digital Design"
De-synchronization
SADT approach Design style: Bundled data Substitution of flip-flops by latches Substitution of clock by local
asynchronous controllers De-synchronized circuits ...
never halt (liveness) perform same computations as
synchronous circuit (flow-equivalence)
© A. Steininger & J. Lechner & R. Najvirt / TU Vienna 5Lecture "Advanced Digital Design"
De-synchronizationConversion steps
1. Conversion of Flip-flops to latches D-FF separated into master/slave latches
2. Generation of delays elements for request signals matched to length of critical path of
combinational logic
3. Implementation and wiring of asynchronous latch controllers
6Lecture "Advanced Digital Design"
De-synchronizationCircuit Architecture
[Cortadella et al., 06]
De-synchronized circuit
Synchronous circuit
© A. Steininger & J. Lechner & R. Najvirt / TU Vienna
© A. Steininger & J. Lechner & R. Najvirt / TU Vienna 7Lecture "Advanced Digital Design"
De-synchronizationAsynchronous Controllers
Controller for master/slave latches 4-phase protocol
Different controller implementations with more or less concurrency possible Non-overlapping Semi-decoupled 4-phase Fully-decoupled 4-phase De-synchronization control
More concurrency => fast pipeline More concurrency => larger controllers
© A. Steininger & J. Lechner & R. Najvirt / TU Vienna 8Lecture "Advanced Digital Design"
De-synchronizationFlow Equivalence
Definition: Two circuits are flow-equivalent if they ... have the same set of latches For each latch, the sequence of stored
values is the same in both circuits
[Cortadella et al., 06]
© A. Steininger & J. Lechner & R. Najvirt / TU Vienna 9Lecture "Advanced Digital Design"
De-synchronizationPros/Cons
Advantages Use of standard HDLs Use of industrial-strength synthesis tools Almost no re-education for hardware
designers necessary Simple porting of legacy designs Negligible area overhead compared to
synchronous implementation Disadvantages
1-to-1 mapping of sync. circuits can lead to sub-optimal designs
© A. Steininger & J. Lechner & R. Najvirt / TU Vienna 10Lecture "Advanced Digital Design"
Click Elements
Published as an implementation style for data-driven compilation (Haste)
Also useful for implementing asynchronous equivalents of synchronous circuits
Uses flip-flops for storage Most elements implementable with
cells from a standard (sync) library Arbiter still required (not for SADT)
© A. Steininger & J. Lechner & R. Najvirt / TU Vienna 11Lecture "Advanced Digital Design"
Click Elements
© A. Steininger & J. Lechner / TU Vienna 12Lecture "Advanced Digital Design"
Null Convention LogicSynthesis
RTL Synthesis Transform VHDL/Verilog to 3NCL netlist
Netlist contains just AND & INV gates Off-the-shelf synthesis tools
NULL values are treated as “don’t care” Logic optimizations
Dual-rail expansion 3NCL netlist to 2NCL netlist DIMS implementation of AND & INV gates
Produces a delay-insenstive circuit Logic optimizations
© A. Steininger & J. Lechner / TU Vienna 13Lecture "Advanced Digital Design"
Dual Rail NAND
DIMS implementation [Ligthart et al.,2000]
© A. Steininger & J. Lechner / TU Vienna 14Lecture "Advanced Digital Design"
Null Convention Logic Technology Mapping
DIMS implementation inefficient Techn. mapping on threshold gates
Circuit functionality fully described by set function of DIMS implementation
DIMS smoothing: Derive boolean network representing set function
Threshold gates have specific set function Perform logic optimization and map
boolean network to available threshold gates
© A. Steininger & J. Lechner / TU Vienna 15Lecture "Advanced Digital Design"
Dual Rail NAND
DIMS implementation Set function
[Ligthart et al.,2000]
© A. Steininger & J. Lechner / TU Vienna 16Lecture "Advanced Digital Design"
Null Convention Logic Threshold Gates
Library of threshold gates by Theseus all unate functions with up to 4 inputs
© A. Steininger & J. Lechner & R. Najvirt / TU Vienna 17Lecture "Advanced Digital Design"
Syntax-Directed Compilation
1-to-1 mapping of language constructs to handshake circuit components
Uses a library of highly optimized standard cell components for simpler physical synthesis and verification
Allows experienced designer to easily envision the resulting circuit but limits optimization potential
© A. Steininger & J. Lechner / TU Vienna 18Lecture "Advanced Digital Design"
BalsaHandshake Circuits
Approx. 40 handshake components Connected over channels
Data path associated Pure control channels (no data transferred) Active ports initiate communication Passive ports respond to request
Push channel Data flow from active to passive port
Pull channel Data flow from passive to active port
© A. Steininger & J. Lechner / TU Vienna 19Lecture "Advanced Digital Design"
Example: Handshake Components
Fetch () Transfers data upon request
Case (@) Conditional control flow element
Source: [Balsa Manual]
© A. Steininger & J. Lechner / TU Vienna 20Lecture "Advanced Digital Design"
Example:Modulo-10 Counter
import [balsa.types.basic]
type C_size is nibbleconstant max_count = 9
procedure count10(sync aclk; output count: C_size) is variable count_reg : C_size variable tmp : C_sizebegin loop sync aclk; if count_reg /= max_count then tmp := (count_reg + 1 as C_size) else tmp := 0 end || count <- count_reg ; count_reg := tmp end -- loopend -- begin
© A. Steininger & J. Lechner / TU Vienna 21Lecture "Advanced Digital Design"
Example:Modulo-10 Counter
Source: [Balsa Manual]
© A. Steininger & J. Lechner & R. Najvirt / TU Vienna 22Lecture "Advanced Digital Design"
Martin synthesis
The so-called Martin synthesis process is seminal work of the async group around A. J. Martin at Caltech
Design entry is CHP, result is PRS Performs several transformations with
designer modifiable intermediate steps
© A. Steininger & J. Lechner & R. Najvirt / TU Vienna 24Lecture "Advanced Digital Design"
Process Decomposition
First transformation Reduces processes with complex
control structures to simple concurrent subprocesses
Either syntax-directed (SDD) or data-driven (DDD)
© A. Steininger & J. Lechner & R. Najvirt / TU Vienna 25Lecture "Advanced Digital Design"
Syntax Directed Decomposition
Rule: A process P with construct S can be replaced with processes P1, P2 and a new channel C by replacing S with the communication C and creating P2 of the form *[[#C -> S; C]]
E.g. P: *[A; *[B1 -> S1 [] B2 -> S2]; B]
P1: *[A; C; B]P2: *[[#C & B1 -> S1
[]#C & B2 -> S2 []#C & ~B1 & ~B2 -> C]]
© A. Steininger & J. Lechner & R. Najvirt / TU Vienna 26Lecture "Advanced Digital Design"
Data Driven Decomposition
More fine-grained than SDD At the end, clustering can be
performed to merge subprocesses again for better performance
First transformation to dynamic single assignment (DSA) form:Each variable can be written only once in
each main loop iteration, e.g.:*[A?a; X!a; B?a; Y!a]*[A?a1; X!a1; B?a2; Y!a2]
© A. Steininger & J. Lechner & R. Najvirt / TU Vienna 27Lecture "Advanced Digital Design"
Data Driven Decomposition (2) Second transformation is projection First, transformations to allow projection e.g.
variable duplication and channel addition:*[A?a; x := a, y := ~a; X!x, Y!y]*[A?a; a1 := a, a2 := a; x := a1, y := ~a2; X!x, Y!y]*[A?a; {Ax!a, Ax?a1}, {Ay!a, Ay?a2}; x := a1, y := ~a2; X!x, Y!y]
Then projection to some sets of assignmentsSets: {A?, a, Ax!, Ay!} {Ax?, a1, x, X!} {Ay?, a2, y,
Y!}Projection: *[A?a; Ax!a, Ay!a],*[Ax?a1; x := a1; X!x], *[Ay?a2; y := ~a2; Y!y]
© A. Steininger & J. Lechner & R. Najvirt / TU Vienna 28Lecture "Advanced Digital Design"
Handshake Expansion (HSE)
Each communication channel is replaced by handshake signals, e.g.:*[…; C; …], *[#C -> …; C]is transformed to (4-phase handshake)*[…; r := 1; [a]; r := 0; [~a]; …],*[r -> …; a := 1; [~r]; a := 0]
Reshuffling can then be used to increase concurrency/performance (different handshake controllers)
© A. Steininger & J. Lechner & R. Najvirt / TU Vienna 29Lecture "Advanced Digital Design"
Production Rule Expansion (PRE)
Transforms HSE to PR in three steps: State variable insertion PR generation Symmetrisation
Sequencing must be implemented explicitly
*[[Lr]; Rr := 1; [Ra]; Rr := 0; [~Ra]; La := 1; [~Lr]; La := 0]
Lr -> Rr+Ra -> Rr-
~Ra -> La+~Lr -> La-
© A. Steininger & J. Lechner & R. Najvirt / TU Vienna 30Lecture "Advanced Digital Design"
Production Rule Expansion (PRE)
Transforms HSE to PR in three steps: State variable insertion PR generation Symmetrisation
Sequencing must be implemented explicitly
*[[Lr]; Rr := 1; [Ra]; Rr := 0; [~Ra]; La := 1; [~Lr]; La := 0]*[[Lr]; Rr := 1; [Ra]; x := 1; [x]; Rr := 0; [~Ra]; La := 1; [~Lr]; x := 0; [~x]; La := 0]
~x & Lr -> Rr+Ra -> x+x -> Rr-
x & ~Ra -> La+~Lr -> x-~x -> La-
© A. Steininger & J. Lechner & R. Najvirt / TU Vienna 31Lecture "Advanced Digital Design"
Production Rule Expansion (PRE)
Transforms HSE to PR in three steps: State variable insertion PR generation Symmetrisation
Sequencing must be implemented explicitly
*[[Lr]; Rr := 1; [Ra]; Rr := 0; [~Ra]; La := 1; [~Lr]; La := 0]*[[Lr]; Rr := 1; [Ra]; x := 1; [x]; Rr := 0; [~Ra]; La := 1; [~Lr]; x := 0; [~x]; La := 0]
~x & Lr -> Rr+Ra -> x+
~Lr | x -> Rr-x & ~Ra -> La+
~Lr -> x-Ra | ~x -> La-
© A. Steininger & J. Lechner & R. Najvirt / TU Vienna 32Lecture "Advanced Digital Design"
Summary
Synchronous-Asynchronous Direct Translation
Synthesis with standard tools Syncronous-Asynchronous transformation
Martin Synthesis Process decomposition Handshake expansion Production rule expanstion
© A. Steininger & J. Lechner & R. Najvirt / TU Vienna 33Lecture "Advanced Digital Design"
References Jordi Cortadella, Alex Kondratyev, Luciano Lavagno,
Christos P. Sotiriou. Desynchronization: Synthesis of Asynchronous Circuits From Synchronous Specifications. 2006
Alain J. Martin. Programming in VLSI: From Communicating Processes to Self-timed VLSI Circuits. 1987
Catherine G. Wong and Alain J. Martin. High-Level Synthesis of Asynchronous Systems by Data-Driven Decomposition. 2003
Ad Peeters, Frank te Beest, Mark de Wit, Willem Mallon. Click Elements – An Implementation Style for Data-Driven Compilation. 2010