observability conditions and automatic operand- isolation in high-throughput asynchronous pipelines...
TRANSCRIPT
Observability Conditions and Automatic Operand-Isolation in High-Throughput Asynchronous
PipelinesArash Saifhashemi
Peter A. Beerel
University of Southern California
USC Asynchronous CAD/VLSI Group (async.usc.edu)
(Thanks to a grant from Intel and NSF)
Patmos 2012, Sep 2012, Newcastle upon Tyne
Asynchronous Circuit Design - Today
Applications
• 3D Network on chips (STMicroelectronics)
• Ethernet Switches (Intel SRD)
• Ultra high-speed FPGAs (Achronix)
• Process variation
• Low-power chip design (Encryption – Tiempo, …)
Basic challenges: Automation
Proteus design flow (USC)
• Uses commercial synchronous CAD tools
• Starting at a high-level specification written in SVC (SystemVerilogCSP)
Fulcrum Microsystems Ethernet switch chip (up to 72 10G ports, 40G)
- 1.2 B transistors, 90% Asynchronous 13% Proteus
Tiempo TAM16 - Clockless 16-
bit microcontrolle
r
STMicroelectronics WIOMING 3D-IC (July
2012)
Achronix FPGA. 1.7 M
LUTs. 2.1 Gbps IO
ConstraintsSync Library
Clock Gating
Clock Tree SynthesisNetlist
Clock Gating
The Proteus Flow
Synthesis
Physical Design
Verilog
Netlist
Netlist
Constraints
Constraints
Final Layout
Proteus/Sync
LibraryClockFree
System- Verilog
Image Netlist
SVC2RTLDesign Goals
Synth. RTL Constraints
Async Netlist
Key Features
• Re-uses synchronous EDA tools
• Seamless integration into existing flows
• Up to 2X higher performance
Tool Status
• Started at USC Async CAD/VLSI
• Commercialized by TimeLess (2008)
• Acquired by Fulcrum (2010)
• Intel Acquired Fulcrum (2011)
• Used in Intel Ethernet Alta FM6000 chip
The Problem
• Limited and manual power optimization6
Conditional Communication in Proteus
0
1
0
Not received
Dummy value
0
1
Not sent
Example: ALU
SVC Description
No conditionality in high-level description
Reconverging fanouts
+
Unnecessary calculation
Adding Isolation Cells
• All inputs/outputs are unconditional
• Operand Isolation
• And-based isolation
cells
• Generated by
synchronous RTL
synthesizer
• Does not prevent swit
ching in
asynchronous circuits
Isolation cells are not effective in asynchronous circuits
Three-valued logic
• Formal justification of conditioning• Three-valued logic image model
• Each iteration is modeled by a clock cycle
• Each variable can be 0, 1, or N (no token)
Status of each channel
One iteration
3VL Unconditional Functions
Unconditional functions
• Can be represented only by
, , operators
• Example: functions
represented by
combinational gates in a
typical cell library: NAND,
NOR, AOI, XOR, …Lemma 1: the output is N iff at least one of the inputs is N.
SEND/RECEIVE Operators
• Conditional Communication
• RECEIVE and SEND are modeled as and Ⓡ Ⓢ operators
Behave like buffers when E=1
SEND Reconditioning
Assuming y=f(x) is unconditional and e TFO(y)
Lemma 2:
Application: SEND cells can be moved through logic
• Similar to retiming in synchronous circuits
Less switching when e=0
Less number of SENDs
Observability in 3V Networks
Local Observability Partial Care (LOPC)
• OPC(f,C,xj) of input xj of a node representing a function f is the condition
under which f’s output is not affected as xj changes in C {0,1,N}
Global Observability Partial Care (GOPC)
• GOPC(C,x) of a variable x is the condition under which the value of no
primary output is affected as the value of x changes in C {0,1,N}
• Example:𝑂𝑃𝐶 (𝑀𝑢𝑥 , {0 ,1 } , 𝑖1 )=𝑠{ 1}𝑖2
{0 , 1}
i1 changes in {0,1} are not observable when…
i2 =0 or i2 =1
𝑂𝑃𝐶 ( 𝑓 ,𝐶 , 𝑥 ) implies→
𝐺𝑂𝑃𝐶 (𝐶 ,𝑥 )
s =1
GOPC Conditioning
When xj is not observable…
• Add a SEND followed by a RECEIVE
• Move the SENDs using SEND reconditioning
Lemma 3: 𝐼𝑓 𝑒 { 0}→𝐺𝑂𝑃𝐶 ( {0,1 } ,𝑥1 ) h𝑡 𝑒𝑛 : 𝑓 (𝒙 )= ( 𝑓 (𝒙 ) Ⓢ𝑒 ) Ⓡ𝑒
SEND Reconditioning
0
0 or 1
NN
N
N
N
1
Conditioning
&
+
0
0
+
No Activity
Inserting Isolating Nodes and Recognizing Enable DomainsSynchronous synthesis tools can insert isolating nodes
• Constrained to insert isolating nodes only on non-critical paths
Node u is in e’s Enable Domain OIED(e) if
• All paths starting from a primary input and ending at u include an
isolating node controlled by e
• Detected using a DFS search
Pre-layout Analysis
• Wu : power of receiving data on all inputs and sending
the output (unconditional nodes)
• K: power of conditional nodes
• rf: activity factor Total power Power of each domain
Domain power after isolation (n inputs)
Benefit of isolating each domain
Post-layout Experimental Results
• Case study: 32-bit ALU placed and routed
• Back annotated switching activity using a VCD file
• Results:
• Isolating ADD and SUB are detrimental for rADD and rSUB > 0.2
• 53% power reduction when only isolating MUL (rf=0.25)
• Area cost of isolating MUL is about 4% and no performance penalty
Conclusions and Future Work
Conditional communication in async. circuits is not free
• Creates area and performance overheads
• Requires manual or automatic optimization
Asynchronous circuits can/should leverage sync. tools
• This paper is first to use 3-valued-logic and observability don’t
cares for power optimization of asynchronous circuits
Our future work• Evaluate the proposed method on bigger designs
• Adopt other sync power optimization techniques such as clock
gating
• Optimize the location of SEND/RECEIVE nodes (Reconditioning)