dec 1, 2003 slide 1 copyright, 1999 - 2003 © zenasis technologies, inc. flex-cell optimization a...
TRANSCRIPT
Copyright, 1999 - 2003 © Zenasis Technologies, Inc.Dec 1, 2003
Slide 1
Flex-Cell Optimization
A Paradigm Shift in High-PerformanceCell-Based Design
A Paradigm Shift in High-PerformanceCell-Based Design
Copyright, 1999 - 2003 © Zenasis Technologies, Inc.Dec 1, 2003
Slide 2
The Power-User Dilemma
Custom
Team=4003 GHz, 3 Years
Flex-CellOpt
Team=10520 MHz6 Months
FPGA
ASIC/COT
Team=10400 MHz9 Months
Co
st
/ T
TM
Speed, Power, Area
Takes too long!
Results aren’tgood enough!
Copyright, 1999 - 2003 © Zenasis Technologies, Inc.Dec 1, 2003
Slide 3
The Timing Dilemma
• Design Team clock target – 350 MHz• On Post-logic synth./Post-placement
STA– Only 300 MHz – Problem!!
• Options– Design change
• Rewrite RTL – Tapeout Delay!!
– Better technology• Smaller geometry – Tapeout delay and NRE cost!!• Low-k technology – Yield hit!!
– Better tools• Flex-Cell Optimization
– Custom-design benefits in std cell flow
• Design Team clock target – 350 MHz• On Post-logic synth./Post-placement
STA– Only 300 MHz – Problem!!
• Options– Design change
• Rewrite RTL – Tapeout Delay!!
– Better technology• Smaller geometry – Tapeout delay and NRE cost!!• Low-k technology – Yield hit!!
– Better tools• Flex-Cell Optimization
– Custom-design benefits in std cell flow
Copyright, 1999 - 2003 © Zenasis Technologies, Inc.Dec 1, 2003
Slide 4
Root of the Problem
• Various past studies, including a special session at DAC 2000
• Various past studies, including a special session at DAC 2000
• Std-Cell based design “an order of magnitude” lower performance than custom, at same process node– Architecture– Fixed cell library– Layout
• Std-Cell based design “an order of magnitude” lower performance than custom, at same process node– Architecture– Fixed cell library– Layout
• Fixed cell library can account for as much as 25% of the performance shortfall
• Fixed cell library can account for as much as 25% of the performance shortfall
Copyright, 1999 - 2003 © Zenasis Technologies, Inc.Dec 1, 2003
Slide 5
Rich vs Smart• Simply creating a “richer” cell library does
not solve problem– Too many cells hinder automated optimization– Missing design-specific context information– Well-known matching problems for larger cells
• Simply creating a “richer” cell library does not solve problem– Too many cells hinder automated optimization– Missing design-specific context information– Well-known matching problems for larger cells
• Custom-crafted cells, for specific design, can inject large timing gains late in the design cycle
• Compute-intensive process– Transistor netlist optimization– Cell layout creation– View generation
• Custom-crafted cells, for specific design, can inject large timing gains late in the design cycle
• Compute-intensive process– Transistor netlist optimization– Cell layout creation– View generation
Copyright, 1999 - 2003 © Zenasis Technologies, Inc.Dec 1, 2003
Slide 6
Flex-Cell Optimization -- Concept
TransistorLevel
PhysicalLevel
PhysicalLevel
LogicalLevel
PhysicalLevel
LogicalLevel
Flex-CellOpt
Optimization at Gate, Transistor & Physical Levels
Copyright, 1999 - 2003 © Zenasis Technologies, Inc.Dec 1, 2003
Slide 7
Prior Work
• Manual custom-crafting of cells, is well established– Tactical cells: every high-performance design project
uses some
• Manual custom-crafting of cells, is well established– Tactical cells: every high-performance design project
uses some
• Automated transistor-level netlist creation/optimization– Fishburn, Dunlop(1985): TILOS, transistor sizing– Gavrilov et al (1997): Library-less synthesis– Kanecko, Tian (1998): Concurrent cell
generation and mapping of digital logic– Liu, Abraham (1999): Transistor-level synthesis
of combinational logic
• Automated transistor-level netlist creation/optimization– Fishburn, Dunlop(1985): TILOS, transistor sizing– Gavrilov et al (1997): Library-less synthesis– Kanecko, Tian (1998): Concurrent cell
generation and mapping of digital logic– Liu, Abraham (1999): Transistor-level synthesis
of combinational logic
Copyright, 1999 - 2003 © Zenasis Technologies, Inc.Dec 1, 2003
Slide 8
Flex-Cell Optimization Targets
• Eliminate deficiency due to fixed cell library
– Boost performance by 15% - 25%
• Close aggressive timing in days
• Retain proven existing cell-based design flow
• Use high-yield process, still get performance
• Minimal increase in die-size or power
• Get custom-design performance from std-cell-
based flow
• Eliminate deficiency due to fixed cell library
– Boost performance by 15% - 25%
• Close aggressive timing in days
• Retain proven existing cell-based design flow
• Use high-yield process, still get performance
• Minimal increase in die-size or power
• Get custom-design performance from std-cell-
based flow
Copyright, 1999 - 2003 © Zenasis Technologies, Inc.Dec 1, 2003
Slide 9
• STA• Cluster formation
Critical Paths
Key Steps
• Flex-cell (custom crafted) creation
• Gate-level optimization1 Cell
13 Transistors6 Wires
abd
ac b
a
d
c
a4 Cells
22 Transistors9 Wires
acd
b
a
• Post synthesis netlistd
Copyright, 1999 - 2003 © Zenasis Technologies, Inc.Dec 1, 2003
Slide 10
Flex-Cell Optimization with Physicals
• Physically-aware STA– Placement aware
• Congestion• Blockage
– Multiple levels of accuracy for route info• Steiner estimates• Global route• Detailed route**
• Physically-aware STA– Placement aware
• Congestion• Blockage
– Multiple levels of accuracy for route info• Steiner estimates• Global route• Detailed route**
• Physically-driven optimization– Physically-aware clustering and
mapping– Physically-aware gate-level
optimizations– Low disturbance to existing placement– Incremental legalization of placement– Incremental re-computation of
routes/estimates
• Physically-driven optimization– Physically-aware clustering and
mapping– Physically-aware gate-level
optimizations– Low disturbance to existing placement– Incremental legalization of placement– Incremental re-computation of
routes/estimates
Copyright, 1999 - 2003 © Zenasis Technologies, Inc.Dec 1, 2003
Slide 11
Sample Flex-Cell
Tx-Level View of Gate Cluster
22 TransistorsPath depth = 3 levels
b
bd
dc
c
ac
a
a
yc
13 Transistors; Path depth = 2 levels
Critical Path: a -> y
Rise = 0.12 ns; Fall = 0.10 ns
abd
ac b
ad
dc
ay
After Tx-Level Optimization
Before After
Rise (critical) 0.26ns 0.12nsFall (critical) 0.31ns 0.10ns# Cells 4 1# Transistors 22 13Path depth 3 2# nets 9 7
Tx Opt
Custom-Crafted Flex-Cell
1 Cell, 7 nets
Critical Path: a -> y
Rise = 0.12 ns ; Fall = 0.10 ns
Gate-Level Cluster
acd
b
y
Critical Path: a -> y
Rise = 0.26 ns ; Fall = 0.31 ns
4 Cells, 9 nets
a
Copyright, 1999 - 2003 © Zenasis Technologies, Inc.Dec 1, 2003
Slide 12
Transistor-Level Optimization
Map to transistor-levelcandidate netlists for Flex-Cell
Layout synthesis
Fast (pre-layout) characterization
Cluster of standard cells, various context-specific constraintsfor this cluster, other real-life constraints like process, etc.
Meets requirements?(No)
Set of candidate Flex-Cell
Post-layout characterization
Meets requirements?(No)
Create transistor-levelnetlist with systematic
redundancy, if permitted
Detailed characterization
Transistor Sizing
Various interfaces to evaluate and fit Flex-Cell into standard-cell based design flow
Copyright, 1999 - 2003 © Zenasis Technologies, Inc.Dec 1, 2003
Slide 13
Key Issues• Judicious mix of gate-level and transistor-
level optimization
• Judicious mix of discrete and continuous
transistor sizing
• Effective use of transistor-level restructuring
• Fast and accurate transistor-level simulation
– 50x to 100x faster than Spice
• Accurate estimation of parasitics given
transistor-level netlist
• Judicious mix of gate-level and transistor-
level optimization
• Judicious mix of discrete and continuous
transistor sizing
• Effective use of transistor-level restructuring
• Fast and accurate transistor-level simulation
– 50x to 100x faster than Spice
• Accurate estimation of parasitics given
transistor-level netlist
Copyright, 1999 - 2003 © Zenasis Technologies, Inc.Dec 1, 2003
Slide 14
Impact On a Sample Critical Path
0.29 0.14 0.18
0.25 0.070.11 1.0
4
0.20
0.04
Original Critical Path
Optimized Path
0.20
Flex-Cell2
0.04
0.07
Flex-Cell1
0.36
0.820.15
0.24
21%Improvement
Copyright, 1999 - 2003 © Zenasis Technologies, Inc.Dec 1, 2003
Slide 15
Results (ZenTime)
• 38K+ instance design• 16% performance boost
– 297 MHz --> 344 MHz
• Implemented in a 0.13u process• Added 132 flex-cells, 5,927 instances• Without increasing power or area
• 38K+ instance design• 16% performance boost
– 297 MHz --> 344 MHz
• Implemented in a 0.13u process• Added 132 flex-cells, 5,927 instances• Without increasing power or area
Copyright, 1999 - 2003 © Zenasis Technologies, Inc.Dec 1, 2003
Slide 16
Impact on Global Timing
• Initial frequency: 297 MHz • Final frequency: 344 MHz
• Initial frequency: 297 MHz • Final frequency: 344 MHz
Copyright, 1999 - 2003 © Zenasis Technologies, Inc.Dec 1, 2003
Slide 17
Timing Optimization Results
Design Orig Opt ImprovFlex Cell
Flex Cell
Design Technology Process
MHz MHz (%) Created InstsSize
(#insts)Corner
Circuit1 297 345 16% 132 5927 38,130 0.13um slow
Circuit2 250 277 11% 103 4900 62,801 0.13um slow
Circuit3 248 279 13% 133 5113 160,610 0.13um slow
Circuit4 251 294 17% 150 2050 21,814 0.18um slow
Circuit5 187 219 18% 165 3821 33,940 0.13um typical
Circuit6 167 193 16% 49 183 18,265 0.18um typical
Circuit7 562 641 14% 160 2469 9,048 0.13um typical
with physicals (def, sdf, …) with wire loads
Copyright, 1999 - 2003 © Zenasis Technologies, Inc.Dec 1, 2003
Slide 18
I/O & Design Flow
GDSII
Back
-en
d D
esi
gn
Extraction &Verification
Detailed Route
Fron
t-en
d D
esi
gn
ConstraintsDesignLibrary
Flex-Cell OptTiming
Physical Synthesis
Phy
sica
l
Gatelevel Opt.
Discrete Sizing
Cont. Sizing
Clustering
Tim
ing
Interface
library.liblibrary.leflibrary.cdl
netlist.vnetlist.defconstr.sdc
tech.bsim3 netlist.set_loadnetlist.sdf
opt_netlist.vopt_netlist.def
flex-cell.est.libflex-cell.est.lef
flex-cell.cdl
Flex-CellFactory
Copyright, 1999 - 2003 © Zenasis Technologies, Inc.Dec 1, 2003
Slide 19
Automated Flex-Cell Generation
Tool Suite and Flow
Sized spicenetlists
CellArchitecture
gdslef
ant. lef
eqn.vmos.v
lumpedC.sp distrRC.sp
Layout
Functional
Spice Timing Power
Noise/glitch
.lib
.db
.tlf
Reports
.lib??
Copyright, 1999 - 2003 © Zenasis Technologies, Inc.Dec 1, 2003
Slide 20
Summary
• New dimension in optimization of cell-based designs
• Essential to find the “right balance” between gate-level and transistor-level optimization
• Better design quality, higher runtime• Timing, Area, Power no longer a simple
trade-off– Possible to improve more than one, simultaneously
• Many challenges– Lots of research opportunities!!
• New dimension in optimization of cell-based designs
• Essential to find the “right balance” between gate-level and transistor-level optimization
• Better design quality, higher runtime• Timing, Area, Power no longer a simple
trade-off– Possible to improve more than one, simultaneously
• Many challenges– Lots of research opportunities!!