dec 1, 2003 slide 1 copyright, 1999 - 2003 © zenasis technologies, inc. flex-cell optimization a...

21
Copyright, 1999 - 2003 © Zenasis Technologies, Inc. Dec 1, 2003 Slide 1 Flex-Cell Optimization A Paradigm Shift in High- Performance Cell-Based Design

Upload: allen-tucker

Post on 13-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Copyright, 1999 - 2003 © Zenasis Technologies, Inc.Dec 1, 2003

Slide 1

Flex-Cell Optimization

A Paradigm Shift in High-PerformanceCell-Based Design

A Paradigm Shift in High-PerformanceCell-Based Design

Copyright, 1999 - 2003 © Zenasis Technologies, Inc.Dec 1, 2003

Slide 2

The Power-User Dilemma

Custom

Team=4003 GHz, 3 Years

Flex-CellOpt

Team=10520 MHz6 Months

FPGA

ASIC/COT

Team=10400 MHz9 Months

Co

st

/ T

TM

Speed, Power, Area

Takes too long!

Results aren’tgood enough!

Copyright, 1999 - 2003 © Zenasis Technologies, Inc.Dec 1, 2003

Slide 3

The Timing Dilemma

• Design Team clock target – 350 MHz• On Post-logic synth./Post-placement

STA– Only 300 MHz – Problem!!

• Options– Design change

• Rewrite RTL – Tapeout Delay!!

– Better technology• Smaller geometry – Tapeout delay and NRE cost!!• Low-k technology – Yield hit!!

– Better tools• Flex-Cell Optimization

– Custom-design benefits in std cell flow

• Design Team clock target – 350 MHz• On Post-logic synth./Post-placement

STA– Only 300 MHz – Problem!!

• Options– Design change

• Rewrite RTL – Tapeout Delay!!

– Better technology• Smaller geometry – Tapeout delay and NRE cost!!• Low-k technology – Yield hit!!

– Better tools• Flex-Cell Optimization

– Custom-design benefits in std cell flow

Copyright, 1999 - 2003 © Zenasis Technologies, Inc.Dec 1, 2003

Slide 4

Root of the Problem

• Various past studies, including a special session at DAC 2000

• Various past studies, including a special session at DAC 2000

• Std-Cell based design “an order of magnitude” lower performance than custom, at same process node– Architecture– Fixed cell library– Layout

• Std-Cell based design “an order of magnitude” lower performance than custom, at same process node– Architecture– Fixed cell library– Layout

• Fixed cell library can account for as much as 25% of the performance shortfall

• Fixed cell library can account for as much as 25% of the performance shortfall

Copyright, 1999 - 2003 © Zenasis Technologies, Inc.Dec 1, 2003

Slide 5

Rich vs Smart• Simply creating a “richer” cell library does

not solve problem– Too many cells hinder automated optimization– Missing design-specific context information– Well-known matching problems for larger cells

• Simply creating a “richer” cell library does not solve problem– Too many cells hinder automated optimization– Missing design-specific context information– Well-known matching problems for larger cells

• Custom-crafted cells, for specific design, can inject large timing gains late in the design cycle

• Compute-intensive process– Transistor netlist optimization– Cell layout creation– View generation

• Custom-crafted cells, for specific design, can inject large timing gains late in the design cycle

• Compute-intensive process– Transistor netlist optimization– Cell layout creation– View generation

Copyright, 1999 - 2003 © Zenasis Technologies, Inc.Dec 1, 2003

Slide 6

Flex-Cell Optimization -- Concept

TransistorLevel

PhysicalLevel

PhysicalLevel

LogicalLevel

PhysicalLevel

LogicalLevel

Flex-CellOpt

Optimization at Gate, Transistor & Physical Levels

Copyright, 1999 - 2003 © Zenasis Technologies, Inc.Dec 1, 2003

Slide 7

Prior Work

• Manual custom-crafting of cells, is well established– Tactical cells: every high-performance design project

uses some

• Manual custom-crafting of cells, is well established– Tactical cells: every high-performance design project

uses some

• Automated transistor-level netlist creation/optimization– Fishburn, Dunlop(1985): TILOS, transistor sizing– Gavrilov et al (1997): Library-less synthesis– Kanecko, Tian (1998): Concurrent cell

generation and mapping of digital logic– Liu, Abraham (1999): Transistor-level synthesis

of combinational logic

• Automated transistor-level netlist creation/optimization– Fishburn, Dunlop(1985): TILOS, transistor sizing– Gavrilov et al (1997): Library-less synthesis– Kanecko, Tian (1998): Concurrent cell

generation and mapping of digital logic– Liu, Abraham (1999): Transistor-level synthesis

of combinational logic

Copyright, 1999 - 2003 © Zenasis Technologies, Inc.Dec 1, 2003

Slide 8

Flex-Cell Optimization Targets

• Eliminate deficiency due to fixed cell library

– Boost performance by 15% - 25%

• Close aggressive timing in days

• Retain proven existing cell-based design flow

• Use high-yield process, still get performance

• Minimal increase in die-size or power

• Get custom-design performance from std-cell-

based flow

• Eliminate deficiency due to fixed cell library

– Boost performance by 15% - 25%

• Close aggressive timing in days

• Retain proven existing cell-based design flow

• Use high-yield process, still get performance

• Minimal increase in die-size or power

• Get custom-design performance from std-cell-

based flow

Copyright, 1999 - 2003 © Zenasis Technologies, Inc.Dec 1, 2003

Slide 9

• STA• Cluster formation

Critical Paths

Key Steps

• Flex-cell (custom crafted) creation

• Gate-level optimization1 Cell

13 Transistors6 Wires

abd

ac b

a

d

c

a4 Cells

22 Transistors9 Wires

acd

b

a

• Post synthesis netlistd

Copyright, 1999 - 2003 © Zenasis Technologies, Inc.Dec 1, 2003

Slide 10

Flex-Cell Optimization with Physicals

• Physically-aware STA– Placement aware

• Congestion• Blockage

– Multiple levels of accuracy for route info• Steiner estimates• Global route• Detailed route**

• Physically-aware STA– Placement aware

• Congestion• Blockage

– Multiple levels of accuracy for route info• Steiner estimates• Global route• Detailed route**

• Physically-driven optimization– Physically-aware clustering and

mapping– Physically-aware gate-level

optimizations– Low disturbance to existing placement– Incremental legalization of placement– Incremental re-computation of

routes/estimates

• Physically-driven optimization– Physically-aware clustering and

mapping– Physically-aware gate-level

optimizations– Low disturbance to existing placement– Incremental legalization of placement– Incremental re-computation of

routes/estimates

Copyright, 1999 - 2003 © Zenasis Technologies, Inc.Dec 1, 2003

Slide 11

Sample Flex-Cell

Tx-Level View of Gate Cluster

22 TransistorsPath depth = 3 levels

b

bd

dc

c

ac

a

a

yc

13 Transistors; Path depth = 2 levels

Critical Path: a -> y

Rise = 0.12 ns; Fall = 0.10 ns

abd

ac b

ad

dc

ay

After Tx-Level Optimization

Before After

Rise (critical) 0.26ns 0.12nsFall (critical) 0.31ns 0.10ns# Cells 4 1# Transistors 22 13Path depth 3 2# nets 9 7

Tx Opt

Custom-Crafted Flex-Cell

1 Cell, 7 nets

Critical Path: a -> y

Rise = 0.12 ns ; Fall = 0.10 ns

Gate-Level Cluster

acd

b

y

Critical Path: a -> y

Rise = 0.26 ns ; Fall = 0.31 ns

4 Cells, 9 nets

a

Copyright, 1999 - 2003 © Zenasis Technologies, Inc.Dec 1, 2003

Slide 12

Transistor-Level Optimization

Map to transistor-levelcandidate netlists for Flex-Cell

Layout synthesis

Fast (pre-layout) characterization

Cluster of standard cells, various context-specific constraintsfor this cluster, other real-life constraints like process, etc.

Meets requirements?(No)

Set of candidate Flex-Cell

Post-layout characterization

Meets requirements?(No)

Create transistor-levelnetlist with systematic

redundancy, if permitted

Detailed characterization

Transistor Sizing

Various interfaces to evaluate and fit Flex-Cell into standard-cell based design flow

Copyright, 1999 - 2003 © Zenasis Technologies, Inc.Dec 1, 2003

Slide 13

Key Issues• Judicious mix of gate-level and transistor-

level optimization

• Judicious mix of discrete and continuous

transistor sizing

• Effective use of transistor-level restructuring

• Fast and accurate transistor-level simulation

– 50x to 100x faster than Spice

• Accurate estimation of parasitics given

transistor-level netlist

• Judicious mix of gate-level and transistor-

level optimization

• Judicious mix of discrete and continuous

transistor sizing

• Effective use of transistor-level restructuring

• Fast and accurate transistor-level simulation

– 50x to 100x faster than Spice

• Accurate estimation of parasitics given

transistor-level netlist

Copyright, 1999 - 2003 © Zenasis Technologies, Inc.Dec 1, 2003

Slide 14

Impact On a Sample Critical Path

0.29 0.14 0.18

0.25 0.070.11 1.0

4

0.20

0.04

Original Critical Path

Optimized Path

0.20

Flex-Cell2

0.04

0.07

Flex-Cell1

0.36

0.820.15

0.24

21%Improvement

Copyright, 1999 - 2003 © Zenasis Technologies, Inc.Dec 1, 2003

Slide 15

Results (ZenTime)

• 38K+ instance design• 16% performance boost

– 297 MHz --> 344 MHz

• Implemented in a 0.13u process• Added 132 flex-cells, 5,927 instances• Without increasing power or area

• 38K+ instance design• 16% performance boost

– 297 MHz --> 344 MHz

• Implemented in a 0.13u process• Added 132 flex-cells, 5,927 instances• Without increasing power or area

Copyright, 1999 - 2003 © Zenasis Technologies, Inc.Dec 1, 2003

Slide 16

Impact on Global Timing

• Initial frequency: 297 MHz • Final frequency: 344 MHz

• Initial frequency: 297 MHz • Final frequency: 344 MHz

Copyright, 1999 - 2003 © Zenasis Technologies, Inc.Dec 1, 2003

Slide 17

Timing Optimization Results

Design Orig Opt ImprovFlex Cell

Flex Cell

Design Technology Process

MHz MHz (%) Created InstsSize

(#insts)Corner

Circuit1 297 345 16% 132 5927 38,130 0.13um slow

Circuit2 250 277 11% 103 4900 62,801 0.13um slow

Circuit3 248 279 13% 133 5113 160,610 0.13um slow

Circuit4 251 294 17% 150 2050 21,814 0.18um slow

Circuit5 187 219 18% 165 3821 33,940 0.13um typical

Circuit6 167 193 16% 49 183 18,265 0.18um typical

Circuit7 562 641 14% 160 2469 9,048 0.13um typical

with physicals (def, sdf, …) with wire loads

Copyright, 1999 - 2003 © Zenasis Technologies, Inc.Dec 1, 2003

Slide 18

I/O & Design Flow

GDSII

Back

-en

d D

esi

gn

Extraction &Verification

Detailed Route

Fron

t-en

d D

esi

gn

ConstraintsDesignLibrary

Flex-Cell OptTiming

Physical Synthesis

Phy

sica

l

Gatelevel Opt.

Discrete Sizing

Cont. Sizing

Clustering

Tim

ing

Interface

library.liblibrary.leflibrary.cdl

netlist.vnetlist.defconstr.sdc

tech.bsim3 netlist.set_loadnetlist.sdf

opt_netlist.vopt_netlist.def

flex-cell.est.libflex-cell.est.lef

flex-cell.cdl

Flex-CellFactory

Copyright, 1999 - 2003 © Zenasis Technologies, Inc.Dec 1, 2003

Slide 19

Automated Flex-Cell Generation

Tool Suite and Flow

Sized spicenetlists

CellArchitecture

gdslef

ant. lef

eqn.vmos.v

lumpedC.sp distrRC.sp

Layout

Functional

Spice Timing Power

Noise/glitch

.lib

.db

.tlf

Reports

.lib??

Copyright, 1999 - 2003 © Zenasis Technologies, Inc.Dec 1, 2003

Slide 20

Summary

• New dimension in optimization of cell-based designs

• Essential to find the “right balance” between gate-level and transistor-level optimization

• Better design quality, higher runtime• Timing, Area, Power no longer a simple

trade-off– Possible to improve more than one, simultaneously

• Many challenges– Lots of research opportunities!!

• New dimension in optimization of cell-based designs

• Essential to find the “right balance” between gate-level and transistor-level optimization

• Better design quality, higher runtime• Timing, Area, Power no longer a simple

trade-off– Possible to improve more than one, simultaneously

• Many challenges– Lots of research opportunities!!

Copyright, 1999 - 2003 © Zenasis Technologies, Inc.Dec 1, 2003

Slide 21

The History of Methodology Shifts

Netlist

schematic

Netlist

optimization

Logic

synthesis

Physical

synthesis

Flex-cell optimization

Flex-cell

synthesis

Physical

optimization