simultaneous device and interconnect optimization
TRANSCRIPT
ECE902 VLSI Interconnects
Fall 1999, Prof. Lei He 1
Simultaneous Device and Interconnect Optimization
■ Simultaneous device and wire sizing
■ Simultaneous buffer insertion and wire sizing
■ Simultaneous topology construction, buffer insertion and wire sizing z WBA tree (student presentation)
z P-tree
Simultaneous device and wiresizing
■ Dominance-Property based approach to minimize weighted sum of delayz Simultaneous driver/buffer and wiresizing
[Cong-Koh, TVLSI’94] [Cong-Koh-Leung, ISLPED’96]z Simultaneous transistor and interconnect sizing
[Cong-He, PDW’96, ICCAD’96]
■ Lagrangian relaxation based approach to minimize maximum delayz Simultaneous buffer and wire sizing
[Chen-Chang-Wong, DAC’96]
■ Mathematical programming based approach to minimize area while meeting performance requirementz Simultaneous gate and wiresizing
[Menezes-Baldick-Pileggi, ICCAD’95]
ECE902 VLSI Interconnects
Fall 1999, Prof. Lei He 2
RC Delay Model for Drivers
■ Rmin = resistance of min-size driver■ di = size of i-th driver■ Cg = gate capacitance of min-size
driver■ Cd = diffusion capacitance of min-
size driver
d1 did2 dk
tD(T,D)
Delay of Driver = i-thR
d(d C +d C )
ii d i+ g
min1
Rp
Rn
Cg Cd
Switch level RC Model for minimum size driver
Delay from 1st to 2nd last driver, t (T,D) = Delay of driverd
i=1
k-1
i-th∑
Total Delay Measure t(k,D,W)
Total Delay Measure: t(k,D,W) = t (k,D) + t (W)D l
■ Interconnect delay from last driver to sinks
t (W) = t(N)
where is user - specified normalized non - negative parameter
to prioritize sinkN
l
sink Ni
i i
i
i
∑ ×λ
λ
Where tD(T,D) is the delay from 1st to 2nd last driver
tl(W) is the interconnect delay from last driver to sinks
ECE902 VLSI Interconnects
Fall 1999, Prof. Lei He 3
Power Dissipation Formulation
■ Short-circuit: ScP(i) ∝ di
Short circuit Power = ScP(i)−=∑i
k
1
■ Capacitive: CP(i) ∝ (diCd+di+1Cg) for I< k CP(k) ∝ (dkCd+CIL) CIL: load due to
routing tree
■ Total Power = Capacitive + Short-Circuit
Capacitive Power = CP(i)i
k
=∑
1
Main Theorem: Relation between Driver and Wire Sizing
■ Given (D,W) and (D’, W’) for k drivers
■ if W = opt-WS(D) and W’ = opt-WS(D’)Â D dominates D’ => W dominates W’
■ if D = opt-DS(W) and D’ = opt-DS(W’) ÂW dominates W’ => D dominates D’
ECE902 VLSI Interconnects
Fall 1999, Prof. Lei He 4
K-SDWS LU-Bound algorithm for Delay Optimization
■ Lower bound of SDWS optimal solution
Dominate
W0 = Min. Width Assignment (dominated by opt. sol.)
D0 = Opt-DS(W0)
W1 = Opt-WS(D0)
D1 = Opt-DS(W1)
(Di,Wi) monotonically inreases
■ (Di, Wi) dominated by optimal solution
K-SDWS Optimal algorithm for Delay Optimization
■ Linear search for the optimal stage number, k*
Optimal k-SDWS solution
SDWS Optimal Algorithm for Delay Optimization
■ Case 1: the bounds meet
■ Case 2: bounds do not meet z Discretize driver sizes of k-th driver between the
bounds z For each discretized driver
− compute optimal sizes for k-1 drivers and wiresz Select best d-SDWS solution
gdsagMAXILD
MAX/CCae
s
CWTCk ==
= + wheres* and
*ln
/),(ln */1
)1( kk*D
MAX≤≤
ECE902 VLSI Interconnects
Fall 1999, Prof. Lei He 5
K-SDWS Optimal algorithm for Combined Delay and Power Optimization
■ Linear search for the optimal stage number, k*
■ Compute Optimal Driver Sizing Solutin by MAPLE
Solution MonotoneSelect
0 1
1-k to2i allfor 0 1
2
2
1
1
1
=⋅
−⋅+
==−⋅+
−
+
−
gk
L
k
i
i
i
Cd
d
dBA
d
d
dBA
solutiondriver monotone no has ws.t.number stagesmallest :
1-
MAX
)1(DPMAX
DP
MAX
k
kk*≤≤
Experiments to Evaluate SDWS Algorithm
■ Compared with other design methods:
z CDSMIN (Constant Driver Sizing, ratio e and MINimum wire
width)
z ODSMIN (Optimal Driver Sizing,
MINimum wire width)
z DWSA [Cong-Koh-Leung, LPDW’94)
(Independent constant Driver Sizing with ratio e, optimal wire
width)
=+
g
L
i
i
C
Ck
d
d
/1
1
ECE902 VLSI Interconnects
Fall 1999, Prof. Lei He 6
Experimental Results on Power-Delay Trade-off
Simultaneous Transistor and Interconnect Sizing[Cong-He,PDW & ICCAD’96]
Given: Initial layout design for multiple nets,Table-based models for device delay and interconnect coupling capacitances
Determine: Discrete sizes for transistors/wires
Minimize: α Delay + β Power + γ Area
ECE902 VLSI Interconnects
Fall 1999, Prof. Lei He 7
■
z resistance for unit-width transistor/wirez area capacitance for unit-width transistor/wire z fringing capacitance for transistor/wirez discrete widths for transistors/wires
■ To minimize t(X) is a simple CH-posynomial program
)(),()(),()( 1,
)(
,0
)( 00 jCjiFxjCjiFXtji
xiR
jijx
iR
ii••+•••= ∑∑
)()()( 1)()( 00 iCiHiG
ix
iR
ix
iR
ii••+•+ ∑∑
:0C
:0R
:},...,,{ 21 nxxxX =:1C
Objective for Delay Minimization
Dominance Property for Simple CH-posynomial Programs
■ Theorem ([Cong-He, pdw’96]z The dominance property holds for simple CH-posynomial
program w.r.t. the local refinement.− If X dominates optimal solution X*
X’ = local refinement of XThen, X’ dominates X*
− Symmetric for X dominated by X*
)()()(0 0 1 ,1
qjqj
m
p
m
q
n
i
n
ijjx
axbXf p
i
pi ⋅⋅= ∑ ∑ ∑ ∑= = = ≠=
■ To minimize
is a simple CH-posynomial program where api and bqj are positive constants.
ECE902 VLSI Interconnects
Fall 1999, Prof. Lei He 8
Overview of STIS Algorithm
■ Support mixed transistor sizing formulations:z find an optimal size for each gate, each pull-up or pull-down
block, or each transistor
■ Algorithm Flown Partition devices and interconnects into DC-Connect-
Components (DCCs)o Compute TIGHT lower and upper bounds by iterative LR
(local refinement) for devices and wires within each DCCp Compute optimal solution within bounds by bottom-up
dynamic program [Lillis-et al, ICCAD’95] within each DCC
Experimental Results■ Clock nets of 12.7Mchip/s all digital BPSK direct sequence
spread spectrum IF transceiver Chip in UCLA1 radio for wireless multimedia information systems
■ Clock nets routed interactively with Flint, fabricated by 1.2um SCMOS technology
■ CLK net: 112 inverters and 255 sinksDCLK net: 31 inverters and 123 sinks
■ Manually designed driver/buffer: cascade chain of 4 inverters■ Ideal inter-clock skew = 0:
ECE902 VLSI Interconnects
Fall 1999, Prof. Lei He 9
Manual Design versus LR-Based Optimizations
■ Transistor sizing formulation can achieve higher delay and skew reduction at a similar power dissipation
■ Runtimes (wire segmenting: 10um) z LR-based SBWS 1.18s, STIS 0.88sz Dynamic programming run out of memory
z Total HSPICE simulation ~2000s
manual SBWS STISmax delay (ns) 4.6324 4.3447(-6.2%) 3.9632(-14.4)average power(mW) 60.85 46.09(-24.3%) 46.29(-24.2%)clock skew 470ps 130ps(-3.6x) 40ps(-11.7x)
Trend of Device Effective Resistance
■ R0 is NOT a constant. It depends on size, input slope tt and output load cl
z May differ by a factor of 2
z NOT a function of a single sizing variable
size = 100x
cl \ tt 0.05ns 0.10ns 0.20ns0.225pf 12200 12270 191800.425pf 8135 9719 125000.825pf 8124 8665 10250
size = 400x
cl \ tt 0.05ns 0.10ns 0.20ns0.501pf 12200 15550 191500.901pf 11560 13360 174401.701pf 8463 9688 12470
effective-resistance R0 for unit-width n-transistor
Invalidate simple CH-posynomial Fomulation!
ECE902 VLSI Interconnects
Fall 1999, Prof. Lei He 10
Bounded CH-Posynomial Program and Extended Local Refinement
))(()()(0 0 1 ,1
)( qjqj
m
p
m
q
n
i
n
ijjx
XaxXbXf p
i
pi ⋅⋅= ∑ ∑ ∑ ∑= = = ≠=
■ To minimize
is a general CH-posynomial, when api and bqj are arbitrary functions of X , but each has an upper and lower bound.
■ Extended local refinement on w.r.t X is local refinement using following coefficients:z When X dominates X*, for any p, q and , we use
maxpia ,)( 1
pixpi forXa min
qja qjxqj forXa 1)(
ix
instead of instead ofminpib )(Xbpi
maxqjb )(Xbqjinstead of instead offor ,p
ix forqjx
ij≠
z Symmetric operation when X is dominated by X*
Dominance Property for Bounded CH-Posynomial Program
■ Theorem ([Cong-He, ISPD’98]:z The dominance property holds for bounded CH-posynomial
program w.r.t. the extended local refinement.− If X dominates optimal solution X*
X’ = extended local refinement of XThen, X’ dominates X*
− If X is dominated by X* X’ = extended local refinement of X
Then, X’ is dominated by X*
■ Application:z Device and wire sizing problem
− under general capacitance model− under table-based device delay model
ECE902 VLSI Interconnects
Fall 1999, Prof. Lei He 11
Extended Local Refinement for Device
■ and are determined z under assumption that R0 increases w.r.t.
− increases of size and input slope− decrease of output load
z table lookup− using keeping updated lower and upper bounds on
transistor size, input slope and output load
)(max0 iR )(min
0 iR
■ When we use:z for LR optimization on transistor iz for LR optimization on transistors rather than i
,*XX ≥)(max
0 iR
)(min0 iR
■ When we use:z for LR optimization on transistor i
z for LR optimization on transistors rather than i
,*XX ≤)(min
0 iR
)(max0 iR
Comparison between STIS Formulations
DCLK step-model table-model
sgws 1.16 1.08 (-6.8%)
stis 1.13 (-2.5%) 0.96 (-17.2%)
2cm line step-model table-model
sgws 0.82 0.81 (-0.4%)
stis 0.75 (-8.6%) 0.69 (-16.5%)
■ Different formulations on DCLK and 2cm linez Parameters are based on 0.18um processz Optimal buffer insertion is used for 2cm line
■ Total runtimez LR-based optimization ~10 seconds
z HSPICE simulation ~3000 seconds
ECE902 VLSI Interconnects
Fall 1999, Prof. Lei He 12
GISS can be Solved as General CH-Posynomial Program
z 16-bit bus each a 10mm-long line, 500um per segmentz Min min width (max spacing)z GISS/DP dynamic programming based and under
variable ca and cf
z GISS/LR LR-based and under general cap table
C e n te rs p a c in g
A v e r a g e D e la y s (n s ) R u n t im e s ( s )
M I N G I S S /D P G I S S /L R G I S S /D P G I S S /L R
2 x p i tc h 1 .5 1 0 .8 0 ( -4 7 % ) 0 .7 9 ( -4 7 % ) 1 8 3 2 .0
3 x p i tc h 1 .3 3 0 .5 2 ( -6 1 % ) 0 .5 2 ( -6 1 % ) 1 8 9 2 .4
4 x p i tc h 1 .2 8 0 .4 2 ( -6 7 % ) 0 .4 2 ( -6 7 % ) 5 1 1 2 .3
5 x p i tc h 1 .2 5 0 .3 7 ( -7 1 % ) 0 .3 6 ( -7 1 % ) 1 0 8 6 4 .9
6 x p i tc h 1 .2 3 0 .3 4 ( -7 2 % ) 0 .3 2 ( -7 3 % ) 1 3 7 9 7 .7
Simultaneous Device and Interconnect Optimization
■ Simultaneous device and wire sizing
■ Simultaneous buffer insertion and wire sizing
■ Simultaneous topology construction, buffer insertion and wire sizing
ECE902 VLSI Interconnects
Fall 1999, Prof. Lei He 13
Buffer Insertion with Wiresizing[Lillis-Cheng-Lin, ICCAD’95]
■ Objective is to minimize power subject to delay constraints■ Incorporate the effect of signal slew on buffer delay using
piece-wise linear functions■ In the bottom-up phase, consider discrete wiresizing for
each edge e,z For each option (c, q), candidate wire width w,
cap(e, w) = wire cap. of e with width wres(e, w) = wire res. of e with width wCompute new option (c’, q’):
c’ = c + cap(e, w);q’ = q - res(e, w) × (cap(e, w)/2 + c)
■ Additional pruning rule considered for power minimization: Options (c, q) with power p, and (c’, q’) with power p’, prune (c, q) if p’< p, c’≤ c, q’≥ q
Simultaneous Buffer Insertion/Sizing and Wiresizing[Chu-Wong, ISPD’97]
■ Assumptions:z Consider only area capacitancez Continue wire widths and buffer sizes without bounds
■ Problem:z Given a single line, driver resist., load, and the total
number of segments n to be used
z Objective: find (i) the optimal number of buffers to beinserted in their locations and sizes
(ii) the optimal length and width of each segment
ECE902 VLSI Interconnects
Fall 1999, Prof. Lei He 14
■ Results and Implications:z Closed form formula for optimal number of buffers
z All segments in the optimal solution are of equal length
z Closed form formulas for buffer and wire sizes, for any given buffer locations
z Buffer locations do not matter, as long as delay is the only objective and the buffer and wire sizes are not bounded
⇒ For delay minimization, a chain of cascade drivers is as good as using buffers to break a long line
However, power and area will be affected by buffer locations
■ For interconnect tree, apply the formulas on edges iteratively; keep buffer locations/sizes and wire widths of other edges fixed while optimizing one edge
■ Shortcoming: Ignore fringing capacitance which is significant in deep submicron
Simultaneous Buffer Insertion/Sizing and Wiresizingcontinued
Comparison of Several Interconnect Optimization Algorithms
■ T+B+W:Topology (T), followed by optimal buffer insertion and sizing B (B=10) then followed by optimal wire sizing (W=18)
■ TB+BW: Simultaneous T and B (B=3), followed by simultaneous buffer and wire sizing (BW) with B=40, W=18
■ Tbw+BW: Simultaneous TBW with small number of B=3 and W=3, then followed by BW as above
■ TBW: Simultaneous TBW with larger number of B=10 and W=8
■ Provided by the UCLA TRIO (Tree, Repeater, & Interconnect Optimization) package
ECE902 VLSI Interconnects
Fall 1999, Prof. Lei He 15
Comparison of Optimization Results by Different Algorithms
AlgorithmsT+B+W TB+BW Tbw+BW TBW
0.40 0.39 0.35 0.340.47 0.48 0.38 0.38
Delay(nS)
0.42 0.41 0.36 0.355-pi
nne
ts
CPU (S) 0.1 0.1 1.4 150.42 0.37 0.34 0.330.56 0.56 0.44 0.44
Delay(nS)
0.47 0.45 0.38 0.3810-p
inne
ts
CPU (S) 0.8 1.0 6.4 760.45 0.43 0.38 0.390.54 0.48 0.42 0.41
Delay(nS)
0.46 0.43 0.38 0.3820-p
inne
ts
CPU (S) 1.6 4.0 27.6 350