chap6 detailed placement - vast labcadlab.cs.ucla.edu/~cong/cs258f/chap6_06w.pdf · primary1 752 81...
TRANSCRIPT
1
Detailed Placement
Objectives
• Major: Legalization– Make placement feasible with as little
movement as possible• Minor : Refinement
– Wirelength– Timing– Routability
2
Legalization
• Greedy – Tetris [D. Hill, US Patent’02]
• Hierarchical – Network flow + Dynamic programming [J.
Vygen, DATE’98]• Mixed-size placement
– Constraint graph + Linear programming [Cong & Xie, ASPDAC’06]
Greedy Method – Tetris [D. Hill, US Patent’02]
• Sort cells using X coordinate • For each cell, determine the
closest site location on each row
• Assign the cell to the site with the best heuristic cost– Incident wirelength– Displacement– Weighted sum
Legalized already
To be legalized
20
15
105
10
15
20
3
Greedy Method
• Consider single or a subset of the cells at a time
• Does not consider the impact on remaining cells
Greedy Method – Tetris [D. Hill, US Patent’02]
• Draw back– May not find a legal
solution– Keep squeezing the
GP to compensate– After squeezing,
redo the legalization again
4
Greedy Method -- Domino [K. Doll et al, TCAD’94]
• Overlapped initial placement
• Improve placement• Divide chip into
overlapping regions• Assign cells within
each region• Iterated if necessary
Greedy Method -- Domino [K. Doll et al, TCAD’94]
• Within each region– Chop cells into uniform sized subcells
• Same cost for all subcells from the same cell• All subcells are pulled towards the cheapest location • As a result, tend to lie side by side
– Transform into min-cost-maximum-flow• Pseudo Q and S• Capacity from Q to any subcell is ∞• Capacity from candidate location to S is 1• One edge from each subcell to each candidate location
– Solve by flow-augmentation algorithm
5
Greedy Method -- Domino [K. Doll et al, TCAD’94]
• Network transformation
Something omitted here ?
Greedy Method -- Domino [K. Doll et al, TCAD’94]
• Cost on each edge– Depends on wirelength
modeling– Model I : Ignore unplaced
cells– Model II : Use the
coordinates from the previous iteration for approximation
– Model II consistently better than Model I
6
Greedy Method -- Domino [K. Doll et al, TCAD’94]
circuit #cell #pad #net #pin #rowprimary1 752 81 904 2941 20
struct 1888 64 1920 5471 19primary2 2907 107 3029 11229 28biomed 6417 97 5742 21040 39
industry2 12142 495 13419 48404 66industry3 15032 68 21924 68290 60avqsmall 21854 64 22120 60729 80avqlarge 25114 64 25378 65578 72
Greedy Method -- Domino [K. Doll et al, TCAD’94]
Area Runtime Area Runtime Area Runtime Area Runtime Area Runtimeprimary 20.31 794 23.37 99 20.76 81 19.96 151 20.03 168struct 6.56 2875 8.23 174 6.48 167 6.38 245 6.2 253primary2 78.16 5316 95.33 580 78.85 496 78.58 838 78.11 922biomed 48.22 14631 53.55 5567 45.93 1639 40.43 2742 39.99 2640industry2 219.52 37521 272.4 16399 233.59 7550 215.61 9835 214.25 9587industry3 658.01 65652 755.07 11571 598.42 8458 586.92 10257 575.54 10349avqsmall 161.05 92959 N/A N/A 129.39 14488 123.57 15963 122.28 17444avqlarge 168.07 96564 N/A N/A 141.38 18537 130.1 20549 129.27 21086average 1 1 1.18 0.21 0.95 0.13 0.9 0.18 0.89 0.18
Gordian-L + Domino IICircuit
TimberWolf V.5.4 VNPR Gordian-L Gordian-L + Domino I
•Reduces area by 11% on average compared with TimberWolf v.5.4•Moderate runtime with respect to Gordian-L
7
Hierarchical Method
• First assign cells into regions on the chip• Then legalize within each individual region• Cell remain in one region once assigned
there
Hierarchical Method[J. Vygen, DATE’98]
• Determine total amount of cells to move out of one region (zone)– Regions(zone) corresponds to
nodes– Edge between adjacent regions– Determine source/sink
depending on overflow/underflow
– Network solver O(n2logn)– Result forms a DAG (Why?)– Scan regions in topological
order
8
Hierarchical Method[J. Vygen, DATE’98]
• Determine the cells to move between regions (zones)– C(r) = {c1, c2, … cm}– Pj(x1, x2, … xk) = C(C1, C2, … Ck), For j = 1, 2 …m, and
0 ≤ xi ≤ f(r, ri), such that – w(Ci) = xi, C ⊆ {c1, c2, … cj} (i = 1, 2 … k) – Ci ∩ Cj = ∅ i≠j– is minimized
– Generating all 0 ≤ xi ≤ f(r, ri) takes comparisons
∑∑= ∈
k
i Cck
k
rrc1
),,(cost
∏=
k
iirrf
1
),(
Mixed-size Legalization[Cong & Xie ASPDAC’06]
• Constraint graph based macro legalization• Enhanced standard cell legalization
9
Phase I. Constraint Graph based Macro Legalization -- Constraint Graph Generation
horizontalvertical vertical
4
3 33
4
Model Model pairwisepairwise nonnon--overlapping constraints with constraint edgesoverlapping constraints with constraint edges
Constraint edge direction to the one that incurs the least amounConstraint edge direction to the one that incurs the least amount of t of overlapoverlap
Phase I. Constraint Graph based Macro Legalization -- Constraint Graph Example
m1
m2
m3
m4
m5 m7
m6
Macro Height Widthm1 4 4m2 2 4m3 3 4m4 9 6m5 4 6m6 3 6m7 6 6
Chip Dimension : 25 x 10
Vhs 1
2
3
4
5
6
7
Vhs
24
4
5
5
6
6
3
Vvs
1
3
6
4
2
5
7
Vvt
10
Phase I. Constraint Graph based Macro Legalization -- Constraint Graph Example
Vhs
1
2
3
4
5
6
7
24
4
5
5
6
63
Vhs
m1
m2
m3
m4
m5 m7
m6
Infeasible
Vvs
1
3
6
4
2
5
7
Vvt
m1
m2
m3 m4
m5
m7
m6
Phase I. Constraint Graph based Macro Legalization -- Constraint Graph Example
Vhs 1
2
3
4
5
6
7
24
4
5
5
6
63
Vhs
m1
m2
m3
m4
m5 m7
m6
No Impact
Vhs 1
2
3
4
5
6
7
24
4
5
5
6
63
Vhs
m1
m2
m3
m4
m5 m7
m6
11
Phase I. Constraint Graph based Macro Legalization -- Constraint Graph Adjustment
• Extract the epsilon-network of the constraint graph– Subgraph made up of critical edges
• Search for a min-cut on the epsilon-network– Used in logic synthesis for timing [Singh 92, Xu 00, Ph.D
Thesis]• Heuristic edge capacity assigned as
• Corresponding epsilon network for the example
⎪⎩
⎪⎨⎧
−+
+++
+−
∞+=
)0,2
)(max()0,2
)(max(
infeasible orthogonal if )(
jji
iji
jiij y
hhvL
hhvRy
ec
Phase I. Constraint Graph base Macro Legalization – Final Constraint Graph
1Vhs
2
3
4
5
6
Vht
7
Vvs
1
6
4
3 2
5
7
Vvt\
Final Horizontal Constraint Graph Final Vertical Constraint Graph
12
Phase I. Constraint Graph base Macro Legalization -- Macro Coordinate Assignment• Determine the location of macros by minimizing total
perturbation from global placement • Linear constraints based on the edges derived in the constraint
graph– Similar formulation in [Vygen, DAC97, Tang et al, ASPDAC 2005]
• LP formulation can be enhanced to consider weights, WL, etc
⎪⎪⎩
⎪⎪⎨
⎧
+≥−
+≥−
−+− ∑∑==
edge constraint verticaliFor 2
''
edge constraint horizontal iFor 2
''..
''min
th2112
th2112
11
iiii
iiii
n
iii
n
iii
hhyy
wwxx
ts
yyxx
Phase II. Enhanced Cell Legalization
Front end Back end Front end Back end
Enhanced greedy legalization [D. Hill, US Patent 02, Khatkhate ISPD04]
Macros are still allowed to move, but only horizontally during cell legalization
13
Impact of LP for Macro Coordinate AssignmentFWL RT(s) FWL RT(s)
ibm01 2.22E+06 30 2.18E+07 37ibm02 5.00E+06 87 4.77E+07 70ibm03 6.67E+06 58 6.68E+07 54ibm04 7.52E+06 66 7.59E+07 75ibm05 9.76E+06 66 9.76E+07 66ibm06 6.00E+06 66 6.06E+07 64ibm07 1.01E+07 123 1.02E+07 120ibm08 1.21E+07 152 1.19E+07 157ibm09 1.29E+07 145 1.28E+07 146ibm10 2.91E+07 340 2.90E+07 321ibm11 1.82E+07 195 1.80E+07 206ibm12 3.52E+07 380 3.48E+07 330ibm13 2.35E+07 242 2.34E+07 242ibm14 3.55E+07 452 3.54E+07 443ibm15 5.04E+07 650 5.03E+07 552ibm16 5.32E+07 665 5.31E+07 671ibm17 6.52E+07 948 6.52E+07 923ibm18 4.32E+07 715 4.31E+07 724Avg. 1.01 1.02 1.00 1.00
Greedy LPcircuit
LP helps to reduce the final WL by 1%LP helps to reduce the final WL by 1%
Impact of Movable MacrosFWL RT(s) FWL RT(s)
ibm01 2.25E+06 37 2.18E+07 37ibm02 4.83E+06 62 4.77E+07 70ibm03 6.93E+06 63 6.68E+07 54ibm04 7.97E+06 67 7.59E+07 75ibm05 9.75E+06 68 9.76E+07 66ibm06 6.21E+06 66 6.06E+07 64ibm07 1.09E+07 138 1.02E+07 120ibm08 1.18E+07 117 1.19E+07 157ibm09 1.31E+07 123 1.28E+07 146ibm10 3.10E+07 236 2.90E+07 321ibm11 1.90E+07 175 1.80E+07 206ibm12 3.90E+07 320 3.48E+07 330ibm13 2.53E+07 244 2.34E+07 242ibm14 3.62E+07 354 3.54E+07 443ibm15 5.13E+07 490 5.03E+07 552ibm16 5.34E+07 466 5.31E+07 671ibm17 6.65E+07 746 6.52E+07 923ibm18 4.45E+07 635 4.31E+07 724Avg. 1.04 0.89 1.00 1.00
circuitFixed Movable
Allowing macros to move helps to reduce final WL by 4%Allowing macros to move helps to reduce final WL by 4%
14
Impact of Backend ContourFWL RT(s) FWL RT(s)
ibm01 N/A 2.18E+07 37ibm02 N/A 4.77E+07 70ibm03 N/A 6.68E+07 54ibm04 N/A 7.59E+07 75ibm05 9.76E+07 67 9.76E+07 66ibm06 6.06E+07 65 6.06E+07 64ibm07 N/A 1.02E+07 120ibm08 N/A 1.19E+07 157ibm09 N/A 1.28E+07 146ibm10 N/A 2.90E+07 321ibm11 1.80E+07 205 1.80E+07 206ibm12 N/A 3.48E+07 330ibm13 N/A 2.34E+07 242ibm14 3.54E+07 445 3.54E+07 443ibm15 5.03E+07 5.65E+02 5.03E+07 552ibm16 5.31E+07 674 5.31E+07 671ibm17 6.52E+07 943 6.52E+07 923ibm18 4.31E+07 943 4.31E+07 724
circuitw/o backend w backend
Having the backend contour significantly enhances the robustnessHaving the backend contour significantly enhances the robustness
Refinement Algorithms
• Analytical Approach– Linear programming [J. Vygen, DATE’98]
• Combinatorial Approach– Linear placement [Kahng et al, ASPDAC’99]– Window interleaving [S. Hur et al, ICCAD’00]
15
Linear Programming [J. Vygen, DATE’98]
max
11
1min
)(
)(2
}),({)(
)()())(()( s.t.
))()()((min
im
i
jij
ij
i
ji
ii
Nn
xcx
cxccwcx
)x(cxnxpxoffspcxnx
nxnxnw
i ≤
≤+
≤
≤+≤
−
++
∈∑
• Linear programming within each zone
• No experimental results given, although theoretically sound
Linear Placement [A. Kahng et al, ASPDAC’99]
• Input – a single row with dimension with m movable cells Ci,
i=1,2,… m – Fixed cells in other rows– n nets, N1, N2, … Nn
• Output– Overlapping free placement of Ci within the row with
minimum • Constraint
– x(C1) < x(C2) < … < x(Cm)
∑=
n
iiNHPWL
1
)(
16
Linear Placement [A. Kahng et al, ASPDAC’99]
fixed cells
net N
span (N)
fl(N) fr(N)
ml(N) mr(N)
fixed_span (N) minimize
Linear Placement [A. Kahng et al, ASPDAC’99]
• Contribution of cell Ci
• Piece-wise linear and convex– Increasing (decreasing)
when fr(N) to the left of corresponding li(N) is less (more) than fl(N) to the right of corresponding ri(N)
∑
∑
=
=
−+
−=
)(
)(
}0),()(max{
}0),()(max{)(cos
NLCill
rrNRCi
i
NmNf
NfNmxt fr(1) fl(2) fr(3) fr(3) fl(4) fr(2) fr(5) fl(6) fl(7)
Minimum Segment
17
Linear Placement [A. Kahng et al, ASPDAC’99]
• Dynamic Programming Algorithm
– pre-computed cell cost functions
• Prefix Algorithm
– piecewise-linearity of cell cost function
• Clumping Algorithm
– convexity of cell cost function
Linear Placement [A. Kahng et al, ASPDAC’99]
• Dynamic programming– Place C1 … Ci, Ci being at or to the left of sj
– is minimized–– Runtime = (i range) ×(j range)
= n × (k - ∑ wi)≈ O(n2)
∑=
i
kk xt
1)(cos
()}cos)(),(min{)( 111 iijijiji twsPsPsP +−= −−−
18
Linear Placement [A. Kahng et al, ASPDAC’99]
Ci-1 Ci
sj
sj-wi-1
Ci
sj-1
Pi(j) will be eitherCi exactly at sj (extend Pi-1(j-wi-1))
or Ci to the left of sj (use Pi(j-1))
Linear Placement [A. Kahng et al, ASPDAC’99]
• Prefix algorithm– pcosti (minimum cost of a prefix placement)
monotonously nonincreasing– pcosti and costi can be represented by a sequence of line
segments (min, max, slope, y-intercept)– Sites bounding maximal line segments of pcosti is a
subset of sites bounding pcosti and costi
– Merge the two representations for each Ci
– Complexity O(m2)
19
pcosti-1
x
cost
costi
pcosti
Linear Placement [A. Kahng et al, ASPDAC’99]
Linear Placement [A. Kahng et al, ASPDAC’99]
• Clumping algorithm– Find the minimum interval for Ci, i.e., where slope of
costi is 0.– Scan adjacent cell starting from left– If Ci-1 and Ci can not be put in their minimum intevals
without overlap, replace them with Ci-1’, which all the pins replicated
– Otherwise place Ci on the leftmost legal site– Complexity O(m2)– Use RB tree to reduce to O(mlogm)
20
clumped cellclumped
cell
optimal positions for cells
directions to minimum segments of individual cells
Linear Placement [A. Kahng et al, ASPDAC’99]
Linear Placement [A. Kahng et al, ASPDAC’99]
Algorithm#cell #rows #site Heuristic None Swap None Swap4155 46 41610 % improvement 9.39 13.38 9.49 13.46
runtime 56.8 94.1 31.8 69.411471 42 78036 % improvement 6 6.18 6 6.22
runtime 182.9 289 68.9 19812260 610 128893 % improvement 1.13 1.38 1.15 1.4
runtime 13.9 35.1 14.8 40.27309 56 47152 % improvement 7.43 8.13 7.45 8.08
runtime 46.5 125.7 35.1 95.98829 60 98760 % improvement 3.7 3.54 3.76 3.77
runtime 34 43.9 20.7 91.8
Circuit Prefix Clumping
Comparison with an industry placement tool
21
Window Interleaving[S. Hur et al, ICCAD’00’]
• Intra-row optimization– Given window W, choose arbitrary
subsequence A and B = W-A– Interleave A and B to get an optimal
arrangement– Repeat by sliding window right across each row
from top to bottom
Window Interleaving[S. Hur et al, ICCAD’00]
22
Window Interleaving[S. Hur et al, ICCAD’00]
• Denotation– A = a1, a2, … an, B = b1, b2, … bm– Sij is optimal arrangement with a1, a2,… ai (i≤n) and b1,
b2,… bj (j ≤m), C(Sij) the cost of Sij
• Recursion
• Complexity O(nm+p(n+m)), p is the number of pins on incident nets
⎪⎭
⎪⎬⎫
⎪⎩
⎪⎨⎧ <
=
=
=
−
−−−
otherwise ,
)()( if ,,
0)(0
1,
1,,11
0,0
0,0
jji
jjiijiijiij bS
bSCaSCaSS
SCS