chap6 detailed placement - vast labcadlab.cs.ucla.edu/~cong/cs258f/chap6_06w.pdf · primary1 752 81...

1

Detailed Placement

Objectives

• Major: Legalization– Make placement feasible with as little

movement as possible• Minor : Refinement

– Wirelength– Timing– Routability

2

Legalization

• Greedy – Tetris [D. Hill, US Patent’02]

• Hierarchical – Network flow + Dynamic programming [J.

Vygen, DATE’98]• Mixed-size placement

– Constraint graph + Linear programming [Cong & Xie, ASPDAC’06]

Greedy Method – Tetris [D. Hill, US Patent’02]

• Sort cells using X coordinate • For each cell, determine the

closest site location on each row

• Assign the cell to the site with the best heuristic cost– Incident wirelength– Displacement– Weighted sum

Legalized already

To be legalized

20

15

105

10

15

20

3

Greedy Method

• Consider single or a subset of the cells at a time

• Does not consider the impact on remaining cells

Greedy Method – Tetris [D. Hill, US Patent’02]

• Draw back– May not find a legal

solution– Keep squeezing the

GP to compensate– After squeezing,

redo the legalization again

4

Greedy Method -- Domino [K. Doll et al, TCAD’94]

• Overlapped initial placement

• Improve placement• Divide chip into

overlapping regions• Assign cells within

each region• Iterated if necessary


• Within each region– Chop cells into uniform sized subcells

• Same cost for all subcells from the same cell• All subcells are pulled towards the cheapest location • As a result, tend to lie side by side

– Transform into min-cost-maximum-flow• Pseudo Q and S• Capacity from Q to any subcell is ∞• Capacity from candidate location to S is 1• One edge from each subcell to each candidate location

– Solve by flow-augmentation algorithm

5


• Network transformation

Something omitted here ?


• Cost on each edge– Depends on wirelength

modeling– Model I : Ignore unplaced

cells– Model II : Use the

coordinates from the previous iteration for approximation

– Model II consistently better than Model I

6


circuit #cell #pad #net #pin #rowprimary1 752 81 904 2941 20

struct 1888 64 1920 5471 19primary2 2907 107 3029 11229 28biomed 6417 97 5742 21040 39

industry2 12142 495 13419 48404 66industry3 15032 68 21924 68290 60avqsmall 21854 64 22120 60729 80avqlarge 25114 64 25378 65578 72


Area Runtime Area Runtime Area Runtime Area Runtime Area Runtimeprimary 20.31 794 23.37 99 20.76 81 19.96 151 20.03 168struct 6.56 2875 8.23 174 6.48 167 6.38 245 6.2 253primary2 78.16 5316 95.33 580 78.85 496 78.58 838 78.11 922biomed 48.22 14631 53.55 5567 45.93 1639 40.43 2742 39.99 2640industry2 219.52 37521 272.4 16399 233.59 7550 215.61 9835 214.25 9587industry3 658.01 65652 755.07 11571 598.42 8458 586.92 10257 575.54 10349avqsmall 161.05 92959 N/A N/A 129.39 14488 123.57 15963 122.28 17444avqlarge 168.07 96564 N/A N/A 141.38 18537 130.1 20549 129.27 21086average 1 1 1.18 0.21 0.95 0.13 0.9 0.18 0.89 0.18

Gordian-L + Domino IICircuit

TimberWolf V.5.4 VNPR Gordian-L Gordian-L + Domino I

•Reduces area by 11% on average compared with TimberWolf v.5.4•Moderate runtime with respect to Gordian-L

7

Hierarchical Method

• First assign cells into regions on the chip• Then legalize within each individual region• Cell remain in one region once assigned

there

Hierarchical Method[J. Vygen, DATE’98]

• Determine total amount of cells to move out of one region (zone)– Regions(zone) corresponds to

nodes– Edge between adjacent regions– Determine source/sink

depending on overflow/underflow

– Network solver O(n2logn)– Result forms a DAG (Why?)– Scan regions in topological

order

8

Hierarchical Method[J. Vygen, DATE’98]

• Determine the cells to move between regions (zones)– C(r) = {c1, c2, … cm}– Pj(x1, x2, … xk) = C(C1, C2, … Ck), For j = 1, 2 …m, and

0 ≤ xi ≤ f(r, ri), such that – w(Ci) = xi, C ⊆ {c1, c2, … cj} (i = 1, 2 … k) – Ci ∩ Cj = ∅ i≠j– is minimized

– Generating all 0 ≤ xi ≤ f(r, ri) takes comparisons

∑∑= ∈

k

i Cck

k

rrc1

),,(cost

∏=

k

iirrf

1

),(

Mixed-size Legalization[Cong & Xie ASPDAC’06]

• Constraint graph based macro legalization• Enhanced standard cell legalization

9

Phase I. Constraint Graph based Macro Legalization -- Constraint Graph Generation

horizontalvertical vertical

4

3 33

4

Model Model pairwisepairwise nonnon--overlapping constraints with constraint edgesoverlapping constraints with constraint edges

Constraint edge direction to the one that incurs the least amounConstraint edge direction to the one that incurs the least amount of t of overlapoverlap

Phase I. Constraint Graph based Macro Legalization -- Constraint Graph Example

m1

m2

m3

m4

m5 m7

m6

Macro Height Widthm1 4 4m2 2 4m3 3 4m4 9 6m5 4 6m6 3 6m7 6 6

Chip Dimension : 25 x 10

Vhs 1

2

3

4

5

6

7

Vhs

24

4

5

5

6

6

3

Vvs

1

3

6

4

2

5

7

Vvt

10


Vhs

1

2

3

4

5

6

7

24

4

5

5

6

63

Vhs

m1

m2

m3

m4

m5 m7

m6

Infeasible

Vvs

1

3

6

4

2

5

7

Vvt

m1

m2

m3 m4

m5

m7

m6


Vhs 1

2

3

4

5

6

7

24

4

5

5

6

63

Vhs

m1

m2

m3

m4

m5 m7

m6

No Impact

Vhs 1

2

3

4

5

6

7

24

4

5

5

6

63

Vhs

m1

m2

m3

m4

m5 m7

m6

11

Phase I. Constraint Graph based Macro Legalization -- Constraint Graph Adjustment

• Extract the epsilon-network of the constraint graph– Subgraph made up of critical edges

• Search for a min-cut on the epsilon-network– Used in logic synthesis for timing [Singh 92, Xu 00, Ph.D

Thesis]• Heuristic edge capacity assigned as

• Corresponding epsilon network for the example

⎪⎩

⎪⎨⎧

−+

+++

+−

∞+=

)0,2

)(max()0,2

)(max(

infeasible orthogonal if )(

jji

iji

jiij y

hhvL

hhvRy

ec

Phase I. Constraint Graph base Macro Legalization – Final Constraint Graph

1Vhs

2

3

4

5

6

Vht

7

Vvs

1

6

4

3 2

5

7

Vvt\

Final Horizontal Constraint Graph Final Vertical Constraint Graph

12

Phase I. Constraint Graph base Macro Legalization -- Macro Coordinate Assignment• Determine the location of macros by minimizing total

perturbation from global placement • Linear constraints based on the edges derived in the constraint

graph– Similar formulation in [Vygen, DAC97, Tang et al, ASPDAC 2005]

• LP formulation can be enhanced to consider weights, WL, etc

⎪⎪⎩

⎪⎪⎨

⎧

+≥−

+≥−

−+− ∑∑==

edge constraint verticaliFor 2

''

edge constraint horizontal iFor 2

''..

''min

th2112

th2112

11

iiii

iiii

n

iii

n

iii

hhyy

wwxx

ts

yyxx

Phase II. Enhanced Cell Legalization

Front end Back end Front end Back end

Enhanced greedy legalization [D. Hill, US Patent 02, Khatkhate ISPD04]

Macros are still allowed to move, but only horizontally during cell legalization

13

Impact of LP for Macro Coordinate AssignmentFWL RT(s) FWL RT(s)

ibm01 2.22E+06 30 2.18E+07 37ibm02 5.00E+06 87 4.77E+07 70ibm03 6.67E+06 58 6.68E+07 54ibm04 7.52E+06 66 7.59E+07 75ibm05 9.76E+06 66 9.76E+07 66ibm06 6.00E+06 66 6.06E+07 64ibm07 1.01E+07 123 1.02E+07 120ibm08 1.21E+07 152 1.19E+07 157ibm09 1.29E+07 145 1.28E+07 146ibm10 2.91E+07 340 2.90E+07 321ibm11 1.82E+07 195 1.80E+07 206ibm12 3.52E+07 380 3.48E+07 330ibm13 2.35E+07 242 2.34E+07 242ibm14 3.55E+07 452 3.54E+07 443ibm15 5.04E+07 650 5.03E+07 552ibm16 5.32E+07 665 5.31E+07 671ibm17 6.52E+07 948 6.52E+07 923ibm18 4.32E+07 715 4.31E+07 724Avg. 1.01 1.02 1.00 1.00

Greedy LPcircuit

LP helps to reduce the final WL by 1%LP helps to reduce the final WL by 1%

Impact of Movable MacrosFWL RT(s) FWL RT(s)

ibm01 2.25E+06 37 2.18E+07 37ibm02 4.83E+06 62 4.77E+07 70ibm03 6.93E+06 63 6.68E+07 54ibm04 7.97E+06 67 7.59E+07 75ibm05 9.75E+06 68 9.76E+07 66ibm06 6.21E+06 66 6.06E+07 64ibm07 1.09E+07 138 1.02E+07 120ibm08 1.18E+07 117 1.19E+07 157ibm09 1.31E+07 123 1.28E+07 146ibm10 3.10E+07 236 2.90E+07 321ibm11 1.90E+07 175 1.80E+07 206ibm12 3.90E+07 320 3.48E+07 330ibm13 2.53E+07 244 2.34E+07 242ibm14 3.62E+07 354 3.54E+07 443ibm15 5.13E+07 490 5.03E+07 552ibm16 5.34E+07 466 5.31E+07 671ibm17 6.65E+07 746 6.52E+07 923ibm18 4.45E+07 635 4.31E+07 724Avg. 1.04 0.89 1.00 1.00

circuitFixed Movable

Allowing macros to move helps to reduce final WL by 4%Allowing macros to move helps to reduce final WL by 4%

14

Impact of Backend ContourFWL RT(s) FWL RT(s)

ibm01 N/A 2.18E+07 37ibm02 N/A 4.77E+07 70ibm03 N/A 6.68E+07 54ibm04 N/A 7.59E+07 75ibm05 9.76E+07 67 9.76E+07 66ibm06 6.06E+07 65 6.06E+07 64ibm07 N/A 1.02E+07 120ibm08 N/A 1.19E+07 157ibm09 N/A 1.28E+07 146ibm10 N/A 2.90E+07 321ibm11 1.80E+07 205 1.80E+07 206ibm12 N/A 3.48E+07 330ibm13 N/A 2.34E+07 242ibm14 3.54E+07 445 3.54E+07 443ibm15 5.03E+07 5.65E+02 5.03E+07 552ibm16 5.31E+07 674 5.31E+07 671ibm17 6.52E+07 943 6.52E+07 923ibm18 4.31E+07 943 4.31E+07 724

circuitw/o backend w backend

Having the backend contour significantly enhances the robustnessHaving the backend contour significantly enhances the robustness

Refinement Algorithms

• Analytical Approach– Linear programming [J. Vygen, DATE’98]

• Combinatorial Approach– Linear placement [Kahng et al, ASPDAC’99]– Window interleaving [S. Hur et al, ICCAD’00]

15

Linear Programming [J. Vygen, DATE’98]

max

11

1min

)(

)(2

}),({)(

)()())(()( s.t.

))()()((min

im

i

jij

ij

i

ji

ii

Nn

xcx

cxccwcx

)x(cxnxpxoffspcxnx

nxnxnw

i ≤

≤+

≤

≤+≤

−

++

∈∑

• Linear programming within each zone

• No experimental results given, although theoretically sound

Linear Placement [A. Kahng et al, ASPDAC’99]

• Input – a single row with dimension with m movable cells Ci,

i=1,2,… m – Fixed cells in other rows– n nets, N1, N2, … Nn

• Output– Overlapping free placement of Ci within the row with

minimum • Constraint

– x(C1) < x(C2) < … < x(Cm)

∑=

n

iiNHPWL

1

)(

16


fixed cells

net N

span (N)

fl(N) fr(N)

ml(N) mr(N)

fixed_span (N) minimize


• Contribution of cell Ci

• Piece-wise linear and convex– Increasing (decreasing)

when fr(N) to the left of corresponding li(N) is less (more) than fl(N) to the right of corresponding ri(N)

∑

∑

=

=

−+

−=

)(

)(

}0),()(max{

}0),()(max{)(cos

NLCill

rrNRCi

i

NmNf

NfNmxt fr(1) fl(2) fr(3) fr(3) fl(4) fr(2) fr(5) fl(6) fl(7)

Minimum Segment

17


• Dynamic Programming Algorithm

– pre-computed cell cost functions

• Prefix Algorithm

– piecewise-linearity of cell cost function

• Clumping Algorithm

– convexity of cell cost function


• Dynamic programming– Place C1 … Ci, Ci being at or to the left of sj

– is minimized–– Runtime = (i range) ×(j range)

= n × (k - ∑ wi)≈ O(n2)

∑=

i

kk xt

1)(cos

()}cos)(),(min{)( 111 iijijiji twsPsPsP +−= −−−

18


Ci-1 Ci

sj

sj-wi-1

Ci

sj-1

Pi(j) will be eitherCi exactly at sj (extend Pi-1(j-wi-1))

or Ci to the left of sj (use Pi(j-1))


• Prefix algorithm– pcosti (minimum cost of a prefix placement)

monotonously nonincreasing– pcosti and costi can be represented by a sequence of line

segments (min, max, slope, y-intercept)– Sites bounding maximal line segments of pcosti is a

subset of sites bounding pcosti and costi

– Merge the two representations for each Ci

– Complexity O(m2)

19

pcosti-1

x

cost

costi

pcosti



• Clumping algorithm– Find the minimum interval for Ci, i.e., where slope of

costi is 0.– Scan adjacent cell starting from left– If Ci-1 and Ci can not be put in their minimum intevals

without overlap, replace them with Ci-1’, which all the pins replicated

– Otherwise place Ci on the leftmost legal site– Complexity O(m2)– Use RB tree to reduce to O(mlogm)

20

clumped cellclumped

cell

optimal positions for cells

directions to minimum segments of individual cells



Algorithm#cell #rows #site Heuristic None Swap None Swap4155 46 41610 % improvement 9.39 13.38 9.49 13.46

runtime 56.8 94.1 31.8 69.411471 42 78036 % improvement 6 6.18 6 6.22

runtime 182.9 289 68.9 19812260 610 128893 % improvement 1.13 1.38 1.15 1.4

runtime 13.9 35.1 14.8 40.27309 56 47152 % improvement 7.43 8.13 7.45 8.08

runtime 46.5 125.7 35.1 95.98829 60 98760 % improvement 3.7 3.54 3.76 3.77

runtime 34 43.9 20.7 91.8

Circuit Prefix Clumping

Comparison with an industry placement tool

21

Window Interleaving[S. Hur et al, ICCAD’00’]

• Intra-row optimization– Given window W, choose arbitrary

subsequence A and B = W-A– Interleave A and B to get an optimal

arrangement– Repeat by sliding window right across each row

from top to bottom

Window Interleaving[S. Hur et al, ICCAD’00]

22

Window Interleaving[S. Hur et al, ICCAD’00]

• Denotation– A = a1, a2, … an, B = b1, b2, … bm– Sij is optimal arrangement with a1, a2,… ai (i≤n) and b1,

b2,… bj (j ≤m), C(Sij) the cost of Sij

• Recursion

• Complexity O(nm+p(n+m)), p is the number of pins on incident nets

⎪⎭

⎪⎬⎫

⎪⎩

⎪⎨⎧ <

=

=

=

−

−−−

otherwise ,

)()( if ,,

0)(0

1,

1,,11

0,0

0,0

jji

jjiijiijiij bS

bSCaSCaSS

SCS

chap6 detailed placement - vast labcadlab.cs.ucla.edu/~cong/cs258f/chap6_06w.pdf · primary1 752 81...

Documents