tvlsi2002 fixed

Upload: raffi-sk

Post on 03-Apr-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/28/2019 Tvlsi2002 Fixed

    1/17

    1

    Fixed-outline Floorplanning :Enabling Hierarchical Design

    Saurabh N. Adya, Member, IEEE, Igor L. Markov, Member, IEEE

    Abstract Classical oorplanning minimizes a linearcombination of area and wirelength. When SimulatedAnnealing is used, e.g., with the Sequence Pair represen-tation, the typical choice of moves is fairly straightfor-ward.

    In this work, we study the xed-outline oorplanformulation that is more relevant to hierarchical designstyle and is justied for very large ASICs and SoCs.We empirically show that instances of the xed-outlineoorplan problem are signicantly harder than related

    instances of classical oorplan problems. We suggest newobjective functions to drive simulated annealing and newtypes of moves that better guide local search in the newcontext. Wirelength improvements and optimization of aspect ratios of soft blocks are explicitly addressed bythese techniques.

    Our proposed moves are based on the notion of oorplan slack. The proposed slack computation canbe implemented with all existing algorithms to evaluatesequence pairs, of which we use the simplest, yet seman-tically indistinguishable from the fastest reported [28].A similar slack computation is possible with many otheroorplan representations. In all cases the computationtime approximately doubles.

    Our empirical evaluation is based on a new oorplan-ner implementation Parquet-1 that can operate in bothoutline-free and xed-outline modes. We use Parquet-1to oorplan a design, with approximately 32000 cells, in37 minutes using a top-down, hierarchical paradigm.

    Index Terms VLSI CAD, Physical Design, Floorplan-ning, Hierachical Design, Placement

    I. INTRODUCTION

    We describe the classical oorplanning framework and compare it to a modern xed-outline formulation.

    A. Classical Outline-Free Floorplanning

    A typical oorplanning formulation entails a collec-tion of blocks , which can represent circuit partitionsin applications. Each block is characterized by area(typically xed) and shape-type , e.g., xed rectangle,rectangle with varying aspect ratio, an L-shape, aT-shape, or a more general rectilinear polygon, etc(such shapes may optimize layouts of special types of circuits, e.g., datapaths). A solution to such a problem,i.e., a oorplan , species a selection of block shapesand overlap-free placements of blocks. Depending onshape constraints, a oorplanning formulation can bediscrete or continuous. For example, if at least oneblock is allowed to assume any rectangular shape

    with xed area and aspect ratio in the interval

    a b (where a b) the solution space is no longer nite ordiscrete. Multiple aspect ratios can be implied by anIP block available in several shapes as well as by ahierarchical partitioning-driven design ow for ASICs[26], [13] where only the number of standard cells ina block (and thus the total area) is known in advance.In many cases, e.g., for row-based ASIC designs,there are only nitely many allowed aspect ratios,but solution spaces containing a continuum are usednone the less, primarily because existing computationalmethods cannot handle such a large discrete solutionspace directly [13]. We point out that in the classicaloorplanning formulations, movable blocks tend tohave xed aspect ratios, but the overall oorplan isnot constrained by an outline. While several recentworks allow for variable block aspect ratios, the moremodern xed-outline formulation (see below) has notbeen addressed.

    Objective functions not directly related to area typi-cally involve a hypergraph that connects given blocks.

    While more involved hypergraph-based objective func-tions have been proposed, the popularity of the HPWL(half-perimeter wirelength) function is due to its sim-plicity and relative accuracy, given that routes are notavailable. The HPWL objective became even morerelevant [13] with the wide use of multi-layer over-the-cell routing in which more nets are routed withshortest paths.

    A fundamental theorem for many oorplan represen-tations, says that at least one area-minimal placement can be represented [20]. This does not hold for ob- jectives that include wirelength because none of theoptimal solutions may be packed which implies that

    more nets can be routed with shortest paths. Figure 1shows a small example that area-minimal placementsdo not hold for minimum wirelength objectives. Wealso note that lack of incremental move structures ina oorplan representation is an important weaknessof typical topological oorplanners. Thus wirelengthhas to be calculated from scratch after each oorplanevaluation. For designs with lot of wires, calculatingthe wirelength from scratch after each move can slowdown the oorplanner considerably [1].

    For the remaining part of this work, we will bedealing with the area and HPWL objectives only, buteven this simplied setting implies multi-objective op-

  • 7/28/2019 Tvlsi2002 Fixed

    2/17

    2

    P1 A B P2

    Fig. 1. Example to show that area-minimal placements

    do not hold for minimum wirelength objectives. Blocks A and B, connected by 2-pin nets to xed pins P1 andP2 respectively. The blocks touch in no optimal solutionif the pins are sufciently far from each other.

    timization . Mathematically, best trade-offs are capturedby the non-dominated frontier (NDF), also known asthe Pareto front .De nition: a solution of a multi-objective optimizationproblem belongs to the non-dominated frontier iff noother solution improves upon one of the objective func-tions while preserving (or improving) other objective

    functions.1

    Of those works on abstract oorplanningthat address both objectives, most minimize a linearcombination [27], [23], [28] with arbitrarily chosencoefcients. By a simple corollary of the denitionof NDF, this produces non-dominated solutions, mostlikely different for different coefcients. Note, how-ever, that area and wirelength have different dimen-sions. Given that net lengths have the same order of magnitude as the x and y dimensions of the oorplanitself, areas tend to be several orders of magnitudelarger than wirelengths and path delays. One can takethe square root of the area so that the the terms becomeof the same magnitude. However, even if one tries

    to connect the area of a region to its perimeter, thatrelation depends on the aspect ratio (the rectangle1x100 has perimeter length 202, and the rectangle10x10 has perimeter length 40, even though both havearea 100). Since aspect ratio changes in the courseof oorplanning, one cannot come up with a xedcoefcient. Moreover, net length may be much largerthan the perimeter because of the large number of nets.Here, again, there is no xed coefcient because the netlength changes during the course of oorplanning andit is difcult to predict optimal net lengths (some netsmay be very short and some may be very long). Theproblem is exacerbated, because for different designs,different coefcients may be required to nd the NDF.In our experiments, area terms dominated wirelengthterms unless highly problem-specic coefcients areused. In other words, it is difcult to fully automatea oorplanner that explores non-dominated solutionswith respect to wirelength and area objectives. Therelationship between linear combination objectives andthe Pareto curve (NDF) is studied in [12]. It is shownthat with a suitable choice of coefcients, any point onthe Lower Convex Hull of the NDF can be found.

    1 The design of optimization heuristics can be viewed as a problemwith at least two objective functions runtime and solution quality.

    This suggests a systematic method of modifying thecoefcients to probe the hull and also characterizesthe limitations of the linear combination approach.

    To summarize, classical oorplan approaches entaildifcult multi-objective optimization and often rely onrepresentations that may not capture any minimumwirelength solutions.

    (a) (b)

    Fig. 2. Layout of a modern chip. Figure (a) shows theactual layout. Figure (b) shows an abstract oorplancaptured from the layout in (a). The example illustratesthe hierarchical nature of designs and the need tosupport a hierarchical ow.

    B. Modern Fixed-outline Floorplanning

    As pointed out in previous works [13], [4], someof fundamental difculties in classical oorplanningare gracefully resolved in the context of modern ASIC

    design. Modern hierarchical ASIC design ows basedon multi-layer over-the-cell routing naturally imply xed-die placement and oorplanning rather than thevariable-die style, associated with channel routing,two layers of metal and feedthroughs. Each top-downstep of such a ow may start with a oorplan of pre-scribed aspect ratio, and with blocks of bounded (butnot xed) aspect ratios. The objective is to minimizewirelength subject to (i) the xed oorplan outlines,and, perhaps (ii) zero white-space. Floorplans with nowhite-space are called mosaic by Hong et al. [11]. (i)implies that the white-space is no longer an objective,but rather a constraint, because it can be computed

    in advance. Modern design ows use hierarchy as ameans to reduce the complexity. Figure 2 (a) showsthe layout of a modern chip. Figure 2 (b) abstracts theactual layout in (a) to generate a hierarchical oorplan.This example serves to illustrate the need to support ahierarchical ow.

    The modern oorplanning formulation was proposedby Kahng [13] and is an inside-out version of theclassical outline-free oorplanning formulation theaspect ratio of the oorplan is xed, but the aspectratios of the blocks can vary. To our knowledge, it hasnot yet been explicitly addressed in the literature, partlydue to the lack of benchmarks. Since our work ad-

  • 7/28/2019 Tvlsi2002 Fixed

    3/17

    3

    dresses this formulation, we re-evaluate the relevanceof classical oorplanning results in the new context:

    1) Zero white-space requirement is practical be-cause at top level (during oorplanning) thereare no unused resources the space betweenblocks can be used for routing, buffers, etc. Forexample, buffer islands are discussed in [7].Therefore, whitespace must be allocated morecarefully. The new formulation makes researchon classical block packing more relevant . Thatis because all wirelength-minimal solutions inthis formulation can be captured by compactedrepresentations such as sequence pairs [20], O-trees [23], B

    -trees [5] and corner block lists[11]. In fact, any oorplan with zero white-space can be captured by known representations,because it is compacted.

    2) Multi-objective minimization of area and wire-length, via linear combinations or otherwise, isno longer an issue since white-space is xed.

    3) Handling blocks with variable aspect ratios ap-pears increasingly important because there maybe very few or no oorplans with a given out-line for any given xed conguration of aspectratios. A number of works [19], [21], [6], [29]handle the oorplan sizing problem, i.e., changesof aspect ratios without reordering blocks, bymethods of mathematical optimization (convexlinear and non-linear programming). However,

    such methods are difcult to combine with com-binatorial optimization and entail excessive run-times. For example [29] cites runtime of 19.5hours for the ami49 benchmark (other works citesmaller runtimes). Additionally, such approachesentail a mix of two very different computationalengines. The implementation reported in [11]appears to handle discrete variable aspect ratiosby randomized re-instantiation of blocks basedon a set of 16 alternatives.

    4) Perhaps, the greatest shortcoming of known ap-proaches to oorplanning with respect to thenew formulation is the lack of appropriate neigh-

    borhood structures, i.e., incremental changes(moves) that preserve the xed outline of theoorplan. Notably, every oorplan encoded bythe corner block list(CBL) representation [11]has zero white-space with respect to the roomsthe oorplan creates (i.e. is mosaic), but CBLbased moves can change the oorplans aspectratio considerably.

    5) Given that the new oorplanning formulation ismore constrained , we see increased relevance of research on accommodating application-specicconstraints, such as alignment, abutment, order,regions [28], symmetry [24], etc.

    We conclude that classical oorplanning is largelyrelevant to the new oorplan formulation proposed byKahng [13], however the new formulation must be ad-dressed through ways other than novel representations.This is primarily due to the fact that known oorplanrepresentations and manipulation algorithms do not allow effective traversals of the solution space without violating important constraints , such as the xed-outline constraint discussed in our work. While suchrepresentations and algorithms may be proposed in thefuture, an alternative approach is to allow temporaryviolations and either tolerate or x them. For example,not every corner block list [11] yields a valid oorplan,but the feasibility constraint is clearly stated in [11] andtolerated by the reported implementation. Constrainedmodern oorplanning has also been addressed recentlyby Feng et. al [9]. Their work assumes initial locationsof blocks to be oorplanned are available and thetechniques are more applicable for incremental oor-planning. There are a number of works that do oor-planning with various realistic objectives (congestion,timing, power, etc). However xed-outline constraintsand the optimization of the HPWL present a simpler,but necessary part of practical oorplanning. Thus, weview simplied oorplanning formulations as a usefullter for promising computational techniques.

    Industrial design instances, can be broadly classiedinto ASICs, SoCs, and Microprocessor. ASIC chipsfrequently contain a handful (1-20) of large macros,a moderate number (100s) of large multi-row cells,

    and many small standard cells up to several millionand increasing. ASIC chips typically have whitespaceranging from 40% to 80%

    SoC designs are similar to ASIC designs, but withmany more large macros in the placement area. Inextreme cases the bulk of the design is concentrated instandard pre-designed library cores, RAMs, etc, withonly a small fraction of movable logic providing minorcontrol functions.

    Microprocessor designs are generally laid out hier-archically, and this approach often leads to many smallpartitions. Some of these partitions are small standard-cell placement instances with very few xed cells, and

    a small number of movable cells (

    10000).Particularly note that many current designs are hi-

    erarchical and are designed top-down [14], [16], [8].As pointed out in [8] oorplanning is becoming in-creasingly important for prototyping hierarchical de-signs. The top level in any hierarchical design owmay use a variable die. Variable-die oorplanning isemployed in [25] because only one level of hierarchygoes through their oorplanner. However, xed-dieoorplanning offers new possibilities. Fixed-outlineoorplanning could be incorporated in a top-downhierarchical ow employing multi-level oorplanningas described in Section III-F. It can also be used in

  • 7/28/2019 Tvlsi2002 Fixed

    4/17

    4

    an ASIC oor-placement ow [1] to place mixed-sizeASIC designs in a xed-die context. A methodology toplace standard-cell designs with numerous macros bycombining oorplanning and standard-cell techniquesis proposed in [1]. The proposed design ow is asfollows:

    An arbitrary black-box (no access to source coderequired) standard-cell placer generates an initialplacement.

    To remove overlaps between macros, a physi-cal clustering algorithm constructs a xed-outlineoorplanning instance.

    A xed-outline oorplanner, generates valid loca-tions of macros.

    With macros considered xed, the black-boxstandard-cell placer is called again to place smallcells.

    This design ow provides a somewhat new killerapplication for the many oorplanning techniquesdeveloped in the last ve years and xed-outlineoorplanning formulations are relevant in the xed-diecontext which is very popular in ASIC design. Floor-planning is heavily used in design ows for complexhierarchical SoC design today. Given the dominanceof xed-outline design, our techniques are applicableto many chips that are being designed today.

    In this work, we study neighborhood structuresfor the well-known sequence pair representation. Ourproposed slack-based moves are more likely to reducethe oorplan span in a given direction (H or V)

    than random pair-wise swaps and block rotations usedin most works based on sequence pairs. Wirelengthminimization and handling aspect ratios of soft blocksare also more transparent with slack-based moves.

    The remaining part of the paper is organized asfollows. Section II discusses the background on oor-planning and the sequence pair representation. Weintroduce the concept of oorplan-slack in Section III.It also discusses better local-search in the annealingcontext and special moves aimed at HPWL mini-mization and handling soft-blocks using slacks. Fixed-outline oorplanning and applications to hierarchicaloorplan design is also explained. Section IV presentsempirical validation of our work, and future directionsare discussed in Section V.

    I I . B ACKGROUND : THE SEQUENCE PAIRFLOORPLAN REPRESENTATION

    An overwhelming majority of oorplanners rely onthe Simulated Annealing framework [26] but differ byinternal oorplan representations.

    The sequence pair representation for classical oor-plans of N blocks has been proposed in [20]. Unlikemost new graph-based representations, it consists of two permutations (orderings) of the N blocks. The twopermutations capture geometric relations between each

    pair of blocks. Recall that since blocks cannot overlap,one of them must be to the left or below from the other,or both. In sequence pair

    a

    b

    a

    b

    a is to left of b (1)

    a

    b

    b

    a

    ! "

    a is above b (2)

    In other words, every two blocks constrain eachother in either vertical or horizontal direction. Thesequence pair representation is shift-invariant since itonly encodes pairwise relative placements. Therefore,placements produced from sequence pairs must bealigned to given horizontal and vertical axes, e.g., x # 0and y # 0. Multiple sequence pairs may encode thesame block placement, e.g., for three identical squareblocks, both $ a c b % c a b %' & and $ a c b %

    c b a %' & encode the placement with a straight ontop of c, and b aligned with c on the right.

    The original work on the sequence pair representa-tion [20] proposed an algorithm to compute placementsfrom a sequence pair by constructing the horizontal(H) and vertical (V) constraint graphs. The H and Vgraphs have N ( 2 vertices each one for each of N block, plus the source and the sink. For everypair of blocks a and b there is a directed edge a ) bin the H graph if a is to the left of b according tothe sequence pair (Formula 1). Similarly there is adirected edge a ) b in the V graph if a is above baccording to the sequence pair (Formula 2) exactlyone of the two cases must take place. Vertices thatdo not have outgoing edges are connected to the sink,

    and vertices that do not have incoming edges areconnected to the source. Both graphs are consideredvertex-weighted, where the weights in the H graphrepresent horizontal sizes of blocks, and the weights inthe V graph represent vertical sizes of blocks. Sourcesand sinks have zero weights.

    B C

    A

    CB

    A,

    ,

    Fig. 3. Two sequence pairs with edges of the horizontal(dashed) and vertical (solid) constraint graphs.

    Block locations are the locations of blocks lowerleft corners. The x locations are computed from the H graph, and y locations are computed from the V graphindependently . Therefore, we will only look at thecomputation of the x locations. One starts by assigninglocation x # 0 to the source. Then, the H graph istraversed in a topological order. To nd the location

  • 7/28/2019 Tvlsi2002 Fixed

    5/17

    5

    of a vertex, one iterates over all incoming edges andmaximizes the sum of the source location and sourcewidth. Figure 3 illustrates the algorithm on two exam-ples. The worst-case and average-case complexity of this algorithm is O $ n2 & , since the two graphs, together,have a xed O $ n2 & number of edges, and topologicaltraversals take linear time in the number of edges.

    We say that a block placement is representable (orcan be captured) by a sequence pair iff there existsa sequence pair which encodes that placement. A fun-damental theorem from [20] implies that at least oneminimal-area placement is representable with sequencepair (in fact, there are many). Therefore, sequence pairrepresentation is justied for area minimization.

    Sequence pairs can be used to oorplan hard rectan-gular blocks by Simulated Annealing [20], [21], [27],[28]. The moves are (i) random swaps of blocks inone of the two sequence pairs, and (ii) rotations of single blocks. Sequence pairs are modied in constanttime, but need to be re-evaluated after each move. Noincremental evaluation algorithms have been reported,therefore the annealer spends most of the time evalu-ating sequence pairs.

    The sequence pair representation and necessary al-gorithms have been extended to handle xed blocks[21] as well as arbitrary convex and concave recti-linear blocks [10]. Recently, the original O $ n 2 & -timeevaluation algorithm [20], has been simplied and spedup to O $ n log $ n & & by Tang et al. [27], and then toO $ n log $ log $ n & & & [28]. Importantly, those algorithms do

    not change the semantics of evaluation they onlyimproved runtime, and lead to better solution qualityby enabling a larger number of iterations during thesame period of time. While O-trees [23] and cornerblock lists [11] can be evaluated in linear time, thedifference in complexity is dwarfed by implementationvariations and tuning, e.g., the annealing schedule. Theimplementation reported by Tang et al. [28] seems tooutperform most known implementations, suggestingthat the sequence pair is a competitive oorplan rep-resentation.

    In our experiments, the simple O $ n 2 & evaluation al-gorithm from [27] performs faster than the O $ n log $ n & & -

    time algorithm from the same paper. We implement theO $ n log $ n & & -time algorithm using C++ STL maps. 2 Onami49 benchmark with 49 blocks the O $ n log $ n & & -timealgorithm is slower than O $ n2 & -time algorithm by afactor of 7X. On a benchmark with 32498 blocks theO $ n log $ n & & -time is slower than O $ n2 & -time algorithmby a factor of 4X. This is primarily due to the simplic-ity and lower implementation overhead of data struc-tures used by the O $ n2 & -time algorithm. A more recentpaper [28] claims that their advanced O $ n log $ log $ n & & & -time algorithm outperforms the quadratic algorithm in

    2 STL maps are implemented using red-black trees.

    practice. Given that it is considerably more involved,but based on the same principles, we choose to base ourwork on the quadratic algorithm, leaving out a potentialspeed up.

    All three algorithms are based on the followingtheorem [27]: The x-span of the oorplan to whichsequence pair $ S 1 S 2 & evaluates, equals to the lengthof the longest common weighted subsequence of S 1and S 2 , where weights are copied from block widths.Analogous statement about the y-span deals with thelongest common subsequence of S R1 and S 2 , where R stands for reversed and weights are copied fromblock heights. Moreover, the computations of x and ylocations of all blocks can be integrated into the longestcommon subsequence computations.

    EF D

    A

    CBXslack

    (a) (b)

    Fig. 4. In Figure (a) X-slack of blocks B and C is shownby the solid arrow. Slack is the distance a block canbe moved in a particular dimension without increasingthe area of the oorplan. In Figure (b) blocks withzero Y-slack are shown. They lie on a critical pathmarked with arrows.

    EF D

    A

    CB

    FE

    B C

    A

    xSlack for Block A =

    ySlack for Block E =

    ABFECD>

  • 7/28/2019 Tvlsi2002 Fixed

    6/17

    6

    and potentially other oorplan representations. It isbased on the following series of observations

    x and y locations are computed independently. in each dimension, the oorplan is constrained by

    one or more critical paths in respective con-straint graphs. A critical path is a path of blocksthat constrain each other in the same direction andare tightly packed so that any change in block location must produce overlaps or increase thespan of the oorplan.

    in each dimension, the computation of block lo-cations based on the constraint graphs is math-ematically identical to the propagation of arrivaltimes in Static Timing Analysis. Formally, STAis performed on an edge-weighted graph, whilethe constraint graphs are vertex-weighted, How-ever, this difference is supercial since a vertex-

    weighted graph can be trivially transformed intoan edge-weighted graph, e.g., by distributing ver-tex weights to incident edges, or otherwise.

    after the x-span X of the oorplan and x-locationsof blocks are known, one can perform a symmetriccomputation of locations in right-to-left direction,assigning location X to the sink vertex. This willbe analogous to the back-propagation of requiredarrival times in Static Timing Analysis.

    by analogy with Static Timing Analysis, the dif-ference between the two locations computed foreach block slack is related to the mostcritical path on which this block lies. In partic-

    ular, zero slacks are always associated with pathsthat constrain the oorplan. Negative slacks areimpossible as long as blocks do not overlap.

    Slacks can be computed with any sequence pairevaluation algorithm that can work in left-to-right andright-to-left modes, which includes all algorithms weare aware of. Figure 5 shows oorplan evaluations forthe same sequence pair in bottom-left mode and top-right mode.

    A fast O $ n2 & algorithm for oorplan evaluation usingsequence pairs was presented by Tang et. al [27]. It isbased on Longest Common Subsequence (LCS) com-

    putation. The positions of blocks are recorded in left-to-right (bottom-to-top) order. The algorithm works asfollows. Assume the blocks are 1 N , and the input se-quence pair is $ X Y & . Both X and Y are a permutationof 1 N . Block position array Position

    b b # 1 N , isused to record x or y coordinate of block b, dependingon weight w $ b & equals to the width or height of block b respectively. To record the indices in both X andY for each block b, the array match

    b b # 1 N isconstructed to be match

    b x # i and match

    b y # jif b # X

    i # Y

    j . The array Length

    1 N is used torecord the length of candidates of the LCS. The actualalgorithm for LCS computation is shown in Figure 6.

    To calculate the actual positions of the blocks two callsto LCS ORIG are made. X-positions are calculated byinitializing the weights array weights

    1 N with thewidths of blocks and invoking LCS ORIG( X Y xlocs ).Y-positions are calculated by initializing the weightsarray w

    1 N with the heights of blocks and invokingLCS ORIG( X R Y ylocs ), where X R is the sequence X in reversed order. Figure 7 shows the pseudo codeto evaluate the oorplan in bottom-left mode, givena sequence pair $ X Y & using LCS computations.

    To evaluate the oorplan in the top-right mode weneed to evaluate the LCS of the two sequences inright-to-left order. This can be achieved easily byreversing the two sequences and invoking LCS ORIG.The pseudo code for SP EVAL REV is presented inFigure 8. It reverses the two sequences, X and Y beforecalling the original algorithm. To compute the slackswe need the locations of the bottom-left corner of eachblock when evaluating the sequence pair in top-rightmode. Lines 12 through 16 of SP EVAL REV achievethis.

    Figure 9 presents pseudo code for the procedureEVAL SLACKS , which evaluates slacks for eachblock in the design, given a sequence pair $ X Y & . Asillustrated in Figure 4 (a), the slack of a block ina oorplanning instance represents the distance (in aparticular dimension) at which this block can be movedwithout changing the outline of the oorplan. Blockswith zero slack in the Y dimension are shown in Figure4 (b). Such blocks must lie on critical paths in the

    relevant constraint graph. Figure 10 annotates blocksin a given oorplan with horizontal ( x) and vertical ( y)slacks.

    The above mentioned code which evaluates slacksfor blocks depends on the fact that LCS ORIG( X Y )and LCS ORIG( X R Y R) procedures give the samevalue for the length of the lcs of two sequences X andY . We formalize this as a lemma below. The followingnotation is used; lcs $ X Y & is the longest commonsub-sequence of sequences X and Y and LCS $ X Y & isthe length of lcs $ X Y &Lemma 1 For two weighted sequences X and Y thelength of lcs of X and Y is the same as the length of

    the lcs of reversed sequences X R

    and Y R

    .Proof The weights of individual blocks in the twosequence pairs $ X Y & and $ X R Y R & are the same.Thus by denition of lcs, lcs $ X Y & # lcs $ X R Y R &and also LCS $ X Y & # LCS $ X R Y R & . Alternatively,as explained in [20], LCS $ X Y & for a sequence pair

    $ X Y & representing a oorplan, is the length of thelongest path in the horizontal constraint graph (HCG)implied by the sequence pair. i.e. the width of theoorplan. Reversing the two sequences to form thesequence pair ( X R Y R) amounts to reversing thedirection of all the edges of the directed acyclic graph(DAG) represented by the HCG. Since the longest

  • 7/28/2019 Tvlsi2002 Fixed

    7/17

  • 7/28/2019 Tvlsi2002 Fixed

    8/17

    8

    1 EVAL SLACKS ( X

    Y ) /*

    X

    Y is the sequence pair*/2 N = NumberOfBlocks;3 xlocs

    1

    N

    ylocs

    1

    N /*Block positions in bottom-left mode*/4 xlocsRev

    1

    N

    ylocsRev

    1

    N /*Block positions in top-right mode*/5 xSlacks

    1

    N

    ySlacks

    1

    N /*X and Y slacks for each block*/6 SP EVAL ORIG( X

    Y

    xlocs

    ylocs );

    7 SP EVAL REV( X

    Y

    xlocsRev

    ylocsRev );8 for i = 1 to N 9 begin10 xSlack

    i xlocsRev

    i xlocs

    i ;11 ySlack

    i ylocsRev

    i ylocs

    i ;12 end13 return;

    Fig. 9. Pseudo code for EVAL SLACKS. xlocs, ylocs, xlocsRev, ylocsRev hold the positions of blocks

    path in this new reversed HCG is the same as the oldHCG, the length of lcs of ( X R Y R) should be the sameas the length of the lcs of ( X Y ). Similar argument

    holds for the vertical direction.We now prove that the slacks for a block cannot benegative and blocks with zero slacks lie on the criticalpath.Theorem 1(a) Slacks for a block cannot be negative.(b) A block has zero slack in a particular dimensioniff it lies on the critical path in that dimension.Proof (a) We will base our discussion on the x-slack of ablock. As computed by procedure EVAL SLACKS(Figure 9), x-slack for each block is the differencein the x-locations of the block calculated in top-

    right mode and bottom-left mode. Let b be a block in the design. The sequence pair $ X Y & can berepresented as $ $ X l b X r & $ Y l b Y r & & . Similarly thereversed sequence pair $ X R Y R & can be represented as

    $ $ X Rr b X Rl & $ Y

    Rr b Y

    Rl & & . The x-location of block b in

    the bottom-left mode is given by, xlocBL b # LCS $ X l Y l & [28].Similarly the x-location of block b in top-right modeis given by, xlocTR b # xSize width b LCS $ X Rr Y

    Rr & ,

    where xSize is the x-span of the oorplan. i.e. xSize # LCS $ X Y & # LCS $ X R Y R & . According toLemma 1, LCS $ X Rr Y

    Rr & # LCS $ X r Y r & . Thus xlocTR b

    can be written as, xlocT R b # LCS $ X Y & width b LCS $ X r Y r &For two sequences X # $ X l b X r & and Y # $ Y l b Y r & , if LCS $ X Y & LCS $ X l Y l & ( width b ( LCS $ X r Y r & ,then the lcs of sequences X and Y is

    $ lcs $ X l Y l & b lcs $ X r Y r & , and the length of thislcs is LCS $ X l Y l & ( width b ( LCS $ X r Y r & , which cannotbe greater than LCS $ X Y & . Thus, by contradiction, thefollowing inequality holds. LCS $ X Y &

    LCS $ X l Y l & ( width b ( LCS $ X r Y r &

    $ LCS $ X Y & width b LCS $ X r Y r & & LCS $ X l Y l &

    0

    xlocT R b xlocBL b

    0

    xSlack b

    0We have proved that for any block b, its x-slack isnon-negative. This holds true for the y-slack too. QED.

    (b) As above, we base our discussion on thex-critical path. A critical path of a oorplan inx-dimension is dened as the longest x-path of theoorplan. There can be more than one critical path.However the length of all the critical paths is thesame and equal to the LCS $ X Y & , where $ X Y & is thesequence pair representing the oorplan. If a block blies on the x-critical path then b is a part of the lcsof X and Y . As shown in theorem 1(a), the followingequality holds. LCS $ X Y & # LCS $ X l Y l & ( width b ( LCS $ X r Y r &

    $ LCS $ X Y & width b LCS $ X r Y r & & LCS $ X l Y l & #

    0

    xlocT R b xlocBL b # 0

    xSlack b # 0Similarly if a block b has zero x-slack, then xlocT R b xlocBL b # 0

    $ LCS $ X Y & width b LCS $ X r Y r & & LCS $ X l Y l & #0

    LCS $ X Y & # LCS $ X l Y l & ( width b ( LCS $ X r Y r &

    b is on the longest x-path or the critical path.The same argument holds for blocks on the y-criticalpath. QED.

    Based on the above theorem we have the followingcorollary.Corollary If a move improves the oorplan spanin x or y direction then it must involve some 0-slack blocks.Proof Let a move M improve the x-span of theoorplan represented by sequence pair $ X Y & . Let

    $ X N Y N & be the sequence pair after the move. M couldbe of any type. e.g. swap, rotate etc. If the move M does not involve any block on the x-critical path, then LCS $ X N Y N &

    LCS $ X Y & because lcs $ X Y & is alsoa sub-sequence of $ X N Y N & . Thus, a move M mustinvolve a block with zero x-slack in order to improve

  • 7/28/2019 Tvlsi2002 Fixed

    9/17

    9

    the oorplans x-span. Similar result holds for they-direction. QED.

    B. Slack-based movesOnce slacks are known, they can be used in move

    selection. Both the timing analysis interpretation aboveand the common subsequence interpretation from [27]imply that if a move (such as pair-wise swap) does notinvolve at least one block with zero slack in a givendimension, then the oorplan span in that dimensioncannot decrease after the move. This is because sucha move cannot improve critical paths or, equivalently,longest common subsequences. Therefore we biasmove selection towards blocks having zero slack inat least one dimension. Of those blocks, the ones withlarge slack in the other direction are potentially goodcandidates for single-block moves, such as rotations,and more gradual aspect ratio changes, discreteor continuous can be chosen efciently. Blockswith two zero slacks, especially small blocks, are goodcandidates for a new type of move, in which a block is moved simultaneously in both sequence pairs tobecome a neighbor of another block (in both sequencepairs, and, thus in placement). Namely, we attempt tomove a critical block C next to a block L with as largeslacks as possible, since large slacks imply that white-space can be created around L (more precise conditionscan be written, but will still be heuristic). Figure10 illustrates such a move. The following exampleillustrates the four possible ways of moving a block close to another by manipulating the sequence pair.

    Example: Consider the ve-block sequence pair a b c d e % c a d e b % % . We wish to move block e close to block a in the oorplan. This can be donein four ways:

    a e b c d % c a e d b % (e is to right of a)

    e a b c d % c e a d b % (e is to left of a)

    a e b c d % c e a d b % (e is below a)

    e a b c d % c a e d b % (e is above a)

    In addition to changing the sequence pair, our im-

    plementation changes block orientation and aspect ratiobased on current slacks. We observe that [22] alreadysuggested the analogy with static timing analysis in thecontext of FPGA placement. However, their algorithmsare rather different and explicitly rely on H and Vconstraint graphs, while our proposed algorithms donot.

    C. Handling Soft Blocks Using Slack Information

    In a oorplanning instance, soft blocks have a xedarea but an aspect ratio which is variable betweencertain pre-determined limits. We added slack-based

    move types to change aspect ratios of soft blocks.During annealing, at regular intervals, a procedurecalled PackSoftBlocks is invoked to shape the softblocks to improve the area of the total oorplan.PackSoftBlocks adopts a greedy approach. A block with low (preferably zero) slack in one dimensionand high slack in the other dimension is chosen. Theheight and the width of such a block is changed withinallowable limits so that its size in the dimension of smaller slack is reduced (to increase the slack). Suchmoves are greedily applied to all soft blocks in thedesign. Figure 11 gives the pseudo code for the greedyprocedure PackSoftBlocks .

    D. Wirelength Minimization

    In classical oorplanning, the global objective is tominimize wirelength and total area of the design. This

    implies multi-objective minimization. Typically, mostsimulated annealing based oorplanners use a linearcombination of area and wirelength as an objective forthe annealer. In our oorplanner, we too use a linearcombination of area and HPWL to evaluate annealermoves. Since area and wirelength have differentdimensions they need to be normalized to give areasonable linear combination. In our implementation,the area term is normalized by the total area of allblocks, and the wirelength term is normalized by thecurrent wirelength of the oorplan at every move.The term in the simulated annealing algorithm iscalculated as follows.

    area #area new area old

    totalArea blocks

    HPWL # HPW L new HPWL old

    HPWL old # $ 1 weight HPWL & area ( weight HPWL HPWL

    weight HPWL is the linear weight attributed to theHPWL term and its value is between 0 and 1. Duringthe annealing, all moves for which is negative areaccepted. All moves with a positive are acceptedif random exp $

    temp initialtemp current & , where random is a

    random number between 0 and 1. Thus the probability

    of accepting a bad move decreases as temp current isreduced.Additional moves are designed to improve the

    wirelength. For a given block a , we calculate, usinganalytical techniques, its ideal location that wouldminimize quadratic wirelength of its incident wires. 3

    We determine the ideal location $ xa ya & of block awhich minimizes the following function.

    N a

    v N

    $ xv xa &2

    ( $ yv ya &2

    3 Analytical techniques are used because they are fast and easy toimplement.

  • 7/28/2019 Tvlsi2002 Fixed

    10/17

    10

    0

    200

    400

    600

    800

    1000

    1200

    0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

    before move: area=1.07e+07

    clkc

    clkd

    cmp1

    cmp2cmp3

    cntd

    cntu

    npd

    nps

    ppd

    pps

    x 0%

    y 0%

    x 0%

    y 0%

    x 0%

    y 0%

    x 0%

    y 0%

    x 0%

    y 0.04%

    x 0%

    y 14.31%

    x 5.22%y 14.31%

    x 5.22%

    y 14.31%

    x 7.71%

    y 17.91%

    x 7.71%

    y 17.91%

    x 16.59%

    y 21.45%

    0

    200

    400

    600

    800

    1000

    1200

    0 2000 4000 6000 8000 10000 12000

    after the move: area=1.02e+07

    clkcclkd

    cmp1

    cmp2 cmp3cntd

    cntu

    npd

    nps

    ppd

    pps

    x 0%

    y 0%

    x 0%

    y 0%

    x 0%y 0%

    x 0%

    y 0%

    x 0%

    y 0.05%

    x 0%

    y 4.12%

    x 0%

    y 4.13%

    x 14.44%

    y 4.20%

    x 14.44%

    y 4.20%

    x 16.68%

    y 8.34%

    x 16.68%

    y 25.02%

    Fig. 10. A slack-based move in a highly suboptimal oorplan of benchmark hp. x and y slacks are shownas percentages of the respective spans of the oorplan. Module cmp3 the smallest with zero y slack ismoved upon the module cntu , which has the high y slack. This move improves vertical span and slightlyworsens the horizontal span, but the oorplan area is reduced.

    1 PackSoftBlocks (Direction)2 Calculate X-Slacks and Y-Slacks for all N blocks ( EVAL SLACKS () );3 Sort blocks in increasing order according to X-Slacks in sortedBlks X-Slacks4 Sort blocks in increasing order according to Y-Slacks in sortedBlks Y-Slacks5 for i = 1 to N6 begin7 if(Direction = Horizontal)8 currBlock = sortedBlks X-Slacks[i];9 else if(Direction = Vertical)10 currBlock = sortedBlks Y-Slacks[i];11 Shape currBlock to increase the slack in the critical direction &

    reduce the slack in the other direction, within allowable limits;12 end13 return;

    Fig. 11. Pseudo code for PackSoftBlocks. PackSoftBlocks is called once with Direction = Horizontal andonce with Direction = Vertical.

    The ideal location $ xa ya & of block a is simply theaverage of the position of all modules connected toblock a . We then identify the block b closest to theideal location. This is done by expanding a circlecentered at the ideal location and identifying the closestblock b. We then attempt to move block a in thesequence pair so that in both sequences it is locatednext to b. As explained in Section III-B, we evaluatethe four possible ways to do that, and choose the best.Thus an attempt is made to move a close to its ideallocation to minimize quadratic wirelength.

    Another type of move attempts to minimize both theoorplan size and wirelength objectives at the sametime. Find a block b closest to the ideal location of the chosen block a such that the block b has largeslack in at least one dimension. Depending on whetherb has a large slack in the X-dimension or in the Y-dimension, we place a with a horizontal relation or avertical constraint relative to b, respectively. Empiricalmeasurements conrm that adding the proposed movetypes improves nal oorplans.

    0

    50

    100

    150

    200

    250

    0 100 200 300 400 500 600 700 800

    n100 area= 194586 WS= 7.75% AR= 3.22 time= 45.91s

    Fig. 12. A oorplan with 100 blocks, generated withouta constraining outline, has aspect ratio 3.22:1.

    E. Fixed-outline Constraints

    Fixed-outline oorplans enable top-down design of very large scale ASICs and SoCs. Figure 12 showsthe result of a oorplan with pure area minimizationwithout any xed outline constraints. The white-spaceachieved is 7.75% with an aspect ratio of 3.22:1.However this oorplan can be completely useless for asituation where 1:1 aspect ratio is imposed by a higher-level oorplan.

    The following notation will be used in our oorplan-

  • 7/28/2019 Tvlsi2002 Fixed

    11/17

    11

    ning formulations. For a given collection of blocks withtotal area A and given maximum white-space fraction , we construct a xed outline with aspect ratio

    1. 4

    H #

    $ 1 ( & A W #

    $ 1 ( & A

    Aside from driving the annealer by area minimiza-tion, we consider the following objective functions:

    1) The sum of the excessive length and width of the oorplan,

    2) The greater of the excessive length and width of the oorplan.

    Denoting the current height and width of the oorplanby H and W , we dene these functions as

    (1) max H H 0 max W W 0 (2) max H H W W

    The choice of these functions is explained by the factthat the xed-outline constraint is satised when and

    only when each of those functions takes value 0.0 orless. For this reason we cannot consider the product of xed outline violations.

    Our experiments show that a classic annealer-based oorplanner is practically unable to satisfy thexed-outline constraints (for all of the three above-mentioned objective functions). Therefore we addition-ally bias the selection of moves. Figure 13 shows theevolution of the xed-outline oorplan during Simu-lated Annealing with slack-based moves. The schemeworks as follows. At regular time intervals duringthe simulated annealing the current aspect ratio iscompared to the aspect ratio of the xed outline. If thetwo are different, then the slack-based moves describedearlier are applied to change the current aspect ratio inthe needed direction. For example, if the width needsto be reduced then we chose the blocks in the oorplanwith smallest slack in the x direction and insert themabove or below the blocks with largest slack in the ydirection. These moves have better chances of reducingthe area and improving the aspect ratio of the currentoorplan at the same time. Through these repeatedmoves during the simulated annealing the structure of the oorplan is biased towards the aspect ratio of thexed outline. Our techniques also work with multi-objective minimizations (area+HPWL) and handle softblocks in designs. As shown in Section IV, our imple-mentation is successful in satisfying a variety of xed-outline constraints. One concern about the intelligentmove selection techniques during simulated annealingis that it may limit the solution space coverage. Ingeneral, these characteristics are necessary if simulatedannealing is to be successful. We interleave the intel-ligent moves with totally randomized moves to ensurethat simulated annealing does not get trapped in a localminima.

    4 The restriction of 1 is imposed without loss of generalitysince our oorplanner can change orientations of individual blocks.

    Current Outline

    Required OutlineSuccess

    Failure

    Restart

    yviolation

    xviolation

    (Initial)(1000 moves)

    (End of annealing)

    Fig. 13. Snap-shots from xed-outline oorplanning.The number of annealing moves is xed, but if theevolving oorplan ts within the required xed-outline,annealing is stopped earlier. If at the end of annealingthe xed-outline constraints are not satised, it isconsidered a failure and a fresh attempt is made.

    F. Hierarchical Layout

    Hierarchical design is becoming increasingly attrac-tive as a way to manage design complexity [18], [15]. Itis argued in [18] that hierarchy is needed for humans,not for algorithms. The need for hierarchical designmakes it imperative that the whole design ow sup-port a hierarchical design methodology. We argue thatxed-outline oorplanning is an integral component of such a multi-level hierarchical ow. As discussed in[15], the top level oorplan might need to be xedearly on in the design cycle to avoid costly iterations

    later. Once the top-level oorplan is nalized, thehigh-level blocks and their shapes impose a xed-outline constraint on the lower levels of design. Adesign team might decide to implement each high-level block at, in which case there would be no needfor a xed-outline oorplanning tool. However, if thedesign team decides to implement high-level blocksin a hierarchical fashion as well then xed-outlineoorplanning becomes necessary. With increasing chipsizes, such a scenario is not unrealistic.

    We use our techniques in xed-outline oorplanningto develop a hierarchical oorplanning ow which is justied for very large ASICS and SoCs. We employ a

    simple connectivity based clustering scheme to createa top-level of hierarchy with clustered blocks. Eachtop-level clustered block is soft with aspect ratiosallowed from 0.75 to 1.5 and has an area equal to1 15 sum of area of all sub blocks . Thus a white-space of 15% is allotted to each top-level block, so thatParquet can nd a solution satisfying the xed-outlineconstraints. The connectivity based clustering we use,is a multi-level greedy approach employing a series of passes till the design reduces to manageable size. Ineach pass we group highly connected cells together.Thus the design size reduces by a factor of 2 in eachpass. We remove any net that connects cells only within

  • 7/28/2019 Tvlsi2002 Fixed

    12/17

  • 7/28/2019 Tvlsi2002 Fixed

    13/17

  • 7/28/2019 Tvlsi2002 Fixed

    14/17

    14

    plots reect the difculty in satisfying xed-outlineoorplans with given aspect ratios, which highly de-pends on the dimensions of the blocks. As seen fromthe plots, our simulated annealer fairly often failed tosatisfy the given outline, however, the probability of success is typically over 50%, i.e., at least ve in tenstarts are successful. This consistent rate of successsuggests that our slack-based moves indeed improvelocal search (simulated annealing without slack-basedmoves is never able to satisfy the xed outline). Alsonote that in most of the unsuccessful attempts the nalsolutions are within 1-2% from the desired outline, yetwe regard them as failures .

    Out of the three objective functions we tried, min-imizing the sum of excessive width and height andminimizing the area is more successful than minimiz-ing the maximum of excessive width or height. Findingan explanation of this empirical result remains an openproblem.

    When we decreased in our experiments, somexed outlines are never satised, which may be dueto the absence of solutions with a given aspect ratioand very small white-space. The plot of probabilityof success of satisfying the xed-outline constraintfor a design(n100) with 12% white-space, is shownin Figure 15. As expected, decreasing worsens theprobability of success. The average runtimes requiredto satisfy the xed-outline constraints also increasewith lower .

    Figure 16 shows the plots of probabilities of success,

    HPWL and average time vs. aspect ratios for designn100 xed-outline constraints in the wirelength min-imization mode. The time taken to satisfy the xed-outline constraints increases signicantly compared tothose in Figure14, because of the overhead of cal-culating the wirelength from scratch for every move.The probabilities of success decrease a little. Howeverfor very skewed aspect ratios, wirelength minimizationsuffers. In the outline-free mode , Parquet achieves anaverage wirelength of 323 and average white-spaceof 10.18% over 50 independent starts for the designn100. Our experiments with other publicly availablebenchmarks (n50, n200, ami49 etc) produced consis-

    tent results.

    D. Hierarchical Layout Context

    We develop a hierarchical oorplanning ow em-ploying the xed-outline oorplanner, Parquet, as anengine. For our experiments we use the publicly avail-able 6 ibm06 placement benchmark [1]. ibm06 is amixed-size placement benchmark with 32498 cells,including 178 macros. The standard cells are of varyingwidths but similar height and a modied standard cell

    6 The benchmarks are available on the Web athttp://vlsicad.eecs.umich.edu/BK/ISPD02bench/

    placement algorithm which can handle macros shouldbe employed to place such a design. However, weperform this experiment to demonstrate the scalabilityof hierarchical oorplanning. We consider all blocksto be hard, thus further constraining the problem. Weemploy a simple connectivity based clustering schemeto create a top-level of hierarchy with 238 clusteredblocks (each top-level block having approximately 128blocks). Big macros are kept out of this clustering.A white-space of 15% is allotted to each top-levelclustered block. Each top-level clustered block is softwith aspect ratios allowed from 0.75 to 1.5. Thetop-level design is oorplanned without any xed-outline constraints and the objective is to minimizearea. The top-level oorplan is shown in Figure 17(a). The top-level clustered blocks impose xed-outlineconstraints on the sub-blocks. Each top-level block isnow oorplanned with these constraints. The nal atoorplan of ibm06 is shown in Figure 17 (b). Weachieved a dead-space of 17.44% in 37 minutes. Incomparison oorplanning ibm06 design at achieved adeadspace of 55.62% in 1070 minutes. We tried speed-ing up the at oorplanning by using the O $ n log $ n & &fast sequence pair evaluation algorithm [27]. However,as pointed out in Section II, O $ n log $ n & & algorithmperformed worse than O $ n2 & algorithm. While, in ourexperiments we only used a single level of hierarchy,the ow could be easily changed to handle multiplelevels of oorplanning hierarchy. Also, we consideredonly area as an objective, but a linear combination

    of area and wirelength can also be considered as anobjective. We perform the hierarchical oorplanningexperiment to demonstrate that multi-level oorplan-ning is much more scalable than at oorplanning.Indeed, oorplanning almost 32K objects at using thesequence pair representation would require inordinateamounts of time. The oorplan obtained by hierar-chical oorplanning can further be compacted usingefcient layout compaction schemes or employing lowtemperature annealing. However, with close to 32Kobjects as in our case, it might be too costly to eventry low temperature annealing with the sequence pairrepresentation. We conclude that in comparison to the

    at oorplanner, the hierarchical oorplanner scalesmuch better in terms of runtime and solution quality.Thus, hierarchical oorplanning can used to oorplana large number of top-level blocks efciently and alsoto support a multi-level hierarchical design ow.

    V. C ONCLUSIONS

    Our work makes an important step in identifying andevaluating fundamental optimization techniques thatare successful for wirelength optimization in large-scale xed-outline oorplanning. In perspective, weexpect our techniques to be useful (i) in oorplanningwith more realistic objective functions, (ii) during

  • 7/28/2019 Tvlsi2002 Fixed

    15/17

    15

    0

    0.2

    0.4

    0.6

    0.8

    1

    1 1.5 2 2.5 3 3.5 4 4.5 5

    P r o

    b a b i l i t y o

    f s u c c e s s

    Aspect Ratio of Fixed Outline

    Probablity(success) vs Aspect Ratio

    12% white-space15% white-space

    10

    12

    14

    16

    18

    20

    1 1.5 2 2.5 3 3.5 4 4.5 5

    R u n

    t i m e s

    ( s e c

    )

    Aspect Ratio of Fixed Outline

    Runtimes vs Aspect Ratio

    12% white-space15% white-space

    Fig. 15. Probability of success and average runtimes for oorplanning design n100 with xed-outlineconstraints. The objective function is area. The maximum white-space for the design is 12%. i.e. 12% .Probabilities and runtimes for a design with maximum white-space of 15% are also provided for comparison.The probabilities of success with 12% white-space are signicantly lower compared to those with 15%white-space. because of smaller . We plotted average of 50 runs for each aspect ratio.

    0

    0.2

    0.4

    0.6

    0.8

    1

    1 1.5 2 2.5 3 3.5 4 4.5 5

    P r o b a b i l i t y o f s u c c e s s

    Aspect Ratio of Fixed Outline

    Probablity(success) vs Aspect Ratio

    obj fn: area

    300

    320

    340

    360

    380

    400

    420

    440

    1 1.5 2 2.5 3 3.5 4 4.5 5

    H P W L

    Aspect Ratio of Fixed Outline

    HPWL vs Aspect Ratio

    obj fn: area

    40

    50

    60

    70

    80

    90

    100

    1 1.5 2 2.5 3 3.5 4 4.5 5

    R u n t i m e s ( s e c )

    Aspect Ratio of Fixed Outline

    Runtimes vs Aspect Ratio

    obj fn: area

    Fig. 16. Probability of success, HPWL and average runtimes for oorplanning design n100 with xed-outline

    constraints in HPWL minimization mode. The maximum white-space for the design is 15%. i.e.

    15% . Theprobabilities of success are slightly lower compared to Figure 14, because of multi-objective minimization.Also, the wirelength minimization suffers when required aspect ratios are skewed. We plotted average of 50runs for each aspect ratio.

    0

    500

    1000

    1500

    2000

    2500

    3000

    3500

    4000

    0 500 1000 1500 2000 2500 3000 3500 4000 4500

    ibm06_Toplevel area= 1.00e+07 AR= 1.04

    0

    500

    1000

    1500

    2000

    2500

    3000

    3500

    4000

    0 500 1000 1500 2000 2500 3000 3500 4000 4500

    ibm06 area= 1.00e+07 WS= 17.44% AR= 1.04 time=37m

    0

    500

    1000

    1500

    2000

    2500

    3000

    3500

    4000

    0 500 1000 1500 2000 2500 3000 3500 4000 4500

    ibm06_noHier area=1.9e+7 WS=55.62% AR=1.09 time=1070m

    (a) (b) (c)

    Fig. 17. Hierarchical Floorplanning of ibm06 design with 32498 blocks. ibm06 is a mixed-size design andhas 178 macros. We use a connectivity based clustering scheme to reduce the design size at top level to238 clustered blocks (each top-level block having approximately 128 blocks). Big macros are kept out of this clustering. All blocks are hard. The top level is oorplanned without any xed-outline constraints. Thetop-level blocks impose xed-outline constraints on the lower level. The top-level oorplan is shown in Figure(a). The nal at oorplan is shown in Figure (b). 17% deadspace was achieved in 37m. In comparison Figure(c) shows the oorplan of ibm06 obtained by at oorplanning. 55% deadspace was achieved in 1070m.

  • 7/28/2019 Tvlsi2002 Fixed

    16/17

    16

    related optimizations with additional degrees of free-dom such as added logic [re-]synthesis, and (iii) innon-classical design ows such as virtual prototyping.These extensions will be explored in our future work.

    This work points out that a non-standard oor-planning formulation xed-outline oorplanningis signicantly harder than classic min-area outline-free oorplanning. We implement an annealing-basedoorplanner Parquet-1 that uses a recently discovered[27] sequence pair evaluation algorithm and study itsperformance both in the xed-outline and outline-freecontexts. We use the concept of slacks in a oorplanfor better local search. We added special techniques,based on slacks of individual blocks in a oorplan,to handle soft blocks. These new techniques whenincorporated into the simulated annealing framework perform well and achieve an area utilization close to99% for MCNC benchmarks within reasonable time.We also introduce special moves based on analyticalmethods to better drive wirelength (HPWL) minimiza-tion during simulated annealing. For the standard for-mulation, our oorplanner is competitive, both in termsof runtime and solution quality, with other leading-edge implementations and represents current state-of-the-art. However, our implementation experiencesserious difculties in the xed-outline context untilthe algorithm is modied. In particular, more relativewhite-space is required to satisfy an outline of a givenarea when its aspect ratio is xed.

    We propose new objectives that more successfully

    drive our annealing-based oorplanner to satisfy xed-outline constraints. New types of slack-based moves,that may be applicable to most oorplanner implemen-tations based on simulated annealing, are introduced.These special moves performed during annealing pro-vide better control of the x and y dimensions of theoorplan. We study the sensitivity of relative white-space in the design on the effectiveness of our proposedmethods. We also study the effect on wirelength mini-mization when trying to achieve various xed-outlines.

    Our experiments show that classical methods failfor xed-outline instances constructed from stan-dard MCNC benchmarks and other publicly available

    benchmarks, but when new objectives and slack-basedmoves are added to our Parquet-1 implementation, itnds acceptable xed-outline oorplans for a varietyof aspect ratios. We also conclude that minimizing thesum of excessive width and height is a more successfulapproach than minimizing the greater of the two.

    We demonstrate a top-down, hierarchical oorplan-ning ow with a single level of hierarchy. We are ableto oorplan 32498 blocks and achieve a dead-space of 17.44% in 37 minutes. In comparison at oorplanningachieved a deadspace of 55.62% in 1070 minutes.

    We do not necessarily advocate our particular wayof doing hierarchical oorplanning, and plan to study

    this our future work. We also note that large designscan be rst partitioned using recursive bisection, butnot necessarily all the way down to the level of detail where the differences in block sizes and shapescomplicate recursive bisection. Our on-going work [1]aims to combine recursive bisection and oorplanningtechniques in the context of mixed-mode placement.In our on-going research we are extending the pro-posed methods to top-down, multi-level hierarchicaloorplanning and related applications to standard-cellplacement with large macro cells.

    REFERENCES

    [1] S. N. Adya and I. L. Markov, Consistent Placement of Macro-blocks Using Floorplanning and Standard-Cell Placement, ISPD 2002 , pp. 12-17

    [2] S. N. Adya, I. L. Markov, Fixed-outline FloorplanningThrough Better Local Search, ICCD 2001 , pp. 328-334.

    [3] A. E. Caldwell, A. B. Kahng, A. A. Kennings and I. L. Markov,Hypergraph Partitioning for VLSI CAD: Methodology forReporting, and New Results, DAC 1999 , pp. 349-354.

    [4] A. E. Caldwell, A. B. Kahng and I. Markov, Can RecursiveBisection Alone Produce Routable Placements?, DAC 2000 ,pp. 477-482.

    [5] Y. C. Chang, Y. W. Chang, G. M. Wu and S. W. Wu, B -Trees:A New Representation for Non-Slicing Floorplans, DAC 2000 ,pp. 458-463.

    [6] P. Chen and E. S. Kuh, Floorplan Sizing by Linear Program-ming Approximation, DAC 2000 , pp. 468-471.

    [7] J. Cong, T. Kong and D.Z. Pan Buffer Block Planning forInterconnect-Driven Floorplanning, ICCAD 1999 , pp. 358-363.

    [8] W. Dai, D. Huang, C. Chang, M. Courtoy, Silicon VirtualPrototyping: The New Cockpit for Nanometer Cip Design ASP-DAC 2003 , pp. 635-639.

    [9] Y. Feng, D. P. Mehta and H. Yang Constrained ModernFloorplanning, ISPD 2003 , pp. 128-134[10] K. Fujuyoshi and H. Murata, Arbitrary Convex and Concave

    Rectilinear Block Packing Using Sequence Pair, ISPD 1999 ,pp. 103-110.

    [11] X. Hong et al., Corner Block List: An Effective and Ef cientTopological Representation of Non-Slicing Floorplan, ICCAD2000 , pp. 8-13.

    [12] A. Jagannathan, S. W. Hur and J. Lillis, A Fast AlgorithmFor Context-Aware Buffer Insertion, ACM Transactions on Design Automation of Electronic Systems (TODAES), Vol. 7, No. 1,January 2002 , pp. 173-188

    [13] A. B. Kahng, Classical Floorplanning Harmful?, ISPD 2000 ,pp. 207-213.

    [14] J. Koehl, D. E. Lackey, G. Doerre, IBMs 50 Million GateASICs, ASP-DAC 2003 , pp. 628-634.

    [15] D. E. Lackey, Design Planning Methodology for Rapid

    Chip Deployment, IEEE/DATC Electronic Design ProcessWorkshop, 2001[16] A. Mehrotra, L. V. Ginneken and Y. Trivedi, Design Flow

    and Methodology for 50M gate ASIC ASP-DAC 2003 , pp.640-647.

    [17] J. Lin and Y. Chang, TCG: A Transitive Closure Graph BasedRepresentation for Non-Slicing Floorplans, DAC 2001 , pp.764-769.

    [18] L. Scheffer, Data Modeling and Convergence Methodology inIntegration Ensemble, IEEE/DATC Electronic Design ProcessWorkshop, 2001

    [19] T. S. Moh, T. S. Chang and S. L. Hakimi, Globally OptimalFloorplanning for a Layout Problem, IEEE Trans. on Circuitsand Systems - I , vol 43, no. 9, pp. 713-720, September 1996.

    [20] H. Murata, K. Fujiyoshi, S. Nakatake and Y. Kajitani, VLSImodule placement based on rectangle-packing by the sequencepair, IEEE Trans. on CAD, 1996 , vol 15(12), pp. 1518-1524

  • 7/28/2019 Tvlsi2002 Fixed

    17/17

    17

    [21] H. Murata and E. S. Kuh, Sequence-Pair Based PlacementMethods for Hard/Soft/Pre-placed Modules, ISPD 1998 , pp.167-172.

    [22] S. Nag and K. Chaudhary, Post-Placement Residual-OverlapRemoval with Minimal Movement, DATE 99 , pp. 581-586,

    [23] Y. Pang, C. K. Cheng and T. Yoshimura, An Enhanced

    Perturbing Algorithm for Floorplan Design Using the O-treeRepresentation, ISPD 2000 , pp. 168-173.[24] Y. Pang, F. Balasa, K. Lampaert and C. K. Chang, Block

    Placement with Symmetry Constraint Based on the O-TreeNon-Slicing Representation, DAC 2000 , pp. 464-468.

    [25] F. Remond,Monterey Usage at STMicroelectronics, DAC 02 ,http://www.montereydesign.com/customers/ST%20Presentation%20DAC%202002.pdf

    [26] N. Sherwani, Algorithms For VLSI Design Automation,Kluwer , 3rd ed. 1999.

    [27] X. Tang, R. Tian and and D. F. Wong, Fast Evaluationof Sequence Pair in Block Placement by Longest CommonSubsequence Computation, DATE 2000 , pp. 106-111.

    [28] X. Tang and D. F. Wong, FAST-SP: A Fast Algorithm forBlock Placement Based on Sequence Pair, ASPDAC 2001 .

    [29] F. Y. Young, C. C. N. Chu, W. S. Luk and Y. C. Wong,Floorplan Area Minimization Using Lagrangian Relaxation,

    ISPD 2000 , pp. 174-179.

    Saurabh N. Adya received his bachelorsdegree in Electronics and CommunicationEngineering from KREC, Mangalore Uni-versity, India in 1999 and a masters des-gree in Computer Science and Engineer-ing from the University of Michigan, AnnArbor, in 2002. He is currently a doc-toral candidate in the EECS department atthe University of Michigan. From 1999 to2000, he worked as IC Design Engineerat Texas Instruments, India. His current re-

    search interests are in the general area of VLSI CAD and speci cally,physical design for VLSI.

    Igor L. Markov is an assistant professorof Electrical Engineering and ComputerScience at the University of Michigan. Hedid his masters (Math) and doctorate (CS)work at UCLA, where he earned the BestPh.D. student award for 2000 at the Com-puter Science Department.

    Prof. Markovs interests are in quan-tum computing and in combinatorial op-timization with applications to the designand veri cation of integrated circuits. Prof.

    Markovs contributions include the circuit placer Capo and thequantum circuit simulator QuIDDPro.

    Prof. Markov co-authored more than 50 publications and isserving on technical program committees at ICCAD, DATE, ISPD,GLSVLSI, SLIP and IWLS in 2003.