644 ieee transactions on components and ... ieee transactions on components and packaging...

14
644 IEEE TRANSACTIONS ON COMPONENTS AND PACKAGING TECHNOLOGIES, VOL. 29, NO. 3, SEPTEMBER 2006 Placement and Routing for 3-D System-On-Package Designs Jacob Rajkumar Minz, Student Member, IEEE, Eric Wong, Student Member, IEEE, Mohit Pathak, Student Member, IEEE, and Sung Kyu Lim, Senior Member, IEEE Abstract—Three-dimensional (3-D) packaging via system-on- package (SOP) is a viable alternative to system-on-chip (SOC) to meet the rigorous requirements of today’s mixed signal system integration. In this article, we present the first physical design algorithms for thermal and power supply noise-aware 3-D place- ment and crosstalk-aware 3-D global routing. Existing approaches consider the thermal distribution, power supply noise, and crosstalk issues as an afterthought, which may require an expen- sive cooling scheme, more decoupling capacitors decap , and additional routing layers. Our goal is to overcome this problem with our thermal/decap/crosstalk-aware 3-D layout automation tools. The traditional design objectives such as performance, area, wirelength, and via are considered simultaneously to ensure high quality results. The related experimental results demonstrate the effectiveness of our approaches. Index Terms—Crosstalk, mixed-signal CAD, placement and routing, power supply noise, system-on-package (SOP), thermal distribution, three-dimensional (3-D) packaging. I. INTRODUCTION S EMICONDUCTOR industry is beginning to question the viability of system-on-chip (SOC) approach due to its low- yield and high-cost problem. Recently, three-dimensional (3-D) packaging via system-on-package (SOP) [1]–[3] has been pro- posed as an alternative solution to meet the rigorous require- ments of today’s mixed signal system integration. 1 The SOP is about 3-D integration of multiple functions in a miniaturized package achieved by thin film embedding. The 3-D SOP con- cept optimizes integrated circuits (ICs) for transistors and the package for integration of digital, RF, optical, sensor and others. It accomplishes this by both build-up SOP, similar to IC fabrica- tion, and by stacked SOP, similar to parallel board fabrication. The uniqueness of 3-D SOP is in the highly integrated or em- bedded RF, optical or digital functional blocks, and sensors, in contrast to stacked ICs and stacked package. Due to the high complexity in designing large-scale SOP under multiple objec- tives and constraints, computer-aided design (CAD) tools have become indispensable. Thus, innovative ideas on CAD tools for Manuscript received November 26, 2004; revised July 30, 2005. This work was supported by Grants from MARCO C2S2 and NSF CAREER under CCF- 0546382. This work was recommended for publication by Associate Editor C. Amon upon evaluation of the reviewers’ comments. The authors are with the School of Electrical and Computer Engi- neering, Georgia Institute of Technology, Atlanta, GA 30332 USA (e-mail: [email protected]). Color versions of Figs. 6, 7, 10, and 19 are available online at http://ieeex- plore.ieee.org. Digital Object Identifier 10.1109/TCAPT.2006.880441 1 A special issue on SOP (vol. 27, issue 2, May 2004) provides a comprehen- sive survey of the state-of-the-art in SOP technology. SOP technology are crucial to fully exploit the potential of this new emerging technology. However, there exists very few tools, if not none, that handle the complexity of automatic 3-D SOP layout generation. This article tackles the three most crucial issues that threaten the reliability of SOP paradigm: thermal distribution, power supply noise, and crosstalk. First, thermal issues can no longer be ignored in high performance 3-D packages due to higher power densities and other issues. High temperatures not only require more advanced heat sinks, they also degrade circuit performance. Interconnect delay increases with temperature, which degrades circuit timing. If timing deteriorates enough, logic faults can occur. Hence thermal issues must be considered early-on in the design process. Second, the continuing trend of reducing power supply voltage has resulted in reduced noise margin, which effects reliability and may even cause functional failures due to spurious transitions. Active devices in 3-D packaging draw a large volume of instantaneous current during switching, which causes simultaneous switching noise (SSN). Third, due to the scaling down of device geometry in deep-submicron technologies, the crosstalk noise between adjacent nets has become a major concern in high performance packaging design. Increased coupling noise can cause signal delays, logic hazards and even malfunctioning of the designs. Thus controlling the level of crosstalk noise in 3-D packages chip is an important task for the designers. However, existing approaches consider these issues as an af- terthought, which may require expensive cooling schemes, more decoupling capacitors decap , and more routing layers. In addition, many time-consuming iterations are required between full-length thermal/SSN/crosstalk simulation and manual layout repair until the convergence to a satisfactory result. We note that the placement of modules in 3-D SOP design has huge impact on thermal distribution and SSN while the routing of the signal nets has direct impact on crosstalk. Therefore, the goal of this article is to present the first physical layout algorithm for 3-D SOP that combines thermal/decap-aware 3-D placement and crosstalk-aware 3-D global routing algorithm. The traditional design objectives such as performance, area, wirelength, and via costs are considered simultaneously to ensure high quality results. The related experimental results demonstrate the effec- tiveness of our approach. The remainder of this paper is organized as follows. Section II discusses previous works. Section III presents the problem for- mulation and an overview of our 3-D algorithms. Section IV presents the SOP placement algorithm. Section V presents the SOP routing algorithm. Section VI presents the experimental re- sults. We conclude in Section VII. 1521-3331/$20.00 © 2006 IEEE

Upload: lamdung

Post on 20-May-2018

227 views

Category:

Documents


0 download

TRANSCRIPT

644 IEEE TRANSACTIONS ON COMPONENTS AND PACKAGING TECHNOLOGIES, VOL. 29, NO. 3, SEPTEMBER 2006

Placement and Routing for 3-DSystem-On-Package Designs

Jacob Rajkumar Minz, Student Member, IEEE, Eric Wong, Student Member, IEEE,Mohit Pathak, Student Member, IEEE, and Sung Kyu Lim, Senior Member, IEEE

Abstract—Three-dimensional (3-D) packaging via system-on-package (SOP) is a viable alternative to system-on-chip (SOC) tomeet the rigorous requirements of today’s mixed signal systemintegration. In this article, we present the first physical designalgorithms for thermal and power supply noise-aware 3-D place-ment and crosstalk-aware 3-D global routing. Existing approachesconsider the thermal distribution, power supply noise, andcrosstalk issues as an afterthought, which may require an expen-sive cooling scheme, more decoupling capacitors (=decap), andadditional routing layers. Our goal is to overcome this problemwith our thermal/decap/crosstalk-aware 3-D layout automationtools. The traditional design objectives such as performance, area,wirelength, and via are considered simultaneously to ensure highquality results. The related experimental results demonstrate theeffectiveness of our approaches.

Index Terms—Crosstalk, mixed-signal CAD, placement androuting, power supply noise, system-on-package (SOP), thermaldistribution, three-dimensional (3-D) packaging.

I. INTRODUCTION

SEMICONDUCTOR industry is beginning to question theviability of system-on-chip (SOC) approach due to its low-

yield and high-cost problem. Recently, three-dimensional (3-D)packaging via system-on-package (SOP) [1]–[3] has been pro-posed as an alternative solution to meet the rigorous require-ments of today’s mixed signal system integration.1 The SOP isabout 3-D integration of multiple functions in a miniaturizedpackage achieved by thin film embedding. The 3-D SOP con-cept optimizes integrated circuits (ICs) for transistors and thepackage for integration of digital, RF, optical, sensor and others.It accomplishes this by both build-up SOP, similar to IC fabrica-tion, and by stacked SOP, similar to parallel board fabrication.The uniqueness of 3-D SOP is in the highly integrated or em-bedded RF, optical or digital functional blocks, and sensors, incontrast to stacked ICs and stacked package. Due to the highcomplexity in designing large-scale SOP under multiple objec-tives and constraints, computer-aided design (CAD) tools havebecome indispensable. Thus, innovative ideas on CAD tools for

Manuscript received November 26, 2004; revised July 30, 2005. This workwas supported by Grants from MARCO C2S2 and NSF CAREER under CCF-0546382. This work was recommended for publication by Associate EditorC. Amon upon evaluation of the reviewers’ comments.

The authors are with the School of Electrical and Computer Engi-neering, Georgia Institute of Technology, Atlanta, GA 30332 USA (e-mail:[email protected]).

Color versions of Figs. 6, 7, 10, and 19 are available online at http://ieeex-plore.ieee.org.

Digital Object Identifier 10.1109/TCAPT.2006.880441

1A special issue on SOP (vol. 27, issue 2, May 2004) provides a comprehen-sive survey of the state-of-the-art in SOP technology.

SOP technology are crucial to fully exploit the potential of thisnew emerging technology. However, there exists very few tools,if not none, that handle the complexity of automatic 3-D SOPlayout generation.

This article tackles the three most crucial issues that threatenthe reliability of SOP paradigm: thermal distribution, powersupply noise, and crosstalk. First, thermal issues can no longerbe ignored in high performance 3-D packages due to higherpower densities and other issues. High temperatures not onlyrequire more advanced heat sinks, they also degrade circuitperformance. Interconnect delay increases with temperature,which degrades circuit timing. If timing deteriorates enough,logic faults can occur. Hence thermal issues must be consideredearly-on in the design process. Second, the continuing trendof reducing power supply voltage has resulted in reducednoise margin, which effects reliability and may even causefunctional failures due to spurious transitions. Active devicesin 3-D packaging draw a large volume of instantaneous currentduring switching, which causes simultaneous switching noise(SSN). Third, due to the scaling down of device geometryin deep-submicron technologies, the crosstalk noise betweenadjacent nets has become a major concern in high performancepackaging design. Increased coupling noise can cause signaldelays, logic hazards and even malfunctioning of the designs.Thus controlling the level of crosstalk noise in 3-D packageschip is an important task for the designers.

However, existing approaches consider these issues as an af-terthought, which may require expensive cooling schemes, moredecoupling capacitors decap , and more routing layers. Inaddition, many time-consuming iterations are required betweenfull-length thermal/SSN/crosstalk simulation and manual layoutrepair until the convergence to a satisfactory result. We note thatthe placement of modules in 3-D SOP design has huge impacton thermal distribution and SSN while the routing of the signalnets has direct impact on crosstalk. Therefore, the goal of thisarticle is to present the first physical layout algorithm for 3-DSOP that combines thermal/decap-aware 3-D placement andcrosstalk-aware 3-D global routing algorithm. The traditionaldesign objectives such as performance, area, wirelength, andvia costs are considered simultaneously to ensure high qualityresults. The related experimental results demonstrate the effec-tiveness of our approach.

The remainder of this paper is organized as follows. Section IIdiscusses previous works. Section III presents the problem for-mulation and an overview of our 3-D algorithms. Section IVpresents the SOP placement algorithm. Section V presents theSOP routing algorithm. Section VI presents the experimental re-sults. We conclude in Section VII.

1521-3331/$20.00 © 2006 IEEE

MINZ et al.: 3-D SYSTEM-ON-PACKAGE DESIGNS 645

II. RELATED WORKS

The physical design for 3-D integration has drawn interestfrom both academia and industry recently. Unlike the activephysical design research effort in mixed signal SOC [4], [5] and3-D stacked die technology [6]–[8], however, the research for3-D SOP considerably lacks behind due to its short history. Thephysical design research for SOP has been recently pioneeredby the [9]–[14]. Decoupling capacitor placement is done fortwo-dimensional (2-D) printed circuit board (PCB) designs[15]–[17]. Thermal-aware module placement algorithms areproposed for PCB designs [18], [19]. Many multichip module(MCM) routing algorithms have been proposed in the literature[20]–[25]. Several works on MCM pin redistribution include[26]–[28].

There exists several similarities and differences betweenMCM and SOP routing. In general, the design objectives andconstraints in both MCM and SOP routing are quite similar,where performance, wirelength, via, layer, and crosstalk opti-mization are crucial. In addition, the routing process is dividedinto multiple steps: pin distribution, topology generation, layerassignment, and detailed routing. A notable difference betweenSOP and MCM routing lies in the fact that there exists multipledevice layers in SOP, whereas in MCMs there is only onedevice layer. Therefore, nets are now connecting pins locatedin all intermediate layers in SOP, and the blocks in each layerbehave as obstacles. In MCMs, however, all pins are locatedonly at the top layer, and there exists no obstacles except forthe wires themselves. This makes SOP or 3-D package routingproblem more general than MCM routing.

In SOP designs, some pins in each device layer can bedistributed either to the layer above or below. In addition,some nets that connect blocks in nonneighboring device layersneed to penetrate intermediate device layers. Since the blocksin these intermediate layers become obstacles, designers userouting channels in each device layer to insert “feed-throughvias.” Thus, we have to determine which routing channel to usefor feed-through vias. These routing channels are also used forintra-layer connection. Thus, after feed-through via insertion isdone, we need to decide which remaining channels to use forthe intra-layer connection. We then need to finish connectionbetween the original and distributed pins. In case the pin loca-tion is not known for some “soft blocks,” we need to determinetheir location either along the boundary or underneath the blockduring our local routing step.

III. PROBLEM FORMULATION

A. SOP Layer Structure

The layer structure in multilayer SOP is illustrated in Figs. 1and 2. The placement layers2 contain the blocks (such as ICs,embedded passives, opto-electric components, etc), which fromthe point of view of physical design are just rectangular blockswith pins along the boundary. The interval between two adja-cent placement layers is called the routing interval. A routinginterval contains a stack of routing layers sandwiched betweenpin distribution layers. These layers are actually – routing

2We use placement layer and device layer interchangeably.

Fig. 1. Illustration of 3-D SOP, where A, D, M, and P, respectively, nodeanalog chip, digital chip, memory module, and embedded passive blocks.

Fig. 2. Illustration of the layer structure and routing resource in SOP.The block and white dots respectively denote the original and redistributedpins. The “x” denotes a feed-through pin for an x-net to pass through aplacement layer using a routing channel. The solid, dotted, and arrowed lines,respectively, denote signal wires, vias, and feed-through vias.

layer pairs so that the rectilinear partial net topologies may beassigned to them. The pin distribution layers in each routing in-terval are used to evenly distribute pins from the nets that areassigned to this interval. Then these evenly distributed pins areconnected using the routing layer pairs. Each placement layerconsists of a pair of – routing layers, so routing is permitted.A feed-through via is used to connect two pin distribution layersfrom different routing intervals. Thus, the routing channels ineach placement layer are used for two purposes: i) accommo-date feed-through vias and ii) perform local routing, where lim-ited number of intra-layer connections are made.

In the SOP model the nets are classified into two categories.The nets which have all their terminals in the same placementlayer are called -nets, while the ones having terminals in dif-ferent placement layers are -nets. The -nets can be routed ina single routing interval or indeed within the placement layeritself. On the other hand, the -nets may span more than onerouting interval.

We use the floor connection graph (FCG) [29], illustrated inFig. 3, to model the placement layer, where each routing channelbecomes an edge and each channel intersection point becomesa node. In addition, each soft block becomes a node, and pin as-signment edges are added to this node to connect to all adjacentchannels. This model allows us to determine which boundary to

646 IEEE TRANSACTIONS ON COMPONENTS AND PACKAGING TECHNOLOGIES, VOL. 29, NO. 3, SEPTEMBER 2006

Fig. 3. Graph-based modeling of placement layer: (a) a connection betweentwo hard blocks and (b) b is a soft block, and the new pin is located on thebottom boundary. Dotted lines denote pi assignment edges.

use for the pins with unknown location. Each channel edge isassociated with i) via capacity for feed-through vias and ii) wirecapacity for local routing. In addition, pin assignment edgeshave pin capacity for each boundary of a soft block. The routingand pin distribution layers in each routing interval are modeledwith a standard 3-D grid, where each node represents arouting region and each -direction edge represents each hor-izontal/vertical boundary among the regions. Each -directionedge represents a group of vias each region can accommodate.Thus, all edges in this 3-D grid are associated with capacity:and edges for wire capacity, and edges for via capacity.

B. SOP Placement Problem

The following are given as the input to our 3-D SOP place-ment problem: i) a set of blocks thatrepresent the various active and passive components in thegiven SOP design, ii) width, height, and maximum switchingcurrents for each block, iii) a netlistthat specifies how the blocks are connected via electrical wires,iv) , the number of placement layers in the 3-Dpackaging structure, and v) the number of power/ground signallayers along with the location of the power/ground pins.

For each net from a given netlist , let denote thewirelength of . The wirelength is the sum of Manhattandistance in , and directions, where the direction is theheight of the associated vias.3 Let , and , respec-tively, denote the width, height, and area of the placement layer

. Let , i.e., the product ofthe maximum width and the maximum height among all place-ment layers. Let . Let and denotethe average width and height among all placement layers. Let

, i.e., the sum of thedeviation of the dimension among all placement layers. Last,the area objective of 3-D SOP placement, denoted , is theweighted sum of , and .

Let denote the maximum temperature among all blocks.Let denote the total amount of decoupling capacitancerequired to suppress the simultaneous switching noise (SSN)

3We assume that the height of vias connecting two x–y layers in a routingand pin distribution layer pair is of one grid, whereas the height of feed-throughvias that penetrate a placement layer is six grid.

Fig. 4. Illustration of crosstalk computation between two Steiner trees. Thenumbers denote the coupling length. (a) Crosstalk is 7. (b) Crosstalk-freerouting.

under the given tolerance value. Last, the goal of the SOP Place-ment Problem is to find the location of each block in the place-ment layers such that the following cost function is minimizedwhile SSN constraint is satisfied:4

(1)

In addition, decaps are required to be placed adjacent to theblocks that require them.

C. SOP Routing Problem

For each net from a given netlist NL, let and , respec-tively, denote the amount of crosstalk and via associated with .Let denote the coupling length between and as il-lustrated in Fig. 4. We define as follows:

where denote the routing layer that contains net . For eachnet , let denote the Elmore delay [30] at sink . Then, themaximum sink delay of net , denoted , is

. The performance of a SOP global routing is to minimizethe following cost function:

For each routing interval in a 3-D package with place-ment layers, let , and , respectively, denote thetop pin distribution layer pairs, routing layer pairs, and bottompin distribution layer pairs. There exists three kinds of con-nections in each routing interval : top (connection between

and ), middle (connection between and ),and bottom (connection between and ). We usethe routing layer pairs in , and , respectively,for the top, middle, and bottom connections. We construct therouting grid, denoted , that contains all the distributed pins

4In this weighted cost function, the area termA is normalized toA =A ,where A is the footprint area of the initial solution used for Simulated An-nealing. This will ensure the area term stays close to one during the annealingprocess. The same normalization is done for the other objectives as well—wire-length, decap, and thermal.

MINZ et al.: 3-D SYSTEM-ON-PACKAGE DESIGNS 647

Fig. 5. Overall design flow.

from and in a 2-D grid. These pins are con-nected by a graph-based Steiner tree topology generation algo-rithm (we use an existing heuristic [31] for this purpose). Thetotal layers used in a SOP global routing solution is given by

Section V-D discusses how to compute and Section V-Ediscusses how to compute and .

Last, the formal definition of SOP Global Routing Problemis as follows: Given a 3-D placement and netlist, generate arouting topology for each net , assign to a set of routinglayers and assign all pins of to legal locations. All conflictingnets are assigned to different routing layers while satisfying var-ious wire/via capacity constraints. The objective is to minimizethe following cost function:

(2)

We minimize during our layer assignment step andduring our channel assignment step. is the focus during

our topology generation step. The wirelength, via, and crosstalkminimization are addressed in all steps of our global router.

D. Overview of the Approach

An overview of our 3-D SOP physical design tool is shownin Fig. 5. The input to the flow is a module-level netlist thatspecifies how the SOP modules such as digital die, analog die,embedded passives, opto/MEMS modules, etc, are connected.Our 3-D module placement is based on the Simulated Annealingapproach [32], where we examine many candidate 3-D moduleplacement solutions and evaluate each of them using our thermaland power supply noise analyzers. More details are presentedin Section IV. Our 3-D global routing is decomposed into fivesteps, where the input/output (IO) pins from the blocks are firstevenly distributed using pin distribution layers to alleviate con-gestion problem. We then construct routing tree topology for allnets and assign them to the routing layers to minimize crosstalk.

Fig. 6. Top: 3-D grid of a SOP for thermal modeling.

We place the feedthrough-vias for -nets and finish connectionin each placement layer. More details are presented in Section V.

IV. SOP PLACEMENT ALGORITHM

A. Overview of the Algorithm

Simulated Annealing is a very popular approach for moduleplacement due to its high quality solutions and flexibility inhandling various constraints. We extend the existing 2-D Se-quence Pair scheme [33] to represent our 3-D module place-ment solutions. Simulated Annealing procedure starts with aninitial multi-layer placement along with its cost in terms of area,wirelength, decap, and maximum temperature. We then make arandom perturbation (move) to the initial solution to generatea new 3-D placement solution and measure its cost. We per-form 3-D SSN analysis to compute the amount of decap needed.The algorithm does a one time set-up of the thermal matrices.These matrices are used during incremental temperature calcu-lations to evaluate the thermal cost. The thermal modeling andevaluation is explained in Section IV-B. White space insertion(discussed in Section IV-D) is done and the increase in foot-print area is estimated. If the new cost is lower than the old one,the solution is accepted; otherwise the new solution is acceptedbased on some probability that is dependent on temperature ofthe annealing schedule. We examine a predetermined numberof candidate solutions at each temperature. The temperature isdecreased exponentially, and the annealing process terminateswhen the freezing temperature is reached. The actual decap al-location is done after optimization using LP solver discussed inSection IV-D.

B. 3-D Thermal Analysis

We use a 3-D thermal resistance mesh, as shown in Fig. 6, forthermal analysis. Each node models a small volume of the 3-DSOP substrate, and each edge denotes the connectivity betweentwo adjacent regions. This is equivalent to using a discrete ap-proximation of the steady state thermal equation ,where is thermal conductivity, is temperature, and ispower. This results in the matrix equation .

Since the matrix computation involved with thermal analysisfor each candidate 3-D placement solution is expensive, we

648 IEEE TRANSACTIONS ON COMPONENTS AND PACKAGING TECHNOLOGIES, VOL. 29, NO. 3, SEPTEMBER 2006

(a) (b)

Fig. 7. Illustration of 3-D power supply modeling. (a) Multilayer powersupply network, where multiple sets of device, routing, and power supplylayers are stacked together. (b) 3-D-grid modeling. where black and gray nodesrespectively denote power supply and consumption nodes.

tackle this problem with the following scheme. Assuming thatthe thermal conductivity of the modules are similar, swappingthe module locations would not change the thermal resistancematrix too much. This means that matrix only needs to becomputed once in the beginning. To calculate the temperatureprofile of a new block configuration, the power profile needsto be updated and then multiplied by . Alternatively, a changein power profile can be defined. Multiplying and willgive change in temperature profile . Adding to the oldtemperature profile will give the new temperature profile. Theseequations summarize the two methods:

. Swapping twoblocks usually has a small effect on the power profile, soshould be sparse. This reduces the number of multiplicationsused by the second method at the expense of doing extra addi-tions and subtractions.

C. 3-D Power Supply Noise Modeling

We model the P/G network for 3-D SOP as a 3-D grid graph asshown in Fig. 7. The edges in the grid-graph have inductive andresistive impedances. The mesh contains power-supply pointsand connection points, which supply and consume currents. Thecurrent distribution for the blocks is inversely proportional to theimpedance of path from source.

The dominant current source for a block is defined as thevoltage source supplying significantly more power to the blockthan any other neighboring sources. The dominant path for ablock is the path from the dominant supply to the block causingthe most drop in voltage. It has been shown experimentally in[34] that the shortest path between the dominant current source(nearest Vdd pins) and the block offers highly accurate SSN es-timation within reasonable runtime. In our 3-D SSN analysisengine, we compute dominant paths for all blocks which is dy-namically updated whenever a new placement solution is eval-uated in terms of SSN. Let be a dominant current path forblock . Then denotes the set ofdominating paths overlapping with ( includes itself).Let be the overlapping segments between path and .Let and denote the resistance and inductance of .

Fig. 8. Illustration of SSN calculation. The dominant current source for blockA is s , which is not located in the same layer. The dominant (shortest) pathp carries I =6 amount of current, where I denotes the current demand of A.The block C draws current from s and s using p ; p , and p (each of thesecarries I =3 amount of current). The resistance of p , the overlap between pand p , contributes to the SSN at B and C .

Fig. 9. LP-based 3-D decap allocation.

After the current paths and their values have been determinedfor all blocks, the SSN for is given by

where is the current in the path , which is the sum of allcurrents through this path to various consumers. An illustrationis shown in Fig. 8. The weight of and its rate of change are theresistive and inductive components of the path. Let denotethe maximum charge drawn from the power supply by block

. If , where is the noisetolerance, the decap allocated to block is given by

where denotes the total number of blocks to be placed. Fi-nally, the decap cost is given by .

D. 3-D Decoupling Capacitor Placement

In our LP-based 3-D decap allocation shown in Fig. 9,denotes the amount of decap allocated from white space toblock . The objective is to maximize the utilization of whitespaces for decap allocation. The constraint (??) limits the totalallocation of a white space to its total area, where denotes thearea of white space , and denotes the neighboring blocksof . The constraint (??) ensures that the total amount of decapallocated to each block does not exceed its demand, wheredenotes the demand.

MINZ et al.: 3-D SYSTEM-ON-PACKAGE DESIGNS 649

Fig. 10. Illustration of 3-D decap allocation: (a) 3-D placement,(b) X-expansion, and (c) XY -expansion, where the darker blocks denote theneighboring blocks of the decap ( = white space) inserted. Note that blocksfrom other layers can utilize the white space for decap insertion.

Unlike the 2-D decap assignment as done in [34], the neigh-boring blocks in 3-D case are the adjacent blocks either from thesame placement layer or from neighboring layers. The assump-tion here is that the white space from different layers can be usedto allocate decap. In order to facilitate this, we introduce prede-termined parameters to control decap allocation to block

from white space module . evaluates the usefulness ofwhitespace to be used as a decap for block . The value of

decreases with increasing distance of whitespace from themodule that requires decap. Hence allocations of decap from farlocated white spaces will be avoided.

The second aspect of the formulation deals with the gener-ation of , the optimal area that will accommodate all decapwithout too much penalty on the final footprint area. The decapbudget for each block is determined through SSN analysisof a particular 3-D placement. The decap allocation involvesseveral iteration between white space insertion and LP solvingto reach a good solution both in terms of completeness of decapallocated and the additional increase in area. The white spaceis generated by expanding the floorplan in the and direc-tion as illustrated in Fig. 10. In a multilayer placement, how-ever, we have to take additional care to balance the expansionin each layer, so as to minimize the expansion of the total foot-print area. The expansion is proportional to the decap demandof each module. In our sequence pair-based 3-D placement, wemodify the horizontal and vertical constraint graphs to expandthe placement into and directions, respectively.

Note that we may have to iterate between white space inser-tion and allocation if the current expansion does not satisfy allthe decap needs. In order to prevent from iterating between LPand white space insertion, we start by adding white spaces gen-erously in the floorplan then use LP to perform compaction. Inour scheme, the -expansion is controlled by parametersdiscussed earlier, where the existing white spaces have higherranks than the newly inserted ones. The actual amount of ex-pansion is determined by a parameter 1, where we increasethe amount of expansion by a factor of for each block thatneeds decap. This ensures that the initial expansion satisfies theLP constraint. Since LP determines the individual , we usethese assignment values to decide which white space insertionwas not necessary and thus can be removed.

As exact decap allocation is a time-consuming process, it isimpractical to solve an LP problem for every candidate solution.However, our white space insertion takes , which is

Fig. 11. Pseudocode for coarse pin distribution algorithm.

computationally equivalent to computing block location usinglongest path computation for area. Therefore, we can affordwhite space insertion (and the calculation of area increase due todecap insertion) at every move. We perform the actual LP-baseddecap allocation at the end of the annealing process as a com-paction stage.

V. SOP GLOBAL ROUTING

A. Overview of the Approach

Our 3-D global router is divided into the following five steps.

1) Pin redistribution: we first determine which set of -netsand -net segments are assigned to each routing interval.The pins from these nets are then evenly distributed in thetop and bottom pin distribution layers. Our pin redistri-bution step is further divided into three steps: coarse pindistribution, net distribution, and detailed pin distribution.

2) Topology generation: Steiner trees are generated for allnets in each routing interval so that the performance ofthe routed design is optimized.

3) Layer assignment: the routed nets are assigned to a uniquerouting pair in the routing layer so that the total numberof layers used is minimized.

4) Channel assignment: for each -net, its location of feed-through via in the routing channel is determined. We alsoassign channels and finish the connections for the i-netsthat are to be routed in each placement layer.

5) Local routing: we finish connections between the pinsnow located in the routing channels and the pins on theblock boundaries. We also determine the location of pinsfrom soft blocks.

We use an existing max-cut partitioning heuristic [35] for thenet distribution in step 1. For step 2, we use an existing RSA/Gheuristic [31] to generate the net topologies. In addition, we usethe congestion-driven rip-up-and-reroute [36] for step 5. There-fore, the focus of this paper is to develop heuristics for the re-maining steps in 1, 3, and 4.

B. Coarse Pin Distribution

A pseudocode for our coarse pin distribution algorithm isshown in Fig. 11. First, we assign all pins in the placement layersto a nearby grid point in CP, a 2-D grid, while trying tobalance the number of pins assigned to each grid point (line

650 IEEE TRANSACTIONS ON COMPONENTS AND PACKAGING TECHNOLOGIES, VOL. 29, NO. 3, SEPTEMBER 2006

Fig. 12. Illustration of coarse pin distribution. Pins along the externalboundary are not shown for simplicity.

Fig. 13. Illustration of the gain computation for coarse pin distribution. A netnindicated by the black node is moved from P to P . Then, g (n) = �2; g =0, and g = �1, where deg(P ) = 3 becomes the maximum-degree partition.

1–4). An illustration is shown in Fig. 12. We impose pin ca-pacity for each grid point so that the pins are evenly distributedin CP. Our approach is to visit the pins in a random order andfind the best grid for each pin. For each pin , a grid point thatis closest to the original location of and has not violated thepin capacity constraint is chosen (line 3). After this process isfinished, CP serves the starting point of our min-cut placementbased algorithm, where each grid point corresponds to a parti-tion. We then iteratively improve the quality of this initial solu-tion via move-based approach. In our new heuristic algorithm,our cost function is based on i) how far the new pin location isfrom the initial location, ii) total wirelength, and iii) how evenlydistributed the inter-partition connections are.

For each pin , we define the displacement gain, denoted, to represent how much the distance between the orig-

inal and new location is reduced if is moved to another par-tition. We define the wirelength gain, denoted , to repre-sent how much the length of the nets that contain (estimatedby the half-perimeter of the bounding box) is reduced if ismoved to another partition. For a partition , let denotethe number of nets that have connections to . Then, the cut-size balance factor is defined to be the difference between themaximum and minimum among all partitions. We de-fine the balance gain, denoted , to represent how much thecutsize balance factor is reduced. Our move-based multilevelmincut partitioning algorithm performs cell move based on thecombined gain function

Fig. 13 shows an illustration of the gain computation. In ourmultilevel approach, we first perform restricted multilevel clus-tering (line 5) that preserves the initial placement result,where two pins that are in different partitions initially are notclustered together. At each level of the cluster hierarchy fromtop to bottom (line 8), we compute the combined gain for

Fig. 14. Pseudocode for SOP detailed pin distribution algorithm.

each cluster and perform cluster moves. In order to compute thedisplacement and balance gain of a group of pins ( cluster),we add the individual displacement and balance gain of all pinsin this cluster. When there is no gain at a certain level, we de-compose the clusters into next lower level and perform refine-ment. This process continues until we obtain a solution at thebottom level (line 12).

C. Detailed Pin Distribution

After coarse pin distribution and net distribution are finished,we know which set of nets are assigned to each routing intervalas well as their (evenly distributed) entry/exit points in pin distri-bution layers. However, the coarse pin distribution is done basedon the 2-D grid that merged all multiple placement layers intoone. The even pin distribution in this 2-D grid offers a goodenough reference points for net distribution. But, it does not con-sider even pin distribution in each individual routing interval. Inaddition, it is also possible that pin capacity for each routingregion in each routing interval may be violated. Therefore, thegoal of detailed pin distribution is to address these problems ineach routing interval so that the subsequent topology generationand layer assignment truly benefit from this even pin distribu-tion. In addition, we use a grid large enough for each routing in-terval to legalize pin location, i.e., each grid point contains onlyone pin. Since the crosstalk minimization are addressed duringthe prior steps, the major focus of detailed pin distribution stepis on i) how far the new location is from the original location ob-tained from coarse pin distribution and ii) the total wirelength.

A pseudocode for our net distribution algorithm is shown inFig. 14. Our force-directed heuristic algorithm encourages allpins from the same net to be placed closer to the center of masswhile minimizing the distance between the old and new pin loca-tion. The size of the grids for detailed pin distributionis determined for each routing interval so that each pin can beassigned to a unique grid point (line 1–3). In addition, we projectthe coarse pin distribution result to this new set of grids(line 5). Note that there still exists overlap among the pins in

at this point even though DP is usually finer than CP. Inorder to remove this overlap, we apply an additional force thatslightly pulls each pin toward the center of mass. For each pin

in a net , the displacement force (line 8) for -direction isdefined as follows:

MINZ et al.: 3-D SYSTEM-ON-PACKAGE DESIGNS 651

Fig. 15. Illustration of layer assignment.

where denotes the -coordinate of , the center of massof , and denotes the width of the bounding boxof . We compute using the -coordinates. Note that

. The vector is thenadded to (line 9). This minor change on the originalpin location helps to remove most of the overlap in whilenot increasing the wirelength too much. We then sort the pinsbased on the lexicographic order of new locations and assigneach pin starting from the top-most row in the left-most column(line 10–11). Due to its simplicity, this deterministic algorithmis quite efficient and effective in reducing the additional wire-length required for pin distribution as well as the total wire-length among all nets as shown in Section VI.

D. SOP Layer Assignment

For each routing interval , the routing grid contains allthe (redistributed) pins from the top and bottom pin distributionlayers ( and ). We generate Steiner-tree based routingtopology using the RSA/G heuristic [31] to connect these pins.The goal is to minimize the maximum sink delay as dis-cussed in Section III-C. The routing layer in each routinginterval consists of several layer pairs, where each pair con-sists of one layer for horizontal wires and another layer for ver-tical wires. Thus, we can assign a entire rectilinear routing treeto a routing pair. In addition, two trees that are intersecting canalso be assigned to the same routing pair provided that they donot violated the wire capacity of the routing regions involved.An illustration is shown in Fig. 15. The SOP layer assignmentproblem is to assign each net to a routing layer pair so thatthe wire capacity constraint is satisfied and the total number oflayer pairs used for all routing intervals is minimized. Our ap-proach is to perform layer assignment for each routing intervalindependently.

For each routing interval , we construct a layer constraintgraph (LCG) [37], denoted LCG , as follows: corresponding toeach net in we have a node in LCG . Two nodes

LCG have an edge between them if a net seg-ment and are sharing the same edge in , i.e.,and are sharing the same boundary of a routing region. Thenwe use a node coloring algorithm to assign colors to the nodesin LCG such that no two nodes sharing an edge are assignedthe same color. Let denote the total number of colors usedduring the node coloring, and let denote the wire capacityof the boundary in routing region. Then, the total number oflayer pairs used in this routing interval is computed as follows:

. Last, a node with coloris assigned to layer pair . Let and , respectively, de-note the maximum number of wires used among all horizontaland vertical edges in . Then, the following is a lower-bound

Fig. 16. Pseudocode for SOP layer assignment algorithm.

on the number of layers used in :.

Fig. 16 shows the SOP layer assignment algorithm that in-cludes our coloring heuristic. We first sort all nodes in LCG ina decreasing order of the number of their neighbors (line 3). Let

denote the set of colors used by the neighbors of (line6). We visit the nodes in the sorted order (line 5). In case thereexists an used color that is not included in (line 7), we as-sign this color to (line 12). In case there exist multiple colorsthat satisfy this condition, we use the lowest color. Otherwise,we introduce a new color and assign it to (line 9–10). Last,we assign a layer pair to each net based on its color (line 13).In spite of its simplicity, this greedy algorithm provides resultsthat are very close to the lower bound on total number of layersused as demonstrated in Section VI.

E. Sop Channel Assignment

Our prior topology generation and layer assignment stepsfocus on the connections among the distributed pins in the pindistribution layers. after these steps are finished, there remaintwo kinds of connections: i) connections between the originaland distributed pins and ii) feed-through via insertion. For bothtypes of connections, the routing channels in the placementlayers are used. During the SOP channel assignment step, eachpin in the pin distribution layer is mapped to a routing channelin the neighboring placement layer. In addition, each pin froman x-net that needs to penetrate a placement layer is alsomapped to a routing channel in the placement layer. Last, wegenerate routing topology for each pin-to-channel connectionand assign it to a routing layer pair in the pin distribution layer.Each placement layer is modeled with the floor connectiongraph (FCG) [29] as illustrated in Fig. 3. During the SOP localrouting step, we use the congestion-driven rip-up-and-reroute[36] to i) finish the routing for the given set of two-pin connec-tions while satisfying the wire capacity constraint and ii) decidethe location of pins for soft blocks along their boundaries. Anillustration is shown in Fig. 17.

For each placement layer , let denote the set ofpins that need to be mapped to a routing channel in .contains pins from and . The pins in aregrouped into two sets: terminal pin set for the pins thathave terminals in and feed-through pin set for thepairs of pins that require feed-through vias to penetrate .

652 IEEE TRANSACTIONS ON COMPONENTS AND PACKAGING TECHNOLOGIES, VOL. 29, NO. 3, SEPTEMBER 2006

TABLE IAREA/WIRE-DRIVEN VERSUS DECAP-DRIVEN VERSUS THERMAL-DRIVEN ALGORITHMS

Fig. 17. Illustration of (a) channel assignment and (b) local routing.

Let denote the set of routing channels in . Eachchannel is associated with the pin capacity constraint.The goal of SOP Channel Assignment Problem is to map eachpin to a routing channel for 1 and finish

-to- connection while satisfying the pin capacity constraint, i.e., . For each pair of pins , we

map and to the same channel . Let denotethe number of layers used to finish connection between pinsfrom and , and let denote the number oflayers used to finish connection between pins fromand . The objective of SOP channel assignment is tominimize the following cost function:

A pseudocode for SOP channel assignment algorithm isshown in Fig. 18. We visit each placement layer and assignthe feed-through pins and terminal pins to the channels. Inaddition, an L-shaped routing topology for each pin-to-channelis constructed in a 2-D grid (line 3). The signal delay offeed-through vias is larger than that of other types of vias. Sinceeach channel is under a capacity constraint, it is importantto assign the feed-through pins to the nearest channels first.Therefore, our strategy is to perform channel assignment forthe feed-through pins first to minimize the delay of -nets thatrequire feed-through vias. In addition, we give priority to the

Fig. 18. Pseudocode for SOP channel assignment algorithm.

pins that are included in long nets. Thus, we sort the pins basedon the wirelength (line 4). Our heuristic algorithm assigns pinsto channels based on the cost of mapping—we seek a channelwith the best mapping cost for a given pin (line 6–8). We com-pute the cost of mapping for a given pin and a channel asfollows:

costroom

dist bend cong

where room denotes the number of pins can accommodateuntil it violates the capacity constraint, dist is the Man-hattan distance between and bend is the number ofbends in the connection between and , and cong de-notes the total number of existing connections along the pro-posed L-shaped route. We choose the channel with the max-imum cost . Note that a channel is represented with a lineinstead of a point. Thus, distance and bend are basedon the shortest connection between and any point on . Upona pin to channel mapping, we update the usage of channel in

and edges in (line 10). After the channel assign-ment and topology generation for all pin-to-channel connec-tions are finished, we perform layer assignment using our col-oring heuristic presented in Section V-D and compute the totalnumber of layers used.

MINZ et al.: 3-D SYSTEM-ON-PACKAGE DESIGNS 653

VI. EXPERIMENTAL RESULTS

We implemented our algorithms in C /STL and ran our ex-periments on Linux Beowulf clusters. We tested our algorithmswith two sets of benchmarks. The first set is from the standardGSRC floorplan circuits [38]. The second set, named the GTbenchmark, was synthesized from the IBM circuits [39], wherewe use our multilevel partitioner [40] to divide the gate-levelnetlist into multiple blocks. The GSRC benchmarks are small tomedium-sized in terms of both the number of blocks and nets.5

The GT benchmarks contain medium to large number of blockswith dense connections.

A. SOP Placement Results

The dimension of the area was assumed to be in mm andarea per unit decap cost was chosen to be 50 mm . The numberof placement layers is fixed to four for all experiments. In ourexperiments, we used the same parameters across the bench-marks. The technology parameters used were 0.01 m forwire resistance per unit length, 1 pH/ m for wire inductanceper unit length, and 10 fF/ m for wire capacitance per unitlength. We report the average runtime for each benchmark mea-sured in seconds. For our thermal analysis we used a grid sizeof 10 10 7 in the and direction. The number of activelayers was four with three passive layers in between them. Theconductivity of the substrate was chosen to be that of silicon(0.1 W/mm C). The conductivity of the sides of the packagewas fixed at 0.01 W/mm C. The heat sink was assumed to beat the top of the package with a conductivity of 0.5 W/mm Cand the conductivity of the bottom of the package was chosento be 0.2 W/mm C. The power density of the blocks varied from10 W/mm^2 to 300 W/mm^2 depending on the switching cur-rent demands. The switching current demands were generatedusing a formula using a random number and the size of theblock.

Table I shows a comparison among 3-D SOP placement algo-rithms with three different objectives: 1) area/wire-driven (

1 and 0 in (1), Section III–B), 2) decap-driven( 1 and 0), and 3) thermal-driven( 1 and 0). Our baseline algorithm isthe area/wirelength-driven algorithm.

• We achieved a 21% improvement over baseline in decapcost using our decap-driven algorithm, with a 7% increasein final area and 15% increase in wirelength. The temper-ature decreases by 3%.

• Our thermal-driven floorplanning achieved a 21% im-provement over baseline with a dramatic increase intotal area of 76%. Wirelength increased by 28% anddecap requirement increases by 19%. We note that itmay be possible to reduce area and wirelength costs byfine-tuning parameters.

Table I shows the result of our final 3-D placement algorithmthat considers all four objectives: area, wirelength, decap, andthermal ( 1 in (1), Section III-B). Wenote that our area/wire/decap/thermal-driven algorithm achievesan improvement in both decap and temperature over baseline

5The GT benchmark circuits are available for download at our website:www.gtcad.gatech.edu.

Fig. 19. Decap versus temperature correlation plot for n300.

TABLE IIDECAP VERSUS TEMPERATURE CORRELATION CONSTANTS

by 13% and 9%, respectively. There is a reasonable increasein final area and wirelength by 19% and 15%, respectively. Wealso tested our area/wire/decap algorithm under a thermal con-straint. In this case, all candidate solutions during the simulatedannealing are discarded if the maximum temperature is abovea certain threshold (100 C in our case). We observe that thearea, wirelength, and decap results improve simultaneously atthe cost of slight increase in temperature. The CPU times of thealgorithm depend on the number of candidate solutions evalu-ated. The times reported in the table directly reflect the subsetof the configuration space spanned by the algorithm. The con-strained algorithm ran much faster, making a small number ofmoves because fewer moves were accepted.

Fig. 19 shows the temperature and decap requirement of eachblock of the final floorplan of the n300 benchmark. The figureclearly shows that there is little correlation between temperatureand decap. This shows that minimizing one objective does notnecessarily minimize the other objective as a positive correlationwould indicate. It also means that minimizing both objectives isnot mutually exclusive, as a negative correlation would indicate.This matches with the block placement results. Table II showsthe correlation constants.

Table III shows power supply noise simulation results forthree 3-D placement schemes-no-decap aware, decap-aware,and decap-aware decap-placement. The P/G plane structuresize is 246 mm 254 mm, and the top P/G plane pair wasmodeled using cavity resonator model [41] and simulated inHSPICE. The placement layer that uses this P/G plan includes14 active devices. The dc 5 V sources are located at four edgesin the plane pair and fourteen current sources exist in the planepair. As can be seen from the Table III, the SSN for the

654 IEEE TRANSACTIONS ON COMPONENTS AND PACKAGING TECHNOLOGIES, VOL. 29, NO. 3, SEPTEMBER 2006

TABLE IIIPOWER SUPPLY NOISE SIMULATION RESULTS. WE REPORT SSN NOISE FOR

EACH BLOCK PLACED IN THE TOP PLACEMENT LAYER. (a) NONOPTIMIZED

CASE WITHOUT ANY DECOUPLING CAPACITOR. FROM TABLE I WE NEED 26.7DECAP UNITS TO SUPPRESS SSN NOISE. (b) OPTIMIZED CASE WITHOUT ANY

DECOUPLING CAPACITOR. FROM TABLE I WE NEED 19.9 DECAP UNITS.(c) OPTIMIZED CASE WITH DECOUPLING CAPACITORS INSERTED

TABLE IVBENCHMARK CHARACTERISTICS OF GSRC AND GT CIRCUITS. WCAP DENOTES

THE WIRE CAPACITY OF EACH BOUNDARY IN THE ROUTING TILES. CP AND DPRESPECTIVELY DENOTE THE DIMENSION OF 2-D GRIDS USED DURING OUR

COARSE AND DETAILED PIN DISTRIBUTION STEPS

TABLE VAREA INFORMATION OF THE 4-LAYER SOP FLOORPLANNING FOR GSRC AND

GT BENCHMARK DESIGNS. THE MIN AND MAX RESPECTIVELY DENOTE THE

DIMENSION OF THE MINIMUM AND MAXIMUM FLOORPLAN AREA, WHEREAS

THE FINAL COLUMN DENOTES THE DIMENSION OF THE OVERALL FOOTPRINT

decap-aware algorithm is lower compared to nondecap-awarealgorithm. The SSN of the noisiest block blk4 is 1.58 V, whichis reduced to 1.42 V by our decap-aware scheme. With the inser-tion of decap, the noise is suppressed to 0.22 V. In addition, thetotal amount of decap required for nondecap-aware algorithmis 26.7, which is reduced to 19.9 with our decap-aware scheme.The largest amount of decap is used for blk5 (0.50 nF), becauseof an increase in its SSN after optimization. The numbers showthat the SSN was efficiently suppressed and the amount of decapreduced by using our algorithms.

B. SOP Routing Results

Table IV shows the characteristics of the GSRC and GTbenchmark designs used during routing (see Table V). InTable VI we compare pin redistribution results. Under the

TABLE VISOP PIN REDISTRIBUTION RESULTS USING OUR COARSE AND DETAILED PIN

DISTRIBUTION ALGORITHMS. WE REPORT THE WIRELENGTH BETWEEN THE

ORIGINAL AND THE NEW LOCATION (dw), TOTAL WIRELENGTH (wl),CROSSTALK (xt) AND TOTAL NUMBER OF LAYER PAIRS USED (1 yr)

IN THE ROUTING LAYERS

TABLE VIITOPOLOGY GENERATION AND LAYER ASSIGNMENT RESULTS. WE

REPORT THE TOTAL WIRELENGTH (WL), ELMORE DELAY OF THE NETS

WITH MAXIMUM SINK DELAY (dly), AND THE LOWER BOUND (low)AND THE ACTUAL NUMBER (1 yr) OF LAYERS USED

DPD columns, we perform DPD ( detailed pin distributionpresented in Section V-C) only, where we skip CPD (coarse pin distribution presented in Section V-B) and assign all-nets to the routing interval below for net distribution. Under

the CPD DPD, we perform CPD using the algorithm, assignall -nets to the routing interval below for net distribution, andperform DPD. DPD serves as our baseline, where CPD+DPDrespectively demonstrate the impact of our coarse pin distri-bution. In all cases, we perform detailed pin distribution tolegalize the pin location, i.e., remove overlaps among the pins.We use the following metrics to evaluate our solutions: wire-length between the original and the new location (dw), totalwirelength (wl), crosstalk (xt) and total number of layer pairsused (lyr) in the routing layers. The displacement (dw) andwirelength (wl) results are scaled by 10 , and the time reportedis the average runtime among the GSRC/GT circuits. Fromthe comparison between DPD and CPD DPD, we note thatthe displacement result (dw) increases by an average of 50%.However, CPD lowers the total wirelength (wl) consistentlyby 10% on average and the number of layers (lyr) by 10% onaverage.

In Table VII we show our topology generation (RSA/G) andlayer assignment (LAYER) results. We used the technologyparameters for 0.13- process for Elmore delay computation.Specifically, the driver resistance of 29.4 k , input capaci-tance of 0.050 fF, unit-length resistance of 0.82 m and

MINZ et al.: 3-D SYSTEM-ON-PACKAGE DESIGNS 655

TABLE VIIISOP CHANNEL ASSIGNMENT RESULTS. WE REPORT THE LAYER USAGE,

WIRELENGTH, AND VIA FOR THE BASELINE (WIRELENGTH MINIMIZATION

ONLY) AND MULTI-OBJECTIVE ALGORITHMS

unit-length capacitance of 0.24 fF/ m are used. We report thetotal wirelength (wl), Elmore delay of the nets with maximumsink delay (dly), and the lower bound and the actual number oflayers used for the top-most routing interval. In general, GSRCbenchmarks have bigger delay than GT benchmarks due tothe larger average wirelength. Our layer assignment algorithmpresented in Section V-D is able to achieve results very closeto the lower bounds discussed in Section V-D. For the GTcircuits, the layer assignment results are within 10% of thelower bound. For the GSRC circuits, we were able to achievethe results equal to the lower bound.

Our channel assignment results are shown in Table VIII.The baseline case is where we optimize the wirelength only.We then compare it to our multi-objective channel assignmentalgorithm that simultaneously optimizes the wirelength, via,and layer usage. Our comparison indicates that the number oflayers is consistently and significantly reduced especially forthe bigger GT benchmarks, where an average improvementof 48% is observed. In case of the second largest benchmarkgt1000, we achieved 57% improvement. This saving on thelayer usage comes at the cost of increase in wirelength andvias. The average increase in wirelength is 12% and 30% forGSRC and GT benchmarks, respectively. The average increasein via usage is 63% and 79% for GSRC and GT benchmarks,respectively. We noted that the channel assignment result isvery sensitive to the weighting constants among the objectivesused in our cost function. This indicates that the solution spaceof the channel assignment problem offers many useful tradeoffpoints.

Table IX reports our local routing results. We report the wire-length, maximum and average routing demand as well as thestandard deviation. Our baseline is the local routing optimizedfor wirelength only. We then compare it against our multi-ob-jective local routing algorithm that simultaneously optimizeswirelength and routing demands. In both cases, the same pindemand constraint is imposed. We note that the improvement ofour multi-objective algorithm over the baseline is significant, es-pecially for GSRC circuits—the routing demands were reducedby 33% on average while the wirelength increased by only 10%.In addition, we reduced the routing demands for the GT bench-marks by 41% on average, with wirelength increase by 20%. Inour biggest benchmarks (gt1500), our routing demand reduction

TABLE IXSOP LOCAL ROUTING RESULTS FOR THE BASELINE (WIRELENGTH

MINIMIZATION ONLY) AND MULTI-OBJECTIVE ALGORITHMS. WE REPORT THE

WIRELENGTH (wl), MAXIMUM (max) AND AVERAGE (avg) ROUTING

DEMAND AS WELL AS THE STANDARD DEVIATION (dev)

Fig. 20. Snapshot of four-layer SOP with three routing interval for n200benchmark. The routing-tile boundaries with heavy usage are shown withthicker lines.

is the largest (53%), which comes with the maximum increase inwirelength (23%). This again indicates that the local routing re-sult is very sensitive to the weighting constants among the objec-tives used in our cost function. The lower standard deviation ofour multi-objective algorithm indicates that the routing demandis more evenly distributed ( lower congestion) compared tothe wirelength-only case. Last, Fig. 20 shows a snapshot of afour-layer SOP with 3 routing interval for n200 benchmark.

VII. CONCLUSION

In this article, we presented the first physical layout algo-rithm for 3-D SOP designs that includes thermal/decap-aware3-D placement and crosstalk-aware 3-D routing algorithm. Weextended the thermal and SSN models to 3-D and used them toguide our 3-D module placement. In addition to estimating theamount of decap needed to keep the SSN at the circuits toler-ance level, we also efficiently used near-by white spaces to allo-cate decap if necessary. Our routing process is divided into pinredistribution, topology generation, layer assignment, channelassignment, and local routing steps. Our major objective is toreduce the amount of crosstalk and layers while satisfying var-ious constraints on routing resource. Our ongoing work includesSOP detailed routing.

656 IEEE TRANSACTIONS ON COMPONENTS AND PACKAGING TECHNOLOGIES, VOL. 29, NO. 3, SEPTEMBER 2006

REFERENCES

[1] R. Tummala and V. Madisetti, “System on chip or system on package?,”IEEE Design Test Comput., vol. 16, no. 2, pp. 48–56, Apr./Jun. 1999.

[2] R. Tummala, “SOP: What is it and why? A new microsystem-integrationtechnology paradigm-Moore’s law for system integration of miniatur-ized convergent systems of the next decade,” IEEE Trans. Adv. Packag.,vol. 27, no. 2, pp. 241–249, May 2004.

[3] R. R. Tummala et al., “The SOP for miniaturized, mixed-signal com-puting, communication, and consumer systems of the next decade,”IEEE Trans. Adv. Packag., vol. 27, no. 2, pp. 250–267, May 2004.

[4] K. Kundert, H. Chang, D. Jefferies, G. Lamant, E. Malavasi, and F.Sendig, “Design of mixed-signal systems-on-a-chip,” IEEE Trans.Comput.-Aided Design Integr. Circuits Syst., vol. 19, no. 12, pp.1561–1571, Dec. 2000.

[5] G. Gielen and R. Rutenbar, “Computer-aided design of analog andmixed-signal integrated circuits,” Proc. IEEE, vol. 88, no. 12, pp.1825–1854, Dec. 2000.

[6] Y. Deng and W. Maly, “Interconnect characteristics of 2.5-D system in-tegration scheme,” in Proc. Int. Symp. Phys. Design, 2001, p. 171.

[7] S. Das, A. Chandrakasan, and R. Reif, “Design tools for 3-D integratedcircuits,” in Proc. Asia South Pacific Design Automat. Conf., 2003, pp.53–56.

[8] B. Goplen and S. Sapatnekar, “Efficient thermal placement of standardcells in 3-D IC’s using a force directed approach,” in Proc. IEEE Int.Conf. Comput.-Aided Design, Nov. 2003, pp. 86–89.

[9] J. Minz and S. K. Lim, “Layer assignment for system on packages,” inProc. Asia South Pacific Design Automat. Conf., Jan. 2004, pp. 31–37.

[10] J. Minz, M. Pathak, and S. K. Lim, “Net and pin distribution for 3-Dpackage global routing,” in Proc. Design, Automat. Test Europe, Feb.2004, pp. 1410–1411.

[11] R. Ravichandran, J. Minz, M. Pathak, S. Easwar, and S. K. Lim, “Phys-ical layout automation for system-on-packages,” in Proc. IEEE Electron.Comp. Technol. Conf., vol. 1, Jun. 2004, pp. 41–48.

[12] J. Minz, S. K. Lim, J. Choi, and M. Swaminathan, “Module placementfor power supply noise and wire congestion avoidance in 3-D pack-aging,” in Proc. IEEE Electr. Perform. Electron. Packag., 2004, pp.123–126.

[13] J. Minz and S. K. Lim, “A global router for system-on-package targetinglayer and crosstalk minimization,” in Proc. IEEE Electr. Perform. Elec-tron. Packag., 2004, pp. 99–102.

[14] J. Minz, E. Wong, and S. K. Lim, “Thermal and congestion-aware phys-ical design for 3-D system-on-pacakge,” in Proc. 55th IEEE Electron.Comp. Technol. Conf., May 31–Jun. 3 2005, pp. 824–831.

[15] J. Choi, S. Chun, N. Na, M. Swaminathan, and L. Smith, “A method-ology for the placement and optimization of decoupling capacitors forgigahertz systems,” in Proc. VLSI Design Symp., 2000, pp. 1–6.

[16] A. Kamo, T. Watanabe, and H. Asai, “An optimization method for place-ment of decoupling capacitors on printed circuit board,” in Proc. IEEEElectr. Perform. Electron. Packag., 2000, pp. 73–76.

[17] Y. Chen, Z. Chen, and J. Fang, “Optimum placement of decouplingcapacitors on packages and printed circuit boards under the guidanceof electromagnetic field simulation,” in Proc. IEEE Electron. Comp.Technol. Conf., Orlando, FL, May 1996, pp. 756–760.

[18] J. Lee, “Thermal placement algorithm based on heat conductionanalogy,” IEEE Trans. Compon. Packag. Technol., vol. 26, no. 2, pp.473–482, Jun. 2003.

[19] K. Deb, P. Jain, N. Gupta, and H. Maji, “Multiobjective placement ofelectronic components using evolutionary algorithms,” IEEE Trans.Compon. Packag. Technol., vol. 27, no. 3, pp. 480–492, Sep. 2004.

[20] K. Khoo and J. Cong, “An efficient multilayer MCM router based onfour-via routing,” IEEE Trans. Comput.-Aided Design Integr. CircuitsSyst., vol. 14, no. 10, pp. 1277–1290, Oct. 1995.

[21] J. D. Cho, K. Liao, S. Raje, and M. Sarrafzadeh, “M2R: Multilayerrouting algorithm for high-performance MCMs,” IEEE Trans. Circuits.Syst. I, vol. 41, no. 4, pp. 253–265, Apr. 1994.

[22] W. W. Dai, “Topological routing in surf: Generating a rubberbandsketch,” in Proc. ACM Design Automat. Conf., 1991, pp. 39–48.

[23] D. Wang and E. Kuh, “A new timing-driven multilayer MCM/IC routingalgorithm,” in Proc. IEEE Multi-Chip Module Conf., 1997, pp. 89–94.

[24] Y. Cha, C. Rim, and K. Nakajima, “A simple and effective greedy multi-layer router for MCMs,” in Proc. Int. Symp. Physical Design, 1997, pp.67–72.

[25] J. Cong and P. Madden, “Performance driven multi-layer general arearouting for PCB/MCM designs,” in Proc. ACM Design Automat. Conf.,Mar. 1998, pp. 356–361.

[26] J. Cho and M. Sarrafzadeh, “The pin redistribution problem in multi-chip modules,” Math. Programm.—Spec. Iss. Appl. Discrete Programm.Comput. Sci., pp. 297–330, 1994.

[27] D. Chang, T. Gonzales, and O. Ibarra, “A flow based approach to the pinredistribution problem for multi-chip modules,” in Proc. Great LakesSymp. VLSI, 1994, p. 114.

[28] J. D. Cho, “An optimum pin redistribution for multichip modules,” inProc. IEEE Multi-Chip Module Conf. (MCMC’96), 1996, p. 111.

[29] J. Cong, “Pin assignment with global routing for general cell design,”IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 10, no.11, pp. 1401–1412, Nov. 1991.

[30] W. Elmore, “The transient response of damped linear networks with par-ticular regard to wideband amplifiers,” J. Appl. Phys., pp. 55–63, 1948.

[31] J. Cong, A. B. Kahng, and K.-S. Leung, “Efficient algorithms for theminimum shortest path steiner arborescence problem with applicationsto VLSI physical design,” IEEE Trans. Comput.-Aided Design Integr.Circuits Syst., vol. 17, no. 1, pp. 24–39, Jan. 1998.

[32] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, “Optimization by simu-lated annealing,” Science, pp. 671–680, 1983.

[33] H. Murata, K. Fujiyoshi, S. Nakatake, and Y. Kajitani, “Rectanglepacking based module placement,” in Proc. IEEE Int. Conf. Com-puter-Aided Design, 1995, pp. 472–479.

[34] S. Zhao, C. Koh, and K. Roy, “Decoupling capacitance allocation andits application to power supply noise aware floorplanning,” IEEE Trans.Comput.-Aided Design Integr. Circuits Syst., vol. 21, no. 1, pp. 81–92,Jan. 2002.

[35] J. D. Cho, S. Raje, M. Sarrafzadeh, M. Sriram, and S. M. Kang,“Crosstalk-minimum layer assignment,” in Proc. IEEE Custom Integr.Circuits Conf., San Diego, CA, 1993, pp. 29.7.1–29.7.4.

[36] L. McMurchie and C. Ebeling, “Pathfinder: A negotiation based perfor-mance-driven router for FPGAs,” in Proc. ACM Int. Symp. Field-Pro-grammable Gate Arrays, 1995, pp. 111–117.

[37] M. Ho, M. Sarrafzadeh, G. Vijayan, and C. Wong, “Layer assignment formultichip modules,” IEEE Trans. Comput.-Aided Design Integr. CircuitsSyst., vol. 9, no. 12, pp. 1272–1277, Dec. 1990.

[38] GSRC, “Floorplan Benchmark Suites,” Tech. Rep., Online. Available:http://www.cse.ucsc.edu/research/surf/GSRC/GSRCbench.html, 2006.

[39] C. J. Alpert, “The ISPD98 circuit benchmark suite,” in Proc. Int. Symp.Phys. Design, 1998, pp. 80–85.

[40] J. Cong and S. K. Lim, “Edge separability based circuit clustering withapplication to multi-level circuit partitioning,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 23, no. 3, pp. 346–357, Mar.2004.

[41] N. Na, J. Choi, M. Swaminathan, J. P. Libous, and D. P. O’Connor,“Modeling and simulation of core switching noise for asics,” IEEETrans. Adv. Packag., vol. 25, no. 1, pp. 4–11, Feb. 2002.

Jacob Rajkumar Minz (S’05) received the B.Tech.degree in computer science and engineering fromthe Indian Institute of Technology, Kharagpur, in2001 and is currently pursuing the Ph.D. degree atthe School of Electrical and Computer Engineering,Georgia Institute of Technology, Atlanta.

He was with the Advanced VLSI Design Lab, In-dian Institute of Technology, Kharagpur, for one year,where he was involved in the design of digital chips.His areas of interest are physical design automationand algorithms for electronic CAD.

Eric Wong (S’05) received the B.S. degree inelectrical engineering from the State University ofNew York at Binghamton in 2004 and is currentlypursuing the Ph.D. degree in electrical and computerengineering at the Georgia Institute of Technology,Atlanta.

His research interests include physical VLSI de-sign and design automation for VLSI circuits.

MINZ et al.: 3-D SYSTEM-ON-PACKAGE DESIGNS 657

Mohit Pathak (S’05) received the B.Tech. degree incomputer science and engineering from the IndianInstitute of Technology, Kharagpur, in 2004 and iscurrently pursuing the Ph.D. degree in the School ofElectrical and Computer Engineering, Georgia Insti-tute of Technology (Georgia Tech), Atlanta.

He is a Graduate Research Assistant at GeorgiaTech. He was with Magma Design Automation, Indiafor one year, where he worked as a Member of Tech-nical Staff. His areas of interest are physical designautomation and VLSI design.

Sung Kyu Lim (S’94–M’00–SM’05) received theB.S., M.S., and Ph.D. degrees from the ComputerScience Department, University of California,Los Angeles (UCLA), in 1994, 1997, and 2000,respectively.

From 2000 to 2001, he was a Post-DoctoralScholar at UCLA, and a Senior Engineer at AplusDesign Technologies, Inc. He joined the Schoolof Electrical and Computer Engineering, GeorgiaInstitute of Technology, Atlanta, as an Assistant Pro-fessor in August 2001 and the College of Computing

as an Adjunct Assistant Professor in September 2002. He served as a GuestEditor for ACM Transactions on Design Automation of Electronic Systems.His research focus is on the physical design automation for 3-D circuits, 3-Dsystem-on-packages, microarchitectural physical planning, field programmableanalog arrays, and quantum cell automata.

Dr. Lim received the Design Automation Conference Graduate Scholarshipin 2003 and the National Science Foundation Faculty Early Career Develop-ment (CAREER) Award in 2006. He has been on the Advisory Board of theACM Special Interest Group on Design Automation since 2003. He has servedthe Technical Program Committee of several ACM and IEEE conferences onelectronic design automation.