a parallel integer programming approach to global …adavoodi/papers/dr-wu-dissertatio… · a...

A PARALLEL INTEGER PROGRAMMING

APPROACH TO GLOBAL ROUTING

by

Tai-Hsuan Wu

A dissertation submitted in partial fulfillment of the requirements for the degree of

Doctor of Philosophy (Electrical and Computer Engineering)

at the

UNIVERSITY OF WISCONSIN-MADISON

2011

Fair use allowance: You are allowed to reproduce of this dissertation with appropriate credit or citation. If financial profit is involved, i.e., for use in a textbook or sales from distribution of copies, contact me first if further into future. All cited materials are copyright of their respective authors.

©Copyright by Tai-Hsuan Wu 2011

All Rights Reserved

submitted to the Graduate School of the University of Wisconsin-Madison

in partial fulfillment of the requirements for the degree of Doctor of Philosophy

By

The dissertation is approved by the following members of the Final Oral Committee:

Date of final oral examination:

Month and year degree to be awarded:

A PARALLEL INTEGER PROGRAMMING APPROACH TO GLOBAL ROUTING

Tai-Hsuan Wu

Azadeh Davoodi, Assistant Professor, Electrical and Computer Engineering, Chair Jeffrey T. Linderoth, Associate Professor, Industrial and Systems EngineeringKewal K. Saluja, Professor, Electrical and Computer Engineering Mikko H. Lipasti, Professor, Electrical and Computer EngineeringParameswaran Ramanathan, Professor, Electrical and Computer Engineering

May 23, 2011

August 2011

i

University of Wisconsin – Madison

Abstract

A Parallel Integer Programming Approach to Global Routing

Tai-Hsuan Wu

Chair of the Supervisory Committee: Professor Azadeh Davoodi

Electrical and Computer Engineering

This work introduces a parallel algorithm for an important Electronic Design Automation problem known

as Global Routing. Global routing is a stage in the VLSI design cycle during which millions of

interconnects are planned on the chip. The existing global routing procedures are highly sequential in

nature, and thus developing a parallel algorithm is a challenging task.

In this dissertation, first, a global routing procedure known as GRIP is proposed which heavily relies

on Integer Programming (IP) techniques. GRIP decomposes the global routing problem into smaller

subproblems corresponding to rectangular subregions on the chip together with their net assignments.

GRIP solves the individual subproblems and the connection problem between them. It is the first

successful realization of a heavily IP-centric procedure to Global Routing for large industrial instances.

Due to the effective use of Integer Programming techniques and decomposition, GRIP demonstrates

tremendous improvement in wirelength, with the same or lower usage of the routing resources, compared

to the competing global routers in the open literature.

Despite the limited parallelism in solving the subproblems and the procedure to parallel - connect

them, GRIP takes significant time (from many hours to days) to solve the challenging ISPD 2007 and

2008 benchmark instances.

To improve the computational runtime in GRIP without much degradation in the solution quality, next,

this work presents PGRIP, a procedure which allows solving the subproblems with a very high degree of

parallelism. Concurrent processing of the routing subproblems is desirable for effective parallelization.

However, achieving global routing solutions with no (or low) over-usage of routing resources is

ii

challenging without strong, coordinated algorithmic control. PGRIP addresses this challenge via a

patching phase which offers a one-time synchronization between the subproblems to minimize the

likelihood of over-usage of routing resources when attempting to connect them after their concurrent

processing. Patching also relies on Integer Programming techniques and provides a one-time feedback to

each subproblem to enhance its connectivity to its adjacent subproblems.

Similar to GRIP, PGRIP maintains the large gap in the solution quality compared to the competing

academic global routers. Unlike GRIP, it is able to achieve a significantly high degree of parallelism. The

runtimes of different steps in PGRIP can be budgeted by the user. In the computational experiments,

PGRIP achieves the same (wall) runtime (of about 75 minutes) regardless of the size and difficulty of the

problem instance, while running on a grid of few hundred CPUs of only 2GB memory. In return, a more

difficult instance has a higher number of subproblems and is solved using a higher number of parallel

CPUs. Obtaining similar runtimes is a unique feature of PGRIP, compared to the competing global

routing methods which are shown to take significantly longer for more challenging problem instances.

Moreover, the memory usage in both GRIP and PGRIP are significantly lower than the competing

procedures because each processor is assigned to solve a small-sized subproblem.

The third component of this dissertation is utilizing the parallel procedure of PGRIP to solve a

variation of global routing to minimize interconnect power. The work introduces Power-GRIP which

offers an IP formulation for minimizing interconnect power in multi-supply voltage (MSV) domains. The

IP integrates a proposed mathematical modeling for the interconnect power. Power-GRIP adapts and

extends the procedures used in PGRIP, while it is the first work to study interconnect power minimization

in MSV domains. Simulation results demonstrate significant saving in the interconnect power metric for

global routing without any degradation in wirelength and routing resource usage, compared to an initially-

provided and wirelength-optimized routing solution.

In summary, with the aid of large-scale parallelism provided by computational grids, this work

demonstrates that the use of the integer programming, which was perhaps viewed as too time-consuming

and hence impractical for global routing, allowing generating significantly higher quality solutions while

meeting practical runtime requirements.

iii

TABLE OF CONTENTS

Pages

Abstract ·········································································································································· i

Table of Contents ···························································································································· iii

List of Figures ································································································································ v

List of Tables ································································································································ vii

Chapter 1: Introduction ············································································································· 1

1.1 Motivation ························································································································ 2

1.2 Contributions of This Dissertation ·················································································· 3

Chapter 2: Global Routing: Preliminaries and Literature Review ··········································· 8

2.1 Problem Definition ··········································································································· 9

2.2 Fundamental Techniques ································································································· 11

2.3 Shortcomings of the Existing Techniques ······································································· 21

2.4 Multi-objective Global Routing ······················································································· 23

Chapter 3: GRIP: Global Routing via Integer Programming ··················································· 24

3.1 An Integer Program for Global Routing ·········································································· 25

3.2 Solution Procedure via Price-and-Branch ······································································· 26

3.3 Decomposition for Scalability ························································································· 35

3.4 Handling Overflow ··········································································································· 41

3.5 Comparison to Optimization-Based Methods ·································································· 44

3.6 Simulation Results ··········································································································· 45

iv

Chapter 4: PGRIP: A Parallel Integer Programming Approach to Global Routing ·················· 51

4.1 Challenges of Parallelizing GRIP ···················································································· 52

4.2 An Integer Programming Formulation of PGRIP ···························································· 55

4.3 The Parallel Global Routing Procedure ············································································ 56


Chapter 5: Power-GRIP: Power-Driven Global Routing for MSV Domains ··························· 71

5.1 Interconnect Power Modeling in MSV Domains ····························································· 73

5.2 Placement of Level Converters ······················································································· 77

5.3 Power-Driven MSV-Based Global Routing ···································································· 80


Chapter 6: Conclusions and Future Works ··············································································· 96

6.1 Conclusions ······················································································································ 96

6.2 Future Work 1: Layer Directive Global Routing ····························································· 97

6.3 Future Work 2: Enhancing the Correlation Between the Placement and Routing Stages ··· 100

Bibliography ··································································································································· 103

v

LIST OF FIGURES

Figure Number Pages

1.1 Physical design flow ········································································································· 2

2.1 Published articles of the Global Routing problem since 1983. The statistical data is obtained from Microsoft Academic Search with the keyword “Global Routing” ············ 8

2.2 The construction of grid graph for the global routing problem ········································· 9

2.3 A 2-D view of the design after placement (left); A 3-D global routing grid-graph (right) ·· 10

2.4 Maze routing with bounding box may lead to sub-optimal solution in the presence of routing obstacles (designated in red in the figure) ···························································· 12

2.5 Pattern Routing and Monotonic Routing can be used to speedup the shortest path searching process. The search space, however, becomes limited as a result of few pattern possibilities ························································································································· 13

2.6 Two approaches of tree construction for multi-terminal nets ·············································· 14

2.7 Hierarchical Global Routing ······························································································· 17

2.8 (a) A six-terminal net which can be decomposed into five two-terminal sub-nets (b) Routing these five two-terminal subnets independently yields to sub-optimal solution (c) A solution with a smaller wirelength can be found while sharing the resources ········ 22

3.1 Overview of the price-and-branch procedure for GRIP ······················································ 29

3.2 Improving routes via the shortest path algorithm on a weighted grid-graph ······················ 31

3.3 Procedure to identify new candidate routes with reduced cost via rerouting segments of an existing route ·············································································································· 33

3.4 Modifying grid-graph of a subproblem to handle floating terminals ································· 37

3.5 (a) Defining subproblems using initial Flute-based net planning. (b) Improving net assignment to the subproblems via detouring ···································································· 38

3.6 GRIP’s procedure to define and solve the subproblems ······················································ 39

3.7 Connecting route-segments in adjacent subproblems ························································· 40

3.8 Defining few subproblems around the edges with overflow ·············································· 43

3.9 Comparison of GRIP-based partitioning versus uniform-based partitioning in the benchmark adaptec1 [3] ····································································································· 46

vi

4.1 GRIP solves a subproblem with some flexibility in routing the “inter-region” nets, which resulting in exploring a limited parallelism ······························································ 53

4.2 Example of planning inter-region nets when processing two adjacent subproblems independently ······················································································································· 54

4.3 Overview of parallel GRIP ·································································································· 57

4.4 Example illustrating the PGRIP patching procedure ··························································· 61

4.5 Uniform allocation of routing resources for parallel-connecting subproblems ················· 64

4.6 The projected congestion map of adaptec1 benchmark instance at different phases of PGRIP ···························································································································· 68

5.1 Overview of Global Routing with Multi-Supply Voltage ··················································· 72

5.2 MSV-based Global Routing model with level converters ··················································· 73

5.3 Decomposition of net with multi supply voltage levels ······················································ 74

5.4 Modeling route capacitance on a Global Routing edge ······················································ 76

5.5 Dependence of three types of capacitance on edge utilization in metal layer 1 ················· 77

5.6 Valid on-route level converter locations for one net ··························································· 78

5.7 Comparison between (b) wirelength-optimized Global Routing and (c) power-optimized Global Routing ···················································································································· 80

5.8 Convex expression of edge capacitance in metal1 with respect to the edge utilization ······ 81

5.9 Power-aware route generation ····························································································· 86

5.10 Penalizing the edge capacitance if the rerouting of a net causes a larger edge utilization compared to phase 1 ············································································································ 89

5.11 Decomposition into smaller-sized subproblems similar to GRIP and PGRIP ·················· 90

6.1 Different metal layers have different wire widths. The even numbered metal layers run horizontally across the picture, while the odd numbered layers run perpendicular to the picture ······················································································································· 98

6.2 Overview of layer directive Global Routing ······································································ 99

6.3 Inserting buffers in the post-placement stage may create congested regions and cause routability issue ··················································································································· 102

vii

LIST OF TABLES

Table Number Pages

3.1 The ISPD’07 and ISPD’08 benchmarks ·············································································· 47

3.2 Runtime information of GRIP (without the overflow step) ················································ 49

3.3 Results of GRIP for the ISPD’07 and ISPD’08 benchmarks. The wirelength (WL) is scaled to 105 ···································································································································· 50

4.1 Results of PGRIP for the ISPD’07 and ISPD’08 benchmarks. The wirelength (WL) is scaled to 105 ························································································································ 66

4.2 Estimated overflow of the initial subproblems ···································································· 67

4.3 Runtime comparison of PGRIP and GRIP ·········································································· 69

5.1 Results of the level converter placement for the ISPD'08 benchmarks ······························ 93

5.2 Results of Power-GRIP for the ISPD’08 benchmarks. The wirelength is scaled to 105. Power and capacitance are scaled to 103 ·············································································· 94

1

Chapter 1

INTRODUCTION

With the rapid advances in nanometer VLSI process technology, modern circuit design with bil-

lions of transistors is becoming increasingly complex, and in turn placing even higher computing

demands on the Electronic Design Automation (EDA) tools. Aggressive technology scaling, not

only requires simultaneous optimization of conflicting objectives such as power, timing, area, and

noise, but also requires considering the impact of manufacturing inaccuracies such as process vari-

ations and sub-wavelength lithography. These challenges complicate achieving design closure and

prolong the design cycle.

Physical design which precedes the fabrication of a circuit is one of the most critical stages in

the design flow because it is the step where many of the objectives can be accurately modeled and

thus effectively be optimized. Physical design is composed of several steps such as placement, clock

network synthesis, routing, and design for manufacturability. As shown in Figure 1.1, these steps

are applied in an iterative manner. During these iterations, various design objectives are optimized

while checking that a set of design rules are satisfied. If the design rules are not satisfied at the end

of one iteration in the physical design flow, then a new iteration will start and the process continues.

To achieve design closure faster, it is always desirable to speedup the physical design stage

since it is among the most time-consuming steps in the design flow. This can be done with the

recent advances in parallel architectures, by accelerating the individual components in the design

flow (e.g., parallel placement or clock network synthesis). Alternatively, faster design closure is

possible by reducing the number of iterations in the design flow. This may be achieved by improving

the algorithms driving the steps in the design flow to generate a higher quality solution. With the

recent advances in parallel architectures and cloud computing, one way to improve the quality of the

existing algorithms is to revisit the alternative procedures which were considered impractical due to

their high computational demands.

2

Logic Design

Placement & CTS

Routing

DFM

Phy

sica

l Des

ign

Post Layout Simulation

Tape out

Global Routing

Track Assignment

Detail Routing

May incorporate Global Routing in the placement stage for

better congestion estimation

Figure 1.1: Physical design flow.

1.1 Motivation

In this dissertation, we investigate an alternative optimization technique, Integer Programming, for

an important EDA problem known as Global Routing. Integer Programming was considered imprac-

tical for Global Routing due to its large single-thread runtime requirement on large-sized industrial

design instances. Nevertheless, we demonstrate that by utilizing parallel computing, Integer Pro-

gramming is applicable and allows obtaining significant improvement in the solution quality while

meeting the practical runtime requirements.

Global routing fits in a modern physical design flow as shown in Figure 1.1. The routing stage

is further decomposed into three sub-stages of global routing, track assignment, and detail routing.

Global routing plans the approximate routing path for each net for a given placed netlist. The

approximate routing path of each net is then assigned to the available routing tracks, which represent

the physical routing resources in a design. During the detail routing stage, various sophisticated

design rules such as wire spacing constraints and via stacking rules are checked and repaired to

ensure design and manufacturability closure. Among all these three stages, global routing is perhaps

the most critical one. This is because the interconnect planning performed at this stage directly

3

impacts various design objectives such as the chip area, circuit timing, power consumption, the

complexity of detail routing stage and the number of iterations required to complete the design cycle.

Moreover, the solution obtained from global routing is not only utilized during track assignment and

detail routing, but as shown in Figure 1.1, it may also be passed over to the placement stage in order

to provide a more accurate wirelength and congestion estimation and to enhance the routability of

the design.

Due to the fact that the simplest version of the routing problem (i.e., rectilinear routing of a single

multi-terminal net with minimum wirelength) is an NP-complete problem [40], a considerable body

of work in the past three decades has focused on solving the global routing problem, and even its

implementation for various design styles such as gate arrays, sea of gates, standard cell-based and

custom designs [44] [49] [66] [67] .

Global Routing is an inherently sequential EDA problem. On one hand, this is because most

of the state-of-the-art academic and commercial global routers rely on a rip-up and reroute based

procedure, which is iterative by nature and hard to parallelize. The basic step in rip-up and reroute

is to remove one or more routes passing through the congested regions and replace them with new

routes going through the less congested ones. Although various techniques have been proposed to

improve the efficiency of the rip-up and reroute procedure, routing multiple nets simultaneously is

still difficult because of the competition for routing resources [47]. Specifically, if two nets in the

same region are rerouted concurrently, they may use the same routing resource to complete their

new routes and result in unexpected over-utilization of routing resources. On the other hand, one

can try to partition a large-sized design into smaller subregions, and then independently route these

subregions in parallel. Nevertheless, many nets may belong to multiple subregions, and guarantee-

ing the connectivity between adjacent subregions without over-utilizing the routing resources is not

trivial, and in fact is the main challenge of this approach.

1.2 Contributions of This Dissertation

In this dissertation, we present three related topics on parallelizing Global Routing via Integer Pro-

gramming. We first propose a method to decompose a design into rectangular subregions together

with their net assignments to form smaller-sized subproblems. These subproblems are solved via an

4

Integer Programming (IP) based procedure in a systematic order, and then a subregion connection

phase is applied to generate a complete solution. This procedure significantly improves the solution

quality but has a prohibitively large runtime due to the limited parallelism. Thus, to speedup the

runtime, the procedure is extended to solve all the subproblems with a much larger degree of paral-

lelism using only a one-time synchronization which results in significantly enhancing the quality of

the subregion connectivity. At the end, we propose an extended IP formulation of Global Routing

to minimize the contemporary objective of interconnect power which is increasingly gaining impor-

tance in modern VLSI design. We show that a similar parallel procedure is applicable to solve this

extended IP formulation.

1.2.1 Contribution 1: Global Routing via Integer Programming [70] [72]

We first introduce GRIP, a Global Routing technique which heavily relies on integer programming

techniques. As the first step towards achieving parallelism, GRIP decomposes the large-sized prob-

lem into smaller-sized subproblems. The smaller-sized subproblems are solved individually via an

Integer Program (IP), which aims to select one route for each net from a set of promising candidate

routes. Later, the route fragments of the same net in adjacent subproblems are connected to form the

complete Global Routing solution. To further reduce the overflow, an IP-based overflow reduction

phase is applied at the end. The first contribution of this dissertation can be summarized as follow:

• An integer program for the global routing problem which minimizes the wirelength and via

costs of the routed nets as its objective. The procedure is directly applied to a 3-D graph

model of the problem, thus avoiding a commonly-used layer assignment phase;

• Generation of a set of promising candidate routes for each net using a linear-programming

based pricing procedure. The pricing is an iterative procedure that effectively and systemati-

cally considers the impact of currently-generated routes when generating new ones;

• A decomposition procedure to make integer programming applicable to large-sized instances.

The routing problem is divided into a set of balanced subproblems in terms of the complexity

required to solve them. Consequently the runtime of our procedure depends on the number of

subproblems, and some of the non-adjacent ones can be processed in parallel;

5

• A novel method called “floating terminals” for retaining connection flexibility when solving

the decomposed subproblems;

• A final “clean-up” integer programming-based procedure for routing a set of designated nets

to minimize the overflow.

In the simulation results, GRIP achieves an average 9.23% and 5.24% improvement in the sum-

mation of wirelength and via cost for the ISPD’07 and ISPD’08 benchmarks respectively. These

results are compared to the best solutions reported for each case from four state-of-the-art academic

global routers. The remarkable improvement is due to a combination of the concurrent nature of IP,

effective pricing for candidate route generation, directly working with the 3-D model of the problem,

effective decomposition into subproblems, and effective recombination of the solution fragments.

1.2.2 Contribution 2: A Parallel Integer Programming Approach to Global Routing [71]

Although GRIP can produce high quality solutions, it has a prohibitively long runtime to complete

global routing. To tackle this issue, a parallel global routing procedure called PGRIP is presented,

which is able to significantly speed up the (wall) runtime of GRIP by utilizing many more proces-

sors. PGRIP removes a major bottleneck in GRIP by routing all the subproblems independently and

ensuring (through a one-time synchronization) that the routing results of adjacent subproblems can

be effectively patched together. Moreover, the runtimes of different steps in PGRIP can be budgeted

by the user. The memory usage in both GRIP and PGRIP are significantly lower than the competing

procedures because each processor is assigned to solve a small-sized subproblem. In our compu-

tational experiments, PGRIP achieves the same (wall) runtime (of about 75 minutes) regardless of

the size and difficulty of the problem instance while running on a grid of few hundred CPUs of only

2GB memory.

There are several challenges to obtain a high-quality solution (of small wirelength without over-

utilization of routing resources), if processing the subproblems concurrently to realize a parallel

global router. The first challenge is to effectively decompose the routing problem into subproblems

so that the difficulty of the subproblems are balanced. This step can significantly impact the final

solution quality. The second challenge is to generate the subproblem solutions in a manner that can

6

facilitate their connectivity later and avoid overflow. PGRIP addresses both of these challenges. The

following summarizes the second contribution of this dissertation:

• To form the subproblems, we extend GRIP to include a formal procedure for the initial esti-

mation of the distribution of the nets. This is a crucial step to obtain a high quality routing

solution and to achieve balanced subproblems.

• In order to effectively achieve concurrent processing of individual subproblems, we employ a

one-time synchronization approach so that significant portions of the computation can occur

completely without centralized control. This synchronization is via our novel use of an integer

programming patching procedure.

• Our procedure can finish problems of varying sizes and difficulties within the same time

budget, and the number of used processors varies depending on the problem size.

1.2.3 Contribution 3: Power-Driven Global Routing via Integer Programming in MSV domains [73]

To demonstrate the flexibility of the proposed parallel framework for Global Routing, next, mini-

mizing an alternative objective is considered in this work. Instead of the traditional minimization

of wirelength and inter-layer via costs during global routing, minimizing the interconnect power is

introduced which is becoming an increasingly important design objective.

Specifically, this work presents an IP model for interconnect power minimization during global

routing for designs in Multi-Supply Voltage domains (MSV). The mathematical model captures the

dependency on wire size, spacing, and wiring congestion at different metal layers, as well as the

supply voltage utilized at each domain on the chip. We show a similar procedure of GRIP and

PGRIP can be extended to handle this variation of the global routing formulation. This work makes

the following contributions:

• Extend the wirelength-driven pricing procedure in GRIP, to a power-driven one to generate

power-efficient candidate routes.

• Employ a two-phase procedure to handle the nonlinearity of the IP formulation heuristi-

cally. The first phase is applied to minimize the interconnect capacitances (area, fringe, and

7

congestion-dependent coupling capacitances). The second phase is then used to minimize

power by accounting for net activities, voltage levels, while adhering to the interconnect ca-

pacitances obtained from the first phase.

The remainder of this dissertation is organized in the following sections. The definition of Global

Routing and its formulation are described in Chapter 2. The basic techniques that are widely adopted

in the modern global routers are also introduced in this chapter. In Chapter 3, GRIP is presented

based on an Integer Programming model of global routing. The price-and-branch procedure, and

the decomposition and connection of subregions are discussed in detail in this chapter. PGRIP and

its procedure to parallel solve all subregions are presented in Chapter 4. Power-GRIP is presented in

Chapter 5 to minimize the interconnect power in MSV domains. Finally, Chapter 6 concludes this

dissertation and offers a summary and future directions.

8

Chapter 2

GLOBAL ROUTING: PRELIMINARIES AND LITERATURE REVIEW

The Global Routing problem was first defined by Burstein et al. [10] in 1983 in order to simplify

the “complicated” wire routing problem in VLSI design. Since then, a great deal of research efforts

have been dedicated to the global routing problem, and more than four hundred articles have been

published (as shown in Figure 2.1). Specifically, research in global routing has gained momentum

ever since new challenging benchmarks were released during the ISPD 2007 contest [3]. In this

chapter, the global routing problem is defined and some fundamental techniques that are widely

used by the global routing procedures are introduced. An overview of modern state-of-the-art aca-

demic global routers is then presented, followed by a discussion on the global routing challenges for

modern VLSI designs.

Figure 2.1: Published articles of the Global Routing problem since 1983. The statistical data isobtained from Microsoft Academic Search with the keyword “Global Routing”.

9

cells

global edges

global bins

global edges

global bins

cap. = C

Figure 2.2: The construction of grid graph for the global routing problem.

2.1 Problem Definition

The Global Routing problem can be conceptualized on a grid-graph G = (V,E) as depicted in Fig-

ure 2.2. After placement, a chip is partitioned into rectangular regions called global bins. Each

global bin is a vertex v ∈ V in the grid-graph. The boundary between two adjacent global bins is

modeled as an edge e ∈ E. Each edge e is associated with a cost ce. Also given as input is a set

of (multi-terminal) nets N . Each net Ti is defined by a set of vertices (terminals) in V (Ti ⊂ V ).

At the level of Global Routing, the terminals of the nets are assumed to be located at the center of

each global bin. Routing a multi-terminal net is finding a Steiner tree that connects its terminals to

each other. The cost of the tree is the summation of the costs of its edges. For example, when the

cost of an edge is 1 unit, the cost of the tree reflects the wirelength of the corresponding route. The

Global Routing problem finds a set of Steiner trees connecting the terminals of each net Ti,∀i ∈ N .

Furthermore, each edge e ∈ E is associated a capacity ue, reflecting the maximum available routing

resources between its corresponding adjacent bins. If an edge is utilized higher than its capacity,

then the overflow on the edge is computed by adding the units of extra wire usage for that edge.

In modern VLSI design, the routing resources are available in many metal layers. For example,

the ISPD’07 benchmarks have six metal layers—three horizontal layers and three vertical layers [3].

Adjacent layers are connected by (inter-layer) vias. In the grid graph, vias are also modeled as edges

with unlimited capacity. Each vertex is connected to its top and bottom vertices corresponding to

the top and bottom layers (if they exist). For various reasons such as reliability, manufacturability,

10

global edges

global bins

Horizontaledges

Vias

Verticaledges

Figure 2.3: A 2-D view of the design after placement (left); A 3-D global routing grid-graph (right).

area, and signal delay, it is desirable to minimize or control the number of vias during routing. The

grid-graph of global routing can be extended as a 3-D graph as shown in Figure 2.3. The cost of a

via is considered 3 units (to associate a higher penalty with via usage) in the ISPD’07 benchmarks

[3] and 1 unit in the ISPD’08 benchmarks [4].

When evaluating a routing solution, typically two metrics of total wirelength and overflow are

minimized. The total wirelength is the same as the total costs of the routed nets, when the cost

of each via is 1 unit (e.g., ISPD’08 benchmarks [4]). The total overflow is computed as the units

of overflow added over all the edges [50]. Typically, overflow should also be minimized (zero is

desirable) since it directly corresponds to the routability of the design. The wirelength and overflow

are conflicting objectives; to minimize the overflow, the nets passing through the congested regions

need to be detoured, and thus causing an increase in the total wirelength. In modern VLSI design,

a small number of overflow is allowed during Global Routing. This is because additional routing

resources are preserved for the detail routing stage when the design rule violations are repaired,

and these resources can also be used to eliminate the overflow. Additionally, runtime is also an

important metric. This is especially true in the cases in which global routing is repeatedly used

to guide a placement algorithm as a congestion estimation tool to improve the routability of the

design [47] [60]. Overall minimizing wirelength and overflow have been traditionally used since

they correlated well with enhancing routability, timing, power, design rule violations, and other

interconnect-related issues that are of concern in the lower stages of the design.

11

2.2 Fundamental Techniques

Although the global routing procedures have recently achieved remarkable progress, some of the

fundamental techniques have remained the same. In this section, we review these fundamental

techniques and divide them into three categories : 1) routing techniques for a two-terminal net, 2)

decomposition techniques for routing a multi-terminal net, 3) frameworks for routing all the nets.

2.2.1 Routing Techniques for A Two-Terminal Net

The fundamental problem in Global Routing is to find the shortest path for connecting two vertices.

For a routing grid graph as shown in Figure 2.2(b), each edge e in the graph is associated with a

weight we. The weight for example reflects the current utilization of the edge. (Edge weights will

be discussed later when the common routing frameworks are presented.) The single two-terminal

net routing problem identifies the shortest path with minimum weights (of its edges) which connects

the two vertices.

• Maze Routing

Maze routing was introduced in [54] to solve the problem of connecting two vertices in a graph

with the shortest path. Despite various improvements which have been proposed to enhance

the maze routing algorithm in the past few decades, the core procedure of this algorithm

remains intact.

The maze routing algorithm is composed of two stages - propagation and backtracking. Dur-

ing the propagation stage, a wave is propagated in a breadth first traversal of the routing graph

from one of the vertices, designated as the source node, until reaching the other vertex, desig-

nated as the sink node. Next during the backtracking stage, a shortest path is identified in the

reverse direction to connect the source and sink nodes. Several traditional algorithms such as

Lee’s Algorithm [43] and Hadlock’s Algorithm [30] have been proposed to realize the maze

routing for the VLSI routing problem.

Nevertheless, the runtimes of these algorithms are the major bottlenecks for today’s large-

scale designs with millions of grid edges. A common improvement is to create a bounding

12

(a) With bounding box (b) Without bounding box

Figure 2.4: Maze routing with bounding box may lead to sub-optimal solution in the presence ofrouting obstacles (designated in red in the figure).

box around the source and sink nodes to restrict the search region. If no path can be found

to connect these two nodes in this region, the bounding box will be increased and the path

searching algorithm is applied again. In the presence of routing obstacles (which are for

example introduced by the edges that have been utilized to full capacity by the previously-

routed nets), this restricted-bounding box implementation may lead to a sub-optimal solution

as shown in Figure 2.4.

The Dijkstra’s algorithm [21] is the most popular implementation in the latest global routers.

The A* search algorithm [32], which is based on an extension of the Dijkstra’s algorithm,

reorders the searching nodes in the graph to speed up the shortest path searching process.

Specifically, each node in the graph is associated with a cost which is the summation of edge

weights from the source node to the current node and the estimated weight from the current

node to the sink node. The A* search algorithm then visits these nodes in the increasing order

of their costs. Recently, a new procedure is introduced in [47] to further improve the runtime

of maze routing. In this procedure, an upper bound in terms of wirelength is estimated when

finding a path to connect two nodes. During the breadth first traversal in the maze routing

algorithm, any path that exceeds this upper bound is disregarded so that the searching space

becomes restricted.

13

S

T

S

T

L-Shape

Z-Shape

(a) Pattern Routing (b) Monotonic Routing

Figure 2.5: Pattern Routing and Monotonic Routing can be used to speedup the shortest path search-ing process. The search space, however, becomes limited as a result of few pattern possibilities.

• Pattern and Monotonic Routing

As mentioned before, the maze routing algorithm can effectively identify the shortest path

between two terminals. The runtime overhead, on the other hand, is the main shortcoming of

maze routing. In fact, the runtime of the latest academic global routers is mostly occupied by

maze routing. To obtain further speedups, Pattern Routing [42] is alternatively considered. It

is a technique which highly restricts the search space by utilizing specific routing patterns to

connect two terminals. For example, L-shape routing utilizes at most one single bend to route

a two-terminal net, while Z-shape routing contains two bends as shown in Figure 2.5(a).

In general, Pattern Routing can identify a shortest path very quickly, but the solution quality

has a big gap compared to maze routing. To enhance the solution quality, Monotonic Routing

is then proposed [59]. The main idea is to expand the search space compared to pattern

routing to improve the solution quality. This algorithm starts from the source node and then

monotonically increases the routing path until reaching the sink node, as shown in Figure

2.5(b). Although it is faster than the maze routing technique, the search area is still restricted

to the bounding box of the source and sink nodes.

Due to the fact that both Pattern and Monotonic Routings have a limited search space, the

latest academic global routes usually adopt a two-step approach: First Pattern and (or) Mono-

tonic Routing are applied to route all the nets. Then Maze Routing is applied to identify the

shortest paths for those nets which were routed with a very high weight in the previous step.

14

(a) Minimum Spanning Tree (b) Rectilinear Steiner Minimum Tree(a) Rectilinear Minimum Spanning Tree (b) Rectilinear Steiner Minimal Tree

Figure 2.6: Two approaches of tree construction for multi-terminal nets.

2.2.2 Decomposition Techniques for Routing A Multi-terminal Net

In modern VLSI designs, more than 50% of the nets have more than two terminals. The all-pairs

shortest path algorithms such as the Floyd-Warshall algorithm [21] and Johnson’s algorithm [39] are

applicable to route multi-terminal nets. Maze routing algorithm can also be extended to handle this

case. However, these algorithms have a high runtime penalty due to the larger problem size for multi-

terminal nets. Instead, the global routing procedures typically decompose a multi-terminal net into a

set of two-terminal nets, and then route these two-terminal nets individually. The decomposition of

multi-terminal nets itself, however, is an NP-complete problem [28], and there exists many different

algorithms to tackle the decomposition problem.

Two decomposition approaches are common in Global Routing. The first approach is based on

the construction of rectilinear minimum spanning tree (RMST). As shown in Figure 2.6(a), four

terminal points are first sorted by their coordinates, and a spanning tree is constructed by connecting

these terminals in the sorted order. The spanning tree construction may not yield to the best routing

topology, but the main advantage of this approach is its fast runtime. In the past (i.e., before the

ISPD’07 global routing contest [3]), the majority of the global routing algorithms relied on this

approach for multi-terminal net decomposition.

The second approach is to construct the rectilinear Steiner minimal tree (RSMT) as shown in

Figure 2.6(b). Additional Steiner points may be inserted as (additional) net terminals in the decom-

posed (sub)nets during the tree construction to obtain a better tree topology. The RSMT construction

15

process was considered impractical for Global Routing because of its intractable runtime complex-

ity, until a fast RSMT algorithm called Flute [18] proposed recently. For the nets with nine or fewer

terminals, Flute uses look-up tables to identify the optimal Steiner tree. For the nets with more than

nine terminals, a divide-and-conquer method was applied. FastSteiner [41] is another RSMT algo-

rithm which can generate better topologies for the nets with more than nine terminals, compared to

Flute.

Due to the fast runtime and high performance of Flute, it has been adopted by some of the lat-

est academic global routers [12], [76] to decompose the multi-terminal nets. Although the RSMT

approach generates better initial tree topologies, routers need to have the capability to effectively

restructure the tree topologies for avoiding congested regions. On the other hand, the RMST ap-

proach has worse initial tree topologies, but the simplicity of its data structure is the main advantage

(does not need to record the Steiner point locations). However, routers need to spend more efforts to

recover from the bad initial solutions, and it is also necessary to have mechanisms to allow resource

sharing among the wire segments of a tree [13], [47], [61].

2.2.3 Frameworks for Routing All the Nets

• Rip-up and Reroute and History-based Routing

A common framework for routing all the nets is an iterative “repair” framework known as

rip-up and reroute. Rip-up and reroute starts with an initial routing solution which contains

overflow. Within one iteration, the nets are decomposed into two-terminal nets and ordered

(typically according to the overflow in their routes). The nets are visited in a specific order.

When visiting each net, its route is ripped-up and the corresponding edge utilizations are

decreased. Next, the net is rerouted, typically using Maze Routing to find a shortest path of

a smaller weight. The edge weights are updated again to reflect the newly routed net. The

process continues for many iterations, typically until a time-limit is reached or all the nets are

routed without overflow.

When visiting the nets, the ordering procedure may significantly affect the solution quality.

As a result, many existing approaches [12], [16], [27], [31], [50], [58], [59], [76] have focused

on improving the net ordering procedure.

16

During the iterative rip-up and reroute process, the edge weights can be updated according to

a history-based procedure. This procedure, also known as negotiated-congestion routing [50],

was first introduced in the 90’s for routing in FPGAs. Here, the edge weights are gradually

updated such that the routing edges, which consistently remain in the congested regions over

multiple iterations, are assigned a higher weight. This strategy encourages the maze routing

procedure to utilize the edges outside the congested regions to route the nets. In general, the

cost of each edge e can be formulated as [50]

ce = (be +he) · pe (2.1)

where be is an intrinsic cost, he is the cost reflecting congestion history, and pe is the current

congestion penalty. Different routers may have different penalty functions pe. Typically, giv-

ing a higher penalty to the overflow edges can reduce the runtime to obtain an zero-overflow

solution. However, the wirelegth may in turn increase significantly. For example, let ue and re

represent the capacity and current usage of edge e. The penalty function pe in [61] is defined

as

pe =

exp(k(re/

ue −1))

i f re/

ue > 1

re/

ue otherwise(2.2)

At each rip-up and reroute iteration, a route t for a ripped net is found such that its total cost

(∑e ce, ∀e ∈ t) is minimized. After an alternative route is identified, the congestion history he

is updated using the following equation

hk+1e =

hke +hinc i f e has over f low

hke otherwise

(2.3)

The parameter hinc is a constant value in most implementations. The authors in [50] suggest

that only the nets passing through the congested regions need to be rerouted. This technique

is widely used by many academic routers [12], [61], [76].

17

(a) Bottom-up Routing

(b) Top-down Routing

Figure 2.7: Hierarchical Global Routing.

• Hierarchical Routing

An alternative framework for routing all the nets is a hierarchical one. The intuition behind

this framework is to partition the large-sized and complicated global routing instance into a

set of smaller but simpler ones. It can also be categorized as a systematic divide-and-conquer

approach. In general, there are two different approaches for hierarchical routing which are

bottom-up routing and top-down routing [14] [15] [19] [77].

The bottom-up routing approach, as shown in Figure 2.7(a), first partitions a design into a set

of fine-grained bins to form the bottom level. The local nets that completely fall inside the

bins are routed and fixed. In the next level, four adjacent bins are merged into a larger bin,

and again the nets that completely fall inside this bin are routed. This process is continued

until all the small bins are merged together. In this approach, short nets are routed first in the

earlier levels and occupy the routing resources. Later, this approach may fail to find feasible

18

routes for the long nets because of the shortage in the routing resources. Alternatively, the

top-down routing approach starts with one largest bin. The long nets are routed first, as shown

in Figure 2.7(b). Then this bin is gradually partitioned into smaller ones and the short nets

are handled afterwards. Similarly, finding feasible routes for short nets is a major challenge

in this approach.

2.2.4 Literature Review

Existing academic global routers can be categorized into sequential routers [12], [13], [35], [47],

[52], [57], [61], [76] and concurrent routers [7], [9], [10], [16], [34], [65], [77]. Recently, much

attention has been given to the sequential approach due to the faster runtime, but it heavily relies on

the rip-up and reroute technique and suffers from the dependency on net ordering. The concurrent

approach, on the other hand, has potential to generate a higher quality solution. Nevertheless, the

prohibitively longer runtime of the concurrent approach has become a bottleneck for today’s large-

sized Global Routing instances. Following, we briefly introduce the unique features of these modern

academic global routers.

• Labyrinth [42] applies a quick variation of pattern routing. While it does not achieve compet-

itive solutions, its source code is available which facilitated the development of subsequent

routing procedures.

• DpRouter [11] uses an efficient dynamic-programming based pattern routing technique to

achieve better routing solutions for two-pin nets. It also uses a segment-movement RMST

technique to reconstruct the tree structures for avoiding the congested regions.

• ARCHER [57] is an iterative rip-up and reroute approach. It combines several point-to-point

routing techniques to explore the tradeoff between the solution quality and runtime. For the

nets outside the congestion regions, relatively fast routing procedures such as pattern routing

are employed. Expensive routing procedures such as maze routing are then utilized to im-

prove the solution quality. This strategy has been widely used in the subsequent academic

global routers. Furthermore, a Lagrangian relaxation based algorithm is proposed to dynami-

cally modify the Steiner trees to optimize the routing congestion. While upon its publication,

19

Archer had a competitive runtime, its solution quality and runtime significantly fall behind

from the subsequent global routers.

• MaizeRouter [52] is primarily based on two complementary edge-based operations including

extreme edge shifting and edge retraction. Extreme edge shifting is a generalization of edge

shifting that has been enhanced to restructure the Steiner tree topologies particularly to obtain

topologies that reduce routing congestion. Edge retraction is a procedure to allow resources

sharing among the wire segments of a tree to reduce the wirelength. MaizeRouter won the

3-D category of the ISPD’07 Global Routing Contest.

• FastRoute [58], [59], [76] is one of the most competitive routers with respect to the runtime.

The Hanan grid structure is used in FastRoute to construct the Steiner trees. Later, the Steiner

tree construction is enhanced to consider the vias simultaneously. The monotonic routing

and multi-source multi-destination maze routing are the key point-to-point routing techniques

utilized in FastRoute. Despite its fast runtime, FastRoute fails to generate a zero overflow

routing solution for many benchmarks in its earlier versions. Recently, this router has been

improved to generate competitive solutions in terms of both wirelength and overflow.

• BoxRouter [16] [17] is one of concurrent global routing procedures since it iteratively solves

an integer programming (IP) formulation. However, it can also be considered as a sequential

approach. The main idea of BoxRouter is progressive IP. The procedure starts with a small

box around the congested region of the chip and applies IP to route the nets inside the box.

Next, the box is progressively expanded to include more unrouted nets. Although the IP

considers only L-shaped patterns for each two-pin decomposition, one round of maze routing

is applied afterwards to route the nets which could not be successfully routed after solving

the IP. The subsequent BoxRouter 2.0 adds an additional post-routing stage which utilizes

the negotiation-based rip-up and reroute algorithm to further improve the solution quality. In

BoxRouter 2.0, the IP formulation for layer assignment is also extended to take into account

the routing block regions.

20

• FGR [35] [61] is an abbreviation of “Fairly Good Router”. It extends the PathFinder router

of [50] to handle today’s large-scale global routing instances with multiple routing layers.

It offers several technical novelties, such as a particular function to compute the congestion

penalty, a resource sharing procedure to construct tree structures, and a fast layer assignment

followed by a 3-D clean-up phase. FGR won the 2-D category of the ISPD’07 Global Routing

Contest. It also generated the best solution for many benchmarks at that time. BFGR [35] is a

subsequent version of FGR. BFGR improves the memory usage by clustering the neighboring

overflow edges during the rip-up and reroute procedure, and a new history cost function is also

proposed to enhance the negotiation-based procedure.

• NTHU-Route [12] [27] is an traditional iterative router which also uses a rip-up and reroute

framework. It combines and enhances many different techniques proposed by the previous

academic global routers and is able to generate high-quality solutions. The history-based cost

function is perhaps its core strength to distribute the overflow. NTHU-Route also employs a

congested region identification method to specify the net ordering during rip-up and reroute.

The monotonic routing is the basic point-to-point routing algorithm used in NTHU-Route.

The wirelength reduction in NTHU-Route is achieved by using an adaptive multi-source

multi-sink maze routing method. NTHU-Route won the ISPD’08 Global Routing Contest.

• SideWinder [34] is the latest concurrent global router (before this work). It combines pattern

routing and maze routing in an incremental IP formulation. Similar to BoxRouter, it considers

routing all the two-pin nets with L-shapes first. However, instead of iteratively expanding a

small bounding box, the entire routing grid is considered during each pass. In addition to L-

shapes, it also considers all the Z-shapes and selected C-shapes (slightly detoured routes) in

IP to reduce the overflow. Although more patterns are considered in the IP, the routing results

are not competitive in terms of overflow and wirelength.

• NCTU-GR [47] utilizes the traditional rip-up and reroute based algorithm as its core. Its

main contribution is to “parallelize” the sequential rip-up and reroute algorithm. In the task-

based parallel multi-threaded algorithm of NCTU-GR, multiple nets can be ripped up and

rerouted by different threads concurrently. The concurrent routing process also tries to address

21

the challenge of unexpected resource over-usage. Moreover, this router utilizes RMST to

construct the tree structure for multi-terminal nets, but it takes the advantages from the initial

generated RSMT trees to reduce the wirelength.

2.3 Shortcomings of the Existing Techniques

2.3.1 Sequential Routing Techniques

Perhaps the most straightforward strategy for routing multiple nets is to select a specific net ordering

and then route the nets sequentially in that order. The major advantage of this approach is that the

congestion information from the previously routed nets can be taken into consideration while routing

a current net. Due to its simplicity and practicality, most of the state-of-the-art global routers utilize

this sequential approach. Even the procedure of [17], which uses IP as its core, relies heavily on the

pre-routing and post-routing stages which are inherently sequential.

The drawback of a sequential approach is that the solution quality depends greatly on the order

in which the nets are processed, and it is hard to find a good net ordering. For a given net ordering, it

is often more difficult to route the nets that are considered later since they are subject to more block-

ages. Moreover, the sequential approach is inherently hard to parallelize. One possible approach

is to give different net orderings to different threads (cores), and consider the best solution among

those. This strategy can potentially improve the solution quality, but it still can not reduce the run-

time. Alternatively, the parallelization can be achieved by routing multiple nets by different threads

concurrently as proposed in [47]. However, the resource competing issue (i.e., multiple nets use the

same routing resources to complete their routes) is the main challenge. The simulation results in

[47] also show a slight degradation in terms of wirelength using this parallelization strategy.

2.3.2 Net Decomposition and Resource Sharing

Many of the state-of-the-art global routers first use Flute [18] to generate a Steiner tree for each net,

and then decompose this tree into two-terminal sub-nets. Later, point-to-point routing algorithms

such as pattern routing and maze routing are used to route these two-terminal sub-nets individually.

This procedure can significantly reduce the coding development time; it mainly needs the imple-

mentation and optimization of the net ordering and the edge weights. Nevertheless, the routes of the

22

(a) (b) (c)(a) (b) (c)

Figure 2.8: (a) A six-terminal net which can be decomposed into five two-terminal sub-nets. (b)Routing these five two-terminal subnets independently yields to sub-optimal solution. (c) A solutionwith a smaller wirelength can be found while sharing the resources.

two-terminal sub-nets corresponding to the same net may end up being overlapping with each other

after decomposing; further post-processing is necessary in order to compute and consider the actual

wirelength of the route for the un-decomposed net. Figure 2.8 gives an example for this net decom-

position problem. There is a multi-terminal net with six terminal points, and it can be decomposed

into five two-terminal sub-nets as shown in Figure 2.8(a). In Figure 2.8(b), a solution corresponding

to the independent routing of the sub-nets is shown to be sub-optimal. A better solution considering

the overlap and shared resource usage is shown in Figure 2.8(c).

2.3.3 Multi-layer Global Routing

After the release of the ISPD’07 benchmarks, due to the large-size of the 3-D global routing grid-

graph, the subsequent global routing procedures are following a two-step approach to to decompose

and work with smaller-sized subproblems. First, the 3-D global routing grid-graph is projected to a

2-D grid-graph. The capacity of each edge in the grid-graph is the summation of the capacities of its

corresponding edges (that have the same projection) in the 3-D graph. After solving the 2-D routing

problem on the projected graph, next, a procedure called layer assignment [23] is employed, during

which each segment of a route in the 2-D graph is projected back to the 3-D graph, and inter-layer

vias are then used to connect these segments of the same net on different metal layers to each other.

Neglecting the via cost when solving the 2-D routing problem, however, may lead to significant

degradation in solution quality in terms of the total wirelength (assuming the via cost is included

in the wirelength computation). One possible solution is to penalize every bend wire when solving

the routing problem in the 2-D projected graph. Nevertheless, this method still can’t reflect the true

23

via cost, as one wire may directly go from the bottom metal layer to the top metal layer (stacked

vias), which yields to a higher via cost. In this dissertation, a routing framework is proposed which

directly operates on the 3-D global routing grid-graph and avoids the layer assignment phase.

2.4 Multi-objective Global Routing

Due to the deep submicron process technology, the routability and performance of a circuit needs

to be considered simultaneously during Global Routing. This is because the interconnect delay has

become a dominant factor that determines the system performance [51]. Moreover, crosstalk noise

from the coupling capacitance between adjacent wires is also an important factor to determine the

circuit performance and should also be considered during the global routing stage [79]. Previous

work of [33] developed a timing driven global router called TIGER, which is an Integer Program-

ming based approach. In Tiger, the circuit performance is considered by adding a set of path-based

timing constraint to the formulation. In [20], a heuristic-based timing driven global routing method

for standard cell design was proposed. In this approach, the Steiner trees are first constructed and

fixed for the timing critical nets. The non-critical nets are then routed and detoured from the con-

gested regions to improve the routability. The crosstalk avoidance was considered during the global

routing stage in [79]. In this approach, the nets with a crosstalk violation are ripped-up and rerouted

to satisfy the crosstalk constraint.

Furthermore, dynamic power can also be reduced during the global routing stage by rerout-

ing the nets to minimize the power consumed by the routes [62]. Specifically, interconnect power

(signal power) can take a significant portion of the dynamic power spectrum. For example, the

contribution of the interconnect power is reported to be around 30% of dynamic power for a 45nm

high performance microprocessor synthesized using a Structured Data Paths design style and about

18% of the overall power spectrum [62]. Earlier work [48] also reports high interconnect power in

Intel microprocessors. To minimize interconnect power, a route can be detoured from the congested

regions to minimize the coupling capacitance between adjacent routes. It can also be rerouted to

the lower metal layers since the lower metal layers have thinner wire width and therefore have less

capacitance. Recently, [63] proposed a method to minimize interconnect power during the global

routing stage.

24

Chapter 3

GRIP: GLOBAL ROUTING VIA INTEGER PROGRAMMING

In this chapter, we propose GRIP, a Global Routing procedure that heavily relies on integer

programming techniques. Not only GRIP is able to generate solutions for large-sized global routing

instances, but also the solutions found by GRIP demonstrate a considerable improvement in quality

compared to the best solutions in the open literature. In addition, GRIP has minimal dependency on

the nature of the benchmark instances and robustly generates the best solution in each case.

To effectively use integer programming, GRIP decomposes the large-sized routing problem into

smaller-sized subproblems. Each subproblem corresponds to a rectangular subregion on the chip

together with its net assignments. The smaller-sized subproblems are solved individually, and later

the route fragments of the same net in adjacent subproblems are connected. A final phase is ap-

plied to reduce overflow in the congested regions. The above steps are based on solving an integer

program (IP) that aims to select one route for each net from a set of promising candidate routes.

In the simulation results, GRIP achieves on an average 9.23% and 5.24% improvements in the

total cost (i.e., wirelength and via cost) for the ISPD’07 and ISPD’08 benchmarks respectively.

These results are compared to the best solution reported for each case from four state-of-the-art

academic global routers. The significant improvement is possible due to a combination of the con-

current nature of IP, effective pricing for candidate route generation, directly working with the 3-D

model of the problem, effective decomposition into subproblems, and effective recombination of the

route fragments of adjacent subproblems.

The remainder of this chapter is divided into seven sections. In Section 3.1, an integer program

(IP) for the global routing problem is discussed which minimizes the wirelength of the routed nets

as its primary objective. To effectively solve the IP, in Section 3.2, a linear-programming pricing

procedure is proposed to effectively generate a set of candidate routes for each net. Section 3.3

discusses the decomposition of the original problem into smaller-sized subproblems corresponding

to subregions on the chip. It also discusses the procedure to solve each subproblem using an in-

25

troduced concept called “floating terminal” as well as integrating and connecting the solutions of

different subproblems to obtain a complete solution. Consequently the runtime of the GRIP’s pro-

cedure depends on the number of the decomposed subproblems, some of which can be processed in

parallel. In Section 3.4, the IP in GRIP is extended to minimize overflow as a secondary phase and

an algorithmic procedure is introduced to apply this IP in selected congested regions on the chip.

Computational results are reported in Section 3.6.

3.1 An Integer Program for Global Routing

In a mathematical description of the Global Routing problem, we are given a grid-graph G = (V,E)

describing the network topology, a set of (multi-terminal) nets given by N = {T1,T2, . . . ,TN}, (with

Ti ⊂ V ), and edge capacities ue and edge costs ce ∀e ∈ E. Denote by T (Ti) the collection of all

Steiner trees (routes) connecting the terminals in Ti, and let the parameter ate = 1 if Steiner tree t

contains edge e ∈ E, ate = 0 otherwise. Define the binary decision variable xit that is equal to 1 if

and only if net Ti is routed with route t ∈T (Ti). An integer program for the Global Routing problem

can be written as

minx,s

N

∑i=1

∑t∈T (Ti)

citxit +N

∑i=1

Msi (ILP-GR)

∑t∈T (Ti)

xit + si = 1 ∀i = 1, . . . ,N, (3.1)

N

∑i=1

∑t∈T (Ti)

atexit ≤ ue ∀e ∈ E, (3.2)

xit ∈ {0,1} ∀i = 1, . . . ,N,∀t ∈ T (Ti),

si ≥ 0 ∀i = 1, . . . ,N.

The parameter cit is the cost of route t for net Ti which is computed as the total length of the

3-D route, cit = ∑e∋t ce, where the notation e ∋ t denotes that edge e ∈ E is contained in route

t ∈ T (Ti). The equations (3.1) in the (ILP-GR) formulation enforce the routing of each net. The

decision variable si will be positive if net Ti cannot be routed. The objective function trades off the

total routing length with the number of nets that are routed. Typically M is chosen sufficiently large

to ensure that all the nets are routed. The equations (3.2) in the (ILP-GR) formulation ensure that

the given edge capacities are not exceeded.

26

The formulation (ILP-GR) has a number of appealing properties.

1. The exact properties of the route, such as topology and metal layer can be incorporated into

the cost cit of a route, similar to [61]. The formulation can thus directly handle the 3-D Global

Routing problem, avoiding the traditional layer-assignment phase which can be a source of

sub-optimality.

2. The cost of a route can correspond to any other metric such as the area-capacitance of the

route over multiple metal layers.

3. The formulation does not require that the nets be a priori broken into two-terminal sub-nets.

Breaking nets before doing routing can be a significant source of sub-optimality in the re-

sulting final routing [61]. We note that the final version of GRIP has some “net-breaking” to

define subproblems for scalability.

4. The slack variables si and the corresponding objective penalty factor M push the optimiza-

tion to generate a zero overflow routing solution. The model is quite flexible, as with minor

modifications, the integer program can be set to minimize the total overflow.

A significant disadvantage of the formulation (ILP-GR) is its size. First, for a given net Ti, the

number of all the decision variables for this net is equal to |T (Ti)|—the number of possible Steiner

trees connecting the terminals in Ti in the 3-D global routing grid-graph. Second, the number of nets

N and edges E may also be very large. Nevertheless, we use (ILP-GR) as the basis of GRIP. In the

subsequent discussion, we outline the manner in which we deal with the issues posed by the large

formulation size.

3.2 Solution Procedure via Price-and-Branch

GRIP’s procedure to obtain an approximate solution to the above large-scale integer program (IP)

consists of two phases, as shown in Figure 3.1. First, a pricing procedure is used to generate a

set of candidate routes for each net. Second, branch-and-bound is applied to solve the (ILP-GR)

formulation using only the set of generated candidate routes. This two-phase heuristic procedure is

commonly known as price-and-branch [8], [38].

27

3.2.1 Overview of Candidate Route Generation

To generate a set of candidate routes for each net, GRIP solves a linear-programming (LP) relaxation

of (ILP-GR), a relaxation obtained by replacing the binary requirements on the variables xit ∈ {0,1}

with the weaker constraints 0 ≤ xit ≤ 1. The linear program is solved by a column-generation (CG)

procedure [24] during which a subset of all possible routes (GRIP’s candidate routes) are identified.

Solving the LP relaxation via column generation guarantees obtaining the optimal solution to the

linear program, as if all the routes were explicitly considered.

To describe the CG procedure, it is helpful to consider the dual (LPD-GR) of linear programming

relaxation of (ILP-GR):

maxλ≤M,π≤0

∑i∈N

λi + ∑e∈E

πeue (LPD-GR)

s.t. λi +∑e∋t

πe ≤ cit ∀i = 1, . . . ,N,∀t ∈ T (Ti). (3.3)

In a column generation procedure, only a small subset of all possible routes is explicitly included

in the LP relaxation of (ILP-GR). Let S (Ti)⊂T (Ti) be the set of routes considered for net Ti. The

restricted master problem for (ILP-GR) is

minx≥0,s≥0

N

∑i=1

∑t∈S (Ti)

citxit +N

∑i=1

Msi (RMLP-GR)

∑t∈S (Ti) xit + si = 1 ∀i = 1, . . . ,N

∑Ni=1 ∑t∈S (Ti) atexits ≤ ue ∀e ∈ E.

Solving (RMLP-GR) yields a (primal) solution (x, s) as well as values λ ≤ M and π ≤ 0 for the

dual variables in (LPD-GR). By linear programming duality, if the values (λ , π) satisfy all the dual

constraints in (3.3), then (x, s) is an optimal solution to the LP relaxation of (ILP-GR). If not, then

the violated dual constraint suggests that adding the associated column (as a new route variable) to

(RMLP-GR) may reduce its objective value, and the process repeats.

To determine if the dual values (λ , π) (generated from (RMLP-GR) by considering only S (Ti))

are feasible in (LPD-GR) (which includes T (Ti)), we must determine if there exists at least one

28

route t ∈ T (Ti) with λi +∑e∋t πe > cit .

This is itself an optimization problem, known as the pricing problem. The pricing problem can

be decomposed into independent problems for each individual net i = 1, . . . ,N. Specifically, the

pricing problem for net Ti is

mint{cit −∑

e∋tπe | t ∈ T (Ti)}. (PP(Ti))

If the optimal solution value of the formulation (PP(Ti)) is sufficiently small (< λi), then the

values (λ , π) are not dual feasible. Specifically, let t∗ be an optimal solution to the formulation

(PP(Ti)). If

cit∗ −∑e∋t

πe < λi, (3.4)

then t∗ identifies a violated constraint (3.3) in the formulation (LPD-GR). The current solution to

(RMLP-GR) can thus be improved by updating Si to include t∗ as a new column (route).

The CG procedure is summarized as follows:

0. For each i = 1, . . . ,N, initialize S (Ti) with at least one route. (GRIP uses the route generated

for net Ti by the package Flute [18]).

1. Solve the formulation (RMLP-GR), yielding a primal solution (x, s) and dual values (λ , π).

2. For each i=1,. . . ,N, solve the formulation (PP(Ti)), yielding a route t∗. If cit∗ −∑e∋t πe < λi,

then Si = Si ∪{t∗}.

3. If an improving route for some net Ti was found in step 2, return to step 1. Otherwise, stop—

the solution (x, s) is an optimal solution to the LP relaxation of (ILP-GR).

Figure 3.1 illustrates these steps. First, an initial route for each net Ti is generated by Flute

in step 0. These routes are very close to the minimum Steiner trees for the nets. The total of their

29

Step 0: Generate an initial candidate route for

each net using the package Flute

Step 1: Solve (RMLP−GR) using candidate

routes to get dual values for (LPD−GR)

Step 2: Solve the pricing for each net

Step 3:Have new route?

add to candidate routes

Yes

Price:

Branch:Approximately solve (ILP−GR) using

a branch-and-bound solver for the generated candidate routes

No

Figure 3.1: Overview of the price-and-branch procedure for GRIP.

wirelength is likely to give a lower bound on the total wirelength in an optimal solution to the Global

Routing problem. On the other hand, these routes are initially 2-D routes that only use the lowest

horizontal and vertical layers and would result in significant overflow if all used in combination.

After step 1, in a primal solution (x, s) with dual values (λ , π), nets Ti ∈ N that are not able

to be routed completely with the existing Steiner trees in the set S (Ti) will have si > 0 and (by

the complementary slackness condition of linear programming) λi = M. Also by linear program-

ming duality theory, the dual variable πe is the rate of change of the optimal objective value of

(RMLP-GR) per unit change in ue, the capacity of edge e. By observing the condition (3.4), the CG

procedure will naturally seek to find routes for nets Ti with large λi, routing them with edges that

have πe as close to zero as possible. Ideally the routes would use edges with πe = 0, which implies

(again by complementary slackness) that the edge is not being used to capacity. In this way, one can

imagine that the CG procedure helps to iteratively disperse the initial nets from the lower layers to

30

upper layers and from the congested areas to less congested ones. As a result, it is unnecessary to

utilize layer assignment to manipulate the initial 2-D routes.

To summarize, the strengths of the pricing procedure are:

• When generating new candidate routes at each iteration of CG procedure, the impact of can-

didate routes of previous iterations are effectively taken into account. This is by resolving

(RMLP-GR), incorporating the impact of all the existing routes to get a new fractional solu-

tion.

• Within each iteration, solving the pricing procedure effectively identifies new candidate routes

since the objective of (RMLP-GR) is always improved. Moreover, a measure of current con-

gestion is also incorporated in selecting the nets to price. (See Section 3.2.3).

The computational experience with the CG procedure in GRIP was that the objective value of

(RMLP-GR) was quickly improved in the first iterations, but the rate of improvement decreased sig-

nificantly in the later iterations. This “tailing off” phenomenon is very common to the CG procedure

[25]. The improving routes at later iterations of the algorithm almost always come from nets outside

the highly congested areas. Further, the wirelengths of the improving routes are almost identical

to the trees currently available for routing. In these cases, adding the routes to (RMLP-GR) makes

little or no improvement to the objective value. A significant portion of the runtime of the CG pro-

cedure can be spent on iterations that improve the objective value of (RMLP-GR) only marginally.

Thus, in order to speed solution time, GRIP typically stops the procedure once the solution value

has tailed off. Specifically, if the objective value of (RMLP-GR) has made little or no improvement

(less than 10 wirelength units) in the last 20 iterations, the CG procedure is terminated. Next, we

discuss details of steps 2 and 3 of the pricing procedure.

3.2.2 Solving the Pricing Problem for One Net

In the pricing phase (step 2) of the CG procedure, GRIP solves (PP(Ti)) for each net. We rewrite the

objective expression of (PP(Ti)) as

cit − ∑e∋T (Ti)

πe = ∑e∋T (Ti)

(ce − πe) (3.5)

31

u

v

u

v

Figure 3.2: Improving routes via the shortest path algorithm on a weighted grid-graph.

where ce is the cost associated with edge e (e.g., ce=1 when considering wirelength and via count).

To minimize the above objective for net Ti, GRIP considers a weighted graph with edge weights

we = ce− πe. Minimizing the objective of (PP(Ti)) requires finding the smallest-weight Steiner trees

on this weighted graph. Finding a minimum-weight Steiner tree is in general NP-Hard [28], so GRIP

adopts a (heuristic) approach for finding columns that reduce the objective value of (RMLP-GR)

based on local search.

Within the pricing problem, condition (3.4) should be evaluated in step 3 of the CG procedure.

Given a dual solution (λ , π), the reduced cost of route t of net Ti is

cit = cit −∑e∋t

πe − λi (3.6)

Consequently the pricing problem can be viewed as a procedure for identifying a Steiner tree

t for net Ti whose reduced cost cit < 0. By the complementary slackness condition of linear pro-

gramming, for any optimal solution (x, s) to (RMLP-GR) and corresponding dual values (λ , π), the

reduced cost cit = 0 if xit > 0.

GRIP’s local improvement procedure for solving (PP(Ti)) uses this fact as well as the following

simple observation. Given a route t ∈ S (Ti), let V (t) be the set of vertices of the terminals and

Steiner points in t. If the variable xit > 0, and if there exists a path P′ which connects two vertices

(u,v)∈V (t) such that the weight of P′ (with respect to weights w) is less than the weight of the path

P from u to v using edges in t, the reduced cost of tree t ′ = t ∪P′ \P is negative. Thus, adding the

32

variable corresponding to route t ′ to (RMLP-GR) may reduce its objective value. Figure 3.2 shows

inserting such a u-v path into a base Steiner tree.

To approximately solve (PP(Ti)) for a net Ti, GRIP starts with the tree t ∈ S(Ti) with the largest

value of xit . Using edge weights we = ce − πe, a single-source shortest path problem is solved

for some u-v paths, where (u,v) ∈ V (ti). If the new u-v path has smaller length than the existing

path, a route with negative reduced cost has been identified. To identify sources and sinks for the

shortest path problems, the selected route t is decomposed into a set of two-terminal segments r jt

by breaking it at the Steiner points of t. The segments are considered in descending order of their

weight ∑e∋r jt we. When considering segment r jt , the remaining segments of t are considered as a

“base Steiner tree”, and an alternative route of the segment r jt must be found to connect to this base.

Zeroing the weights we = 0 ∀e ∋ t \ r jt and running Dijkstra’s single-source shortest path algorithm

will connect the segment r jt to the base net in a minimum cost fashion [61], [76].

Dijkstra’s single-source shortest path algorithm [26] generates an entire tree of shortest path

weights, thus possibly identifying many routes that would reduce the objective value of (RMLP-GR).

At each iteration of GRIP, a subset of these routes are added as new columns by uniformly sampling

from all the identified routes. At most 40 routes will be added for the nets inside the congested area

(defined in Section 3.2.3), and at most 16 routes are added for nets outside the congested area.

Figure 3.3 demonstrates how new routes are constructed by rerouting a two-terminal u-v seg-

ments. In the figure, the cost and capacity of each edge is 1, and there are two initial routes ta

and tb for nets Ta and Tb, respectively. After solving (RMLP-GR), GRIP sets the edge weights to

we = ce − πe. These edge values are shown in 3.3(a). Note that the two edges with overflow have

large negative dual values (πe << 0), resulting in large positive edge weights. (The penalty for not

completely routing a tree is M = 100 in this example). Based on these edge weights, the cost of

routes ta and tb are 205.

Assume that net Ta is selected for pricing first. As shown in Figure 3.3(a), tree ta can be de-

composed into three segments, each including a terminal. The total edge weight is maximum in the

segment that includes the two edges with large weights, so GRIP starts by rerouting this segment,

using the remaining ones for the base Steiner tree [61], [76]. To reroute the segment, it is removed

from ta, and the edge weights of the remaining edges on the base Steiner tree are set to zero, as

shown in Figure 3.3(b). Thus, GRIP considers the base tree as a backbone when reconnecting ua

33

ta’0.0 0.0

99.0

99.0

0.0

ua

va

0.0 0.0

1.0 1.0

1.0

1.0

0.0

ub

tb’0.0 0.0

0.0

99.0

99.0

0.0

1.0

1.0

1.0

1.0

1.0

vb

tb

ta

99.0

tb’

ta’

(a) (b)

(c) (d)

1.0

1.0 1.0 1.0

1.0

1.0

99.0

1.01.0

1.0 1.0 1.0 1.0 1.0

1.0

Figure 3.3: Procedure to identify new candidate routes with reduced cost via rerouting segments ofan existing route.

and va using Dijkstra’s algorithm.

After reconnecting, an improved route t ′a, avoiding the highly-weighted edges, is identified with

cost of 10 units. In a similar fashion, GRIP considers net Tb and reroutes the segment ub-vb as shown

in Figure 3.3(c). The new segment t ′b for net Tb has a new cost of 9 units. If the reduced costs of t ′a

and t ′b are larger than zero, then these routes are added to (RMLP-GR) as new candidate routes. An

interesting feature of this pricing algorithm is that the new routes can use different Steiner points

than the original ones.

34

3.2.3 Selecting Nets to Price

For large instances of (ILP-GR), the CG procedure can be significantly accelerated by only solving

the pricing problem (PP(Ti)) for a subset of all the nets. To select the nets Ti ∈ N for which

(PP(Ti)) is solved, GRIP takes advantage of information provided by the solution of (RMLP-GR).

Specifically, if si > 0, then net Ti is not completely routed using the existing routes in Si, so net Ti

will be priced. GRIP first prices all the nets in descending order of si (> 0).

To decide whether or not to price the remaining nets with si = 0, GRIP considers measures of

congestion in the current LP solution to (RMLP-GR). In the first measure, congested edges are

those edges e that have the most negative value of πe. The intuition is that πe provides the rate of

change in the objective function of (RMLP-GR) per unit additional capacity on edge e. The second

measure identifies a congested edge by letting ri ∈ argmaxt∈S (Ti) xti be a route for net Ti with the

largest solution value in (RMLP-GR). The value ηe = ∑Ni=1 arie is the number of units of capacity

on edge e that would be used if the routes ri were used for each net Ti ∈ N. If (ηe−ue) is large, then

e is highly congested.

GRIP defines a bounding box (of 3x3 units of grid edges) around an identified-congested edge

e. All nets Ti that contain a terminal inside the bounding box are repriced. GRIP first reprices nets

that are identified by the first congestion measure, followed by nets found by the second measure.

3.2.4 Branch and Bound

Once the CG procedure for the solution of the LP relaxation of (ILP-GR) is complete, either because

no improving routes were found in the pricing phase, or because tailing off was detected, a promising

candidate subset of routes S (Ti)⊂T (Ti) has been identified for each net Ti. Using only these route

variables, the integer program (ILP-GR) is formulated and solved by a black-box commercial integer

programming package. The solution returned by the solver is a feasible solution to the problem.

The proposed approach, based on the direct solution of (ILP-GR), has significant promise to

improve the solution quality of existing academic global routers. For example, using this approach,

we solved the small 2-D IBM01 circuit of the ISPD’98 suite [1] and were able to improve the wire-

length by approximately 5% compared to the best solution found by FGR [61], without any over-

flows. However, the runtime to achieve this high-quality solution for such a relatively-small instance

35

was prohibitively long—a few hours. Thus, in the following section, we discuss mechanisms for

decomposing the full global routing (ILP-GR) problem into smaller instances and procedures for

combining the solutions in order to generate high-quality solutions to large-scale Global Routing

instances. The decomposition procedure considerably accelerates the overall runtime.

3.3 Decomposition for Scalability

Many existing global routing algorithms define reasonably-sized subproblems and create a full so-

lution from integrating the solutions to these subproblems. For example, to achieve a good runtime,

BoxRouter [16] starts by solving an IP over a small rectangular box on the chip and progressively

increases the size of the box to generate new IPs, fixing the solution to the previous IP. However,

this solution fixing when increasing the box size may lead to a degradation in solution quality. The

work [77] proposes a hierarchical IP approach that first solves a small IP to plan the routing of the

longest nets. However, the impact of the shorter nets is neglected.

As demonstrated in Section 3.1, the price-and-branch procedure has potential to find high-

quality solutions, but needs to be accelerated. In this section, we discuss ideas for decomposing

the integer program (ILP-GR) into smaller ones that correspond to non-overlapping rectangular ar-

eas on the chip, together with their net assignments. For example, if all the terminals of a net fall

within a rectangle, then the net is assigned to that subproblem and is bound to be routed inside the

rectangle. We first discuss how GRIP’s IP-based procedure is applicable to solve one subproblem.

We then discuss the procedures to define subproblems and integrate their solutions.

3.3.1 Solving the Subproblems

A subproblem is characterized by a rectangle on the chip referred to as a subregion, together with

a set of nets that must be routed within that area. For some nets, all terminals will lie within the

rectangle, but for longer nets, additional (or all) of their terminals might be outside the rectangle.

Nets whose terminals do not all fall within the rectangle are referred to as inter-region nets. Inter-

region nets are partially routed by each subproblem, and subsequently their segments in different

subproblems are connected.

To be applicable in a decomposition-based procedure, GRIP must handle the case when a sub-

36

problem includes both within-region and inter-region nets. GRIP’s procedure works as follows.

Each subproblem defines a new grid-graph G′(V ′,E ′) and set of nets N ′ ⊂ N . The set N ′ is

composed of two types of nets: the within-region nets that have all terminals inside the subproblm

(Ti ⊆V ′), and the inter-region nets that have at least one terminal outside the subproblem (Ti ⊆V ′).

Figure 3.4(a) shows the latter type of these nets. The net in the figure belongs to three different

subproblems. The neighboring boundaries of these subproblems are shown in bold. The routing

problem for the bottom-left subproblem views this net to have one fixed and two “floating” termi-

nals. Each floating terminal represents a portion of a subproblem boundary through which the net

will connect to another subproblem.

To route inter-region nets in a subproblem, GRIP represents each floating terminal using an aux-

iliary node that is added to the set of nodes V ′ in the grid-graph. Edges connecting the nodes that

are on the subproblem boundary to their corresponding auxiliary node are added to the set E ′. The

added edges have infinite capacity and zero cost in the definition of the integer program (ILP-GR).

Figure 3.4(b) illustrates the addition of auxiliary nodes and edges. After applying this simple con-

struction, the integer program (ILP-GR) is well-defined, and can be solved by the procedure outlined

in Section 3.1.

The example of Figure 3.4(b) is for 2-D routing, but in the general 3-D case, each boundary of

a subproblem is a plane and the graph G′ extends to the third dimension, as shown in Figure 3.4(c).

The nodes on this vertical boundary plane are connected to their corresponding auxiliary node.

3.3.2 Decomposition into subproblems

The challenge of decomposing the problem into subproblems is best understood by means of our

initial computational experience. Our first decomposition approach was to define a uniform grid of

subproblems consisting of the same area. Net assignment to the subproblems was based on their

routings by Flute.

This natural but naive decomposition approach resulted in the IPs corresponding to the congested

subproblems taking significantly longer to be solved by our procedure (e.g., hours for congested sub-

problem and minutes for the less congested ones). Thus, an important objective of the subproblem

definition is to achieve balance, resulting in “equally-difficult” problems that take approximately

37

: floating terminals

auxiliary node

(a)

(b)

(c)

Figure 3.4: Modifying grid-graph of a subproblem to handle floating terminals.

the same time to solve.

GRIP’s procedure for defining subproblems begins by routing all the nets using the 2-D Steiner

route generated by Flute [18]. For each grid edge e of the 2-D problem, a utilization factor is

defined as the ratio of the number of (Flute) routes that cross edge e to its (projected) capacity ue.

The utilization factor plays an important role in defining the subproblem boundaries.

Next, GRIP applies a recursive bi-partitioning strategy, trying to balance an average edge uti-

lization factor (AEU) for each region. At each step, one rectangular partition is divided into two

new rectangles where the AEU is balanced between the two. The AEU for a partition is defined as

the average of the utilization factors of the grid edges in the corresponding rectangle. Moreover, to

decide between a vertical or horizontal partitioning, GRIP chooses the one that results in the smaller

aspect ratio of the generated rectangles. The recursive bi-partitioning stops when any of the sides of

the current partition is less or equal to 32 units of the routing grid, a size empirically set to generate

an IP that can be typically solved by the procedure outlined in Section 3.1 in an acceptable runtime.

This partition will then be taken as a subproblem. Figure 3.5(a) shows an example after the first two

steps.

Once the subproblems are created, the net assignments suggested by Flute are further improved

by considering the congestion of the subproblem.

38

T2

T1

T2

T1

(a) Before detouring (b) After detouring

A A

Figure 3.5: (a) Defining subproblems using initial Flute-based net planning. (b) Improving netassignment to the subproblems via detouring.

Figure 3.5(a) illustrates this point. The two nets T1 and T2 are routed using their Steiner routes,

both of which pass through subproblem A. Net T1 does not have any terminals inside subproblem

A. If A is congested, it is better to detour T1 from A, as shown in Figure 3.5(b), reserving the routing

resources for nets that must be routed into the subproblem.

To improve the net assignments to the subproblems, GRIP relies on the fact that subproblems are

solved in a congestion-based ordering, as described in Section 3.3.3. Before solving a subproblem,

GRIP detours as many nets as possible that “pass” through the corresponding subproblem (i.e., do

not have a terminal in it). The remaining (undetoured) nets inside the subproblem are the ones as-

signed to it and the corresponding subproblem is then solved. The procedure repeats before solving

the subsequent subproblem.

To detour routes out of a subproblem, a shortest path algorithm is used. For a net that does not

have any terminals in the current subproblem, we identify the segment (using its Flute route) that

passes the subproblem, and consequently the two terminals that are connected using this segment.

The two terminals are reconnected via a new segment back to its tree backbone using the same

maze routing procedure explained in the Section 3.2.2 (see Figure 3.5). The weights on the grid

graph for the shortest path problem are set as follows. Since the net should be detoured outside

the subproblem, weights of all the grid-edges inside the subproblem are set to infinity. For the

remaining edges, if an edge is used to capacity by the existing (Flute and detoured Flute) routes,

the weight is set to a large positive number (=100). The remaining edges have a weight of 1. The

39

Temporarily fix the nets using the Steiner routes

generated by Flute

Extract subregions by recursive bi-partitioning

Detour inter-region netsfrom subregion i

Solve IP for subregion i

i = 0

i ++

Figure 3.6: GRIP’s procedure to define and solve the subproblems.

detouring procedure in GRIP has the benefit that it is dynamic, continually updating edge weights

for rerouting, every time a new subproblem is processed. Figure 3.6 shows an overview of GRIP’s

flow to define and solve the subproblems.

3.3.3 Integration of the Solutions of the Subproblems

Thus far, we have explained how the subproblems are formed and how the net assignments are made

to define the subproblems. GRIP solves the subproblems in a rather sequential order with limited

parallelism. After all the subproblems are solved, a final phase connects the route segments which

pass neighboring subproblems. Both of these phases are explained in this section.

For each subproblem, GRIP first computes the total edge overflow (TEO). The TEO is the total

amount of overflow that would occur in the subproblem if the assigned Steiner (Flute and detoured

Flute) routes were used. Subproblems are processed in the decreasing order of their TEOs. Every

time a subproblem is processed, the floating terminals for a net Ti are fixed at a boundary of the

subproblem, as shown in Figure 3.7. Thus, the net Ti is partially routed, and subsequent subprob-

lems must respect this partial routing by assuming the imposed boundary-terminal is fixed. If two

consecutive subproblems (in terms of TEO) are not physically adjacent, GRIP processes them in

parallel.

40

it

Subregion 1 Subregion 2

0.0 0.0

0.0 0.0

0.0

0.0

0.0

0.0

0.0

0.0

: first Steiner point

: fixed terminal

Figure 3.7: Connecting route-segments in adjacent subproblems.

Solving all the subproblems fixes the locations of the floating terminals on the subproblem

boundaries. However, the subproblems are not connected, since the grid-edges between the bound-

aries of subproblems are not considered when solving each subproblem. GRIP uses the same IP-

based procedure to connect the route segments in adjacent subproblems. Specifically, after all the

subproblems are processed, GRIP fixes all the nets that completely fall within a subproblem. For the

inter-region nets, GRIP fixes a “backbone” inside each of its subproblems. To create the backbone,

GRIP removes the branch of the net that connects the boundary terminal to the first Steiner point of

the route in the subproblem. (See Figure 3.7.)

Once these connecting segments are removed, routing resources are freed. GRIP connects these

segments using the formulation (ILP-GR), first fixing all routes of within-region nets and backbones

of the inter-region nets. In the IP, the nets to be routed are two terminal nets crossing the inter-region

boundary, each terminal being a Steiner point of the backbone in the region. By setting the edge

weight we = 0 for all edges in the backbone, the IP effectively connects the two sub-nets at any

location on the backbones. When connecting two neighboring subproblems, remaining (unfixed)

capacity is allocated to the subproblem in a manner that ensures that neighboring subproblems in

all quadrants will be able to be effectively connected.

41

3.4 Handling Overflow

After connecting the subproblem solutions, GRIP evaluates if any net is left unrouted. In case all

the nets were routed, which we found to be the case in the majority of our tested benchmarks, GRIP

terminates. If nets were left unrouted, then routing those nets using any of the generated candidate

routes will introduce overflow (i.e., the corresponding slack variable in Equation (3.1) is 1). In this

section, we discuss an IP and the specifics of price-and-branch procedure to reduce overflow. We

then discuss how GRIP applies this procedure to selected areas on the chip.

3.4.1 Integer Program for Overflow Reduction

GRIP uses the following IP to minimize overflow:

minoe

∑∀e∈E

Qeoe (ILP-OV)

∑t∈T (Ti)

xit = 1 ∀i = 1, . . . ,N, (3.7)

N

∑i=1

∑t∈T (Ti)

atexit ≤ ue +oe ∀e ∈ E, (3.8)

xit ∈ {0,1} ∀i = 1, . . . ,N,∀t ∈ T (Ti),

oe ≥ 0 ∀e ∈ E.

Compared to (ILP-GR), the slack variable si is removed from the net constraint (3.1), but a

new slack variable oe is added to the edge capacity constraints (3.8). The slack variable oe will be

positive if edge e contains overflow, and the objective is to minimize the overflow over all edges.

In (ILP-OV), Qe is a constant weight that can be set to a different value for each edge. In the case

Qe = 1 ∀e, the objective is to minimize the total overflow. GRIP sets Qe by considering the overflow

produced by routing the nets unrouted by the original IP-based procedure. To route an unrouted net,

GRIP selects from the candidate routes for that net, the route that would lead to minimum additional

overflow. If edge e contains overflow in this complete solution, Qe is set equal to the amount of

overflow. If it does not contain overflow, Qe is set equal to 1.

42

3.4.2 Solution Procedure via Price-and-Branch

Similar to IP-based procedure of Section 3.2, GRIP utilizes column generation to solve the LP

relaxation of (ILP-OV). The dual of the LP relaxation of (ILP-OV) is

max ∑i∈N

λi + ∑e∈E

πeue (LPD-OV)

λi +∑e∋t πe ≤ 0 ∀i = 1, . . . ,N,∀t ∈ T (Ti),

−Qe ≤ πe ≤ 0 ∀e ∈ E,

λi : free.

GRIP starts with a small subset of routes and solves the restricted master problem for (ILP-OV):

minx≥0,o≥0

∑∀e∈E

Qeoe (RMLP-OV)

∑t∈S (Ti) xit = 1 ∀i = 1, . . . ,N,

∑Ni=1 ∑t∈S (Ti) atexit ≤ ue +oe ∀e ∈ E.

At the first iteration, S (Ti) only contains one route per net—the route used to obtain the

complete solution. GRIP solves (RMLP-OV) to obtain the dual values λ and π . The pricing

problem is solved to identify a new route t∗ that violates the first constraint of (LPD-OV) (i.e.

λi +∑e∈E πet∗e > 0), indicating the objective of (RMLP-OV) may be improved if t∗ is added to

S (Ti).

When solving the pricing problem to identify a negative reduced cost route for each net, the

edge weight we = −πe is used. Note that πe ≤ 0, so we ≥ 0, and Dijkstra’s single-source shortest

path algorithm can be used to identify the promising routes, exactly as in the procedure described

in Section 3.2.2. Just as in the GRIP procedure for solving (ILP-GR), the linear program solution

process is terminated when tailing off is detected, and the resulting routes are given to a commercial

branch and bound solver to find an integer solution to (ILP-OV).

When selecting nets to price, GRIP first focuses on the nets that pass through the edges with

overflow. Specifically, at one iteration of column generation, for each edge e with oe > 0, all the

43

: edge with max overflow

: fixed terminal

A

B

T1

Figure 3.8: Defining few subproblems around the edges with overflow.

routes t with xit > 0 containing edge e ∈ E(t) are selected and repriced. Since the number of edges

with oe > 0 is typically small, all the corresponding routes are selected, repriced, and their new

route variables are added at the same iteration of column generation. The process repeats until no

improving routes are found, or it tails off if the objective value of (RMLP-OV) has not improved

more than 1 unit of the objective value in the last 20 iterations. Once this procedure is terminated, a

set of candidate routes are found which is used to solve (ILP-OV) to obtain one route for each net.

3.4.3 Defining Subproblems for Overflow Reduction

After integrating the subproblem solutions, overflow was not observed in the majority of the tested

benchmarks. Only for the three benchmarks in the ISPD’08 suite was overflow observed. Further,

overflow was typically confined to a very few “hot spots”. GRIP exploits this observation to define

small-sized subproblems with their net assignments on which to apply the IP-based procedure for

overflow reduction, described in Section 3.4.1. The routes on the other portions of the chip remain

intact.

GRIP defines the subproblems on which to reduce overflow in a sequential order but solves the

corresponding subproblems in parallel. To define the subproblems, GRIP traverses the grid edges in

descending order of their overflow values in a complete solution.

GRIP defines a rectangular subproblem of 40x40 grid edges centered at each overflow edge. If

44

an overflow edge is already included in a previously-defined subproblem, a new subproblem will

not be defined. Moreover, if defining the subproblem for an overflown edge results in overlap with

a previously defined subproblem, the new subproblem will be shifted until the overlap is removed.

All the routes inside a subproblem will be rerouted using the IP-based procedure for overflow

reduction. If a net has terminals outside the subproblem, fixed-terminal location(s) will be defined on

the subproblem boundary, based on its route generated from the previous steps. The fixed terminal

locations are honored when solving the subproblem. Figure 3.8 illustrates this point. This process

also ensures that the segments of the rerouted nets in congested subproblems will remain connected.

3.5 Comparison to Optimization-Based Methods

A number of other authors have proposed optimization-based methods for global routing, and the

purpose of this section is to attempt to place our work in context of these previous contributions.

An early description of applying a pricing procedure to solve the global routing problem is given

by [36]. This work is perhaps the most similar to the GRIP algorithm, in that it relies on column

generation, on defining subregions, and on pasting partial solutions together. In their work, there is

no mention of solving the IP, only its LP relaxation, and computational results are not reported.

The works [7] and [65] both focus on developing fast algorithms for approximately solving the

(full) LP relaxation of the global routing problem. In these approaches, the actual, primal, integer-

valued, routing solution is done by a randomized rounding procedure. This is quite different from

GRIP. GRIP is based on a price-and-branch approach for approximately solving the integer pro-

gram. So both procedures of solving the LP relaxation (pricing), and obtaining an integer solution

(branching) are different.

The paper [56] is a similar approach to that of [7] and [65], but designs an algorithm that can

specifically accounts for the effects of wire spacing during yield optimization. The work [38] is a full

branch-and-price procedure that mathematically shares many commonalities with GRIP. However,

the work [38] is designed specifically for the switchbox routing problem, and the instance sizes are

small enough so that region-based decomposition, done in GRIP, is not needed.

Similar to (ILP-GR), the paper [77] also suggests IP formulations for the global routing problem.

To solve the formulations in [77], column generation is not employed, but rather a set of possible

45

routes for each net is iteratively constructed during a congestion estimation phase. Generating routes

during the LP solution process, as done in GRIP, has the significant advantage of exploiting the dual

information to suggest good routes. The work [77] additionally considers a number of different

objectives besides wirelength or overflow. The paper [78] builds on the work of [77], by describing

different heirarchical approaches, where the routing problems are solved either top-down or bottom-

up. Computational results are given for chips with up to around 25,000 cells and nets.

BoxRouter [17] uses IP formulations as a fundamental component of their algorithm. These IP

formulations are not Steiner-tree packing formulations like (ILP-GR). A fundamental idea behind

BoxRouter is that of progressive IP, where the IP formulation is solved first for a subregion, and

portions of this solution are fixed before proceeding to other regions. This is quite a different

approach than GRIPs area-based decomposition and patching.

3.6 Simulation Results

We first evaluate the quality of the defined subproblems using the GRIP’s procedure in terms of

reaching balance in computation. Next, we present the results reporting the solution quality and

runtime of GRIP.

3.6.1 Evaluation of Subproblem Definition

As discussed in Section 3.3.2, it is a challenge to decompose a large-sized global routing instance

into a set of “balanced” subproblems. A bad decomposition approach may lead to long runtime or

even unroutable solution for the congested subproblems. To demonstrate the degree of balance of

GRIP’s subproblem definition, we considered the adaptec1 benchmark instance from the ISPD’07

benchmark suites [3].

We followed the same decomposition procedure as described in Section 3.3.2 for defining the

subproblems but applied a more aggressive detouring for the long inter-region nets and solved the

subproblems in parallel. Note that all the floating terminals of inter-region nets were modeled as

the auxiliary nodes, and the connection of subproblems was not considered in this experiment. To

measure the computation effort in the pricing phase, we report the number of iterations in the pricing

phase as reported by MOSEK 5.0 [55] which we used to solve each subproblem. Next, to measure

46

GRIP Partition

GRIP Partition

Uniform Partition

Uniform Partition

Subproblem IDSubproblem ID

Subproblem ID Subproblem ID

Figure 3.9: Comparison of GRIP-based partitioning versus uniform-based partitioning in the bench-mark adaptec1 [3].

the effort in the branch-and-bound phase, we report the number of nodes in the branch-and-bound

tree. We collect these two measurements for each subproblem.

We then applied an alternative uniform partitioning for comparison. We defined uniform-sized

subproblems of 32x32 routing grid units and utilized the same routing information as generated

in the previous approach. Similarly, all the subproblems are processed in parallel for which the

MOSEK iterations and the number of explored nodes in the branch-and-bound tree were recorded.

We then compare these two subproblem generation approaches and present the comparison plots in

Figure 3.9.

From this experiment, even though uniform partitioning uses the same initial routes of GRIP’s

partitioning to define the partitions, we can see that GRIP’s partitioning achieves better balance

in terms of number of MOSEK iterations. The uniform partitioning is also highly unbalanced with

respect to the number of branch-and-bound nodes. The plot for uniform partition shows two clusters,

47

Table 3.1: The ISPD’07 and ISPD’08 benchmarks

Benchmark # Nets Grid # Layers V.cap H.cap

adaptec1 (07) 176715 324x324 6 70 70

adaptec2 (07) 207972 424x424 6 80 80

adaptec3 (07) 368494 774x779 6 62 62

adaptec4 (07) 401060 774x779 6 62 62

adaptec5 (07) 548073 465x468 6 110 110

newblue1 (07) 270713 399x399 6 62 62

newblue2 (07) 373790 557x463 6 110 110

newblue3 (07) 551667 973x1256 6 80 80

newblue4 (08) 636195 455x458 6 84 84

newblue5 (08) 1257555 637x640 6 88 88

newblue6 (08) 1286452 463x464 6 132 132

newblue7 (08) 2635625 488x490 8 212 212

bigblue1 (08) 282974 227x227 6 110 110

bigblue2 (08) 576816 468x471 6 52 52

bigblue3 (08) 1122340 555x557 8 148 148

bigblue4 (08) 2228903 403x405 8 202 202

one with 1000 nodes (which we used as a threshold to stop the time-consuming branch-and-bound

procedure) and one with less than 200 nodes. Although GRIP’s partitioning is more balanced,

nevertheless, we still have a small number of unbalanced subproblems.

3.6.2 Comparison of the Solution Quality and Runtime

GRIP was implemented using C++. For solving individual LPs and IPs, MOSEK 5.0 [55] and

CPLEX 6.5 [22] were used, respectively. We report results on the ISPD’07 [3] and ISPD’08 [4]

benchmarks. In the ISPD’07 benchmarks, each via is considered to have a cost of 3 units while in

the ISPD’08 benchmarks, it has a cost of 1 unit. Table 3.1 reports the total number of routed nets

and the grid size for each benchmark. Column 4 shows the number of metal layers. The last two

columns report the projected edge capacities for vertical and horizontal layers.

Table 3.3 reports the solution quality of GRIP for these benchmark instances1. For each bench-

mark, the total overflow is given in the column TOV, and the total wirelength (WL) is broken down

1Benchmark solutions can be downloaded at http://wiscad.ece.wisc.edu/gr/

48

into both edge and via cost. GRIP is compared to four recent academic global routers: FGR 1.1

[61], FastRoute 4.0 [76], NTHU-Route 2.0 [12], and BoxRouter 2.0 [16]. For each router, we report

the percentage improvement in wirelength found by GRIP, as well as the total overflow.

Considering wirelength, GRIP consistently generates the best result for each benchmark. The

improvement numbers are quite significant. The improvement is larger (9.23%) in the ISPD’07

benchmarks since the via cost is 3 for these benchmarks, and therefore the benefits of avoiding layer

assignment and directly generating 3-D routes are more significant. At the same time, if the same

GRIP solutions for the ISPD’07 benchmarks (for via cost of 3) are evaluated assuming via cost is 1,

still an improvement of on average 5.25% in wirelength is obtained.

GRIP generates solutions with no overflow for the majority of the benchmarks, so the overflow

reduction procedure of Section 3.4 need not be applied. For three ISPD’08 benchmarks, newblue4,

newblue7, and bigblue4, the overflow found by GRIP (before applying the overflow step) is reported

in column 10 of Table 3.3. The corresponding overflow and wirelength numbers after applying the

overflow reduction procedures is reported in the last two columns of Table 3.3. GRIP generates

the best known overflow for the first two benchmarks and quite comparable overflow for bigblue4,

while maintaining the wirelength improvement. For these three benchmarks the average degradation

in wirelength compared to GRIP without overflow step is about 0.11%.

GRIP was run on a heterogenous grid of CPUs of 2GB memory, shared by many users, and

controlled by the Condor grid computing toolkit [46]. Table 3.2 reports the run time information, not

including the overflow step. The number of subproblems created by the decomposition procedure

for each benchmark is given in the second column.

As discussed in Section 3.3.2, GRIP uses a congestion-based ordering to process and solve the

subproblems. Based on the ordering, at each step, GRIP solves multiple independent subproblems in

parallel. Column 5 in the Table 3.2 gives the number of such processing steps for each benchmark.

The average and maximum numbers of parallel-processed subproblems at each step are also reported

in columns 6 and 7. The wall clock times are given in columns 3. The runtime unit is minutes. These

wall clock times are computed for the case when the grid would not be shared with other users. The

actual wall clock time for solving the instances was larger, as jobs submit to the Condor-controlled

grid often waited in the job queue while higher-priority jobs were run.

We also report the total CPU runtimes as the summation of the runtimes spent by the CPUs

49

Table 3.2: Runtime information of GRIP (without the overflow step).

Benchmark #Subp. Runtime #Steps #Parallel Subp.

Wall Total CPU Ave. Max.

adaptec1 (07) 100 388 2247 12 8.3 18

adaptec2 (07) 169 455 2677 16 10.6 23

adaptec3 (07) 576 478 5168 32 18.0 38

adaptec4 (07) 570 509 5258 30 19.0 51

adaptec5 (07) 225 584 7133 16 14.1 30

newblue1 (07) 144 483 3076 18 8.0 15

newblue2 (07) 238 467 5228 23 10.4 18

newblue3 (07) 1170 1430 6768 61 19.2 39

newblue4 (08) 174 529 3974 20 8.5 19

newblue5 (08) 311 821 6598 31 9.5 21

newblue6 (08) 140 448 5096 15 8.9 16

newblue7 (08) 325 985 5377 36 9.0 18

bigblue1 (08) 49 339 2770 12 3.9 7

bigblue2 (08) 172 690 3793 21 8.0 20

bigblue3 (08) 208 731 3448 28 7.3 16

bigblue4 (08) 215 726 4400 27 7.6 21

to solve the subproblems in column 4 of Table 3.2. Here, we can see that GRIP’s significant im-

provement in solution quality comes at a considerable CPU time expense. However, comparing

the total runtime to the wall clock time, it can be seen that even the small level of parallelism in

GRIP can yield significant improvement, and reduce computational time to nearly acceptable lev-

els. Next chapter describes further exploiting parallelism to obtain similar high-quality solutions in

even shorter wall clock times.

For the benchmarks with overflow, an additional 30 minutes of walltime was used for solving

the problem (RMLP-OV) to generate the candidate routes, and a 5 hour limit was used for solving

the IP using branch-and-bound. In the overflow case, the IP solver took significantly longer than

when solving similar IPs whose primary objective was wirelengh. For each benchmark, very few

subproblems were defined around the congested areas for overflow reduction. These subproblems

were processed in parallel.

50

Tabl

e3.

3:R

esul

tsof

GR

IPfo

rthe

ISPD

’07

and

ISPD

’08

benc

hmar

ks.T

hew

irel

engt

h(W

L)i

ssc

aled

to10

5 .

Ben

chm

ark

FGR

1.1

Fast

Rou

te4.

0N

TH

U-R

oute

2.0

Box

Rou

ter2

.0G

RIP

(with

outO

Vst

ep)

GR

IP(w

ithO

Vst

ep)

TOV

WL

(%)

TOV

WL

(%)

TOV

WL

(%)

TOV

WL

(%)

TOV

WL

Edg

eV

iaTO

VW

L#

sp

adap

tec1

(07)

08.

420

10.9

90

8.79

011

.99

081

.036

.544

.5–

––

adap

tec2

(07)

08.

330

10.0

10

9.33

012

.60

082

.433

.748

.7–

––

adap

tec3

(07)

07.

140

9.39

07.

690

10.6

10

185.

497

.587

.9–

––

adap

tec4

(07)

03.

940

7.84

07.

360

7.57

017

2.3

91.5

80.8

––

–

adap

tec5

(07)

08.

110

11.7

30

8.18

011

.65

023

8.9

104.

813

4.1

––

–

new

blue

1(0

7)52

610

.99

08.

510

7.76

400

9.73

083

.924

.959

.0–

––

new

blue

2(0

7)0

6.18

010

.49

09.

830

9.83

012

1.4

48.0

73.4

––

–

new

blue

3(0

7)39

908

10.1

431

634

14.2

831

454

6.50

3895

89.

4852

518

156.

176

.279

.945

960

157.

638

Avg

.WL

Impr

.7.

91%

10.4

0%8.

18%

10.4

3%

new

blue

4(0

8)26

24.

0014

47.

1213

84.

6520

03.

9219

612

4.2

83.2

41.0

136

124.

42

new

blue

5(0

8)0

4.36

05.

880

3.79

04.

350

222.

814

7.6

75.2

––

–

new

blue

6(0

8)0

5.44

06.

640

3.61

05.

150

170.

510

2.4

68.1

––

–

new

blue

7(0

8)14

584.

1262

5.91

684.

9720

86.

3611

033

5.5

189.

514

6.0

5433

5.8

1

bigb

lue1

(08)

06.

330

7.24

04.

020

5.76

053

.737

.216

.5–

––

bigb

lue2

(08)

05.

940

10.0

30

5.07

04.

890

86.0

48.3

37.7

––

–

bigb

lue3

(08)

04.

400

3.44

03.

430

3.91

012

6.2

78.7

47.5

––

–

bigb

lue4

(08)

414

4.72

152

8.67

162

4.48

472

4.69

232

220.

512

1.9

98.6

180

220.

71

Avg

.WL

Impr

.4.

91%

6.87

%4.

25%

4.88

%

51

Chapter 4

PGRIP: A PARALLEL INTEGER PROGRAMMINGAPPROACH TO GLOBAL ROUTING

In this chapter, we present PGRIP [71] - a parallel Integer Programming procedure to Global

Routing problem. The goal of PGRIP is to eliminate the bottlenecks that result in a limited paral-

lelism in GRIP. An obvious way to parallelize GRIP would be to parallelize the branch-and-bound

search when solving the Integer Program (IP) of each subproblem. However, achieving high effi-

ciency from a general purpose parallel IP solver running on hundreds of concurrent processors is

a difficult task and an area of active research [45] [75]. Similar to GRIP, the approach taken here

works by decomposing the chip into subproblems but one in which subproblems may be routed inde-

pendently, ensuring (through a one-time synchronization) that resulting routings of the subproblems

can be effectively patched together. The patching itself is also accomplished by solving an IP. The

end result of the work is a parallel global router that is based on the extended IP procedure of GRIP,

but allows for concurrent processing of the subproblems and significant parallelism.

There are several challenges to obtain high-quality solutions from a parallel global router that

relies on concurrent processing of subproblems. The first challenge is effective decomposition of the

routing problem into subproblems—this step can significantly impact the final solution quality. The

second challenge is to generate the subproblem solutions in a manner that later facilitates their con-

nectivity and avoids overflow. Our work addresses both of these challenges. Specific contributions

of our work include the following items.

1. To form the rectangular subregions and the corresponding subproblems, we extend GRIP to

include a formal procedure for the initial estimation of the distribution of the nets. This step

is crucial to obtain a high quality routing solution and to achieve subproblems with balanced

computation runtimes.

2. In order to effectively achieve concurrent processing of individual subproblems, we employ a

one-time synchronization approach so that significant portions of the computation can occur

52

completely without centralized control. This synchronization is via our novel use of an integer

programming “patching” procedure.

3. Our procedure can accept as input a target runtime and produce a high-quality solution within

this limit. The runtime can alternatively be expressed as limits on the number of iterations of

each computational step.

We use various instances of the IP formulation of GRIP for overflow reduction as a core compo-

nent at different phases of our massively parallel procedure. We also introduce a parallel procedure

to independently connect neighboring subproblems.

Similar to GRIP, PGRIP has low memory requirements as it loads individual subproblems within

the local memory of each CPU or core. Specifically in our experiments, cores with a maximum of

2GB of memory were required. The resulting algorithm is highly scalable, concurrently using up

to 725 cores while solving the ISPD’07 and ISPD’08 [3] [4] benchmarks. In contract, in GRIP,

parallelism was limited to roughly 20 concurrent processes. Our routing procedure also achieves

high quality solutions with a runtime limit of 75 minutes, both in terms of wirelength and overflow.

The remainder of the chapter is organized into four sections. Section 4.1 discusses the challenges

of parallelizing GRIP. An Integer Programming formulation for PGRIP is presented in Section 4.2.

Section 4.3 explains the details of our parallel procedure. Specifically, the problem decomposi-

tion method in PGRIP is first introduced followed by the explanation of the pricing, patching, and

repricing phases. The computational results are reported in Section 4.4.

4.1 Challenges of Parallelizing GRIP

Recall that in GRIP, a large-sized global routing problem instance is first decomposed into smaller

subproblems. Each subproblem is a rectangular subregion on the chip together with its net as-

signment. The subproblems in GRIP are processed in the descending order of “difficulty” and the

ordering was crucial to obtain a high quality solution. As a result, only some of the non-neighboring

subproblems could be solved concurrently resulting in a limited parallelism. Since the majority of

the runtime in GRIP is consumed by the sequential processing of the subproblems, our objective in

PGRIP is to concurrently route all the subproblems on different processors so that the (wall) runtime

53

(a) (b)

Figure 4.1: GRIP solves a subproblem with some flexibility in routing the “inter-region” nets, whichresulting in exploring a limited parallelism.

can be improved. However, there are significant challenges to making the GRIP procedure operate

effectively without the centralized algorithmic control.

The first challenge is the defining of subproblems as it can significantly affect both the routing

solution quality and the runtime. Defining the subproblems results in categorizing the nets into inter-

region and intra-region nets. While solving the IP formulation can likely generate good solution

quality for intra-region nets, the assignment of inter-region nets to the subproblems can highly

impact both the wirelength and overflow.

In addition, the subproblem definition can highly impact the runtime. Since our main objective

for a parallel implementation is to improve the runtime, we need to ensure the defined subproblems

will take similar computational effort. As we demonstrated in Section 3.5.1, a bad partitioning

strategy may lead to “unbalanced” subproblems, and the subproblems within the congested regions

were dominating the parallel runtime. It took several hours to solve the congested subproblems,

while the rest of the subproblems only needed few minutes.

Therefore, one challenge in PGRIP is to define the subproblems in a proper manner so that the

planning of the inter-region nets can be effectively obtained to improve the solution quality, and the

difficulty of each subproblem can be balanced to improve the (wall) runtime.

The second challenge on parallelizing GRIP is to ensure that the connectivity between subprob-

lems can be accomplished so that the inter-region nets can be routed with no (or low) overflow. As

shown in Figure 4.1(a), we have limited routing resources (one edge in each metal layer between the

54

Ta Tb

Subproblem 1 Subproblem 2

Ta1 Tb2

Ta2Tb1

Ta1 Tb2

Ta2Tb1

(a) (b) (c)

Figure 4.2: Example of planning inter-region nets when processing two adjacent subproblems inde-pendently.

boundaries of adjacent subproblems) to connect the segments of the inter-region nets. When solv-

ing each subproblem, the “floating-terminal” concept provides the flexibility to route the inter-region

nets to connect anywhere to the subproblem boundary, as shown in Figure 4.1(b). During sequential

solving of the subproblems in the order of their difficulty, the connections of the inter-region nets

in the subproblem boundaries were gradually getting fixed. These boundary “floating-terminal”

locations were then honored by the subsequently-solved subproblems. However, in a parallel imple-

mentation with concurrent processing of the subproblems, connecting different segments of multiple

inter-region nets in the same adjacent subproblems may easily result in overflow in the subproblem

boundaries.

Figure 4.2 shows an example to demonstrate this issue for connecting multiple inter-region nets

in the adjacent subproblems. There are two inter-region nets crossing two subproblems, as shown in

Figure 4.2(a). A congested area is designated (in red) along the boundaries of the subproblems, and

these two subproblems are processed concurrently. If each subproblem is routed independently, then

the connections of the inter-regions nets to the boundary of each subproblem will be unaware of its

other sub-net boundary connection in the neighboring subproblem. As shown in Figure 4.2(b), con-

gestion may happen between the subproblem boundaries due to the long connection routes, which

may cause overflow. This issue happens because the locations of the floating-terminal locations of

each inter-region net is not known. Alternatively, as shown in Figure 4.2(c), if the floating-terminal

locations of inter-region nets from the adjacent subproblems are known, it is more likely to connect

55

the inter-region nets between the subproblem boundaries with less overflow.

Overall, connecting the inter-region nets to achieve an overflow-free solution without a strongly

coordinated algorithmic control between the subproblems is one of the major challenges to paral-

lelize Global Routing using this partition-based strategy. This is the reason why GRIP requires an

ordering and almost-sequential processing of the subproblems to gradually fix the floating-terminal

locations on the subproblem boundaries to ensure the connectivity of inter-region nets.

4.2 An Integer Programming Formulation of PGRIP

A mathematical description of PGRIP, which is a slight variation of the one given in formulations

(ILP-GR) and (ILP-OV) for GRIP, goes as follows. We are given a grid-graph G = (V,E) describing

the network topology, a set of (multi-terminal) nets given by N = {T1,T2, . . . ,TN}, (with Ti ⊂ V ),

and edge capacities ue and weights ce ∀e ∈ E, as discussed in Chapter 3 for GRIP. Denote by T (Ti)

the collection of all Steiner trees (routes) connecting the terminals in Ti, and let the parameter ate = 1

if Steiner tree t contains edge e ∈ E, ate = 0 otherwise. Define the binary decision variable xit that

is equal to 1 if and only if net Ti is routed with route t ∈ T (Ti). An integer program for the global

routing problem can be written as

minx,s

N

∑i=1

∑t∈T (Ti)

citxit + ∑∀e∈E

Qeoe (ILP-PGR)

∑t∈T (Ti) xit = 1 ∀i = 1, . . . ,N

∑Ni=1 ∑t∈T (Ti) atexit ≤ ue +oe ∀e ∈ E

xit = {0,1} ∀i = 1, . . . ,N,∀t ∈ T (Ti),

oe ≥ 0 ∀e ∈ E.

The parameter cit is the cost of route t for net Ti , reflecting its wirelength (including vias), as

described in formulation (ILP-GR) in Chapter 3.

The first set of equations enforces the routing of each net; for each net Ti exactly one route

will be selected. The second set of equations enforces the edge capacity constraint. The decision

56

variable oe will be positive if routing of the nets on edge e results in overflow, and the objective

function trades off the total wirelength with the total overflow. Typically Qe is chosen sufficiently

large to avoid overflow as much as possible.

Compared to the formulations (ILP-GR) and (ILP-OV) presented in Chapter 3 for GRIP, the

formulation (ILP-PGR) has an objective which can be considered as a combination of the objectives

in (ILP-GR) and (ILP-OV) to minimize both wirelength and overflow simultaneously. Furthermore,

(ILP-PGR) has the same constraints as (ILP-OV) and compared to (ILP-GR) it does not have a

slack variable for each net. Moreover, unlike (ILP-OV) which had the same penalty Qe in the GRIP

implementation, in PGRIP different penalties Qe can be assigned for each edge, which we will later

show is particularly helpful in avoiding overflow in our parallel implementation.

In PGRIP, the formulation (ILP-PGR) is used to solve each subproblem, and we approximately

solve (ILP-PGR) by the two-phase price-and-branch procedure similar to GRIP. Note that although

the bounded range of the dual variables λi and πi of the formulation (ILP-PGR) are different from

(ILP-GR), it does not affect the candidate route generation procedure as described in Section 3.2.1.

4.3 The Parallel Global Routing Procedure

In this section, we discuss the details of our parallel global router that removes the requirement

of sequential processing of subproblems. Similar to GRIP, we first generate a routing solution

for each subproblem, and then attempt to connect these partial routing solutions. Unlike GRIP,

these computations can be done almost completely independently. Our parallel global router is also

fundamentally different from GRIP in the way it uses the IP formulation (ILP-PGR) at different

stages of the algorithm, and in the manner in which candidate routes to populate the IP (ILP-PGR)

are generated.

Figure 4.3 gives an overview of our approach. When solving individual subproblems, we modify

the pricing procedure for candidate route generation so that each subproblem receives a one-time

feedback encoding information about the candidate routes of its neighboring subproblems. Given

this information, the subproblem solver “reprices” the nets in order to generate candidate route

fragments that are more likely to connect without overflow.

More specifically, the subproblems first undergo a quick, initial phase to generate a small set of

57

IP-basedpatching

Subproblem1

Subproblem2

Subproblemn

…

feedback to enhance

connectivity

partialrouting

solution

Figure 4.3: Overview of parallel GRIP.

candidate routes. Next, each subproblem sends information on the utilization of its boundaries by

inter-region candidate routes to one or more “master” CPU(s). The master CPU(s) then considers

pairs of neighboring subproblems. For each pair, a “patching” integer program is solved which

locates a desired window on the subproblem boundary for the pseudo-terminal for each inter-region

net. The subproblems then incorporate this feedback in a (longer) reprice procedure to generate

candidate routes that obey these restrictions on the locations of the pseudo-terminals and are more

likely to connect neighboring subproblems without overflow. After the adjusted pricing, a routing

solution is generated for each subproblem using a branch-and-bound based IP solver. In a final

phase, a parallel and distributed IP-based procedure is applied to connect the route fragments from

the neighboring subproblems.

Another important aspect of our parallel global router is the generation of individual subprob-

lems. Defining the initial subproblems can highly impact the final solution quality (as we show in

our simulations).

In this section, we provide more details about each step of our procedure—defining the subprob-

lems (Section 4.3.1), initial pricing at the subproblems (Section 4.3.2), distributed IP-based patching

(Section 4.3.3), adjusted pricing at the subproblems (Section 4.3.4), and parallel connecting of the

subproblems (Section 4.3.5).

58

4.3.1 Defining the subproblems

Defining the subproblems is a crucial step in our procedure which significantly affects the solution

quality and runtime. Poorly defined subproblems contain highly congested areas with many nets.

Congested subproblems are usually difficult and may result in overflow. In addition, the subprob-

lems typically take much longer to solve, resulting in idle time in our parallel procedure that relies

on finishing all subproblems before connecting them.

Two tasks are accomplished by subproblem definition - subproblem boundaries are specified

and nets are assigned to the subproblems. There are different ways to accomplish these tasks. One

way is to first define the boundaries, for example via recursive bi-partitioning of the chip area. The

assignment of each net is defined next, for example based on its 2D-projected route given by the

package Flute [18]. However, defining the assignments solely based on the “Flute estimate” can

result in highly-congested subproblems.

To mitigate the congestion, one could attempt to detour the routes generated by Flute into less

congested subregions, for a better assignment. However, detouring requires knowledge of the sub-

problem boundaries. On the other hand, defining the boundaries without considering an estimate

of the routes and congestion hot-spots during bi-partitioning might significantly limit the amount of

detouring. Therefore the two tasks of subproblem definition are inter-dependent.

A primary contribution of this work is to extend GRIP to obtain a more effective and formal

procedure for subproblem definition. The procedure works as follows:

1) The first step is to generate a routing of all the nets to guide the bi-partitioning. GRIP relies solely

on Flute to generate this routing, while we combine Flute with the IP formulation (ILP-PGR) in the

following manner. First, Flute is used to generate projected 2-D routes for each net. The short nets

are fixed in place, and the linear programming relaxation of (ILP-PGR) is solved. In (ILP-PGR), the

parameter Qe corresponding to edge overflow Oe is set to 1 for all e ∈ E. In practice, we provide as

input a target runtime limit (controlling the number of iterations of column generation) after which

we stop the procedure to get a fractional solution to (ILP-PGR). We then associate a weight with

each route proportional to its fractional value in the solution to the relaxed problem. The weights

are used to select one route for each net via a random procedure where the probability of selecting

a route is proportional to its weight. This is a well-known “randomized rounding” procedure and is

59

better than selecting the route with the highest fractional value for each net, as we also verified in

our implementation.

2) Using the estimated routing generated in Step (1), recursive bi-partitioning is applied to define

the subproblem boundaries. When partitioning, the total number of nets is used as the metric to

balance at each step of the bi-partitioning. Our computational experience indicated that this metric

(as opposed to the AEU used by GRIP) was more correlated with the final solution quality and did

a better job of balancing the computational effort (pricing and branch-and-bound) for solving each

subproblem. The bi-partitioning is terminated when the number of nets in a subproblem is smaller

than 4000, a value empirically determined based on observing the runtime of many subproblems.

In contrast, GRIP stopped when one of the boundaries of a subproblem becomes less than 32 grid

units. Due to the large global routing grid-size, the rectangular subregions formed in GRIP are

smaller and more similar. In contrast, stopping the bi-partitioning based on the number of nets may

result in rectangular subregions that are more varying in size.

3) After fixing the boundaries, we traverse the subproblems sequentially and apply the detouring

procedure of GRIP [72]. The subproblems are processed in the order of their estimated TEO from

the solution obtained in Step (1). Note that we are not solving the subproblem, but merely perturbing

some of the assignments made in Step 1 to obtain more balanced subproblems. This step has a

negligible contribution to the runtime of our routing procedure.

4.3.2 Initial Pricing at the Subproblems

After defining the subproblems, we apply an initial procedure to estimate the utilization of the

boundaries and the locations of the pseudo-terminals to connect the inter-region nets for each sub-

problem. This initial procedure is done independently for each subproblem, which implies that we

allow the generation of candidate routes for inter-region nets that may connect anywhere on the

boundary of the subproblem. After the initial pricing is completed, information from adjacent sub-

problems is sent to a “patching” process (see Section 4.3.3) that determines a window (restricted

region) on the boundary for the location of each pseudo-terminal.

The initial pricing is done by solving the (linear programming relaxation) of the formulation

(ILP-PGR) as described in Section 4.2. A time-bound (or iteration limit) is imposed on the initial

60

pricing phase. In our experiments, a limit of five minutes was used for this step.

The (ILP-PGR) formulation requires the definition of parameters Qe for each edge overflow

variable oe. In the initial pricing phase, we set Qe to be equal to the Manhattan distance of edge e

from the center of the subproblem. Thus, grid edges that are closer to the boundaries have a larger

overflow penalty. As we have previously noted, a major goal (and challenge) of the concurrent

processing of subproblems is to avoid overflow along boundaries when connecting the subproblems.

The weighted overflow penalization is an important factor towards achieving this goal.

In the initial pricing phase, the inter-region nets are allowed to have a pseudo-terminal anywhere

on the corresponding subproblem boundary (see Figure 4.1(a)). In order to assess the utilization of

boundaries by pseudo-terminals in a subproblem, it is important to generate candidate routes for all

the nets in the subproblem, not only the inter-region ones.

4.3.3 Distributed IP-based Patching

Patching is an IP-based procedure that receives as an input two neighboring subproblem boundaries

and the locations of pseudo-terminals on the boundaries from the initial pricing phase. A separate

patching procedure is applied for each pair of neighboring boundaries. The purpose of the patching

phase is to generate feedback to the corresponding two subproblems to enhance their connectivity

through subsequent repricing phase.

Consider the example in Fig. 4.4(a). There are two adjacent subproblems M, L and two inter-

region nets na, nb. The initial pricing phase is first applied independently on these two subproblems

to generate candidate routes for both nets. After this phase, net na has 3 candidate routes in sub-

problem M and 2 in subproblem L as shown in Figure 4.4(d). Based on the candidate routes, in

Figure 4.4(e), the corresponding pseudo-terminals Ma1 to Ma3 for net na in subproblem M and La1,

La2 in subproblem L are then identified. Similarly, the candidate routes and their corresponding

pseudo-terminals for net nb are shown in Figure 4.4(g) and (h) respectively.

Patching simultaneously considers the connection combinations for all the nets crossing the two

boundaries (e.g., both net na and nb have six combinations in Figure 4.4(e) and (h)). Each of these

combinations is encoded as a spanning window on the boundaries as shown in Figure 4.4(f) and

(i). The output of patching is a “restricting” window for each net, describing the permissible range

61

TaM1

TaL2

(d)

Net na

TaM2

TaM3

TaL1

Ma1

Ma2

Ma3

La1

La2

(e)

ca1

ca2

ca3

ca4

ca5

ca6

(f)

TbM1

TbL2

(g)

Net nb

TbM2

TbL1 Mb1

Mb2

Lb1

Lb2

(h)

cb1 cb2 cb3 cb4

cb5

cb6

(i)

TbL3 Lb3

na nb

Subregion M Subregion L

TaM TbL

TaL

TbM

(a) (c)

G’(V’, E’)

v

e

(b)

Figure 4.4: Example illustrating the PGRIP patching procedure.

62

of locations of its two pseudo-terminals on the two boundaries. This restricting window is selected

from the set of existing windows. For each net, one window is generated for the two boundaries, but

different nets can have different windows. These windows are then passed as the feedback to the

two subproblems during the repricing phase.

The patching problem can be posed as an integer program. Assume a routing grid-graph G′ =

(V ′,E ′), where v ∈V ′ is a vertex which represents possible locations of the pseudo-terminal, and e ∈

E ′ is an edge on the boundary of one of the subproblems. An example graph is given in Fig. 4.4(b).

Edges e ∈ E ′ are given a modified capacity that is the sum of the capacities of the boundary edge in

one subproblem and its “mirror” edge in its neighboring one.

For each net i, the IP considers |Li|×|Mi| possible combinations for connecting its two portions.

For net i, each possible combination is denoted by a “virtual route” t spanning its virtual terminals

in V ′. Define the parameter ate = 1 if virtual route t contains edge e ∈ E ′, ate = 0 otherwise. For

each virtual route t for net i, define the binary decision variable xit that will equal to 1 if route t

is selected for net i, and 0 otherwise. Define the parameter cit which is the length of the virtual

route t of net i, in terms of number of the edges in G′. As an example, in Figure 4.4(f), for net na,

we have 6 combinations (virtual routes) with their corresponding span {ca1,ca2,ca3,ca4,ca5,ca6} =

{2,3,1,0,2,1} . Similarly, Figure 4.4(i) demonstrates the combinations of virtual routes and their

corresponding spans for net nb.

For N nets to connect, the patching problem for two neighboring boundaries is mathematically

described as the following integer program:

minx

N

∑i=1

|Li|×|Mi|

∑t=1

citxit +N

∑i=1

Qsi (ILP-PATCH)

∑|Li|×|Mi|t=1 xit + si = 1 ∀i = 1, . . . ,N

∑Ni=1 ∑|Li|×|Mi|

t=1 atexit ≤ ue ∀e ∈ Ev

xit = {0,1} ∀i = 1, . . . ,N,∀t = 1, . . . , |Li||Mi|.

The first set of equations enforces selection of one virtual route for each net. The parameter Q is

63

set be large enough to force all “slack variables” si to take value zero, if possible. The second set of

equations ensures that the given virtual edge capacities are not exceeded. If si = 0 for net i ∈ N, then

exactly one xit variable will be 1 for net i. The corresponding virtual route t, characterized by its two

vertices in V ′, specifies the window on the boundaries of subproblem for net i ∈ N. All subsequent

routes generated by the next pricing phase must obey this constraint. If si > 0 for net i ∈ N in the

solution to (ILP-PATCH), this indicates that inter-region net i is very difficult to route effectively, so

its window is set to the entire boundary of the subproblem.

The second set of equations ensures that the given virtual edge capacities are not exceeded.

Recall the capacity of each virtual edge is the summation of the capacities of its corresponding

edges on the subproblem boundaries, which indicate the available routing resources. Using the

edge capacity constraints, we model the routing resource utilization when trying to choose a set of

windows for all the inter-region nets from the set of identified route-combinations connecting the

nets in the two subproblems.

The parallel routing algorithm solves one patching problem for each pair of subproblem bound-

aries that share at least one inter-region route. These instances of the patching IP are independent

from one another and can be solved in a distributed manner by many CPUs or, since the CPU time

required to solve the patching IPs is minimal, by a single designated processor, as in our implemen-

tation.

In summary, by solving the patching formulation, we consider the impact of all the possible

combinations for connecting a net to the boundaries of its two adjacent subproblems. To further

deal with the large-scale problem size of the patching optimization problem, since the runtime of

the branch-and-bound procedure correlates to the number of combinations, we found that it is not

necessary to consider all the possible combinations between the candidate routes of the nets included

in both of the two adjacent subproblems during the patching phase. The most promising combina-

tions are identified by selecting the candidate routes which have the largest fractional solutions after

the quick initial pricing. In our implementation, we only considered at most 50 combinations for

each net by selecting 10 candidate routes from the more congested subproblem and 5 from the

less congested subproblem. This formulation also simultaneously considers the impact of all the

inter-region nets that pass from the boundaries of adjacent subproblems.

64

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

Figure 4.5: Uniform allocation of routing resources for parallel-connecting subproblems.

4.3.4 Adjusted Pricing at the Subproblems

After each patching IP is solved, the solution, in the form of restricted windows for each inter-

region net is sent back to the processors responsible for the subproblems. At this point, previously

generated candidate routes that do not connect within the specified window range are filtered from

further consideration. Next, a new pricing phase begins wherein candidate routes are generated for

each net while imposing the constraint that the nets can only connect to the boundaries within their

specified windows. This is for the same (ILP-PATCH) formulation as explained in Section 4.3.2. In

earlier experimental work, we tried connecting subproblems using a heuristic method, but found the

IP to generate solutions of much lower overflow.

A time limit is imposed on the adjusted pricing phase (e.g., a limit of 20min in our experiments).

Once the adjusted pricing phase is over, a commercial branch-and-bound IP solver is called to

generate a solution for each subproblem that obeys the patching window constraints.

4.3.5 Parallel Connecting of the Subproblems

The parallel global routing procedure concludes with a final connection and polishing phase. Specif-

ically, after concurrently solving all the subproblems, for each inter-region net, the final segment that

connects its “backbone” to the subproblem boundary is removed, as shown in Figure 3.7.

We then fix all the nets that fall completely inside the subproblems. We also fix the backbones

of the inter-region nets, and implement an IP-based price and branch procedure similar to GRIP to

connect the backbones of the inter-region nets. Here we show how this connection phase can be

65

done in a distributed manner and independently for each pair of neighboring boundaries.

As shown in Fig. 4.5, we divide each subproblem into quadrants. Each quadrant is adjacent to

two neighboring subproblems (e.g., top-right quadrant is adjacent to the top and the right neigh-

boring subproblems). For each routing edge, we divide its remaining capacity (not utilized by the

fixed routes) into equal portions to be allocated for solving two “connection” problems of its two

corresponding subproblems.

For each of the two neighboring boundaries, we solve an IP-based connection problem. Each

subproblem is adjacent to two quadrants, so overall we consider the edges of four quadrants in the

connection problem. For example, for the two boundaries shown in Fig. 4.5, we use the top-left

and bottom-left quadrants of the right subproblem with the top-right and bottom-right quadrants of

the left one. For each edge, we use half of its remaining capacity. We then solve the (ILP-PGR)

procedure to connect the backbones of the inter-region nets.


The parallel global routing procedure was implemented in C++. For solving individual linear pro-

grams (for pricing) and integer programs (for branch-and-bound), the software packages MOSEK

5.0 and CPLEX 6.5, respectively, were used. Parallel processing of subproblems was performed by

submitting jobs to a grid of hundreds of heterogeneous CPUs of 2GB memory, managed by the Con-

dor resource management system. The algorithm was evaluated for the ISPD’07 [3] and ISPD’08

[4] benchmarks. This is to specifically allow full comparison with the GRIP solutions.

A 10min runtime limit was imposed on solving the relaxed (ILP-GR), to define subproblems

(Section 4.3.1). For the initial pricing (Section 4.3.2), repricing (Section 4.3.4), and pricing to con-

nect the subproblems (Section 4.3.5), we set runtime limits of 5min, 20min, and 20min, respectively.

For solving the IP using branch-and-bound after candidate route generation we used a runtime limit

of 10min. We did not limit the patching procedure since this step was very fast. (In general the num-

ber of nets crossing between two subproblems is fairly small). As a result we report slight variation

in the runtimes of our algorithm on different benchmark instances.

In Table 4.1, we compare the solution quality obtained by the parallel IP-based global routing

procedure with existing approaches. For each benchmark, the total overflow (indicated by TOF),

66

Table 4.1: Results of PGRIP for the ISPD’07 and ISPD’08 benchmarks. The wirelength (WL) isscaled to 105.

Benchmark PGRIP FGR FastRoute NTHU-Route GRIP

TOF WL Edge Via TOF WL(%) TOF WL(%) TOF WL(%) TOF WL(%)

adaptec1 (07) 0 82.3 36.5 45.8 0 7.00 0 9.60 0 7.38 0 -1.56

adaptec2 (07) 0 83.4 33.8 49.6 0 7.20 0 8.90 0 8.21 0 -1.24

adaptec3 (07) 0 186.5 97.5 88.9 0 6.61 0 8.87 0 7.15 0 -0.58

adaptec4 (07) 0 173.2 91.5 81.7 0 3.44 0 7.36 0 6.88 0 -0.52

adaptec5 (07) 0 241.5 104.8 136.6 0 7.13 0 10.79 0 7.20 0 -1.07

newblue1 (07) 0 84.9 25.0 59.9 526 9.97 0 7.46 0 6.71 0 -1.14

newblue2 (07) 0 123.3 48.2 75.1 0 4.73 0 9.11 0 8.43 0 -1.55

newblue3 (07) 41K 156.3 76.0 80.3 30K 10.02 32K 14.17 31K 6.38 53K -1.03

Avg. Impr. 6.58% 8.87% 7.42% -1.09%

newblue4 (08) 132 124.9 83.4 41.4 262 3.65 144 6.78 138 4.29 152 -0.44

newblue5 (08) 0 223.9 147.7 76.0 0 3.95 0 5.47 0 3.38 0 -0.44

newblue6 (08) 0 172.0 102.5 69.5 0 4.61 0 5.83 0 2.78 0 -0.88

newblue7 (08) 54 338.4 189.8 148.6 1458 3.37 62 5.17 68 4.22 74 -0.83

bigblue1 (08) 0 54.0 37.3 16.7 0 5.81 0 6.72 0 3.49 0 -0.54

bigblue2 (08) 0 86.5 48.4 38.1 0 5.38 0 9.50 0 4.50 0 -0.64

bigblue3 (08) 0 126.5 78.7 47.8 0 4.20 0 3.24 0 3.22 0 -0.24

bigblue4 (08) 176 221.1 122.0 99.1 414 4.54 152 8.50 162 4.30 186 -0.22

Avg. Impr. 4.44% 6.40% 3.77% -0.53%

total cost of wirelength and via (indicated by “WL”) and the breakdown between wirelength and

via (indicated by “Edge” and “Via” respectively) are reported for PGRIP. For other approaches, we

report the percentage improvement in total cost of wirelength and via (indicated by %WL), and the

overflow. Our solutions were evaluated using the ISPD’08 script and were made available. 1.

Excluding GRIP, the solutions obtained by the parallel global routing algorithm improve signif-

icantly in total cost for each instances (ranging from 3.37% to 10.79%). Compared to GRIP, which

has the best reported solution but impractical runtimes, on average we only have 1.1% and 0.5%

degradation in total cost (WL) of ISPD’07 and ISPD’08, respectively.

Furthermore, the solutions obtain zero overflow for any benchmark that already had zero over-

flow solution from other tools. For benchmarks newblue4 and newblue7, the solutions from the

1Benchmark solutions can be downloaded at http://wiscad.ece.wisc.edu/gr/

67

Table 4.2: Estimated overflow of the initial subproblems.

Benchmark step1+step2+step3 step1+step2 Flute+step2

Avg. Max. Avg. Max. Avg. Max.

adaptec1 (07) 1.29 59 1.41 70 3.56 144

adaptec2 (07) 1.13 85 1.19 94 2.50 183

adaptec3 (07) 0.72 49 0.77 52 2.71 158

adaptec4 (07) 0.27 54 0.31 59 0.97 111

adaptec5 (07) 2.17 100 2.34 118 4.77 306

newblue1 (07) 0.43 40 0.49 43 0.94 73

newblue2 (07) 0.41 79 0.45 83 0.96 131

newblue3 (07) 0.84 660 0.92 748 2.16 1119

newblue4 (08) 1.12 88 1.13 101 2.14 147

newblue5 (08) 2.15 75 2.80 89 4.94 155

newblue6 (08) 1.03 93 1.19 113 2.68 192

newblue7 (08) 7.64 252 8.33 302 13.17 574

bigblue1 (08) 19.31 87 22.78 99 28.77 167

bigblue2 (08) 5.83 59 6.28 62 8.61 93

bigblue3 (08) 9.43 126 10.27 147 17.26 373

bigblue4 (08) 7.02 71 7.95 82 11.33 221

parallel global router have the smallest overflows reported so far (even better than GRIP). This is

likely due to a better definition of the initial subproblems and measuring overflow directly in the IP

formulation.

Next we analyze the quality of defining subproblems before starting their concurrent processing.

We measure it based on an estimate of overflow after defining the subproblems. Recall in defining

subproblems we apply a 3-step approach (see Section 4.3.1): 1) initial routing using a relaxed ILP, 2)

defining boundaries, and 3) detouring routes of step 1 to distribute the net assignments. We estimate

the edge overflow based on the routes generated in step 3, and calculate the average and maximum

edge overflows in each subproblem. We report the average of each of the two quantities over all

the subproblems in columns 2 and 3 of Table 4.2. We also report these two quantities, assuming

detouring is not applied (so the edge overflow is estimated only using the routes of relaxed ILP).

These are reported in columns 4 and 5. As can be seen, the average and maximum overflows of a

subproblem based on the estimate of routes in step 1 and step 3 are not very different. This means

68

(a) (b)

(c) (d)

Figure 4.6: The projected congestion map of adaptec1 benchmark instance at different phases ofPGRIP.

detouring does not result in much improvement after applying the relaxed ILP.

To better show that the relaxed ILP is helpful to define subproblems, we also report the average

and maximum overflow of the subproblems if 2-D routes are taken from Flute, and then subproblems

are generated using step 2. These are reported in columns 6 and 7. Now we can see that our

procedures can significantly improve overflow in initial subproblems before starting their concurrent

processing.

Figure 4.6 demonstrates the projected congestion map of adaptec1 at different phases of our

algorithm. In Figure 4.6(a), an initial net planning is generated by Flute [18]. As can be seen this

congestion map is highly unbalanced and has many hot spots (i.e., edges with overflow). Recall

69

Table 4.3: Runtime comparison of PGRIP and GRIP.

Benchmark PGRIP GRIP

#CPU WCPU(m) TCPU(m) E[#CPU] WCPU(m) TCPU(m)

adaptec1 (07) 90 76 2101 8.3 388 2247

adaptec2 (07) 110 76 2704 10.6 455 2677

adaptec3 (07) 211 77 6319 18.0 478 5168

adaptec4 (07) 221 79 5221 19.0 509 5258

adaptec5 (07) 280 77 3175 14.1 584 7133

newblue1 (07) 122 76 2306 8.0 483 3076

newblue2 (07) 215 77 4192 10.4 467 5228

newblue3 (07) 258 82 14590 19.2 1430 6768

newblue4 (08) 255 77 2944 8.5 529 3974

newblue5 (08) 504 80 4953 9.5 821 6598

newblue6 (08) 459 78 2219 8.9 448 5096

newblue7 (08) 725 86 4788 9.0 985 5377

bigblue1 (08) 124 76 956 3.9 339 2770

bigblue2 (08) 243 77 3411 8.0 690 3793

bigblue3 (08) 326 78 2690 7.3 731 3448

bigblue4 (08) 453 82 3096 7.6 726 4400

Avg. 287 78 4104 11.0 629 4563

that after defining the subregion boundaries, the linear programming relaxation of (ILP-PGR) is

applied followed by detouring as many inter-region nets as possible, as depicted in Section 4.3.1.

After detouring the inter-region nets (which is right before solving the individual subregions), the

number of edges with overflow and the maximum overflow (hotspots) are both decreased, as shown

in Figure 4.6(b). It implies that we now have more balanced subregions. We then apply the price-

and-branch and the patching procedure to solve these subregions concurrently. Figure 4.6(c) shows

the congestion map based on the solutions of the subregions before connecting them. Finally, a legal

solution is generated after connecting the subregions which is shown in Figure 4.6(d).

Table 4.3 reports the runtime comparison of PGRIP and GRIP. The number of parallel processing

jobs for each approach are reported in columns 2 and 5, respectively. The wall runtime of PGRIP

and GRIP are given in columns 3 and 6, indicated by WCPU. In columns 4 and 7, we report the

total runtime (indicated by TCPU). Our target wall runtime in PGRIP was 75 minutes, and the

70

reported runtimes had slight variations, ranging from 76 to 86 minutes. The variations came from

the unbounded runtime of the patching procedure (see Section 4.3.3) and steps 2 and 3 in defining

subproblems. Note that in PGRIP, all the subproblems were concurrently processed so the number of

CPUs in our experiments were equal to the number of subproblems, indicating we were able to take

the advantage of a significant amount of parallelism. This is a nice feature of our approach that the

runtimes in PGRIP are quite similar over different benchmarks, indicating a high level of scalability.

Although GRIP had slightly better solution quality compared to PGRIP, however, GRIP also had

much larger wall runtimes (on average, more than 10 hours to complete a benchmark). This is

because that in GRIP, only some of the non-adjacent subproblems can be processed in parallel. Both

PGRIP and GRIP had similar total runtime, although PGRIP used a more complicated procedure to

handle the inter-region nets. We don’t report the runtimes of competing methods. In comparison,

FastRoute [76], which has the fastest runtime over all the other academic global routers, takes many

hours to complete unroutable benchmarks, while PGRIP consistently finishes all benchmarks in

about 75 minutes.

Another feature of PGRIP is that by changing its runtime limit, we can explore the tradeoff

between runtime and solution quality; for example by allowing more runtime on the pricing phase,

we can generate more candidate routes which in turn can improve the solution quality. Take adaptec1

as an example. Changing the runtime limit of the pricing phases to 30min (instead of 20min in our

first experiment) results in an additional 1.13% improvement in WL. The total runtime is 96min.

Reducing the runtime limit of pricing to 10min, reduces the total runtime to 57 min, but degrades

the WL by 4.41%.

71

Chapter 5

POWER-GRIP: POWER-DRIVEN GLOBAL ROUTING FOR MSV DOMAINS

In this chapter, we present Power-GRIP [73] - a Power-Driven global routing procedure which

supports designs with multiple supply voltages. Power consumption is a primary design objective in

many application domains. Dynamic power still remains the dominant portion of the overall power

spectrum. Design with Multi-Supply Voltage (MSV) allows significant reduction in dynamic power

by taking the advantage of its quadratic dependence on the supply voltage.

Dynamic power is dissipated in combinational and sequential logic cells, clock network, and

the (remaining) local and global interconnects. We refer to the latter as interconnect power. The

interconnects are complex structures in nanometer technologies that span over many metal layers.

The power of a route segment depends on its width, metal layer, and spacing relative to its adjacent

parallel-running routes. These factors determine the area, fringe, and coupling capacitances which

impact power. Furthermore, in MSV designs, the power of a routed net depends on its corresponding

supply voltage. For example, a route will have lower power if all its terminal-cells have the (same)

lower supply voltage. If a net connects a driver cell of lower voltage to a sink cell of higher voltage,

its route includes a level converter (LC) and is decomposed into two segments of low and high

supply voltages, corresponding to before and after the LC.

We propose a global routing method that optimizes the interconnect power in MSV designs.

Figure 5.1 shows a generic design flow for a MSV-based Global Routing. After placement and

voltage assignment, the location and supply voltage of each cell are known. The supply voltage

can be determined for example through voltage island generation [29] [69], or through a row-based

assignment in a standard cell methodology. Furthermore, LC(s) are added to any net that connects

a driver cell to a set of sink cells of higher supply voltage. Next, Global Routing is applied to

minimize the overall wirelength, where the LCs are also included as terminals of a net.

For a given wirelength-optimized Global Routing solution, we propose to further detour the nets

in order to optimize the interconnect power. The interconnect power can be approximated during

72

Placement & Voltage Assignment

Tolerable wirelengthdegradation factor

Wirelength-OptimizedGlobal Routing

Power-OptimizedGlobal Routing

Net extensionwith level converters

Level converter basednet decomposition

Figure 5.1: Overview of Global Routing with Multi-Supply Voltage.

Global Routing since at this stage the metal layers of each route segment are known. Furthermore,

the spacing of parallel routes can be estimated from the routing congestion. Given a wirelength-

optimized solution, the nets can be rerouted to trade off wirelength with power. For example nets

from higher metal layers can be routed to the lower ones for less wire widths and area capacitance.

Nets can also be rerouted to spread the congestion, thereby increasing their spacings for less cou-

pling capacitance. Activity factor and supply voltage can be incorporated as a power-weight for

each route segment.

We present a mathematical formulation for MSV-based Global Routing to minimize power, and

present integer programming-based techniques to solve the formulation. As part of power saving,

our methods spread the routing congestion and ensure no additional overflow (of routing resources)

and a bounded degradation in wirelength compared to the initial solution.

To the best of our knowledge, this is the first work of power-driven global routing in MSV

designs. Previous works of [68] [78] consider maze-routing in conjunction with buffer insertion to

minimize interconnect and buffer power. In [74], co-design of power-grid and interconnect routing

networks is considered. Recently the work [63] discusses power-driven Global Routing, however

it does not consider the MSV case. Also, it relies on the availability of power-efficient candidate

routes for each net but generates such candidate routes quite heuristically.

As part of the contributions of this work, we show a formal procedure to generate power-efficient

candidate routes from the initial WL-optimized solution while taking into account the overall WL

degradation. This is based on a price-and-branch procedure for the proposed power-driven IP. A

parallel solution procedure similar to PGRIP is adopted for decomposing and concurrent solving of

73

cell

global bins

VL

level converter

VHre

VH

VHVL

VH

Figure 5.2: MSV-based Global Routing model with level converters.

subproblems.

The remainder of the paper is divided into 5 sections. Section 5.1 describes our MSV-based

interconnect model. The level convert placement strategy is introduced in Section 5.6. Section 5.3

discusses our formulation and solution procedure for power minimization. Simulation results are

presented in Section 5.4.

5.1 Interconnect Power Modeling in MSV Domains

In this section, we discuss an MSV-based Global Routing model. We assume the level converters

are placed for some of nets and the supply voltage of each cell is known.

5.1.1 Interconnect Modeling in MSV Designs

We are given a grid-graph G= (V,E) model of the Global Routing problem, where each vertex v∈V

corresponds to a global bin containing a number of cells. Each edge e ∈ E represents the boundary

of two adjacent bins. A capacity re is associated with each edge e, reflecting the maximum number

of routes that can pass between two adjacent bins. A net i ∈ {1, . . . ,N} is identified by its terminal

cells, which are a subset of the vertices V . In MSV-based Global Routing, the level converters are

also considered as net terminals. During Global Routing, a Steiner tree ti in G is found for each net

i to connect its terminals. The length of ti is taken to be its wirelength.

Figure 5.2 demonstrates an example. The chip is divided into regions. Each region has either a

low (VL) or high (VH) supply voltage. A routed net is specified in the figure. The net has one driver

74

(a)

level converter

t1 (vH)

s (vL)t2 (vH)

t3 (vH)

(b)

s

s

t2

t3

(c)

n2 (vH)

n1 (vL)n3 (vH)

n2 (vH)

n1 (vL)

n3 (vH)

cannot merge

can merge

Figure 5.3: Decomposition of net with multi supply voltage levels.

terminal with VL voltage and three sink terminals of VH voltage. There are two level converters in

this route and both of them are also considered as additional terminals of the net.

For power-driven MSV-based Global Routing, we first decompose a net which contains level

converters into a set of sub-nets. We reroute each sub-net as an individual net during power op-

timization. Consequently, we have Nd > N number of nets after decomposition. For example, in

Figure 5.3(a), the initial global route is shown with its level converters. The net is decomposed into

three sub-nets, each of which will be rerouted independently. As shown in Figure 5.3(b), the first

sub-net connects the driver terminal in VL to the two level converters. The second one connects one

level converter to one VH terminal. The third one connects the other level converter to the other two

VH terminals.

The decomposition of each net is done using its initial route and the location(s) of its level

converter(s), assuming they are determined before this stage. For a net containing level converters,

starting from its driver terminal, a sub-net corresponding to a low supply voltage is formed that

connects the driver terminal to a set of level converters and/or a set of sink terminals of the same

supply voltage. Next, one or more sub-nets are formed that connect the level converters to the sink

terminals of the same (and higher) voltage level. The BFS algorithm is utilized to traverse the initial

route in our implementation. For example, in Figure 5.3(b), we start traversing from the source node

until reaching the two level converters. All the touched edges form the first sub-net n1 which has

75

a low supply voltage. Next, we continue traversing from each of the level converters individually

until reaching all the sink nodes, using which the sub-nets n2 and n3 with high supply voltage are

then identified.

Our net decomposition procedure is able to find a minimum number of sub-nets for each net that

contains a level converter such that each sub-net has only one corresponding supply voltage. Note

that after rerouting the sub-nets, it is possible that these sub-nets may pass through the same edge(s)

as shown in Figure 5.3(c). If the sub-nets which pass through the same edges have the same voltage

level, (e.g., the sub-nets n2 and n3 in Figure 5.3(c)), then we can merge these sub-nets to release the

over-utilized routing resources. The above procedure is given for the case when two supply voltages

VL and VH exist, which is also the case considered in this dissertation. For higher number of voltage

domains, the procedure can be extended in a similar way.

5.1.2 Power Modeling

Each decomposed net i ∈ {1, ...,Nd} has a corresponding supply voltage Vi and switching activity

αi. The required interconnect power for a Global Routing solution is estimated as

P = fclk ×

(Nd

∑i=1

αiV 2i (C

sinki +Croute

i )

), (5.1)

where fclk is the frequency. As seen in Equation 5.1, the capacitance of routed net i is the sum

of the capacitances of its sink cells (denoted by Csinki ) and of its route (denoted by Croute

i ). Here

Csinki is a constant that does not depend on the re-routing, so it is excluded from the optimization.

Note that the power of the Level converters are considered fixed and thus also not considered as part

of the interconnect power optimization. The capacitance Croutei for a routed net i is the sum of the

capacitances of its unit-length edges that are contained in route ti (given by notation e ∋ ti):

Croutei = ∑

e∋tiCu

e . (5.2)

The parameter Cue is the capacitance of one routed edge e ∈ E. This capacitance is a function of the

metal layer le, wire width we and wire spacing se of the edge e. Specifically,

Cue =Ca(le,we)+2C f (le,we,se)+2Cc(le,we,se), (5.3)

where Ca and C f are the area and fringe capacitances with respect to substrate, and Cc is the

76

Cc

Cf

CaSubstrate

Cc

Cf

Figure 5.4: Modeling route capacitance on a Global Routing edge.

coupling capacitance. As indicated, these capacitances are functions of wire length, width, and

spacing, and are provided by the technology library through a lookup table.

In this work, we assume that only one (and a different) wire width is associated with each metal

layer, so we exclude the parameter we, and for each edge e ∈ E, its metal layer le is known. The

spacing for edge e is estimated from the edge utilization ue in a Global Routing solution. Given

the utilization ue and the length of edge e, (computed from the chip dimension and the routing

grid granularity), the spacing se is calculated to allow maximum spacing between its corresponding

routes. Figure 5.4 shows an example for ue = 3. This simple averaging strategy may be adjusted

if more information is available at the Global Routing stage; (e.g., the adjustment may be due to

the fixed short nets which fall inside a single global routing bin). With this approximation, we

can express the capacitance of a unit-length route-edge in terms of the edge’s metal layer and its

utilization. The total capacitance of edge e is given by the product of the per-unit capacitance Cue

and the utilization ue: Ce =Cue ×ue.

Figure 5.5 (left) shows the curves representing area, fringe, and coupling capacitances for metal

layer 1 with respect to edge utilization for a 45nm library [6], assuming each Global Routing edge

is 2µ . The summation of the 3 capacitances (Cue ) is shown on the right.

77

Figure 5.5: Dependence of three types of capacitance on edge utilization in metal layer 1.

5.2 Placement of Level Converters

In this section, we discuss our on-route level converter placement strategy. This strategy is used

to generate the simulation results in Section 5.4. Given the placement and initial global routing

information, this strategy searches the available placement space near the initial global routes for

placing level converter(s). It has minimum overhead to the design flow since re-legalizing cells is

not necessary in this strategy. Note that this strategy is not designed for general placement, and also

not necessary for the designs with predefined supply voltage regions and inserted level converters.

To guarantee the connectivity, the level converters are placed on the wirelength-optimized route,

initially provided for each net. This also ensures the addition of level converters won’t cause extra

congestion; it allows connecting each level converter to the initial route conveniently just by adding

vias from the level converter to the initial route. Randomly placing the level converters may harm

the Global Routing congestion and degrade total wirelength or overflow.

We list a set of requirements to identify valid level converter insertion cases for a net i with given

route ti. We assume the net has a single source and may have multiple sink terminals.

1. The location of level converter is located at vertices v in ti (v ∋ ti).

2. This vertex v should fall inside a VH voltage island.

3. The global bin corresponding to v should have enough space to add the level converter. We

denote the available space of v by Av and compute it after placement. (See Figure 5.6).

78

VL

level converter

VH

VH

VL

VH

ti

Av

i1 i2 i3 i4

i4

Figure 5.6: Valid on-route level converter locations for one net.

4. For k vertices v1, ...,vk satisfying the above 3 conditions, if all have the same distance to the

source terminal (in terms of the number of edges on ti), we require k level converters to be

added on these vertices simultaneously.

Figure 5.6 shows the set of potential level converter locations of net i with initial route ti. The

source is the terminal in VL island. Note that one vertex in ti can not be used because it is inside the

VL island. We have four cases for valid level converter insertion locations indicated by i1, i2, i3 and

i4. In the latter case, two level converters should be placed on the net after the diverging point on

the route to ensure VH is delivered to both sink terminals. For a single-source net i, we identify all

the cases for valid level converter insertion locations using a breadth first traversal on ti and denote

this set by Li. In this example |Li|= 4. For each case l ∈ Li, we further compute a corresponding

power pil using Equation 5.1, where the edge utilization required to compute coupling capacitance

is obtained from the initial wirelength-optimized solution. The power includes the interconnect

portions on ti and the level converter(s).

To select one level converter insertion case for each net, we define binary variable xil to be

equal to 1 if and only if case l ∈ Li is selected for net i. The level converter placement problem is

expressed as the following Integer Program (IP) which can efficiently be solved using a solver, as

we elaborate in our experiments.

79

minx,s

N

∑i=1

∑l∈Li

pilxil +N

∑i=1

Msi (IP-LC)

∑l∈Li xil + si = 1 ∀i = 1, . . . ,N

∑Ni=1 ∑l∈Li avlxil ≤ Av ∀v ∈V

si ≥ 0 ∀i = 1, . . . ,N

xil = {0,1} ∀i = 1, . . . ,N,∀l ∈ Li.

where the parameter avl is equal to 1 if in case l, a level converter is placed at vertex v. The first set

of constraints ensures at most one level converter insertion case is selected for each net. The slack

variable si will be positive if there is no available space for placing level converters for net i and

is heavily penalized by positive M to maximize the number of placed level converters. The second

constraints ensure level converters are placed in the free placement space.

In addition, it may not be possible to place level converters on a vertex v of the Global Routing

grid because its corresponding global bin is highly congested. We therefore associate for each

vertex v, a constant parameter Av, indicating its available placement space. In our experiments, the

available space is calculated for each global bin according to the placement density.

After solving IP-LC to obtain the level converter location(s) for each net, the nets passing

through the congested region may not be able to find a valid level converter insertion location. In

this case, it is necessary to detour these nets. We first insert all the level converts identified by Equa-

tion IP-LC and decompose the corresponding nets. The remaining available space for each vertex v

is then calculated. We create a bounding box around each failing net, and search the vertices inside

the bounding box for available resources. If more than one vertex has an available space, we select

the one closest to the source node. If non of the vertices have space available, the bounding box

is then expanded to explore more nearby vertices. Once the insertion location is identified, Maze

Routing is utilized to connect the level converter. Note that additional overflow may be introduced

during the detouring. Fortunately, the detouring case is rare, and the introduced overflow may be

recovered in the later phase.

80

n1 : VL, α=0.3

(a)

n2 : VH, α=0.7

n3 : VL, α=0.4

(b) (c)

Figure 5.7: Comparison between (b) wirelength-optimized Global Routing and (c) power-optimizedGlobal Routing.

5.3 Power-Driven MSV-Based Global Routing

In this section, we first utilize an example to show the motivation behind the power-driven global

routing in MSV-domains. Next we present a mathematical formulation of the power-driven MSV-

based Global Routing. We then discuss integer programming-based techniques to obtain high-

quality solutions to the formulation.

5.3.1 Motivational Example

Figure 5.7 demonstrates the advantage of the power-optimized Global Routing over traditional

wirelength-optimized Global Routing. Three nets with different activities and voltage levels are

presented in Figure 5.7(a). Nets n1 and n3 have low activities and low voltage level, and relatively

long initial routes. Net n2, on the contrary, is a short net but has a high activity and high voltage

level. The initial routes of these three nets share two common edges, as shown in Figure 5.7(a). To

reduce the interconnect power, these nets must be detoured to the neighboring edges. In the tradi-

tional wirelength-optimized Global Routing approach, shown in Figure 5.7(b), net n1 is chosen to

be detoured since it is a long net with multi-terminals. In fact, for most of the latest academic global

routers, net n2 is considered to be fixed due to its short wirelength. Fixing the short net n2, how-

81

0.08

0.12

0.16

15 20 25 30

Edg

e C

apac

tianc

e(F

f)

Edge Utilization (ue)

2um

3um 3

ur4ur 5

ur 6ur

2ur

Figure 5.8: Convex expression of edge capacitance in metal1 with respect to the edge utilization.

ever, will lose the opportunity of power saving. Alternatively, both the activity and voltage level are

simultaneously considered when optimizing power, and net n2 is detoured to achieve more power

saving as depicted in Figure 5.7(c).

5.3.2 Mathematical Formulation

As described in Section 5.1.2, the per-unit capacitance of an edge e (Cue ) is a function of its metal

layer and the edge utilization. Typically, this function is a convex increasing function, as depicted in

Figure 5.8. We represent the function Cue by a set of line segments denoted by Qu

e . For example, the

set Que is composed of 7 line segments in the library used in this work [6]. Each line segment q ∈ Qu

e

is of the form muq+ru

que, for a given range of ue, where muq and ru

q are derived from the library for that

range. For each of the 8 metal layers in our library, the curve Cue is represented as 7 piecewise-linear

segments.

82

Since the per-unit capacitance is convex, its value may be expressed in our mathematical opti-

mization problem for Global Routing with the following set of linear inequalities:

muque + ru

q ≤Cue ,∀q ∈ Qu

e . (5.4)

For a given edge utilization ue, the corresponding Cue is obtained from the line equation that

gives the largest value of muque + ru

q for q ∈ Que .

To model Global Routing, we are given a routing grid graph G = (V,E), a set of decomposed

multi-terminal nets denoted by Nd , and edge capacities re. Let Ti be a collection of all Steiner trees

that can route net i. We later discuss how to approximate Ti by generating a set of power-efficient

candidate trees with consideration of wirelength degradation. Each tree t ∈ Ti is associated with

a binary decision variable xit which is equal to 1 if and only if it is selected to route net i. Let the

parameter ate be equal to 1 if tree t contains edge e (if e ∋ t). The Global Routing problem for power

minimization is given by:

minx,s,Cu

Nd

∑i=1

∑t∈Ti

αiV 2i (∑

e∋tCu

e )xit +Nd

∑i=1

Msi (IP-POW)

∑t∈Ti xit + si = 1 ∀i = 1, . . . ,Nd

∑Ndi=1 ∑t∈Ti atexit ≤ re ∀e ∈ E

muq(∑

Ndi=1 ∑t∈Ti atexit)+bu

q ≤Cue ∀e ∈ E,∀q ∈ Qu

e

∑Ndi=1 ∑t∈Ti witxit ≤W0(1+β )

si ≥ 0 ∀i = 1, . . . ,Nd

xit = {0,1} ∀i = 1, . . . ,Nd ,∀t ∈ Ti.

The first term in the expression of the objective function is the interconnect power as explained

in Section 5.1.2. It includes activity αi and voltage Vi of net i. The capacitance of a route t of net i is

obtained by adding the unit edge capacitances Cue for all the edges e ∋ t. Here the route t ∈ Ti will

be selected for net i only if xit = 1.

83

The first set of constraints selects at most one route for each net. The slack variable si is equal

to 1 if net i cannot be routed, and the variable is penalized in the objective function by a large

parameter M to maximize the number of routed nets. The term ∑Ndi=1 ∑t∈Ti atexit represents the

edge utilizations ue. The second set of constraints ensures that the edge utilizations are within the

given edge capacities. The third set of constraints determines the per-unit edge capacitance Cue for

each edge e from its utilization, using the discussed piece-wise linear model. The fourth constraint

ensures the new wirelength is within a factor β of the initially-provided wirelength W0. Here wit

denotes the wirelength of route t of net i.

While the constraints of the presented (IP-POW) formulation are all linear, the objective expres-

sion is nonlinear since it includes multiplication of variables xit and Cue . We approximately solve the

formulation using the following two-phase heuristic approach:

1. We minimize the capacitance of all the edges (∑∀e∈E Ce) by rerouting the nets passing through

the congested region and ignore the net activities αi and voltage levels Vi.

2. We minimize an estimate of total power obtained by including αi and Vi for each net while

assuming the capacitance is fixed and obtained from step 1. However, to reduce the introduced

error, we heavily penalize the mismatch between the capacitance obtained from phase 1 and

the actual capacitance found at phase 2 for each edge during the optimization.

In the next two subsections we discuss these two phases in detail. For each phase, we first give

its IP formulation and then discuss the details of the procedure to efficiently solve the formulation

including our method for generation of the power-efficient candidate routes.

84

5.3.3 Phase1: Minimizing Total Capacitance

Using the piecewise linear approximation for the per-unit capacitance Cue given by Equation (5.4),

we may also approximate the total capacitance as

Ce =Cue ×ue ≥ mu

qu2e + ru

que ∀q ∈ Que .

This (convex) nonlinear expression may be re-linearized, resulting in another piecewise linear ex-

pression for the total edge capacitance that may be used in our linear integer program for minimizing

the total capacitance.

Ce ≥ mque + rq ∀q ∈ Qe. (5.5)

1) Formulation

The formulation of phase 1 is given by the following IP:

minx,s,C

∑∀e∈E

Ce +Nd

∑i=1

Msi (POW-P1)

∑t∈Ti xit + si = 1 ∀i = 1, . . . ,Nd

∑Ndi=1 ∑t∈Ti atexit ≤ re ∀e ∈ E

mq(∑Ndi=1 ∑t∈Ti atexit)+bq ≤Ce ∀e ∈ E,∀q ∈ Qe


xit = {0,1} ∀i = 1, . . . ,Nd ,∀t ∈ Ti

si ≥ 0 ∀i = 1, . . . ,Nd .

The objective expression is similar to the formulation (IP-POW) but the first term is replaced by

∑∀e∈E Ce which represents an estimate of the total interconnect capacitance. The third set of con-

straints is also updated; the variable Ce replaces Cue in the previous formulation, and the coefficients

in the piecewise linear model are updated by using Equation 5.5.

85

2) A Price-and-Branch Solution Procedure

We approximately solve the (POW-P1) using the a two-step heuristics. First, a pricing procedure

is used to generate a set of candidate routes for each net that are power-efficient while considering

the wirelength degradation. The pricing step approximates Ti in the formulation to contain a small

set of power-efficient candidate routes, instead of all the potential routes of net i. Second, branch-

and-bound is applied to solve (POW-P1), selecting one route for each net from the set of generated

candidate routes. The standard branch and bound algorithm can be carried out using a commercial

solver. This two-step procedure of generating candidate routes and then running branch and bound

is commonly known as price-and-branch [8], [38]. We apply the same price-and-branch procedure

as demonstrated in Section 3.1 for power improvement. The major technical difference in this

procedure is in the pricing step to find power-efficient candidate routes, which we next discuss in

detail.

3) Overview of Pricing for Route Generation

We solve a linear-programming relaxation of (POW-P1) by replacing the binary requirements on the

variables xit with constraints 0≤ xit ≤ 1∀i,∀t. The linear program is solved by an iterative procedure

known as column-generation [24]. In column generation, we start by replacing Ti (set of all possible

routes of net i) in formulation (POW-P1) by subset Si ⊂Ti, initially containing one candidate route

per net. We then gradually expand Si, adding new routes that may decrease the objective function.

Adding the new candidate routes is via a power-aware pricing condition for each net.

Before explaining the procedure in more detail, we first give the following notations:

1. We refer to the LP relaxation of (POW-P1) in which Ti is replaced by Si and 0≤ xit≤1 by the

”restricted master problem” denoted by (RMLP-P1); the solution of (RMLP-P1) for a given

Si is denoted by (x, s,C);

2. We refer to the dual of the restricted master problem by (D-RMLP-P1). The solution of (D-

RMLP-P1) consists of (λ ≤ M, π ≤ 0, µ ≥ 0, θ ≤ 0), corresponding to the dual variables for

the first, second, and third set of constraints in the relaxed (POW-P1), respectively.

86

ta

tb0.12 0.17 0.06

0.07

0.07

0.85

0.85

0.07 0.04

t'a tb

0.00 0.00 0.00

0.03

0.00

0.07 0.04

0.07

0.10

0.15

0.12

0.11ua ua

va

va

Figure 5.9: Power-aware route generation.

The iterative column generation procedure including the pricing condition is enumerated below:

1. For each net i = {1, . . . ,Nd}, initialize Si with one route. (In this work we start with the

solution of [12]).

2. Solve (RMLP-P1), yielding a primal solution (x, s,C) and dual values (λ , π, µ, θ ) in (D-

RMLP-P1).

3. Generate a new route t∗ for net i = {1, . . . ,Nd}. Using the solution of step 2, evaluate the

pricing condition: If λi > ∑e∈t∗ ∑q∈Qe mqµeq −∑e∋t∗(πe + θ), then Si = Si ∪{t∗}.

4. If an improving route for some net i was found in step 3, return to step 1. Otherwise, stop—the

solution (x, s,C) is an optimal solution to (RMLP-P1).

Step 3 gives the pricing condition in terms of the solution of the dual problem (D-RMLP-P1)

obtained at the current iteration. This step can determine for a given new route t∗, if it should be

added to the set Si to reduce the objective of (RMLP-P1). However, it does not specify how a

new route should be found such that the pricing condition gets satisfied. We discuss a convenient

graph-based procedure to generate new route t∗ which satisfy the pricing condition.

87

3) Route Generation for One Net

To find the improving routes for net i, we associate a weight we for edge e in the Global Routing

grid:

we = maxq∈Qe

(mqµeq)− πe − θ . (5.6)

By the theory of linear programming, for each edge e, at most one dual variable µeq,q ∈ Qe will

be positive in an optimal solution to (D-RMLP-P1). Thus, considering route t∗, we can compute the

pricing condition as λi > ∑∀e∋t∗ we. We take advantage of this interpretation to identify promising

route t∗ which satisfies the pricing condition. Given a route t ∈Si obtained from previous iterations,

we obtain t∗ by rerouting branches of t with the updated edge weights so that the overall weights of

rerouted branches are reduced.

We explain the procedure with the example of Figure 5.9. Considering two nets a and b, suppose

we are initially given the routes ta and tb for these two nets. After step 2 at the first iteration of

column generation, we obtain edge weights which are given in the figure on the left. To obtain a

new route t∗a for net a, we reroute different branches of ta. For each terminal, we identify a branch

as the segment connecting it to the first Steiner point on ta. We then reroute this branch by solving

Dijkstra’s single-source shortest path algorithm [26] on the weighted graph with the weights of the

first iteration, similar to [59], [61]. The route t∗a is shown in the right figure. After adding t∗a to Sa

we proceed to the second iteration and obtain new edge weights which are shown in the right figure.

The discussed pricing procedure is similar to the procedure introduce in chapter 3. However, it

differs in the pricing condition and the way edge weights are set up. For solving (RMLP-P1) and

its dual at each iteration we use the solver CPLEX 12.0. After obtaining the final set Si, again we

use CPLEX 12.0 for the branch and bound step to get the final solution. We further accelerate the

process by applying a simple problem decomposition that we will discuss in Section 5.3.5.

88

5.3.4 Phase2: Considering Activity and Voltage

At phase 2, we approximate the per-unit edge capacitances using the solution from phase 1, and

re-route the nets to minimize an approximation of the total power. Since the utilization (and hence

capacitance) corresponding to the routing solution of phase 2 may be different from phase 1, we

heavily penalize any mismatch in our optimization.

1) Formulation

We compute the following quantities after phase 1:

1. We define a new ”effective” capacity for each edge e as re = ∑Ndi=1 ∑t∈Ti atexit , where xit is the

value of the routing solution from phase 1.

2. We define the new per-unit capacitance as Cue =

Cere

, where Ce is the value of the edge capac-

itance from the solution found in phase 1.

With these definitions, the formulation of phase 2 is the following integer linear program:

minx,s,ε

Nd

∑i=1

∑t∈Ti

αiV 2i (∑

e∋tCu

e )xit +Nd

∑i=1

M1si + ∑∀e∈E

M2εe (POW-P2)

∑t∈Ti xit + si = 1 ∀i = 1, . . . ,Nd

∑Ndi=1 ∑t∈Ti atexit ≤ re + εe ∀e ∈ E


0 ≤ εe ≤ re − re ∀e ∈ E

xit = {0,1} ∀i = 1, . . . ,Nd ,∀t ∈ Ti

si ≥ 0 ∀i = 1, . . . ,Nd .

The first term in the objective expression is summation of an estimate of the power of the nets

where (∑e∋t Cue ) is the fixed approximate per-unit capacitance of edge e which contains route t and

is obtained using the solution of phase 1 as discussed before. The first set of constraints ensures at

89

0.08

0.12

0.16

15 20 25 30

Edg

e C

apac

tianc

e(F

f)

Edge Utilization (ue)

26 0.123eC =�

26eu =

0.123 0.01 eε+ ×

2eM

Figure 5.10: Penalizing the edge capacitance if the rerouting of a net causes a larger edge utilizationcompared to phase 1.

most one route is selected per net, otherwise a heavy penalty of M1 is associated if si = 0, and this is

reflected in the second term of the objective function. The second set of constraints enforces the new

utilization of each edge to be re+εe, where εe is a new variable which is heavily penalized by a large

factor M2 in the objective function if εe = 0. In other words, we highly penalize if the rerouting of a

net causes a larger edge utilization compared to phase 1. This in effect forces the routing process to

keep the mismatch in the edge utilizations as small as possible which translates in the capacitance

(which is function of utilization) to remain close to phase 1. We also enforce εe + re ≤ re to ensure

the edge utilization is not beyond its actual capacity re in the fourth set of constraints. Finally, the

third set of constraints ensures the increase in wirelength is bounded by factor β .

Note that in the objection expression of this formulation, a large constant parameter M2 can be

chosen to penalize the over-utilization of edge resources. To obtain more accurate estimation of

the edge capacitances, alternatively, we can utilize a liner function with respect to the utilization

for each edge to penalize the over-usage, as in our implementation. Figure 5.10 shows an example.

The edge utilization ue for this particular edge after phase1 is 26 units, and the corresponding edge

capacitance C26e is 0.123 according to the look-up table. The liner function 0.123+0.01×εe is then

calculated to estimate the edge capacitance when over-utilizing the edge resources. In this case M2e

is chosen to be 0.01 which is the slope of this function.

90

Fixed terminalVL

VH

Figure 5.11: Decomposition into smaller-sized subproblems similar to GRIP and PGRIP.

2) Solving using Price-and-Branch

The solution procedure is quite similar to the one explained in the previous Section 5.3.3 for phase 1.

Here, we just note the differences. We denote the restricted master problem by (RMLP-P2) and its

solution by (x, s, ε). The dual of the restricted master is denoted by (D-RMLP-P2) and its solution

is (λ , π, θ ), corresponding to the first, second and third set of inequalities in relaxed (POW-P2),

respectively.

The initial set Si is set to all the candidate routes generated from phase 1. This helps to quickly

generate a high quality solution for phase 2. It also ensures that the solution of phase 1 is included

as a feasible solution in phase 2.

The pricing condition is given by the following inequality λi > αiV 2i (∑e∋t Cu

e )−∑e∈t(πe + θ)

and is used to define the edge weights given by we = αiViCue − πe − θ , ∀e ∈ E.

5.3.5 Decomposition

To further accelerate solving of our two-phase formulation, we adopt the following problem de-

composition similar to GRIP and PGRIP. We divide each voltage island into a set of rectangular

subregions by recursive bipartition of the island while balancing the total number of nets which fall

inside each subregion. For a given subregion, in order to decide which nets fall in it, we use the

91

initial wirelength-optimized solution of [12]. We stop when the number of decomposed nets at each

subregion is at most 3000 which we empirically determined for our experimented benchmarks from

the ISPD’08 suite. Figure 5.11 shows an example. Note that in PGRIP, each subregion can have up

to 4000 nets. This is because the number of constraints in the IP formulation (ILP-PGR) of PGRIP

are less than the formulations (POW-P1) and (POW-P2), and therefore the IP solver is capable to

handle more nets in one subproblem.

Next, each subproblem is defined as one rectangular subregion with the set of nets assigned to it.

If a net passes from multiple subregions, we force the terminal location on the subproblem boundary

to be fixed from the wirelength-optimized solution. (See the figure). This allows independent solv-

ing of each subproblem without the hassle of later connecting the segments of a route in adjacent

subproblems. The subproblems are then solved in parallel without any synchronization.

Even though in our decomposition each subproblem in effect is assigned a low or high voltage

level, it is possible that the nets assigned to it have different supply levels. For example a high

voltage net may just pass from a subregion in a low voltage island. Or a net with level converter

(which will have portions of high and low voltage levels after net decomposition) may fall in a high

voltage island.

Overall this decomposition is extended from PGRIP, but we make use of our initially-provided

Global Routing solution for more effective decomposition to determine the fixed terminal locations

on the boundaries for independent and parallel processing of the subproblems.


5.4.1 Benchmark Instances

In order to test our solution procedure and determine whether or not significant power savings were

possible without increasing wirelength, we modified known benchmarks to include multi-supply

voltages. Modifying the benchmarks required us to generate timing data, power data, and place

level converters. We implemented the procedure of [69] to generate voltage islands for two voltage

levels of VL = 0.9V and VH = 1.1V . The procedure required a sequential netlist with gate-level delay

and power models.

92

Timing Modeling:

We assumed the locations of the sequential elements in the ISPD’08 benchmarks [4] using the

following procedure. First, we obtained a Directed Acyclic Graph (DAG) representation of the

benchmarks from the variation provided by the corresponding ISPD’06 placement benchmarks [2].

Using the placement benchmarks, we obtained a DAG by starting from the designated Primary

Inputs and traversing in forward direction until reaching the Primary Outputs. We also assumed the

nets with more than 50 terminals to be clock trees to identify sequential elements.

We then assumed the delay of each cell (or node in the DAG) is proportional to its size (for unit

load) where the unit delay was assumed to be of the inverter of the 45nm library [6] used in this

work. We considered loading in our cell delay modeling to be proportional to the cell size which

was also given in the placement benchmarks.

Power Modeling:

We randomly and uniformly generated the activity factors of each net to be between 0.1 and 0.9.

The 45nm library used in this work contained information about the total capacitance (area, fringe,

coupling) for each of the 8 metal layers. We used the method described in Section 5.3 to extract

piece-wise linear model for Ce and Cue for each of the 8 metal layers. For each metal layer, we

considered the minimum wire size given in the library. To map edge utilization to spacing, we

assumed the length of each edge of the Global Routing grid to be 2µ; for a given utilization we

assumed the maximum spacing between the routes mapped to the same Global Routing edge.

5.4.2 Level Converter Placement

In our first experiment, we report the result from our level converter placement algorithm for the

nets that contained a level converter (had a source terminal in VL island with fanout terminals in VH

islands). We consider the following case in our experiment: We routed all the nets using the initial

wirelength-optimized solution of NTHU-Route2.0 [12]. We solve our formulation (IP-LC) to obtain

the level converter locations subject to the area density constraints. We consider the obtained results

as the base case for power comparison in our second experiment.

Recall the placement of level converters can impact the power of each route by decomposing

it into multiple segments where each segment has a high or low supply level. Using Equation 5.1,

93

Table 5.1: Results of the level converter placement for the ISPD’08 benchmarks.

Bench #Net #NetLC #LC Power WCPU(min)

adaptec1 177K 9K 20K 432242 5

adaptec2 208K 8K 17K 336881 7

adaptec3 368K 17K 43K 1056778 8

adaptec4 401K 16K 36K 751120 13

adaptec5 548K 32K 85K 1199591 11

newblue1 271K 75K 16K 318922 10

newblue2 374K 22K 47K 453234 17

newblue4 531K 38K 79K 927712 9

newblue5 892K 26K 84K 1469859 14

newblue6 835K 31K 91K 1367000 17

newblue7 1647K 28K 72K 2201835 21

biglue1 197K 9K 26K 619321 6

biglue2 429K 15K 43K 560723 13

biglue3 666K 23K 60K 814957 12

biglue4 1134K 17K 51K 1254323 15

we compute the total power of the nets which need level conversion. This includes the power of

level converters and the different routes segments of the decomposed nets after inserting the level

converters.

Table 5.1 reports our power comparison results. We report the total number of nets and the

number of nets which require level conversion in columns 2 and 3, respectively for each benchmark.

The total number of level converters in our case is given in column 4. The number of level converters

are larger than column 3, indicating that for some nets it may be better to add extra LCs but place

them closer to the sink terminals to reduce the route portion that is driven by high voltage and save

power. In column 5, we report the power of ([12]+LC) for the nets including the ones with level

conversion. We use these power numbers as the base case for our next experiment. Finally, the wall

clock time of the level converter placement (indicated by WCPU) is given in column 6. As can be

seen this step is done very quickly.

94

Table 5.2: Results of Power-GRIP for the ISPD’08 benchmarks. The wirelength is scaled to 105.Power and capacitance are scaled to 103.

Bench #Net # Netd # SP initial solution ([12]+LC) phase 1 phase1+phase2

W0 C P -WL(%) -C(%) -P(%) -WL(%) -C(%) -P(%)

adaptec1 177K 197K 130 54.2 953.3 432.2 0.05 11.70 8.57 0.07 15.48 16.17

adaptec2 208K 224K 195 53.0 750.0 336.9 0.12 10.34 6.93 0.14 14.57 15.13

adaptec3 368K 411K 359 132.7 2187.0 1056.8 0.01 11.51 8.67 0.34 13.55 13.94

adaptec4 401K 437K 296 123.0 1613.8 751.1 0.02 12.16 8.46 0.04 16.92 17.20

adaptec5 548K 632K 454 158.7 2543.0 1199.6 0.38 8.60 6.08 0.43 10.23 10.88

newblue1 271K 287K 195 47.0 612.2 318.9 0.11 13.39 9.87 0.22 17.45 18.40

newblue2 374K 421K 312 77.6 894.9 453.2 0.04 14.19 7.87 0.09 19.20 19.34

newblue4 531K 610K 462 133.7 1955.4 927.7 0.02 13.39 9.61 0.54 17.45 17.61

newblue5 892K 975K 658 234.7 3405.3 1469.9 0.89 11.55 6.75 0.86 14.00 13.47

newblue6 835K 926K 532 180.2 2834.9 1367.0 0.62 12.56 9.35 0.57 16.35 17.80

newblue7 1647K 1719K 670 360.2 5004.4 2201.8 0.01 15.12 11.20 0.17 19.63 20.93

bigblue1 197K 222K 152 57.0 1110.4 619.3 0.23 10.68 7.20 0.16 12.17 12.56

bigblue2 429K 472K 275 92.4 1283.8 560.7 0.14 11.64 7.86 0.10 14.76 14.33

bigblue3 666K 725K 453 133.0 1664.6 815.0 0.91 15.22 10.99 0.93 20.25 20.31

bigblue4 1134K 1184K 509 233.0 3006.6 1254.3 0.18 16.03 12.12 0.28 22.31 22.46

Avg. 0.25 12.54 8.77 0.34 16.29 16.70

5.4.3 Power Saving during Global Routing

Using the initial WL-optimized solution of [12], and after fixing the locations of level converters,

we applied net decomposition (as described in Section 5.1.1). Table 5.2 reports the number of nets

and decomposed nets in columns 2 and 3 respectively. We then applied our power-driven Global

Routing procedure using a wirelength degradation factor of β = 0, so no wirelength degradation was

allowed. We used CPLEX 12.0 [37] to solve our two-phase formulation, and parallel-processed the

subproblems by submitting the jobs to a grid of CPUs of 2GB memory. The number of subregions

(same as number of processors) is given in column 4 (#SP) in Table 5.2.

We then compared three routing solutions.

• The initial WL-optimized solution of [12];

• The solution after applying phase 1, obtained by solving the formulation (POW-P1);

• The solution by further applying phase 2, obtained by solving (POW-P1) followed by (POW-

P2).

95

For each case, we report the wirelength (WL), the total capacitance (C) (∑Ndi=1Croute

i , where Croutei

is defined in (5.2)), given in units f F , and the Global Routing power metric P from (5.1), excluding

the constant portions of the expression.

The results are reported in Table 5.2 in columns 5 to 13. For the initial solution, we report

the wirelength (W0) of the NTHU-R2.0 routes that have been augmented with the extra via-only

segment(s) to connect the level converters to the original routes. (As a result, there is slight increase

in wirelength compared to the numbers reported in the work [12]). For the solutions of phase 1 and

phase 2, we report only the percentage improvement in total wirelength, C, and P, all with respect to

the initial solution.

As can be seen, applying phase 1 of the power-reduction heuristic results in significant saving

of 8.77% in P. Recall, the savings are solely due to capacitance reduction (as can be seen from

the higher improvement rate in C compared to P). By further applying phase 2, we see additional

improvement in P (on average 16.70%). The improvement in C is slightly larger than phase 1,

even though phase 1 solely focuses on optimizing C. This is because we start phase 2 by including

all the candidate routes generated from phase 1. Notice that in both phase 1 and phase 2 there is

improvement (reduction) in wirelength compared to W0. It is important to note that no extra overflow

was introduced in the power-optimized solutions.

In our simulations, we explicitly bounded the runtime for phase 1 and phase 2. The wall clock

runtime of all benchmarks for phase 1 and phase 2 were set to 30min and 40min, respectively. The

number of processors (same as subregions) is given in column 5.

Another feature of our approach is that we can explore the tradeoff between wirelength and

power by controlling the wirelength degradation factor β . Take adaptec1 for instance. Allowing 2%

degradation in wirelength results in extra 4.5% power saving.

96

Chapter 6

CONCLUSIONS AND FUTURE WORKS

6.1 Conclusions

In this dissertation, we presented three related topics on parallelizing Global Routing via integer

programming. In Chapter 3, we presented GRIP which is a procedure for global routing via integer

programming. GRIP is based on solving an IP formulation by column generation and branch and

bound to select candidate routes for each net. The method uses dual information to create a dynamic

congestion metric and directly solves the 3-D model of the routing problem. GRIP uses techniques

to decompose the large-scale problem instances into subproblems of manageable size and recon-

nects the subproblem solutions again using IP. In the case of overflow, GRIP can apply an overflow

reduction procedure again using integer programming.

To further improve the runtime, in Chapter 4, we proposed PGRIP, a parallel global routing pro-

cedure that could independently process subproblems with minimal synchronization among them.

The parallel implementation highly relied on the IP formulation and the procedure to solve it. Our

goal was to show that integer programming (which was considered too computational-intensive for

global routing in the industrial-sized designs) can be used with the aid of massive parallelism to

significantly improve the solution quality while meeting the practical runtime requirements.

In Chapter 5, we proposed Power-GRIP which minimized an interconnect power metric for

designs with multi-supply voltage in the global routing stage. We presented an IP formulation

which considered power saving opportunities by reducing the area, fringe and congestion-dependent

coupling capacitances at each metal layer, while accounting for the activity and supply voltage of

each route segment. We showed significant savings in the power metric for global routing without

any degradation in wirelength or overflow. We also demonstrated that a similar parallel procedure

of PGRIP is applicable to solve this extended IP formulation.

In a broader sense, this work aims to show that parallelism can be used to invest in alternative

computational techniques which may have been considered as too time-consuming in the past, in

order to improve the solution quality.

97

6.2 Future Work 1: Layer-Directive Based Global Routing

As the device performance continues to scale following Moore’s law, the scaled interconnect per-

formance has remained essentially constant. In the past, the wire resistance was so low that the

interconnect delay was negligible, and the circuit delay was dominated by the device delay. In to-

day’s deep sub-micron process technology, however, interconnect delay has dominated the circuit

delay due to the increasing of wire resistance and side-wall capacitance [51].

In 2007 and 2008, the release of large-sized Global Routing benchmarks [3], [4] resulted in

monumental progress of academic global routers. The evaluation metrics suggested by these bench-

marks, namely the total wirelength, via count, and overflow, were improved remarkably. These

metrics, however, are no longer sufficient to catch up with the demands imposed by modern process

technology. Specifically, the main concern of these metrics is that they fail to capture the intercon-

nect delay in Global Routing.

In modern VLSI designs, different metal layers have different wire width. As shown in Fig-

ure 6.1, the wires in the lower metal layers have smaller wire widths so that the routing resources

can be increased. The higher metal layers, on the other hand, tend to have wider wire widths due to

the manufacturability and reliability. When using the wirelength as the main objective in the Global

Routing problem, nets are pushing down to the lower metal layers as much as possible so that the

wirelength can be minimized. In the past, pushing the nets to the lower metal layers didn’t hurt the

interconnect delay since the wire resistance was so small that the per-unit wire RC delay over all

metal layers were almost identical. In today’s deep sub-micron process technology, nevertheless,

pushing down the nets increases the interconnect delay significantly because of the high RC in the

lower metal layers.

One solution to this problem is to identify a set of timing critical nets, and prompt these nets to

the upper metal layers to decrease their net delay. To achieve this, it is necessary to incorporate “per

net” based layer information in Global Routing. Also the timing critical nets should be correctly

identified and assigned to proper layers. Recently, a new set of global routing benchmarks [53] has

been released which contains layer information for each net. For most existing global routers, it is

not difficult to incorporate layer information. But identifying and assigning critical nets to proper

layers are the main challenges.

98

M1

M2

M3

M4

M5

M6

Figure 6.1: Different metal layers have different wire widths. The even numbered metal layers runhorizontally across the picture, while the odd numbered layers run perpendicular to the picture.

6.2.1 Our Proposed Method

Our objective is to prompt as many timing critical nets to the upper layers as possible so that the

interconnect delay can be minimized. Ideally, we can identify all the nets with negative timing

slacks and prompt these nets to the upper metal layers. However, there are several problems for this

simple strategy. First of all, the routing resources of the upper metal layers are limited due to the

wider wire widths. For timing critical designs, it is difficult to route all the nets with negative slacks

using upper metal layers. Furthermore, the timing information is estimated before Global Routing

since there is no physical routing path. The nets with positive slacks before Global Routing may

become timing critical nets afterwards, and vice versa.

To utilize the limited upper layer routing resources effectively, we propose an iterative layer

directive based global routing strategy which is built on top of the commercial router Zroute [64].

As shown in Figure 6.2, we start from using a set of tight selection criteria to identify the most

critical nets, and prompt these nets to the upper metal layers. After one iteration, the selection

criteria are relaxed to identify more critical nets, and the design is then rerouted again. Note that

after one iteration, we recalculate the net delay of critical nets and remove those nets that have short

wirelength and large positive slacks. Nevertheless, we keep the short nets with slightly positive

slacks in the critical net set to prevent the bouncing effect. Also, we never remove the long nets

from the critical net set since pushing down the long nets may significantly increase the net delay.

Following we explain the two main portions of this strategy in detail.

99

Identify Critical Nets

Wirelength Slack Slew Bottleneck

Divide Critical Nets into M Bins

Route Bin i

Ripped up Scenic Routes &Push Down for Rerouting

Finish Layer DirectiveGlobal Routing

Set i = M

i > 0, i--

Route Remaining NetsIf the last critical net bin does not have scenic route

i == 0

Ripped All Nets

Figure 6.2: Overview of layer directive Global Routing.

6.2.2 Identification of Critical Nets

We select the critical nets based on four criteria including wirelength, slack, input slew, and bot-

tleneck. It is obvious that the nets with long wirelength and/or large negative slacks should be

considered as critical nets. Although input slew does not directly affect net delay, it is still worth-

while to consider nets which have a high input slew from their driver cells as critical nets. This is

because improving the delay of these nets can help to reduce the “total” path delay. The bottleneck

nets are those nets having large fan-in or fan-out cones. Improving the delay of this type of nets can

significantly improve the entire design performance since they are shared by many paths. In this

case, the bottleneck nets, even if they only have a slight degree of negative slacks, are worthwhile

to prompt to the upper metal layers.

100

6.2.3 Bin Routing of Critical Nets

After identifying the critical nets, we then divide them into several bins based on their wirelength.

If a net has longer wirelength, then it should be prompted to the higher metal layers. The number of

bins is based on the available routing layers for critical nets. For example, if we use metal4 to metal6

for routing critical nets, then the critical nets are divided into two bins. The first bin uses metal6 and

metal5 for routing, and the second bin uses metal6 to metal4 respectively. Our bin routing procedure

starts form the bin using the highest metal layers. After routing this bin, we detect the scenic routes,

which have high ratio between the global routing wirelength and the initial estimated wirelength, of

this bin. These detected scenic routes are ripped up and pushed down to the next bin for rerouting.

In particular, the number of scenic routes is a good indication of congestion. After routing all the

critical net bins, all the other nets are then routed with no layer constraints.

6.2.4 Discussion

Our proposed layer directive global routing approach is robust in the sense that it can handle de-

signs with different geometry aspect ratios and routing blockages. One may argue that the iterative

approach is too time-consuming. From our initial experiment, however, we show that this proposed

approach can achieve significant circuit timing improvement, and therefore the post routing opti-

mization efforts can be reduced significantly. This is an ongoing project, and more features are still

under development.

6.3 Future Work 2: Enhancing the Correlation Between the Placement and Routing Stages

As we discussed in the first chapter, the physical design is decomposed into several stages, and then

an iterative approach is utilized to simultaneously consider the design objectives against a list of

design constraints. The decomposition helps to simplify the design flow so that each stage can focus

on solving a particular optimization problem. However, it also creates the correlation issue among

the stages, especially for the placement and routing stages. For example during the placement stage,

the wirelength of a net is estimated since there is no physical routing information. When utilizing

the total wirelength as the main objective in the placement stage, cells are usually squeezed together

so that the “estimated” wirelength can be minimized. The squeezed placement, however, may lead

101

to congested regions and cause routability issue during the routing stage.

One solution to enhance the correction between the placement and routing stages is to incorpo-

rate Global Routing in the placement stage to guide the placement engine. As shown in Figure 1.1,

Global Routing can be incorporated in both the placement and post-placement optimization stages

to improve the routability and all the other main objectives such as timing, power, and cell density.

Almost all the existing commercial EDA tools have the capability to incorporate Global Routing in

the placement stage to enhance the design routability. Recently, the researchers in the academic area

start to center on this issue because of the ISPD’11 placement contest [5]. The correlation between

the post-placement optimization and routing stages, nevertheless, is still an open research area, and

we focus on this particular issue.

6.3.1 Incorporation of Global Routing in the Post-Placement Optimization Stage

After placing the cells in a design, several techniques such as cell sizing and buffer insertion are

utilized in the post-placement optimization stage to improve the design performance. For example,

if a net has long wirelength and negative timing slacks, then buffers can be added to this net to

improve its timing. However, if there is no physical routing information at this stage, the wirelength

of a net is then estimated, and it may lead to over optimization or under optimization of a design.

When a design is over optimized by inserting too many buffers or sizing up cells too much, on one

hand, the power consumption can increase significantly. Moreover, inserting too many buffers may

increase the cell density and result in the routability issue. For example, as shown in Figure 6.3, there

is a 8-bit bus with long wirelength located at the high metal layer. Buffers are inserted in the middle

of the bus to improve the timing of this bus, and these buffers could create a routing congested

region. This is because many routing resources are needed by connecting the high metal layer bus

to the inserted buffers (located at metal1). If the design is under optimized, on the other hand,

it puts more pressure to the routing stage (strict timing requirement for critical nets) . Therefore,

incorporating Global Routing to obtain more accurate routing information is the key step to enhance

the solution quality for the post-placement optimization stage.

Our proposed Global Routing algorithm for the post-placement optimization stage is similar to

what we discussed in the previous section with several enhancements. First of all, this global router

102

M1

M2

M3

M4

M5

M6

Figure 6.3: Inserting buffers in the post-placement stage may create congested regions and causeroutability issue.

needs to support incremental routing. For example, if a buffer is added to a long critical net which

was assigned to the upper metal layers, then this net should be pushed down to release the upper

layer resources. Also, other critical nets can be prompted up to decrease their net delay. This step

must be done incrementally to reduce the runtime. Second, the router must have the capability to

guide the optimizer. For example, if prompting a long critical net to the upper metal layers can

satisfy the timing requirement, then this net should not be touched by the optimizer, and vice versa.

This requirement can be done by assigning different RC scalings to different nets. To conclude, we

hope that by incorporating Global Routing in the post-placement optimization stage, we can bridge

the gap between these two stages and accelerate the path to design closure.

103

BIBLIOGRAPHY

[1] ISPD 1998 global routing benchmark suite,[online] http://www.ece.ucsb.edu/ kastner/labyrinth.

[2] ISPD 2006 placement contest and benchmark suite,[online] http://archive.sigda.org/ispd2006/contest.html.

[3] ISPD 2007 global routing contest and benchmark suite,[online] http://www.sigda.org/ispd2007/rcontest/.

[4] ISPD 2008 global routing contest and benchmark suite,[online] http://www.sigda.org/ispd2008/contests/ispd08rc.html.

[5] ISPD 2011 routability-driven placement contest and benchmark suite,[online] http://www.ispd.cc/contests/11/ispd2011 contest.html.

[6] Nangate 45 nm open cell library, [online] http://www.nangate.com. 2008.

[7] Christoph Albrecht. Global routing by new approximation algorithms for multicommodi-tyflow. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,20:622–632, 2001.

[8] Cynthia Barnhart, Ellis L. Johnson, George L. Nemhauser, Martin W. P. Savelsbergh, andPamela H. Vance. Branch-and-price: Column generation for solving huge integer programs.Operations Research, 46:316–329, 1996.

[9] Laleh Behjat and Andy Chiang. Fast integer linear programming based models for VLSI globalrouting. In IEEE/ACM International Symposium on Circuits and Systems, pages 6238–6243,2005.

[10] Michael Burstein and Richard N. Pelavin. Hierarchical wire routing. IEEE Transactions onComputer-Aided Design of Integrated Circuits and Systems, 2:223–234, 1983.

[11] Zhen Cao, Tong Jing, Jinjun Xiong, Yu Hu, Lei He, and Xianlong Hong. Dprouter: A fastand accurate dynamic-pattern-based global routing algorithm. In IEEE/ACM Asia and SouthPacific Design Automation Conference, pages 256–261, 2007.

[12] Yen-Jung Chang, Yu-Ting Lee, and Ting-Chi Wang. NTHU-Route 2.0: a fast and stable globalrouter. In IEEE/ACM International Conference on Computer Aided Design, pages 338–343,2008.

104

[13] Huang-Yu Chen, Chin-Hsiung Hsu, and Yao-Wen Chang. High-performance global routingwith fast overflow reduction. In IEEE/ACM Asia and South Pacific Design Automation Con-ference, pages 582–587, 2009.

[14] Tai-Chen Chen and Yao-Wen Chang. Multilevel full-chip gridless routing considering opticalproximity correction. In IEEE/ACM Asia and South Pacific Design Automation Conference,pages 1160–1163, 2005.

[15] Tai-Chen Chen, Yao-Wen Chang, and Shyh-Chang Lin. A novel framework for multilevel full-chip gridless routing. In IEEE/ACM Asia and South Pacific Design Automation Conference,pages 636–641, 2006.

[16] Minsik Cho, Katrina Lu, Kun Yuan, and David Z. Pan. BoxRouter 2.0: A hybrid and robustglobal router with layer assignment for routability. ACM Transactions on Design Automationof Electronic Systems, 14:1–21, 2009.

[17] Minsik Cho and David Z. Pan. BoxRouter: A new global router based on box expansion andprogressive ILP. IEEE Transactions on Computer-Aided Design of Integrated Circuits andSystems, 26:2130–2143, 2007.

[18] Chris C. N. Chu and Yiu-Chung Wong. Flute: Fast lookup table based rectilinear steinerminimal tree algorithm for VLSI design. IEEE Transactions on Computer-Aided Design ofIntegrated Circuits and Systems, 27:70–83, 2008.

[19] Jason Cong, Jie Fang, Min Xie, and Yan Zhang. MARS - a multilevel full-chip gridless routingsystem. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,24:382–394, 2004.

[20] Jason Cong and Patrick H. Madden. Performance driven global routing for standard cell de-sign. In IEEE/ACM International Symposium on Physical Design, pages 73–80, 1997.

[21] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introductionto algorithms, second edition. 2001.

[22] CPLEX Optimization, Inc., Incline Village, NV. Using the CPLEX Callable Library, Version9, 2005.

[23] Ke-Ren Dai, Wen-Hao Liu, and Yih-Lang Li. Efficient simulated evolution based reroutingand congestion-relaxed layer assignment on 3-D global routing. In IEEE/ACM Asia and SouthPacific Design Automation Conference, pages 570–575, 2009.

[24] George B. Dantzig and Philip Wolfe. Decomposition principle for linear programs. OperationsResearch, 8:101–111, 1960.

105

[25] Jacques Desrosiers and Marco E. Lubbecke. A primer in column generation. In G. Desaulniers,J. Desrosiers, and M. M. Solomon, editors, Column Generation, chapter 1. Springer, 2005.

[26] Edsger W. Dijkstra. A note on two problems in connetion with graphs. Numerische Mathe-matik, 1:269–271, 1996.

[27] Jhih-Rong Gao, Pei-Ci Wu, and Ting-Chi Wang. A new global router for modern designs. InIEEE/ACM Asia and South Pacific Design Automation Conference, pages 232–237, 2008.

[28] Michael R. Garey and David S. Johnson. The rectilinear Steiner tree problem is NP-complete.SIAM Journal of Applied Math, 32:826–834, 1977.

[29] Liangpeng Guo, Yici Cai, Qiang Zhou, and Xianlong Hong. Logic and layout aware volt-age island generation for low power design. In IEEE/ACM Asia and South Pacific DesignAutomation Conference, pages 666–671, 2007.

[30] F. O. Hadlock. A shortest path algorithm for grid graphs. Networks, 7:323–334, 1977.

[31] Raia T. Hadsell and Patrick H. Madden. Improved global routing through congestion estima-tion. In IEEE/ACM Design Automation Conference, pages 28–31, 2003.

[32] Peter Hart, Nils Nilsson, and Bertram Raphael. A formal basis for the heuristic determinationof minimum cost paths. IEEE Transactions on Systems Science and Cybernetics, 4:100–107,1968.

[33] Xianlong Hong, Tianxiong Xue, Jin Huang, Chung kuan Cheng, and Ernest S. Kuh. TIGER:an efficient timing-driven global router for gate array and standard cell layout design. IEEETransactions on Computer-aided Design of Integrated Circuits and Systems, 16:1323–1331.

[34] Jin Hu, Jarrod A. Roy, and Igor L. Markov. Sidewinder: a scalable ILP-based router. In ACMInternational Workshop on System-Level Interconnect Prediction, pages 73–80, 2008.

[35] Jin Hu, Jarrod A. Roy, and Igor L. Markov. Completing high-quality global routes. InIEEE/ACM International Symposium on Physical Design, pages 35–41, 2010.

[36] T. C. Hu and Man-Tak Shing. A decomposition algorithm for circuit routing. MathematicalProgramming Essays in Honor of George B. Dantzig Part I,, 24:87–103, 1985.

[37] International Business Machines Corp., Armonk, NY. Using the CPLEX Callable Library,Version 12, 2009.

[38] David Grove Jogensen and Morten Meyling. A branch-and-price algorithm for switch-boxrouting. Networks, 40:13–26, 2002.

106

[39] Donald B. Johnson. Efficient algorithms for shortest paths in sparse networks. Journal of TheACM, 24:1–13, 1977.

[40] Andrew B. Kahng and Gabriel Robins. On optimal interconnections for VLSI. Kluwer Aca-demic Publishers, Boston, MA, 1995.

[41] Andrew B. Kahng and Alexander Z. Zelikovsky. Highly scalable algorithms for rectilinear andoctilinear steiner trees. In IEEE/ACM Asia and South Pacific Design Automation Conference,pages 827–833, 2003.

[42] Ryan Kastner, Elaheh Bozorgzadeh, and Majid Sarrafzadeh. Pattern routing: Use and theoryfor increasing predictability and avoiding coupling. IEEE Transactions on Computer-AidedDesign of Integrated Circuits and Systems, 21(7):777–790, 2002.

[43] Chin Yang Lee. An algorithm for path connections and its applications. IEEE Transactions onElectronic Computers, 10:346–365, 1961.

[44] Kai-Win Lee and Carl Sechen. A global router for sea-of-gates circuits. In IEEE/ACM Euro-pean Design and Test Conference, pages 242–247, 1991.

[45] Jeffrey T. Linderoth. Topics in Parallel Integer Optimization. PhD thesis, Georgia Institute ofTechnology, 1998.

[46] Michael J. Litzkow, Miron Livny, and Matt W. Mutka. Condor - A hunter of idle workstations.In International Conference on Distributed Computing Systems, pages 104–111, 1988.

[47] Wen-Hao Liu, Wei-Chun Kao, Yih-Lang Li, and Kai-Yuan Chao. Multi-threaded collision-aware global routing with bounded-length maze routing. In IEEE/ACM Design AutomationConference, pages 200–205, 2010.

[48] Nir Magen, Avinoam Kolodny, Uri Weiser, and Nachum Shamir. Interconnect-power dissipa-tion in a microprocessor. In System-Level Interconnect Prediction, pages 7–13, 2004.

[49] Malgorzata Marek-sadowska. Route planner for custom chip design. In IEEE/ACM Interna-tional Conference on Computer Aided Design, pages 246–249, 1986.

[50] Larry Mcmurchie and Carl Ebeling. PathFinder: a negotiation-based performance-drivenrouter for FPGAs. In Symposium on Field Programmable Gate Arrays, pages 111–117, 1995.

[51] Joe W. McPherson. Reliability challenges for 45nm and beyond. In IEEE/ACM Design Au-tomation Conference, pages 176–181, 2006.

[52] Michael D. Moffitt. MaizeRouter: Engineering an effective global router. IEEE Transactionson Computer-Aided Design of Integrated Circuits and Systems, 27:2017–2026, 2008.

107

[53] Michael D. Moffitt. Global routing revisited. In IEEE/ACM International Conference onComputer Aided Design, pages 805–808, 2009.

[54] Edward F. Moore. Shortest path through a maze. In Annals of Computation Laboratory, pages285–292, 1959.

[55] MOSEK ApS, Copenhagen, Denmark. The MOSEK C API manual, Version 5.0, 2008.

[56] Dirk Muller. Optimizing yield in global routing. In IEEE/ACM International Conference onComputer Aided Design, pages 480–486, 2006.

[57] Muhammet Mustafa Ozdal and Martin D. F. Wong. Archer: a history-driven global routingalgorithm. In IEEE/ACM International Conference on Computer Aided Design, pages 488–495, 2007.

[58] Min Pan and Chris C. N. Chu. Fastroute: a step to integrate global routing into placement. InIEEE/ACM International Conference on Computer Aided Design, pages 464–471, 2006.

[59] Min Pan and Chris C. N. Chu. Fastroute 2.0: A high-quality and efficient global router. InIEEE/ACM Asia and South Pacific Design Automation Conference, pages 250–255, 2007.

[60] Min Pan and Chris C. N. Chu. IPR: An integrated placement and routing algorithm. InIEEE/ACM Design Automation Conference, pages 59–62, 2007.

[61] Jarrod A. Roy and Igor L. Markov. High-performance routing at the nanometer scale. IEEETransactions on Computer-Aided Design of Integrated Circuits and Systems, 27:1066–1077,2008.

[62] Rupesh S. Shelar and Marek Patyra. Impact of local interconnects on timing and power in ahigh performance microprocessor. In IEEE/ACM International Symposium on Physical De-sign, pages 145–152, 2010.

[63] Hamid Shojaei, Tai-Hsuan Wu, Azadeh Davoodi, and Twan Basten. A pareto-algebraic frame-work for signal power optimization in global routing. In IEEE/ACM International Symposiumon Low Power Electronics and Design, pages 407–412, 2010.

[64] Synopsys, Inc., Mountain View, CA. IC Compiler User Guide: Zroute, 2010.

[65] Tamas Terlaky, Anthony Vannelli, and Hu Zhang. On routing in VLSI design and communi-cation networks. Discrete Applied Mathematics, 156(11):2178–2194, 2008.

[66] Richard W. Thaik, Ngee Lek, and Sung-Mo Kang. A new global router using zero-one integerlinear programming techniques for sea-of-gates and custom logic arrays. IEEE Transactionson Computer-Aided Design of Integrated Circuits and Systems, 11:1479–1494, 1992.

108

[67] Benjamin S. Ting and Bou Nin Tien. Routing techniques for gate array. IEEE Transactions onComputer-Aided Design of Integrated Circuits and Systems, 2:301–312, 1983.

[68] Di Wu, Jiang Hu, and Rabi Mahapatra. Coupling aware timing optimization and antennaavoidance in layer assignment. In IEEE/ACM International Symposium on Physical Design,pages 20–27, 2005.

[69] Huaizhi Wu, I-Min Liu, Martin D. F. Wong, and Yusu Wang. Post-placement voltage islandgeneration under performance requirement. In IEEE/ACM International Conference on Com-puter Aided Design, pages 309–316, 2005.

[70] Tai-Hsuan Wu, Azadeh Davoodi, and Jeffrey T. Linderoth. GRIP: scalable 3D global routingusing integer programming. In IEEE/ACM Design Automation Conference, pages 320–325,2009.

[71] Tai-Hsuan Wu, Azadeh Davoodi, and Jeffrey T. Linderoth. A parallel integer programmingapproach to global routing. In IEEE/ACM Design Automation Conference, pages 194–199,2010.

[72] Tai-Hsuan Wu, Azadeh Davoodi, and Jeffrey T. Linderoth. GRIP: Global routing via inte-ger programming. IEEE Transactions on Computer-Aided Design of Integrated Circuits andSystems, 30(1):72–84, 2011.

[73] Tai-Hsuan Wu, Azadeh Davoodi, and Jeffrey T. Linderoth. Power-driven global routing formulti-supply voltage domains. In IEEE/ACM Design, Automation and Test in Europe, pages443–448, 2011.

[74] Jinjun Xiong and Lei He. Full-chip multilevel routing for power and signal integrity. Integra-tion, 40(3):226–234, 2007.

[75] Yong Xu, Ted K. Ralphs, Laszlo Ladanyi, and Matthew J. Saltzman. Computational experiencewith a software framework for parallel integer programming. Informs Journal on Computing,21:383–397, 2009.

[76] Yue Xu, Yanheng Zhang, and Chric Chu. Fastroute 4.0: global router with efficient via min-imization. In IEEE/ACM Asia and South Pacific Design Automation Conference, pages 576–581, 2009.

[77] Zhen Yang, Anthony Vannelli, and Shawki Areibi. An ILP based hierarchical global routingapproach for VLSI ASIC design. Optimization Letters, 1:281–297, 2007.

[78] Ahmed Youssef, Zhen Yang, Mohab Anis, Shawki Areibi, Anthony Vannelli, and MohamedElmasry. A power-efficient multipin ILP-based routing technique. IEEE Transactions onCircuits and Systems I-regular Papers, 57:225–235, 2010.

109

[79] Hai Zhou and Martin D. F. Wong. Global routing with crosstalk constraints. IEEE Transactionson Computer-aided Design of Integrated Circuits and Systems, 18:1683–1688, 1999.

a parallel integer programming approach to global …adavoodi/papers/dr-wu-dissertatio… · a...

Documents