(semi)deﬁnitely the future!apdio.pt/documents/10180/12431/slides_curso_sdo.pdf ·...

(Semi)Definitely the Future!Course on Semidefinite Optimization

Miguel F. AnjosMathématiques et génie industriel

Coimbra, 9 May 2011

Miguel F. Anjos (Polytechnique Montréal) Course on SDP Coimbra, 9 May 2011 1 / 205

Agradecimentos


“Read all about it! Read all about it!”

Handbook on Semidefinite, Cone and Polynomial Optimization:Theory, Algorithms, Software and Applications edited by M.F.Anjos and J.B. Lasserre, to be published by Springer in 2011.


Outline for today

1 Motivation

2 A Leisurely Introduction to SDP

3 The Maximum-Cut (Max-Cut) Problem

4 Single-Row Layout

5 Interior-Point Methods

6 Warmstarting Interior-Point Methods

7 The New Frontier: Polynomial Optimization

8 Time Permitting: Distance Geometry

9 Conclusion


Motivation


Cone Optimization

Cone optimization refers to the problem of optimizing a linear functionover the intersection of an affine space and a pointed closed convexcone.


Cone Optimization

The simplest example of cone optimization is the familiar linearprogramming (LP) problem:

max cT x min bT y

s.t. aTi x = bi , i = 1, . . . ,m s.t. z =

m∑i=1

yiai − c

x ≥ 0 z ≥ 0

LP consists of optimizing a linear function subject to linear constraintsand a cone constraint, which in the case of LP is the non-negativeorthant:

{x ∈ <n : xi ≥ 0 for all i}.


Closed Convex Cones

The non-negative orthant

K = {x ∈ <n : xi ≥ 0 for all i}

is a pointed closed convex cone since:

K ∩−K = {0} (pointed/proper)it contains its boundary (closed)if x , y ∈ K, then αx + (1− α)y ∈ K for all α ∈ (0,1) (convex)if x ∈ K, then αx ∈ K for all α ≥ 0 (cone)


The Positive Semidefinite Cone

The set of symmetric positive semidefinite (psd) matrices

P := {X ∈ Sn : λi(X) ≥ 0 for all i}

is also a pointed closed convex cone.

The interior of P is the set of positive definite (pd) matrices:

{X ∈ Sn : λi(X) > 0 for all i}

and the boundary of P are the singular psd matrices.


(Positive) Semidefinite Programming

This gives us a second example of a cone optimization problem,namely semidefinite optimization, or semidefinite programming (SDP):

sup C • X inf bT y

s.t. Ai • X = bi , i = 1, . . . ,m s.t. Z =m∑

i=1yiAi − C

X � 0 Z � 0

where A • B = trace AB =∑

i,j AijBij .


Why Closed Convex Cones?

Convexity is of paramount importance in optimization - much more sothan linearity!

[...] the great watershed in optimization isn’t between linearityand nonlinearity, but convexity and nonconvexity.

- T.R. Rockafellar, SIAM Review (1993)

Convex optimization problems have many advantageous properties,including:

a powerful duality theory (just like LP);polynomial-time solvable in theory (just like LP).


But why SDP specifically?

There are many other possible choices of convex cones for definingcone optimization problems.

However, SDP is (basically) the most general class of symmetriccones.

Symmetric cones permit the use of primal-dual interior-point methodsthat are efficient in practice.


Importance of SDP

SDP problems have become an important class of optimizationproblems for several reasons:

1 Because SDP problems are solvable in polynomial time, anyproblem that can be expressed using SDP is also solvable inpolynomial time.

2 SDP problems can be solved efficiently in practice. This can bedone by using one of the software packages available, oralternatively by implementing a suitable algorithm.

3 SDP can be used to obtain tight approximations for hard problemsin integer and global optimization.

4 SDP problems are applicable to a wide range of practicalapplications in areas such as control theory, circuit design, sensornetwork localization, and principal component analysis.


A Leisurely Introduction to SDP


A Simple Problem in 2 Dimensions

Consider the problemmin x + ys.t. xy ≥ 2

x , y ≥ 0

This is clearly not an LP problem.


Graphic Representation

The graphical representation of the problem is:

6

-x

y .

..........

..........

..........

..........

......

..........

..........

..........

..........

...

..........

..........

..........

..........

..........

..........

..........

.......

..........

..........

..........

....

..........

..........

..........

.

..........

..........

........

..........

..........

..........

......

......................................

........................................

..............

.................

...................

.............

................

......... ............ ......... ........... ........ .......... ........ .......... ........ ......... ................. ................. ................ ................ ................ ................ ................ ................ ................ ................ ................ ................ ................ ................ ................ ................ ................ ................ ................ ................ ................ ................ ................ ................

��

��

��

��

��

��

��

��

��

��

Direction of optimization

r(√2,√

2)


Formulation as an SDP

To verify that the problem above can be formulated an SDP problem,we write it as:

min x11 + x22

s.t. x12 =√

2X � 0,

where

X =

(x11 x12x12 x22

)is a 2× 2 symmetric matrix.The equivalence holds because X � 0 implies

x11 ≥ 0, x22 ≥ 0, and x11x22 − x212 ≥ 0.

This ability to model convex quadratic (and more generally, convexpolynomial) constraints is the fundamental strength of SDP.


Visualizing the 2× 2 psd matrices


An Important Example: Eigenvalue Optimization

Eigenvalues of a matrix variable are important in many contexts, boththeoretical and practical.

Suppose we wish to minimize the maximum eigenvalue of a symmetricmatrix A(z) whose entries depend linearly on the variable z:

A(z) = A0 + z1A1 + z2A2 + . . .+ z`A`

for some ` ≥ 1.The problem of interest is thus

minz∈<

λmax(A(z)).


Eigenvalue Optimization (ctd)

minz∈<

λmax(A(z)). (1)

To formulate this problem using SDP, first observe that (1) is equivalentto

min ts.t. t ≥ λmax(A(z)).

Since t ≥ λmax(A(z)) is equivalent to λmin(t I− A(z)) ≥ 0, thisconstraint is equivalent to t I− A(z) � 0.Hence, problem (1) is equivalent to

max −ts.t. t I− A0 − z1A1 − z2A2 − . . .− z`A` � 0.

which is an SDP problem.


Löwner Partial Ordering

The dual SDP problem can equivalently be written without using thedual variable Z:

sup bT y

s.t. C−m∑

i=1yiAi � 0.

The resulting constraint is called a linear matrix inequality (LMI).

The inequality is interpreted according to the partial order induced onSn by the positive semidefinite cone.This is called the Löwner partial order, and is defined as follows:

A � B ⇔ A− B � 0.


Minimizing the Spectral Norm

By a similar reasoning, the problem of minimizing the spectral norm ofA(z):

minz∈<|λmax(A(z))|

can be written as the following SDP problem:

max −ts.t. t I− A0 − z1A1 − z2A2 − . . .− z`A` � 0

t I + A0 + z1A1 + z2A2 + . . .+ z`A` � 0.


The Second-Order Cone (or Lorentz Cone)

An (n + 1)-dimensional second-order cone (SOC) is the set of all

vectors (x0, x1, . . . , xn) that satisfy x0 ≥√

x21 + . . .+ x2

n .

The non-negative orthant is a special case of SOC.

By the same token, a second-order cone is a special case of SDP:x0 0 0 · · · x10 x0 0 · · · x20 0 x0 · · · x3...

......

. . ....

x1 x2 x3 · · · x0

� 0.


The Second-Order in 3 Dimensions


The Elliptope

The n-dimensional elliptope is the feasible set of the SDP problem

min C • Xs.t. diag (X) = e

X � 0.

In other words, the elliptope of dimension(n

2

)is the set of all symmetric

matrices n × n that are psd and have ones on the diagonal.

This special set comes up in many applications of SDP, including someof today’s topics.


Small Elliptopes

If X ∈ S2, we obtain the elliptope in <1:(1 xx 1

)� 0⇒ x ∈ [−1,1].

If X ∈ S3, we obtain the elliptope in <3:x

yz

∈ <3 :

1 x yx 1 zy z 1

� 0

.

We can visualize this set in <3.


The Elliptope in 3 Dimensions


Geometry of the Elliptope

The vertices of the elliptope correspond to the psd matrices with allentries equal to ±1.

For n = 2, we have two vertices:(1 11 1

)and

(1 −1−1 1

).

For n = 3, there are four vertices:(1 1 11 1 11 1 1

),

(1 1 −11 1 −1−1 −1 1

),

(1 −1 1−1 1 −1

1 −1 1

),

(1 −1 −1−1 1 1−1 1 1

).


Geometry of the Elliptope

Unlike a polyhedron, the elliptope has extreme points that are notvertices.

This occurs first when n = 3: the matrix1 12

12

12 1 1

212

12 1

is not a vertex, but it is an extreme point of the elliptope since it cannotbe expressed as a convex combination of the four vertices.


Polyhedral and Non-polyhedral Faces

The feasible set of an SDP problem may have both polyhedral andnon-polyhedral faces.

A simple example of such a feasible set is:1 x 0x y 00 0 x − y

� 0


Algebraic Characterizations of psd MatricesThere are many ways to express the psd condition on a matrix X.

A few of them are:

X � 0 ⇔ eigenvalues λi(X) ≥ 0 for all i⇔ vT Xv ≥ 0 for all v ∈ <n

⇔ ∃ X12 ∈ Sn s.t. X

12 X

12 = X

⇔ ∃ `1, `2, . . . , `n s.t. Xij = `Ti `j

⇔ X • Z ≥ 0 for all Z � 0 (Fejer’s Theorem)⇔ All principal minors of X are non-negative

A sufficient (but not necessary) condition for psd:

Xii ≥∑j 6=i

|Xij | for all i ⇒ X � 0


Algebraic Characterizations of pd Matrices

A few of them are:

X � 0 ⇔ eigenvalues λi(X) > 0 for all i⇔ vT Xv > 0 for all v ∈ <n,v 6= 0⇔ ∃ linearly independent `1, `2, . . . , `n s.t. Xij = `Ti `j

⇔ All principal minors are positive(can be relaxed to n nested principal minors)

A sufficient (but not necessary) condition for pd:

Xii >∑j 6=i

|Xij | ⇒ X � 0


Schur Complement Theorem

TheoremIf

M =

(A B

BT C

)and A is pd, then

M � 0 ⇐⇒ C − BT A−1B � 0.


SDP problems with rational data

Unlike in LP, an SDP problem may have an irrational optimal solution,even if all the data C, Ai , b are rational (or even integer).

For example, consider the SDP problem:

max x12s.t. x11 = 1

x22 = 2(x11 x12x12 x22

)� 0.

Then: 2 ≥ x212 ⇒ x∗12 =

√2.


The Dual Cone

For any convex cone K in a normed vector space V , the dual cone K∗is defined as

K∗ := {y ∈ V : 〈x,y〉 ≥ 0 ∀x ∈ K}.

We have that:For any cone K, K∗ is a closed convex cone;The non-negative orthant is self-dual (obvious);The SOC is self-dual, by the Cauchy-Schwarz inequality;The psd cone is self-dual, by Fejer’s Theorem:

X � 0⇔ X • Z ≥ 0 for all Z � 0.


The Dual Cone and the Dual ProblemConsider the general cone optimization problem

min 〈c,x〉s.t. 〈ai ,x〉 = bi , i = 1, . . . ,m ← yi

x ∈ K

The Lagrangian dual is:

maxy

{minx∈K〈c,x〉+

m∑i=1

yi (bi − 〈ai ,x〉)

}= max

y

{m∑

i=1

biyi + minx∈K

⟨c−

m∑i=1

yiai ,x

⟩}

The inner minimization is unbounded below unless⟨c−

m∑i=1

yiai ,x

⟩≥ 0 for all x ∈ K

in which case the minimum is zero.


The Dual Cone and the Dual Problem (ctd)

Since the outer problem is a maximization problem, it is thereforeequivalent to

maxm∑

i=1biyi

s.t. c−m∑

i=1yiai ∈ K∗

which is equivalent to

maxm∑

i=1biyi

s.t.m∑

i=1yiai + z = c

z ∈ K∗

This is the dual cone optimization problem.


Other Important Examples of Convex Cones

The completely positive cone:

C := {X ∈ Sn : X =k∑

i=1

vivTi ,vi ≥ 0}

The copositive cone:

C∗ := {X ∈ Sn : vT Xv ≥ 0 for all v ≥ 0}

The doubly non-negative cone:

D := P ∩N

and its dualD∗ = P ⊕N


Relationships Between Various Convex Cones (n ≥ 5)

M1

M3M2

N P

M5

D∗

M6

C

M4

D C∗


Self-Concordant Barrier Functions

Use of an IPM requires a self-concordant barrier function for the coneunderlying the feasible set.

Although such a function (the Universal Barrier Function) exists for alarge variety of convex cones, it is very hard to compute in general.

However, efficient self-concordant barriers exist for symmetric cones.


Symmetric Cones

Symmetric cones arise from direct products of the following five typesof cones:

SOCsymmetric psd matrices over the reals (psd cone)Hermitian psd matrices over the complex numbers (can beexpressed as a psd cone of twice the size);Hermitian psd matrices over the quaternions (can be expressedas a psd cone of four times the size);One exceptional 27-dimensional cone (3× 3 Hermitian psdmatrices over the octonions).

Thus, SDP is (basically) the most general class of symmetric cones.


SDP DualityConsider a primal-dual SDP pair in the following form:

min 〈C,X〉 max 〈b,y〉

s.t. 〈Ai ,X〉 = bi , i = 1, . . . ,m s.t.m∑

i=1yiAi + Z = C

X � 0 Z � 0

Like in LP, we have a weak duality theorem.

TheoremIf X is primal feasible and (y, Z) is dual feasible then 〈C, X〉 ≥ 〈b, y〉.

The proof is just like for LP:

〈C, X〉 − 〈b, y〉 = 〈Z, X〉+m∑

i=1

yi〈Ai , X〉 −m∑

i=1

yi〈Ai , X〉 = 〈Z, X〉 ≥ 0.

The difference between the primal and dual objective values forfeasible solutions X and (y, Z) is called the duality gap.


Beyond weak duality, however, the picture differs. For example, for LP,if the primal is feasible and bounded, orif the dual is feasible and bounded,

then both primal and dual have optimal solutions, and the duality gapis zero at optimality.

For SDP, the situation is more complicated, as the following twoexamples demonstrate.


Zero Duality Gap Without AttainmentConsider the SDP problem

inf X11

s.t.(

X11 11 X22

)� 0.

It is feasible and bounded yet the optimal value zero cannot be

attained because(

0 11 X22

)is not psd for any value of X22.

The dual problem is

max y1

s.t.(

1 00 0

)− y1

(0 1

212 0

)� 0

or equivalentlymax y1

s.t.(

1 − y12

−y12 0

)� 0

with y∗1 = 0 optimal. (In fact, y1 = 0 is the only feasible solution.)Miguel F. Anjos (Polytechnique Montréal) Course on SDP Coimbra, 9 May 2011 44 / 205

Positive Duality GapThe SDP problem

min X11s.t. X11 + 2X23 = 1

X22 = 0X11 X12 X13X12 X22 X23X13 X23 X33

� 0

has optimal value 1.The dual SDP problem is

max y1

s.t.

1 0 00 0 00 0 0

− y1

1 0 00 0 1

20 1

2 0

− y2

0 0 00 1 00 0 0

=

1− y1 0 00 −y2 − y1

20 y1

2 0

� 0

and the psd constraint implies y1 = 0 for every feasible solution,hence the optimal value is 0. (Take e.g. y∗1 = 0, y∗2 = 0.)


How to Bridge the Gap (& Ensure Strong Duality)?


Constraint Qualification

To avoid this kind of difficulty and obtain a strong duality result for SDP,we must require that the primal-dual SDP pair satisfy a constraintqualification (CQ).

This is a standard concept in non-linear optimization. Arguably themost commonly used CQ is Slater’s CQ.

DefinitionSlater’s CQ holds if both primal and dual have feasible positive definitematrices.

We then have the following result.

TheoremUnder Slater’s CQ, both primal and dual have optimal solutions, andthe duality gap is zero at optimality.


Verifying Slater’s CQ

Examplemin 〈C,X〉 max 〈e,y〉

s.t. 〈eieTi ,X〉 = 1, i = 1, . . . ,m s.t.

m∑i=1

yieieTi + Z = C

X � 0 Z � 0


Optimality ConditionsFrom the weak duality of SDP, we have that the duality gap equals

〈C, X〉 − bT y = 〈Z, X〉 ≥ 0.

Since both X and Z are psd, 〈X,Z〉 = 0 implies XZ = ZX = 0, thus weobtain the sufficient optimality conditions:

Ai • X = bi , i = 1, . . . ,m, (primal feasibility)X � 0

Z +m∑

i=1yiAi = C (dual feasibility)

Z � 0XZ = 0 (complementarity)

If Slater’s CQ holds, they are also necessary for optimality.These optimality conditions can be used as the starting point fordefining IPMs to solve SDP problems (later).


Maximum-Cut: A prototypical global optimizationproblem


Maximum-Cut: A prototypical global optimizationproblem

Given a graph G = (V ,E) and weights wij for all edges (i , j) ∈ E , findan edge-cut of maximum weight, i.e. find a set S ⊆ V s.t. the sum ofthe weights of the edges with one end in S and the other in V \ S ismaximum.We assume wlog that wii = 0 for all i ∈ V , and that G is complete(assign wij = 0 if edge ij 6∈ E).


Example of Cut


Optimal Cut


Standard ILP Formulation

maxn∑

i=1

n∑j=i+1

wijyij

s.t. yij + yik + yjk ≤ 2,1 ≤ i < j < k ≤ nyij − yik − yjk ≤ 0,1 ≤ i < j ≤ n, k 6= i , jyij ∈ {0,1},1 ≤ i < j ≤ n

where

yij =

{1 if edge ij is cut0 otherwise,

yij = yji , and wij denotes the weight of edge ij .


The Cut Polytope

The convex hull of the feasible integer points is called the cut polytope.

This structure has been studied in depth, and many families ofcutting planes are well understood, see e.g. Deza and Laurent(1997).It is known that for many graphs, e.g. planar graphs, the LPrelaxation is exact.This approach has been highly successful for solving spin glassproblems in physics (Liers, Jünger, Reinelt and Rinaldi (2005))and is available online at the Spin Glass Server:

http://www.informatik.uni-koeln.de/ls_juenger/research/sgs/


http://www.informatik.uni-koeln.de/ls_juenger/research/sgs/

Quadratic Formulation of Max-Cut

Whereas the ILP formulation is edge-based, we use a node-basedquadratic formulation.

Let the vector v ∈ {−1,+1}n represent any cut in the graph viathe interpretation that the sets {i |vi = +1} and {i |vi = −1} specifythe partition.Then max-cut may be formulated as:

maxn∑

i=1

n∑j=i+1

wij

(1−vi vj

2

)s.t. v2

i = 1, i = 1, . . . ,n.


The Basic SDP Relaxation of Max-Cut(Schrijver, unpublished)

Consider the change of variable X = vvT , v ∈ {±1}n.Then max-cut is equivalent to

max Q • Xs.t. diag (X) = e

rank(X) = 1X � 0,

where Q = 14 (Diag (We)−W).

Removing the rank constraint, we obtain the basic SDP relaxation ofmax-cut.

Q: How good is this SDP relaxation?


Goemans and Williamson (1995):0.878-approximation algorithm

Theorem

If wij ≥ 0 for all edges ij, then

max-cut opt valueSDP relax opt value

≥ α

where α := min0≤ξ≤π

2π

ξ1−cos ξ ≈ 0.87856.

In fact, Goemans and Williamson proved a stronger result:they described a randomized algorithm that

from an optimal solution X∗ of the SDP relaxationgenerates a cut with expected weight ≥ α (SDP relax opt value).


Recall that X∗ � 0⇒ ∃ `1, `2, . . . , `n s.t. Xij = `Ti `j

...........................

............................

............................

............................

...........................

..........................

..........................

...........................

........................................................

................................................................................................................................................................

......

...........................

..........................

........................

..

...........................

............................

............................

............................

..........

..........

.......

..........

..........

.......

............................

............................

......................

......

........................

...

........................

..

..........................

...........................

............................

............................ ............................ ........................... ........................... ........................................................

............................

...........................

..........................

..........................

...........................

............................

............................

............................

...........................

��

`1

@@

@@@I

`2

�`3

CCCCCCCW`4

SSSSSSw`n−1

��

��1`n

XXXXX

XXyr

`T r > 0 `T r < 0

��

��

Since the optimal max-cut value is at least the expected value of thiscut, the theorem follows.


Constant Relative Accuracy Estimate

With no restriction on the edge weights, Nesterov (1998) provedconstant relative accuracy estimates for the basic SDP bound.

Define

µ∗ = max vT Qv µ∗ = min vT Qvs.t. v ∈ {−1,1}n s.t. v ∈ {−1,1}n

ψ∗ = max Q • X ψ∗ = min Q • Xs.t. diag (X) = e, X � 0 s.t. diag (X) = e,X � 0

ands(β) := β ψ∗ + (1− β)ψ∗, β ∈ [0,1].


Constant Relative Accuracy Estimate (ctd)

TheoremWithout any assumption on the matrix Q,

|s( 2π )− µ∗|µ∗ − µ∗

≤ π

2− 1 <

47.

(µ∗ = 0 in the absence of negative edge weights.)


Improving the Basic SDP Relaxation

A straightforward idea is to add the triangle inequalities:

Xij + Xik + Xjk ≥ −1,Xij − Xik − Xjk ≥ −1,

−Xij + Xik − Xjk ≥ −1,−Xij − Xik + Xjk ≥ −1

for all i < j < k .


Improving the Basic SDP Relaxation (ctd)Theoretically, this does not help much in the worst case:

Karloff (1999) showed that the ratio

expected weight of cutmax-cut opt value

can be arbitrarily close to α.Feige and Schechtman (2002) showed that the approximationratio

weight of cut from GW algorithmmax-cut opt value

can be arbitrarily close to α,and that the integrality ratio

max-cut opt valueSDP relax opt value

is no better than ≈ 0.891.Miguel F. Anjos (Polytechnique Montréal) Course on SDP Coimbra, 9 May 2011 63 / 205

Improving the Basic SDP Relaxation (ctd)

In practice, however, the triangle inequalities help for small- tomedium-sized examples, say graphs with up to a few hundred nodes:

Helmberg and Rendl (1998): IPM to solve the basic SDP +triangle inequalitiesHelmberg (2004): Bundle method to solve the basic SDP +triangle inequalitiesKrishnan and Mitchell (2006): Cut-and-price by reformulating thebasic SDP as a semi-infinite LPRendl, Rinaldi and Wiegele (2007):IPM + bundle + triangle inequalities = Biqmac

http://biqmac.uni-klu.ac.at/


http://biqmac.uni-klu.ac.at/

More generally...

The rows and columns of the matrix variable in the basic SDPrelaxation are indexed by the binary variables corresponding to theedges incident on node 1:

X =

1 X12 · · · · · · X1,n

1. . . Xi,j

. . .1

Consider semidefinite relaxations with the rows and columns of thematrix variable indexed by subsets of the set of variables.


Lasserre Hierarchy

1 v1 · · · · · · vn X12 · · · Xpairs · · · Xn−1,n · · · Xtriples · · · X1,2,...,n1 X12 . . . X1n

1 · · · X2n. . .

...1

1. . .

1. . .

11


Properties of the Lasserre Hierarchy

Lasserre (2001,2002):

TheoremThe SDP relaxation with all subsets of nodes is exact for max-cut.

A. and Wolkowicz (2001) and Laurent (2001):

TheoremLet Y ∗ be optimal for the SDP with subsets of up to k nodes. Ifrank Y ∗ ≤ k then the optimal value of the SDP relaxation is the optimalvalue of max-cut.

In particular, this shows that after n − 1 steps, the Lasserre lifting isexact.Laurent (2001) also showed that the Lasserre relaxations are tighterthan the corresponding Sherali-Adams RLT relaxations orLovász-Schrijver relaxations.


Drawback of the Lasserre Hierarchy

However, the SDP problems along the hierarchy grow exponentiallyfast.

Can we find a more efficient construction? Yes!More on this later...


The Single-Row Facility Layout Problem


The Single-Row Facility Layout Problem (SRFLP)

The SRFLP, also known as the 1-D space allocation problem, consistsin finding an optimal linear placement of facilities with varyingdimensions on a straight line.

An instance of the problem consists of:n 1-D facilities {1, . . . ,n}with positive lengths `1, . . . , `n

and (usually non-negative) pairwise connectivities cij .

We seek a placement of the facilities so as to minimize the totalweighted sum of the center-to-center distances between all pairs offacilities.


Practical Application - Manufacturing

Arrangement of machines along a path traveled by an automatedguided vehicle (minimize total traveled distance)


Practical Application - Circuit Design

Layout in VLSI design (minimize wirelength)


Connections to Other Ordering Problems

Note that for the SRFLP, the distance between two facilities dependson the facilities placed between them.

Consider the distance between 1 and 2 in this example:

If all `i are equal, the SRFLP becomes an instance of the LinearOrdering Problem, which is itself a special case of the QuadraticAssignment Problem (QAP).


Early Research

The first papers on the SRFLP considered globally optimal algorithms:Simmons (1969) proposed a branch-and-bound algorithm;Love and Wong (1976) considered an MILP model;Picard & Queyranne (1981) and Beghin-Picavet & Hansen (1982)developed dynamic programming algorithms.

Globally optimal solutions were obtained only for instances with lessthan 10 facilities.

For many years after this, almost all the research on this problemfocused on locally optimal methods.


Locally Optimal Algorithms

Representative papers of this research:Heragu and Kusiak (1991) used non-linear optimization methods;Heragu and Alfa (1992) used a simulated annealing algorithm;Kumar et al. (1995) proposed a greedy heuristic algorithm;de Alvarenga et al. (2000) considered simulated annealing andtabu search.

The last two papers contained best results in the literature until 2005,on instances with up to 30 facilities.


The Return of Globally Optimal Algorithms

In the last few years, there has been a lot of progress with LP- andSDP-based approaches:

A., Kennings & Vannelli (2005) proposed an SDP model, andcomputed lower bounds for instances with up to 80 facilities;Amaral (2006) proposed a new MILP model and solved instanceswith up to 15 facilities;Amaral and Letchford (2006) performed a careful polyhedral studyof the SRFLP, and compute strong bounds for instances with up to30 facilities;


The Return of Globally Optimal Algorithms (ctd)

A. & Vannelli (2006) combined the SDP relaxation with cuttingplanes to solve instances with up to 30 facilities;Amaral (2008) proposed a new MILP formulation with cuttingplanes, and solved instances with up to 35 facilities.Sanjeevi and Kianfar (2010) have shown that several validinequalities presented by Amaral are facet-defining.A. and Yen (2008) proposed a new SDP relaxation with cuttingplanes, solved instances with up to 36 facilities, and obtainedlower bounds for instances with up to 100 facilities.

Today: SDP modelling for the SRLFP.


Optimizing Over Permutations (Simmons (1969))

Let π = (π1, . . . , πn) denote a permutation of the indices[n] := {1,2, . . . ,n} of the facilities.

The problem is then

minπ∈Πn

∑i<j

cij

[12`i + Dπ(i , j) +

12`j

]

where Πn denotes the set of all permutations π of [n], and Dπ(i , j)denotes the sum of the lengths of the facilities between i and j in thelinear arrangement defined by π.


Optimizing Over Permutations (ctd)

Rewriting the objective function as

minπ∈Πn

∑i<j

cijDπ(i , j) +∑i<j

12

cij(`i + `j)

where the second summation is a constant independent of π, we seethe problem is really to minimize

∑i<j

cijDπ(i , j) over all permutations π.


Definition of the Binary Variables

For each pair i , j of facilities, define the ±1 variable

Rij :=

{1, if facility i is to the right of facility j−1, if facility i is to the left of facility j

The order of the subscripts matters: Rij = −Rji .

To accurately formulate the problem, we must ensure that the variablesrepresent a valid permutation. In particular,

if i is to the right of j and j is to the right of k ,then i is to the right of k .


Definition of the Binary Variables (ctd)

We can formulate this condition as

if Rij = Rjk , then Rij = Rik ,

or equivalently(Rij + Rjk )(Rij − Rik ) = 0,

which expanded gives the quadratic constraint

RijRjk − RijRik − RikRjk = −1.

These(n

3

)constraints on the Rij variables are not only necessary, but

also sufficient to obtain precisely all possible permutations of the nfacilities.


Binary Variables↔ Permutations

Let

Rn :={ρ ∈ {±1}(

n2) : RijRjk − RijRik − RikRjk = −1 ∀ i < j < k

}.

TheoremThe function f : Rn → Πn defined as f (ρ) = (π1, . . . , πn) where

πk :=

n + 1 +∑j 6=k

Rkj

2for k = 1,2, . . . ,n,

is a bijection.


Binary Formulation

To express the objective function in terms of the variables Rij , observethat the sum of the lengths of the facilities between i and j can beexpressed as ∑

k 6=i,j

`k

(1− RkiRkj

2

)

Therefore, we can formulate the problem as:

min∑i<j

[cij∑

k 6=i,j`k

(1−Rki Rkj

2

)]+∑i<j

12cij(`i + `j)

s.t.RijRjk − RijRik − RikRjk = −1 for all triples i < j < kR2

ij = 1 for all pairs i < j


First Matrix Formulation and SDP Relaxation

Define the matrix variable X = ρρT with(n

2

)rows and columns.

Then Xij,kl = RijRkl for all pairs of facilities, and we can formulate theproblem as:

min

(∑i<j

cij2

)(n∑

k=1`k

)−∑i<j

cij2

[ ∑k 6=i,j

`kXki,kj

]s.t.

Xij,jk − Xij,ik − Xik ,jk = −1 for all triples i < j < kXij,ij = 1 for all pairs i < jrank(X) = 1X � 0

Omitting the rank constraint yields the SDP relaxation.The number of linear constraints is O(n3), which is too large for verylarge instances.


Second Matrix Formulation and SDP Relaxation

For this reason, we consider a second SDP-based formulation:

min

(∑i<j

cij2

)(n∑

k=1`k

)−∑i<j

cij2

[ ∑k 6=i,j

`kXki,kj

]s.t.

n∑k 6=i,j,k=1

Xij,jk −n∑

k 6=i,j,k=1Xij,ik −

n∑k 6=i,j,k=1

Xik ,jk = −(n − 2)

for all pairs i < jXij,ij = 1 for all pairs i < jrank(X) = 1X � 0

and we omit the rank constraint to obtain a relaxation with only O(n2)linear constraints.


Obtaining Layouts from the SDP Solution

Let X ∗ be the optimal solution to either SDP relaxation.

For each row rs of X ∗, if we set Rrs = +1, then we can assign thevalue Xrs,ij to the variable Rij , for every pair (i , j) 6= (r , s).

Using these values, we then generate the πk values as in the Theorem,and sorting these we obtain a permutation of the facilities.


Tightening the relaxations

A standard way to tighten semidefinite relaxations of binaryoptimization problems is to add inequalities that are valid for all therank-one solutions.

A common approach is to:1 solve the SDP relaxation;2 add some violated inequalities;3 re-optimize;4 repeat from step 2 until no more violated inequalities are found.


Globally Optimal Solutions for Literature Instances

A. and Vannelli (2006) obtained the first provably global optimalsolutions for five well-known instances in the literature using triangleinequalities and the first SDP relaxation.

Instance Number SDP Improved CPU Best layout Gapof bound SDP time by SDP-based

facilities bound (sec) heuristicLit-1 8 2324.5 2324.5 0.2 2324.5 0%Lit-2 10 d2773.9e = 2774.0 2781.5 3.4 2781.5 0%Lit-3 11 d6847.6e = 6848.0 6933.5 32.6 6933.5 0%Lit-4 20 d15285.9e = 15286.0 15549.0 1613.5 15549.0 0%Lit-5 30 d43963.7e = 43964.0 44965.0 57057.0 44965.0 0%

(Computations performed with CSDP on a 2.0GHz Dual Opteron & 16Gb of RAM.)


Global Bounds For Very Large Instances

A. and Yen (2008) used the second SDP relaxation to obtain tightbounds for some extremely large instances:

Number of Number of Range of gaps Averagefacilities instances CPU time

56 5 3.49% - 4.47% 3h 03m64 5 2.80% - 5.15% 9h 00m72 5 2.89% - 4.03% 23h 20m81 5 3.44% - 4.97% 2d 0h 24m

100 5 3.18% - 3.81% 9d 17h 0m

(Computations with CSDP on a 2.4GHz Quad Opteron & 16Gb of RAM.)


Ongoing Related Research

Faster algorithms for solving the SDP relaxations (Hungerländer &Rendl)Extension to continuous 2-dimensional layout (Adams & A.)

See book chapter:Global Approaches for Facility Layout and VLSI Floorplanning by A. &Liers, to appear in Handbook on Semidefinite, Cone andPolynomial Optimization: Theory, Algorithms, Software andApplications edited by A. and J.B. Lasserre.Available on Optimization Online until publication.


Interior-Point Methods


Solving SDP ProblemsNumerous algorithms have been proposed (and, in most cases,implemented) for solving SDP problems:

Ellipsoid methodInterior-point methods (IPMs)Spectral bundle methodLow-rank methodAugmented Lagrangian methodsSmoothing methodsBoundary point methodand more...

The first two are the only ones with provable polynomial-timeconvergence.IPMs are the only ones that are polynomial-time and efficient inpractice.


IPM Variants

Within the framework of IPMs, many variants have been proposed,analyzed, and implemented:

Primal-dual path-followingInfeasiblePotential reductionDual scalingPrimal-dual completion-basedand more...

Modern LP software packages contain both simplex and interior-pointsolvers, often several variants for each.


A basic primal-dual path-following IPM

Let’s again start with LP:

max cT x min bT y

s.t. aTi x = bi , i = 1, . . . ,m s.t. z =

m∑i=1

yiai − c

x ≥ 0 z ≥ 0

A primal-dual solution (x,y, z) is optimal if and only if

A x = b, x ≥ 0

AT y + z = c, z ≥ 0

x ◦ z = 0

However, it is not straightforward to solve this set of conditions for(x,y, z).


Solving the optimality conditions

A x = b, x ≥ 0 (primal feasibility)AT y + z = c, z ≥ 0 (dual feasibility)

x ◦ z = 0 (complementarity)

The simplex method ensures that both primal feasibility andcomplementarity hold, and works to achieve dual feasibility.

Path-following IPMs maintain primal feasibility and dual feasibility, andwork to achieve complementarity.

The first key notion is that of interior point.


Interior Points

An interior point (x,y, z) is a point that satisfies:

A x = b, AT y + z = c, x > 0, z > 0

Complementarity does not hold for any interior point.

Therefore

relax x ◦ z = 0 by changing it to x ◦ z = µ e,

where µ ≥ 0.


The Central Path

The resulting conditions:A x = b

AT y + z = c

x ◦ z = µ e

have a unique solution (which is an interior point) for each µ > 0.Since the solution depends on µ, we denote it as:

(x(µ),y(µ), z(µ)).

As we vary µ, we get a sequence of interior points.As µ→ 0, we converge to a primal-dual optimal solution.The entire sequence of points traces a path inside the feasible region.We call this path the central path.


Following the central path

max x1s.t. x1 + x2 + x3 = 1

x1, x2, x3 ≥ 0


Following the central path in practice

max x1s.t. x1 + x2 + x3 = 1

x1, x2, x3 ≥ 0

00.1

0.20.3

0.40.5

0.60.7

0.80.9

1

00.1

0.20.3

0.40.5

0.60.7

0.80.9

1

0

0.2

0.4

0.6

0.8

1

x1x2

x 3


Outline of a primal-dual path-following IPMA path-following IPM approximately follows the central path to anε-optimal interior-point solution.

Basic outline: Given a tolerance ε > 0, initial µ > 0 and correspondingpoint (x(µ),y(µ), z(µ))

1 Decrease µ2 Approximate the point on the central path corresponding to the

new value of µ by applying Newton’s method to

A x = b, x ≥ 0AT y + z = c, z ≥ 0

x ◦ z = µ e

starting with the point corresponding to the previous µ.3 Repeat until the duality gap bT y− cT x < ε.


And for SDP?

Exactly the same outline with the optimality conditions:

Ai • X = bi , i = 1, . . . ,m

Z =m∑

i=1

yiAi − C

X Z = 0

and the interiority condition defined by both X and Z being positivedefinite:

X � 0,Z � 0.


Newton System

At each step of Newton’s method, we solve the system:

Ai •∆X = 0, i = 1, . . . ,m,

∆Z +m∑

i=1∆yiAi = 0

∆XZ + X∆Z = σµI− XZ−∆X∆Z

(2)

where σ ∈ [0,1] and µ = X•Zn .

This is a linear system with m +(n+1

2

)+ n2 equations and(n+1

2

)+ m +

(n+12

)unknowns.

Therefore, unlike for the LP case, we have an overdetermined system.

The reason this happens is that XZ is in general non-symmetric even ifboth X and Z are symmetric.


SDP Search DirectionsNumerous ways have been proposed to address this difficulty, leadingto more than twenty different search directions in the literature. Hereare four of them:

AHO direction (Alizadeh-Haeberly-Overton): Symmetrize theequation XZ = µI to XZ+ZX

2 = µI. This approach basically reduces thenumber of equations to the number of variables.This method tends to be slower than several others in practice.

HKM direction (Helmberg-Rendl-Vanderbei-Wolkowicz/Kojima-Shindoh/Monteiro-Hara): Solve the system (2) for non-symmetric ∆X,then symmetrize:

∆X← ∆X + (∆X)T

2.

This approach basically increases the number of variables to equal thenumber of equations, and then symmetrizes the ∆X matrix.The HKM direction if fast, and it is arguably the most popular method.


SDP Search Directions (ctd)NT direction (Nesterov-Todd): This direction is a member of theMonteiro-Zhang family of directions:

HP(∆XZ + X∆Z) = µI− HP(XZ)

whereHP(M) =

12

[PMP−1 + P−T MT PT

]and the matrix P determines the symmetrization strategy. Thesedirections are obtained by making the number of equations equal tothe number of variables, and then applying Newton’s method.

Note that P = I yields AHO, and P = Z12 yields HKM.

The NT direction corresponds to P =[X

12 (X

12 ZX

12 )

12 X

12

] 12 . This choice

of P leads to a direction with several favourable theoretical propertiesas it is the unique matrix satisfying

P−1XP−1 = PZP = Diagonal , i.e., X = P2ZP2.


SDP Search Directions (ctd)

Gauss-Newton direction(Kruk-Muramatsu-Rendl-Vanderbei-Wolkowicz): This direction is not amember of the Monteiro-Zhang family.

It is obtained by looking for a least-squares solution of the non-linearequations

Ai • X = bi , i = 1, . . . ,m,

Z +m∑

i=1yiAi = C

XZ = µI

Therefore, the idea is to apply the classical approach for solvingnon-linear least-squares problems, commonly called theGauss-Newton method.


Cone optimization SoftwareSome of the most commonly used are:

IPMs:CSDP (parallel)SDPA (parallel)SeDuMi (Matlab; SOCP)SDPT3 (Matlab; SOCP)DSDP

Non-IPMs:SBMethodSDPLRPENSDP (commercial)

All solve SDP problems, only some handle SOCs explicitly.Miguel F. Anjos (Polytechnique Montréal) Course on SDP Coimbra, 9 May 2011 107 / 205

Warmstarting Interior-Point Methods


Warmstarting

Warmstarting: the use of information from a solved problemto accelerate the optimization of a similar problem

1 Same structure and size, similar dataI changes of right-hand side in Benders decompositionI changes of objective function in Dantzig-Wolfe decomposition

2 Modified structure, more or less (of) the same dataI series of LP/SDP relaxations for combinatorial optimizationI removal of variables/constraints in branch-and-bound schemesI addition of variables/constraints in column-generation approaches


Inherent Problem of Interior-Point Warmstarts

Consider re-optimizing after a perturbation of the constraints of an LPproblem:

The same issue arises in SOCP and SDP.


(Primal-Dual) Interior-Point Methods for LP

(LP) min cT xs.t. Ax = b

x ≥ 0

(LD) max bT y

s.t. AT y + s = cs ≥ 0

First-Order (Karush-Kuhn-Tucker) Optimality Conditions (perturbed)

Ax = b (x ≥ 0) AT y + s = c (s ≥ 0) xisi = µ (for all i)

Basic Idea of Interior-Point Methods1 START from an initial point that is well centered and well balanced

x0i s0

i ≈ µ where µ� 0 for all i

2 APPLY a series of Newton iterations while decreasing µ to zero3 STOP with sufficiently small feasibility residuals and duality gap µ.


Warmstarts After Problem Perturbations for LP

min (c◦ + ∆c)T xs.t. (A◦ + ∆A)x = b◦ + ∆b

x ≥ 0

max (b◦ + ∆b)T y

s.t. (A◦ + ∆A)T y + s = c◦ + ∆cs ≥ 0

Let (x◦ ≥ 0, y◦, s◦ ≥ 0) be optimal for the initial problem

A◦x◦ = b◦ A◦T y◦ + s◦ = c◦ X ◦S◦e = 0

then (x◦, y◦, s◦) is typically infeasible for the perturbed problem

r◦b = b − Ax◦ = (b◦ + ∆b)− (A◦ + ∆A)x◦ = ∆b −∆Ax◦

r◦c = c − AT y◦ − s = (c◦ + ∆c)− (A◦ + ∆A)T y◦ − s◦ = ∆c −∆AT y◦

Good news: Feasibility can be handled by an infeasible IPM (IIPM)Bad news: Initial interiority must be somehow achieved...


Warmstart Strategies for LP

Feasible Interior-Point ApproachesStorage of previous iterates (Mitchell & Todd 1992, Mitchell & Borchers1996, Gondzio 1998, Gondzio & Vial 1999)Adjustments of initial points (Mitchell 2000, Yildirim & Wright 2002,Gondzio & Grothey 2003, John & Yildirim 2008)Unblocking technique based on sensitivity (Gondzio & Grothey 2008)

Infeasible Non-Interior (“Exterior-Point”) ApproachesShifted-barrier methods (Freund 1991, 1996, Polyak 1992, Mitchell 1994)Exact primal-dual penalty method (Benson & Shanno 2007)Exact primal-dual slack approach (Engau, A. & Vannelli 2009)


Infeasible Non-Interior (“Exterior-Point”) ApproachesShifted-barrier problems (Freund 1991, 1996, Polyak 1992)

min cT x − µ∑

log(xi + µhi) s.t. Ax = b, x + µh > 0

Exact primal-dual penalty method (Benson & Shanno 2007)

min cT x + dT ξ

s.t. Ax = b0 ≤ x + ξ ≤ uξ ≥ 0

max bT y − uTψ

s.t. AT y + s = c− ψ ≤ s ≤ dψ ≥ 0

Exact primal-dual slack approach (Engau, A. & Vannelli 2009)

min cT xs.t. Ax = b

x − ξ = 0ξ ≥ 0

max bT y

s.t. AT y + s = cs − ψ = 0ψ ≥ 0


Theoretical Complexity of Slack Approach

Iteration complexity and worst-case iteration count

Comparable computational cost per iteration as standard form LPDetailed analysis yields same worst-case iteration bound as IIPM

Initialization schemes for theoretical analysis

Standard form: (x0, y0, s0) = (ζe,0, ζe) where ζ ≥ ‖(x∗, s∗)‖∞Slacked form: (x0, y0, s0, ξ0, ψ0) = (x◦, y◦, s◦, ζe, ζe)

Iteration complexity of slacked-form IIPM (Engau, A., Vannelli 2009)

O(n) log

(max

{∥∥r0b

∥∥εb

,

∥∥r0c∥∥

εc,

∥∥r0x∥∥

εx,

∥∥r0s∥∥

εs,ζ2nεd

})

Easy to show that max{∥∥r0

x∥∥ , ∥∥r0

s∥∥} ≤ ζ2n for ζ ≥ ‖(x◦, s◦,1)‖∞


Preliminary Warmstart Tests for Data Perturbations

(similar to Benson & Shanno 2007 and Gondzio & Grothey 2008)

Problem test set: perturbable and feasible Netlib LP test problems

“LP” solver: SDPT3 (freely available, easy supply of initial points)with supplemental “free-variable” code by A. & Burer 2008

Perturbation: 10% or 20 (random) entries of A, b, c on average

∆ci =

{εδ if c◦i = 0εδc◦i otherwise

where ε ∈ [−1,1], δ ∈ {0.1,0.01,0.001}

(same scheme for b and A but preserving sparsity structure of A)

Performance measures: perturbation levels, number of iterations

WCR = Warm-to-Coldstart-Ratio =number of warmstart iterationsnumber of coldstart iterations


Netlib Results: WCR Statistics with Slack Approach

(Feasible) Perturbations WCR Statistics∆ δ Pert (#) avg stdv min med maxb 0.001 15.3 46 0.23 0.14 0.06 0.17 0.63b 0.01 14.9 42 0.28 0.14 0.11 0.25 0.65b 0.1 15.0 40 0.39 0.18 0.15 0.36 0.82c 0.001 17.9 62 0.24 0.13 0.08 0.20 0.64c 0.01 18.5 59 0.33 0.17 0.11 0.28 0.90c 0.1 18.0 54 0.44 0.22 0.13 0.40 1.13A 0.001 19.2 60 0.32 0.20 0.13 0.25 0.87A 0.01 19.9 58 0.41 0.25 0.14 0.37 1.74A 0.1 19.9 58 0.58 0.27 0.21 0.56 1.75Abc 0.001 52.8 46 0.33 0.15 0.12 0.29 0.64Abc 0.01 51.6 42 0.46 0.17 0.13 0.45 0.83Abc 0.1 51.5 36 0.74 0.30 0.38 0.68 1.66


Netlib Results: WCR Profiles with Slack Approach


Netlib Results: Comparison of WCR Averages

The relative WCR provides a meaningful comparison.Our performance is competitive among these strategies.


Standard Cutting-Plane Method for SDP Relaxations

(SDP) min C • X s.t. A(X ) = b, P(X ) ≥ q, X � 0

A(X ) = (A1 •X , . . . ,Am •X )T = b ∈ Rm are m equality constraintsP(X ) = (P1 • X , . . . ,Pk • X )T ≥ q ∈ Rk are k � m (valid) inequal.

1 Drop P(X ) ≥ q and find optimal (X , y ,S) from the initial relaxation2 Let I = I(X ) = {i : vi = qi − Pi • X > 0} and add “cuts” PI(X ) ≥ qI

(SDPI) min C • Xs.t. A(X ) = bPI(X ) ≥ qI

X � 0

(SDDI) max bT y + qTI z

s.t. AT (y) + PTI (z) + S = C

z ≥ 0S � 0

3 Find a new solution (X , y , z,S) and repeat Step 2 until I(X ) = ∅.


Motivation For An Integrated Approach

Interior-Point Algorithms

+ take larger steps especially during its first few iterations– take smaller steps as we move closer to the optimal solution

⇒ Add new cuts as quickly as possible (before reaching optimality)!(also used in methods of Helmberg & Rendl 1998, Helmberg 2000, 2004)

– spent the majority of work in computing new Newton directions

⇒ Remove unnecessary cuts as quickly as possible (reduce system)!

Cutting-Plane Methods

+ do not require optimal solutions, especially during early iterations– must be restarted if iterates get too close to the interior’s boundary

⇒ Slack active or violated cuts, continue optimization from warmstart!


Background: Active Set Detection Using LP IndicatorsLet Ix = {i : x∗i > 0} and Is = {i : s∗i > 0} with (x∗, y∗, s∗) optimal to

(LP) min cT x s.t. Ax = b, x ≥ 0

(LD) max bT y s.t. AT y + s = c, s ≥ 0

Variable Indicator (optimization folklore)

limk→∞ x (k)i

{> 0= 0

and limk→∞ s(k)i

{= 0 for i ∈ Ix> 0 for i ∈ Is

Primal-Dual Indicator (Ye 1990, Gay 1991)

limk→∞x (k)

i

s(k)i

=

{∞0

and limk→∞s(k)

i

x (k)i

=

{0 for i ∈ Ix∞ for i ∈ Is

Tapia Indicator (Tapia 1980, El-Bakry, Tapia & Zhang 1994)

limk→∞x (k+1)

i

x (k)i

=

{10

and limk→∞s(k+1)

i

s(k)i

=

{0 for i ∈ Ix1 for i ∈ Is


SeparationLet vk

i = qi − Pi • X k be the cut violation at an iterate (X k , yk , zk ,Sk )

(SDPI) min C • Xs.t. A(X ) = bPI(X ) ≥ qI

X � 0

(SDDI) max bT y + qTI zI

s.t. AT (y) + PTI (zI) + S = C

zI ≥ 0S � 0

Addition of cuts uses variable or Tapia indicator

I+ ={

i /∈ I : vki ≥ min

{ν+, ν+

v vk−1i

}}(no primal-dual indicator because we have no dual variable zi yet)

Removal of cuts uses variable, Tapia, and primal-dual indicator

I− ={

i ∈ I : vki ≤ min

{ν−, ν−v vk−1

i , ν−z zki

}}(no removal unless all indicators predict elimination of cut violation)


Integrated AlgorithmStart from the initial LP/SDP relaxation without any cuts P(X ) ≥ q

min C • Xs.t. AX = b, X � 0PIX − r = qI , r − ξ = 0, ξ ≥ 0

max bT y + qTI z

s.t. S = C −AT y − PTI z � 0

z − ψ = 0, ψ ≥ 0

1 Init (X 0, y0,S0), µ0 = (X 0 • S0)/N, v0 = q − P(X 0), I0 = ∅, k = 0.

2 Stop if max{µk ,∥∥b −A(X k )

∥∥ , ∥∥C − Sk −AT (yk )∥∥} < ε, vk ≤ 0.

3 Add and remove cuts Ik+1 = Ik ∪ Ik+ \ Ik

− (using indicators).

4 Slack new variables r ki and zk

i = 0 at√µ for i ∈ Ik

+ for warmstart.

5 Compute new iterate (X k+1, yk+1, zk+1, r k+1,Sk+1, ξk+1, ψk+1).

6 Set k = k + 1, µk = (X k • Sk )/n, vk = q − P(X k ); go to Step 2.


Implementation, Testing, and ValidationSDP Solver: SDPT3 Version 3.02 (Toh, Todd & Tütüncü 1999/2003)

options.maxit = 2: stops for cut separation every other iterationadditional routines for handling of free variables (A. & Burer 2008)routines for primal-dual slack warmstarting (Engau, A. & Vannelli 2009)

Test Problems: Instances of maxcut and single-row facility-layoutimproved by adding triangle inequalities as cuts

Rndw: randomly generated graphs of varying sizes, densities & edgeweightsS2/3ng/pm: 2D/3D spin glasses with Gaussian and ±1 interactionsSRFLP: single-row facility layout instances

Result Comparison: Each problem is solved in three different ways

NOCUT solves only the initial SDP relaxation without any additional cutsHIPCUT applies our algorithm adding up to 200 cuts every other iterationCOPYCUT solves only the final relaxation with HIPCUT cuts at optimality


Maxcut Example: Groundstates of Ising Spin Glasses

problem in statistical physics

spins either point up or downand interact with each other

spins that interact via a positivebond want to point in the samedirection, spins with a negativebond in opposite directions

Frustration: no configurationexists that satisfies all bonds

Groundstate: a configurationthat minimizes frustration, orassociated spin glass energy

An exact groundstate on an 8x8grid. Each spin either points up(dark gray) or down (light gray).


Results For Randomly Generated Complete Graphs

NOCUT HIPCUT COPYCUTProblem time (iter) ∆ cuts time (iter) time (iter) γR50d50w0-10 0.88 (11) 2.64 1236 26.56 (23) 37.95 (20) 0.70R50d90w0-10 0.90 (11) 1.87 957 21.97 (25) 21.78 (22) 1.01R50d50w1-1 0.87 (12) 2.16 1151 38.40 (28) 37.90 (21) 1.01R50d90w1-1 0.73 (12) 0.79 1850 88.06 (35) 150.32 (27) 0.59R100d50w0-10 1.99 (12) 1.92 1668 98.09 (31) 95.22 (26) 1.03R100d90w0-10 1.46 (12) 1.14 1610 93.83 (33) 85.39 (26) 1.10R100d50w1-1 1.58 (12) 1.53 1555 85.35 (31) 68.39 (23) 1.25R100d90w1-1 1.63 (12) 0.60 1835 111.62 (35) 116.93 (25) 0.95R150d50w0-10 4.05 (12) 2.75 2810 371.86 (46) 399.91 (28) 0.93R150d90w0-10 3.70 (12) 1.57 2755 348.03 (47) 368.60 (28) 0.94R150d50w1-1 3.91 (12) 2.12 2780 367.74 (46) 389.41 (28) 0.94R150d90w1-1 3.89 (13) 0.92 3261 537.12 (65) 561.97 (28) 0.96R200d50w0-10 4.90 (12) 1.07 2790 383.75 (40) 402.53 (28) 0.95R200d90w0-10 4.47 (12) 0.64 2808 376.33 (41) 403.69 (28) 0.93R200d50w1-1 4.50 (12) 0.85 2780 376.75 (41) 382.54 (27) 0.98R200d90w1-1 4.34 (12) 0.31 2823 353.40 (40) 373.58 (26) 0.95


Results For Randomly Generated Spin Glasses

NOCUT HIPCUT COPYCUTProblem time (iter) ∆ cuts time (iter) time (iter) γS2g64 0.94 (11) 5.05 3335 339.84 (40) 392.90 (18) 0.86S2g144 3.25 (12) 4.20 3108 513.20 (47) 635.70 (28) 0.81S2g196 6.97 (13) 4.62 2752 512.23 (49) 607.56 (34) 0.84S2pm64 0.92 (12) 11.28 3053 274.26 (39) 497.25 (29) 0.55S2pm144 3.45 (13) 11.25 2548 375.58 (51) 480.21 (39) 0.78S2pm196 6.75 (14) 11.99 3245 1188.98 (73) 1144.35 (39) 1.04S3g64 0.92 (11) 8.26 2299 108.28 (30) 138.87 (17) 0.78S3g125 2.42 (12) 38.06 4133 939.03 (60) 1143.50 (28) 0.82S3g216 4.86 (13) 8.92 5322 2166.37 (62) 2284.27 (27) 0.95S3pm64 0.97 (12) 13.34 2872 210.65 (34) 393.85 (28) 0.53S3pm125 2.50 (13) 38.10 4149 1038.10 (63) 1008.47 (26) 1.03S3pm216 4.94 (13) 14.05 5066 1715.17 (59) 1977.86 (27) 0.87


Results For SRFLP BenchmarksNOCUT HIPCUT COPYCUT

Problem bound time (iter) bound ∆ cuts time (iter) time (iter) γBeHa82_4 77.9 0.5 (9) 78.0 0.1 60 0.1 (8) 0.1 (9) 0.87LoWo76_5 150.3 0.1 (10) 151.0 0.4 317 1.6 (12) 3.6 (11) 0.45HeKu91_5 1100.0 0.1 (9) 1100.0 0.0 332 1.9 (13) 3.4 (10) 0.57HeKu91_6 1989.7 0.1 (11) 1990.0 0.0 517 4.2 (15) 8.7 (10) 0.49HeKu91_7 4648.2 0.2 (11) 4730.0 1.7 801 3.3 (16) 5.1 (11) 0.65HeKu91_8 6145.7 0.3 (11) 6295.0 2.4 1356 15.4 (23) 20.6 (13) 0.75Si69_8 794.8 0.3 (12) 801.0 0.7 1231 10.2 (19) 16.5 (13) 0.62Si69_9 4670.6 0.4 (11) 4695.5 0.5 2068 51.0 (31) 74.0 (17) 0.69Si69_10 2771.9 0.6 (14) 2781.5 0.3 1702 27.0 (25) 31.1 (11) 0.87Si69_11 6829.9 0.9 (14) 6933.5 1.5 2207 59.6 (30) 70.4 (14) 0.85HeKu91_12 22483.2 1.2 (12) 23365.0 3.9 3175 179.5 (46) 244.8 (20) 0.73HeKu91_15 43706.5 5.1 (15) 44600.0 2.0 5503 957.8 (56) 1269.7 (24) 0.75Am06_15 6155.3 4.7 (14) 6305.0 2.4 5940 1583.5 (83) 1967.5 (30) 0.80Am08_17 9094.9 11.5 (15) 9254.0 1.7 6225 1634.9 (87) 1919.8 (26) 0.85Am08_18 10467.9 14.7 (14) 10650.5 1.7 5606 734.4 (85) 776.3 (27) 0.95HeKu91_20 116639.4 28.6 (15) 119710.0 2.6 8094 2478.6 (112) 3050.0 (38) 0.81AnVa08_25 4463.5 54.9 (15) 4618.0 3.4 8199 3708.5 (96) 4313.8 (43) 0.86

Old news: The initial SDP bound for SRFLP is already very tight.Good news: Our algorithm is very robust for large number ofiterations and when adding inequalities at iterates close tooptimality.Best news: Performance is very promising!


Warmstarting: Summary and Current Research

Accomplishments so far:

analytical and computational support of new warmstarting schemevalidation for data perturbations and new variables/constraintsiteration reduction by 30 – 50% compared to coldstart

Current research work:

Theoretical support of algorithm’s convergence and complexityEfficient implementation within a branch-and-bound framework.


The New Frontier: Polynomial Optimization


Polynomial OptimizationPolynomial optimization problems (POPs) consist of optimizing amultivariate polynomial objective subject to multivariate polynomialconstraints:

Polynomial Optimization Problem (POP)

z = sup f (x)s.t. gi(x) ≥ 0 i = 1, . . . ,m.

Numerous classes of problems can be modelled as POPs, including:Linear ProblemsMixed-Binary Problems

xi ∈ {0,1} ⇔ xi(1− xi) = 0

Quadratic Problems (Convex / Non-convex)Thus, solving POPs is in general NP-hard.


Relaxations of POPsPolynomial Optimization Problem (POP)

z = sup f (x)s.t. gi(x) ≥ 0 i = 1, . . . ,m.

Many tractable relaxations of POPs have been proposed usinglinear, second-order cone, and semidefinite techniques.In particular, sum-of-squares (SOS) decompositions which lead tosemidefinite programming (SDP) relaxations are

I theoretically very strong:F Sequences of relaxations converging to the optimal value in the limitF Exact (exponential-sized) relaxations for pure binary POPs

I but prohibitively expensive for practical computation.

Research objective (Ghaddar, Vera, A.):Obtain improved cone optimization relaxations (SDP and/or SOCP)that are computationally efficient.


General POP PerspectiveGiven a general POP problem:

(POP) z = sup f (x)s.t. gi(x) ≥ 0 i = 1, . . . ,m.

If λ is the optimal value of POP, then POP is equivalent to

inf λs.t. λ− f (x) ≥ 0 ∀x ∈ S := {x : gi(x) ≥ 0, i = 1, . . . ,m}

which we rewrite as

inf λs.t. λ− f (x) ∈ Pd (S)

wherePd (S) = {p(x) ∈ Rd [x ] : p(s) ≥ 0 for all s ∈ S}

is the cone of polynomials of degree at most d that are non-negativeover S.


Understanding Pd(S)

The set

Pd (S) = {p(x) ∈ Rd [x ] : p(x) ≥ 0 for all x ∈ S}

is in general a very complex object.

It is always a convex coneIn most cases the decision problem for Pd (S) is NP-hard:

Decision problem for Pd (S)

Given p(x), decide if p(x) ∈ Pd (S)(i.e. if p(x) ≥ 0 for all x ∈ S)

Idea: use algebraic geometry results to approximate (orrepresent) Pd (S) in tractable ways, i.e., using only linear,second-order, and semidefinite cones.


A General Recipe for Relaxations of POP

We relax λ− f (x) ∈ Pd (S) to

λ− f (x) ∈ K for a suitable K ⊆ Pd (S).

Then

inf λs.t. λ− f (x) ∈ K

provides an upper bound for the original problem.

• The choice of K is a key factor in obtaining good bounds on theproblem.

•We are restricted by the need for the optimization over K to betractable.


SOS Approach - Lasserre (2001), Parrilo (2000)For each r > 0, define the approximation Kr ⊆ Pd (S) as

Kr :=

(Ψr +

m∑i=1

gi(x)Ψr−deg(gi )

)∩ Rd [x ]

where Ψd denotes the cone of real polynomials of degree at most dthat are SOSs of polynomials, and Rd [x ] denotes the set ofpolynomials in the variables x of degree at most d .

The corresponding relaxation can be written as

(Lr ) zr = infλ,σi

λ

s.t. λ− f (x) = σ0(x) +∑m

i=1 σi(x)gi(x)σ0(x) is SOS of degree ≤ rσi(x) is SOS of degree ≤ r − deg(gi(x)), i = 1, . . . ,m.


Convergence of the SOS Approach

Under mild conditions zr → z:

LemmaSuppose that

KdG ⊆ K

d+1G ⊆ · · · ⊆ Kr

G ⊆ Pd (S),

where G is a compact semialgebraic set (not necessarily convex) andthere exists a real-valued polynomial u(x) with u(x) ∈

∑mi=0 gi(x)Ψ

such that {u(x) ≥ 0} is compact. Then

KrG ↑ Pd (S) as r →∞,

and thereforezr

G ↑ z as r →∞.


Solving the SOS Relaxation

For each r , the relaxation (Lr ) can be cast as an SDP problem, sinceσ(x) is a SOS of degree 2k if and only if

σ(x) =

1...xi...

xixj...∏|k |

x

T

M

1...xi...

xixj...∏|k |

x

with M � 0.

Note that Ψd = Ψd−1 for every odd degree d .


Example: Application to Max-cut

The formulation

max xT Qxs.t. x ∈ {−1,1}n

can be recast as

min λ

s.t. λ− xT Qx ∈ P2 ({−1,1}n)

The r = 2 SOS relaxation is precisely the dual SDP problem of thebasic SDP relaxation.


Size of the SOS Relaxation

Good news: Under mild conditions, zr → z.Bad news: For a problem with n variables and m inequality

constraints, the size of the relaxation is:One psd matrix of dimension

(n+rr

);

m psd matrices, each of dimension(n+r−deg(gi )

r−deg(gi )

)(n+rr

)linear constraints.

To overcome the size blow-up one may exploit the structure (sparsity,symmetry, convexity) to get smaller SDP programs.Many authors have contributed here: Gatermann, Helton, Kim, Kojima,Lasserre, Netzer, Nie, Parrilo, Pasechnik, Riener, Schweighofer,Sotirov, Theobald, etc.

Our objectiveAvoid the blow-up by keeping r constant (and small).


A Small Numerical Exampleinfx ,y (x − 1)2 + (y − 1)2

s.t. x2 − 4xy − 1 ≥ 0yx − 3 ≥ 0y2 − 4 ≥ 0

122 − (x − 2)2 − 4(y − 1)2 ≥ 0

L2 relaxation (Optimal value: 9.4083)supλ,σi (·)

λ

s.t. λ− (x − 1)2 − (y − 1)2 = σ0(x , y) +∑4

i=1 σi(x , y)gi(x , y)(6× 14 lin. system )

σ0(x , y) is SOS of degree 2 (3× 3 matrix)σi(x , y) is SOS of degree 0 (4 non-negative constants)

0 5 10 15

0

1

2

3

4

5

6

7

Figure: Structure of the linear system for L2


Example (ctd)L46 relaxation (Optimal value: 36.0654 51.7386)

supλ,σi (·) λ

s.t. λ− (x − 1)2 − (y − 1)2 = σ0(x , y) +∑4

i=1 σi(x , y)gi(x , y)(1528× 73245 lin. system )

σ0(x , y) is SOS of degree 46 (610× 610 matrix)σi(x , y) is SOS of degree 24 (36× 36 SDP matrices)

0 10 20 30 40 50 60 70

0

2

4

6

8

10

12

14

16

0 50 100 150 200

0

5

10

15

20

25

Figure: Structure of the linear system for L46


Lasserre’s Hierarchy for our Example

To solve

infx ,y (x − 1)2 + (y − 1)2

s.t. x2 − 4xy − 1 ≥ 0yx − 3 ≥ 0y2 − 4 ≥ 0122 − (x − 2)2 − 4(y − 1)2 ≥ 0

r 2 4 6# vars 14 73 245# constraints 6 15 28Bound 9.40 36.06 51.73

There is no need to run relaxations for r > 6, because an optimalsolution (and optimality certificate) can be extracted from solution to L6.


Improving the approximation without growing rRecall

(POP) z = sup f (x)s.t. x ∈ S := {x : gi(x) ≥ 0, i = 1, . . . ,m}

(Lr (G)) zr (G) = infλ,σi

λ

s.t. λ− f (x) = σ0(x) +∑m

i=1 σi(x)gi(x)σ0(x) is SOS of degree ≤ rσi(x) is SOS of degree ≤ r − deg(gi(x)),

i = 1, . . . ,m.

Observe that(Lr ) is defined in terms of the functions used to describe SCall this set G = {gi(x) : i = 1, . . . ,m}

GoalImprove our description of S by growing G in such a way that thebound obtained from Lr improves, for fixed r .


Back to our Example

We start withG = {x2 − 4xy − 1, yx − 3, y2 − 4,122 − (x − 2)2 − 4(y − 1)2}

For all (x , y) ∈ S = {x : gi(x) ≥ 0, i = 1, . . . ,m},

p1(x , y) = 0.079x2+0.072xy+0.325x−0.850y2−0.339y−0.213 ≥ 0

We say that p1(x , y) is a valid (polynomial) inequality for S.

Let G1 = G ∪ {p1(x , y)}Then

z2(G1) = 22.8393 > 9.4083 = z2(G)


Why Stop at p1?Add valid inequalities iteratively

Start with G0 = G.Given Gi , generate pi valid (inequality) for S. Let Gi+1 = Gi ∪ {pi}.

p1(x , y) = 0.079x2 + 0.072xy + 0.325x − 0.850y2 − 0.339y − 0.213

p2(x , y) = 0.053x2 + 0.082xy + 0.205x − 0.764y2 − 0.533y − 0.282

p3(x , y) = 0.069x2 + 0.002xy − 0.239x − 0.770y2 + 0.551y − 0.200

p4(x , y) = −0.019x2 + 0.338xy + 0.097x − 0.691y2 − 0.577y − 0.254

p5(x , y) = 0.070x2 + 0.071xy − 0.158x − 0.858y2 − 0.425y − 0.214

p6(x , y) = 0.052x2 + 0.047xy + 0.012x − 0.935y2 − 0.130y − 0.321

p7(x , y) = 0.046x2 + 0.006xy − 0.182x − 0.707y2 + 0.652y − 0.195

p8(x , y) = 0.023x2 + 0.093xy + 0.116x − 0.566y2 − 0.621y − 0.519

p9(x , y) = 0.049x2 + 0.002xy + 0.117x − 0.647y2 − 0.592y − 0.463

i 0 1

2 3 4

9.4083 22.8393

30.1062 32.2653 40.1754

i 5 6 7 8 943.1587 49.3414 51.5485 51.7135 51.7382


Generating Valid Inequalities for POPsRecall

z = inf λs.t. λ− f (x) ∈ Pd (S)

zr (G) = inf λs.t. λ− f (x) ∈ Kr (G)

LemmaLet G be a description for S, and let p(x) valid inequality for S. Then

zr (G ∪ {p(x)}) ≥ zr (G)

How to generate a valid improving inequality?Given G description of S, find p(x) valid for S such that

zr (G ∪ {p(x)})>zr (G)


Valid Inequality Generation for POPs

GoalGiven G describing S, find p(x) ∈ Pd (S) \ Kr (G)

Issues to address:1 Generate p(x) ∈ Pd (S).2 Ensure that p(x) /∈ Kr (G)

Issue 1: Generate p(x) ∈ Pd (S)

There is no tractable representation for Pd (S)

Sol: Generate p(x) ∈ Kr+2(G) ∩ Rr [x ] ⊂ Pd (S).


Valid Inequality Generation for POPs

GoalGiven G describing S, find p(x) ∈ Pd (S) \ Kr (G)

Issues to address:1 Generate p(x) ∈ Pd (S).2 Ensure that p(x) /∈ Kr (G)

Issue 2: Ensure p(x) /∈ Kr (G)

Let Y be dual solution of Lr (G)

Then Y ∈ Kr (G)∗

and therefore p ∈ Kr (G)⇒ 〈p,Y 〉 ≥ 0.

⇒ Look for p(x) such that 〈p,Y 〉 < 0.


Inequality Generating Subproblem

Given G and Ymin 〈p,Y 〉s.t. p(x) ∈ Kr+2(G)

‖p‖ = 1

The normalization is necessary, otherwise the problem is unboundedFor any c > 0, p(x) ≥ 0⇔ cp(x) ≥ 0.


Another Small ExampleOptimal value = 0

min x1 − x1x3 − x1x4 + x2x4 + x5 − x5x7 − x5x8 + x6x8

s.t. x3 + x4 ≤ 1x7 + x8 ≤ 10 ≤ xi ≤ 1 ∀i ∈ {1, · · · ,8}.

r 2 4 6 8

Obj. unb. -0.03550 -0.00192 -Time 1.02 2.81 726.50 > 5 hrs

Table: Lasserre’s Hierarchy

Iter. 0 1 2 3 4 5 10 50

Obj. unb. -0.109 -0.073 -0.069 -0.068 -0.066 -0.057 -0.014Time

Subproblem - 1.5 1.8 2.1 1.9 2.0 2.5Master problem 0.2 0.3 0.3 0.4 0.5 0.5 0.6Cumulative 0.2 2.0 4.2 6.7 9.2 11.8 26.1 200.1

Table: Inequality Generation


Another Small ExampleOptimal value = 0

min x1 − x1x3 − x1x4 + x2x4 + x5 − x5x7 − x5x8 + x6x8

s.t. x3 + x4 ≤ 1x7 + x8 ≤ 10 ≤ xi ≤ 1 ∀i ∈ {1, · · · ,8}.

-0.350

-0.250

-0.150

-0.050

0.050

0 5 10 15 20 25 30 35 40 45 50

Iteration

Lower Bound

Figure: Convergence of Inequality GenerationMiguel F. Anjos (Polytechnique Montréal) Course on SDP Coimbra, 9 May 2011 153 / 205

Special Case: Binary Quadratic POPs

Consider the general binary quadratic POP:

max f (x)s.t. gi(x) ≥ 0 ∀i ∈ I = {1, . . . ,m}

x ∈ {−1,1}n.

where f (x) and gi(x) are polynomials of degree at most 2.

We write the following equivalent formulation:

min λs.t. λ− f (x) ∈ P2(S ∩ {−1,1}n)

where S = {x : gi(x) ≥ 0}.


Valid Inequality Generation for Binary Quadratic POPsWe make use of the following theorem:

Theorem (Peña-Vera-Zuluaga (2006))Let S be a compact set. For any degree d,

p(x) ∈ Pd (x ∈ S : xj ∈ {−1,1})⇔

p(x) = (1 + xj)r+(x) + (1− xj)r−(x) + (1− x2j )c(x),

where r+(x), r−(x) ∈ Pd (S) and c(x) ∈ Rd−1[x ].

We can approximate P2(S ∩ {−1,1}n) by

Qj2(G) = {(1 + xj)r+(x) + (1− xj)r−(x) + (1− x2

j )c(x) :

r+(x), r−(x) ∈ K2(G), c(x) ∈ R1[x ]}

and we have that

K2(G) ⊂ Qj2(G) ⊂ P2(S ∩ {−1,1}n)


Valid Inequality Generation for Binary Quadratic POPs

GoalGiven G describing S, find p(x) ∈ P2(S ∩ {−1,1}n) \ K2(G)

Given G and Ymin 〈p,Y 〉s.t. p(x) ∈ Qj

2(G)‖p‖ = 1

Note that there is exactly one subproblem per binary variable j .Moreover,

the size of Qj2(G) is only twice size of K2(G)

while the size of K4(G) is ∼ n2 times size of K2(G)


Convergence Result

TheoremWhen the polynomial inequality generation scheme is applied to abinary quadratic optimization problem with linear constraints Ax = b,and the initial set is

G0 =

{n − ‖x‖2,

∑i

(ATi x − bi)

2,−∑

i

(ATi x − bi)

2

},

then if all the subproblems have an optimal value 0, then the algorithmhas converged to a global optimal solution.


Computational Results

Quadratic Knapsack Problem

max xT Px

s.t. wT x ≤ cx ∈ {−1,1}n

Lasserre r = 4 Lasserre r = 2 Poly. Ineq. Gen.n Optimal Obj. Time (s) Obj. Time (s) Iter. 0 Iter. 1 Iter. 5 Iter. 10 Time (s)

10 1653 1707.3 28.1 1857.7 0.8 1857.7 1821.9 1797.4 1784.8 5.820 8510 8639.7 17269.1 9060.3 2.9 9060.3 9015.3 8925.9 8850.3 35.430 18229 - - 19035.9 4.3 19035.9 18920.2 18791.7 18727.2 196.640 2679 - - 4735.9 6.8 4735.9 4590.7 4248.2 4126.7 1009.750 16192 - - 21777.9 19.2 21777.9 21390.3 20162.1 19407.1 7014.360 58451 - - 62324.4 126.6 62324.4 62019.1 60906.0 60585.5 17961.170 16982 - - 23884.9 231.4 23884.9 23484.0 22852.8 - 15582.280 - - - 80482.7 365.4 80482.7 79738.9 - - 11072.3

(5-hour time limit)



Max-2-Sat Problem

minm∑

j=1

∏i∈Pj

(1 + xi)

2×∏

k∈Nj

(1− xk )

2

s.t. x ∈ {−1,1}n

Lasserre r = 4 Lasserre r = 2 Poly. Ineq. Gen.n m Optimal Obj. Time (s) Obj. Time (s) Iter. 0 Iter. 1 Iter. 5 Iter. 10 Time (s)

10 45 4 4.00 15.6 3.54 0.4 3.54 3.80 4.00 6.015 65 3 3.00 1103.8 2.75 0.9 2.75 2.77 2.90 2.97 16.020 85 8 8.00 11567.2 6.93 1.7 6.93 7.04 7.29 7.42 28.530 130 6 - - 5.47 6.5 5.47 5.54 5.69 5.80 175.640 170 13 - - 12.00 14.7 12.00 12.16 12.41 12.63 793.550 215 15 - - 13.86 41.3 13.86 14.12 14.67 14.96 4025.860 252 - - - 11.37 106.0 11.37 11.56 11.94 12.19 14548.670 295 - - - 17.08 258.3 17.08 17.35 17.55 - 16722.380 340 - - - 16.77 579.1 16.77 16.88 - - 10062.8

(5-hour time limit)



Degree Three Binary POPs

max∑|α|≤3

cαxα

s.t. aT x ≤ bx ∈ {−1,1}n

Lasserre r = 6 Lasserre r = 4 Poly. Ineq. Gen.n Optimal Obj. Time (s) Obj. Time (s) Iter. 0 Iter. 1 Iter. 5 Iter. 10 Time (s)

5 58 58.00 9.6 59.37 2.1 67.16 58.45 58.00 5.210 139 139.00 4866.0 148.97 35.9 154.59 148.85 143.41 139.12 75.315 1371 - - 1524.71 1436.2 1582.04 1575.49 1519.88 1494.01 1319.920 1654 - - 1707.95 18106.6 1718.53 1716.00 1708.66 1705.15 15763.925 - - - - - 3967.12 3960.78 - - 14287.3

(5-hour time limit)


What About Using SOCs?

For binary quadratic POPs, it is possible to obtain purely second-ordercone (SOC) relaxations.For this part, we separate out the verious types of constraints asfollows:

max xT Qx + pT xs.t. aT

j x = bj ∀j ∈ {1, . . . , t}cT

j x ≤ dj ∀j ∈ {1, . . . ,u}xT Fjx + eT

j x = kj ∀j ∈ {1, . . . , v}xT Gjx + hT

j x ≤ lj ∀j ∈ {1, . . . ,w}xi ∈ {−1,1} ∀i ∈ {1, · · · ,n}


Building blocks for conic representations - IConsider the following example:

S = {(x , y) : x2 + y2 ≤ 1}p(x , y) = 8− 2x − x2 − 12xy + 2y2

Observe thatp(3,0) < 0, butp(x , y) ≥ 0 for all (x , y) ∈ S, since

p(x , y) = (1− 2x + 2y)2 + (1 + x − 2y)2 + 6(1− x2 − y2)

The well-known general result is called S-lemma:

Theorem (S-lemma)

Let B := {x : ‖x‖2 ≤ 1}. Then

p(x) ∈ P2(B) iff p(x) = Ψ2(x) + c(1− ‖x‖2)

where Ψ2(x) is a degree-2 SOS and c ≥ 0.


Applying the S-lemmaHow to find Ψ2(x) and c?Every degree-2 SOS has the form

Ψ2(x) =[

1 xT]M[

1x

]with M � 0.

Proving non-negativity of p(x , y) reduces to an SDP feasibility problem: 8 −1 0−1 −1 −6

0 −6 2

= M + c

1 0 00 −1 00 0 −1

, with M � 0, c ≥ 0.

CorollaryOptimizing a polynomial of degree ≤ 2 over the ball can be cast as anSDP problem.


Building blocks for conic representations - IIS = {(x , y) : x2 + y2 ≤ 1}p(x , y) = 5− 4x + 3y

Observe that p(x , y) ≥ 0 for all (x , y) ∈ F since

p(x , y) = 5 + (−4,3)

[xy

]≥ 5− ‖(−4,3)‖ = 0

LemmaLet B = {x : ‖x‖2 ≤ 1} then

p(x) ∈ P1(B) iff p(x) = aT[

1x

],aß{(uo,u) : ‖u‖ ≤ uo}.

CorollaryOptimizing a degree 1 polynomial over the ball can be cast as anSOCP problem.


What about P3(B)?Theorem (Nesterov (2003))The decision problem over P3(B) is NP-hard.

In summary:

Let B = {x : ‖x‖2 ≤ 1}. The decision problem p(x) ∈ Pd (B)

is easy for d = 1,2;is NP-hard for d ≥ 3.

ActuallyDecision problem for Pd (S) is NP-hard for most (interesting) choices ofS and d ≥ 1. These include:

S polytope and d ≥ 2S = H := {0,1}n and d ≥ 2S = H ∩ polytope and d ≥ 1 (binary programming)


Back to BQPP

We can rewrite BQPP as:

min λ

s.t. λ− (xT Qx + pT x) ∈ P2(H ∩ F )

whereH := {−1,1}n

F = {x : aTj x = bj , cT

j x ≥ dj , xT Fjx + eTj x = kj , xT Gjx + hT

j x ≥ lj},and


A Hierarchy of Relaxations by Zuluaga-Vera-Peña

One hierarchy of approximations to P2(H ∩ F )

min λ

s.t. λ− (xT Qx + pT x) ∈ Kr

is obtained using the cones

Kr :=(

Ψr+2 +∑

i(1− x2i )Rr [x ] +

∑j(bj − aT

j x)Rr+1[x ]

+∑

j(dj − cTj x)Ψr+1 +

∑j(kj − xT Fjx − eT

j x)Rr [x ]

+∑

j(lj − xT Gjx − hTj x)Ψr

)∩ R2[x ]

for integer r ≥ 0.


Convergence of Zuluaga-Vera-Peña Hierarchy

TheoremThe sequence of cones Kr satisfies

Kr ⊆ Kr+1 ⊆ · · · ⊆ P2(H∩F ) and int(P2(H∩F )) ⊆∞⋃

r=0

Kr ⊆ P2(H∩F ).

Hence λ∗BQPPKr↑ z∗BQPP.


However...

The size of the relaxations grows exponentially in r .

For this reason, we start with the smallest relaxation (r = 0):

K0 =Ψ2(x) +∑

i

(1− x2i )R0 +

∑j

(bj − aTj x)R1[x ] +

∑j

(dj − cTj x)R+

0 [x ]

+∑

j

(kj − xT Fjx − eTj x)R0 +

∑j

(lj − xT Gjx − hTj x)R+

0

anduse SOCs to improve its approximation of P2(H ∩ F )

while controlling the computational effort required to solve theresulting relaxations.


Useful Lemma

x ∈ {−1,1}n ⇒ ‖x‖2 = n.

This leads to the following specialized lemma that we will use:

LemmaIf f (x) is a polynomial of degree one and B′ := {x : ‖x‖2 = n}, then

f (x) ∈ P1(B′) if and only if f (x) = f T(√

nx

)with f ∈ SOCn+1.


First Improved Relaxation

Using the previous lemma, we obtain (BQPPSS):

min λ

s.t. λ− (xT Qx + pT x) =(1 xT

)S(

1x

)+∑

i

(1 + xi)αTi

(√n

x

)+∑

i

(1− xi)βTi

(√n

x

)+∑

i

γi(1− x2i )

+∑

j

δj(x)(bj − aTj x) +

∑j

(dj − cTj x)ηT

j

(√n

x

)+∑

j

θj(kj − xT Fjx − eTj x) +

∑j

ξj(lj − xT Gjx − hTj x),

S ∈ Sn+1+ , αi , βi , ηj ∈ Ln+1, δj ∈ R1[x ], γi , θj ∈ R, ξj ∈ R+


Second Improved RelaxationAdding products of linear constraints strengthens further: (BQPPSS+)

min λ s.t. λ− (xT Qx + pT x) =(1 xT

)S(

1x

)+∑

i

(1 + xi )αTi

(√n

x

)+∑

i

(1− xi )βTi

(√n

x

)+∑

i

γi (1− x2i )

+∑

j

δj (x)(bj − aTj x) +

∑j

(dj − cTj x)ηT

j

(√n

x

)+∑

j

θj (kj − xT Fjx − eTj x) +

∑j

ξj (lj − xT Gjx − hTj x)

+∑i,k

σik (dk − cTk x)(1 + xi ) +

∑i,k

µik (dk − cTk x)(1− xi )

+∑k≤l

νkl (dk − cTk x)(dl − cT

l x) +∑i≤j

τij (1− xi )(1− xj )

+∑i≤j

ωij (1 + xi )(1 + xj ) +∑i,j

φij (1− xi )(1 + xj )

S ∈ Sn+1+ , αi , βi , ηj ∈ Ln+1, γi , θj ∈ R, ξj , σik , µik , νkl , τij , ωij , φij ∈ R+


Pure SOC RelaxationAlternatively, we can relax (BQPPSS) by removing the SOS term.In the absence of the SOS term, the valid inequalities

−1 ≤ xixj ≤ 1

are no longer satisfied, and can be added to the relaxation: (BQPPSOC)min λ s.t. λ− (xT Qx + pT x) =∑

i

(1 + xi )αTi

(√n

x

)+∑

i

(1− xi )βTi

(√n

x

)+∑

i

γi (1− x2i )

+∑

j

δj (x)(bj − aTj x) +

∑j

(dj − cTj x)ηT

j

(√n

x

)+∑

j

θj (kj − xT Fjx − eTj x) +

∑j

ξj (lj − xT Gjx − hTj x)

+∑i<j

ν+ij (1 + xixj ) +

∑i<j

ν−ij (1 + xixj )

αi , βi , ηj ∈ Ln+1, δj ∈ R1[x ], γi , θj ∈ R, ξj , ν+ij , ν

−ij ∈ R+


Application I: General Quadratic Polynomial ProblemsLasserre introduced SDP relaxations for binary polynomial programsby approximating P2(H ∩ F ) using the cone

Γr :=

(Ψr+2 +

∑i

(1− x2i )Ψr +

∑i

(x2i − 1)Ψr +

∑i

(bi − aTi x)Ψr

+∑

i

(aTi x − bi )Ψr +

∑i

(di − cTi x)Ψr +

∑i

(ki − xT Fix − eTi x)Ψr

+∑

i

(xT Fix + eTi x − ki )Ψr +

∑i

(li − xT Gix − hTi x)Ψr

)∩ R2[x ],

for even r ≥ 0. Taking r = 0, we obtain a relaxation for BQPP:

(BQPPLas) min λ s.t. λ− q(x) ∈ Γ0

Theorem

λ∗BQPPLas≥ λ∗BQPPSS

≥ λ∗BQPPSS+≥ z∗BQPP


Application II: Quadratic Knapsack Problems

The QKP problem:max xT Qx + pT xs.t. wT x ≤ c

x ∈ {−1,1}n


Previous SDP Relaxation for QKP

min λ

s.t. λ− (xT Qx + pT x) =(1 xT

)S(

1x

)+(c − wT x)

(∑i d−i (1− xi) +

∑i d+

i (1 + xi))

+∑

i ci(1− x2i )

where d+i ,d

−i nonnegative, and ci free variables.

Equivalent to dual of Helmberg-Rendl-Weismantel (2000), thestrongest previously known SDP relaxation for QKP.

Theorem

λ∗QKPHRW4-D= λ∗QKPHRW4

≥ λ∗QKPSS+≥ z∗QKP


Computational Setup

All relaxations were implemented in Matlab 7.9.0SeDuMi 1.3 was used to solve the conic programming problemsAll computations were carried out on a 1200 MHz Sun Sparcmachine


Computational Results for General BQPP Problems

For each size n100 randomly generated instancesdensity (number of nonzero coefficients in the objective function)between 20% to 100%.the number of linear and quadratic constraints (m) varies from 1 ton2 .

We implemented Lasserre’s relaxation using our code and the conedefinition above.The gaps (in %) are calculated as follows:

gap = 100× ubrelaxation − ubbest

ubbest,

where the best upper bound is the one obtained by the (BQPPSS+)relaxation.


Computational Results for General BQPP

n m (BQPPSS+ ) (BQPPSS) (BQPPLas) (BQPPSOC)Time Gap Time Gap Time Gap Time

40 1 78.18 1.66 56.30 34.89 34.59 29.97 33.075 122.37 2.33 67.54 36.38 44.66 28.18 37.47

20 306.31 5.71 88.80 50.60 48.27 38.60 44.1150 1 268.93 0.68 179.74 5.12 112.49 15.16 48.72

5 397.34 3.44 193.86 17.71 122.32 39.05 117.7525 1245.49 12.27 258.77 94.54 142.29 43.08 190.33

60 1 970.00 3.15 626.87 19.61 375.24 65.83 94.165 1169.37 3.69 663.09 40.75 397.93 39.75 183.34

30 5637.18 9.42 850.83 58.95 473.50 52.10 650.4670 1 2793.31 0.93 2515.31 29.44 1214.23 31.51 165.63

5 3848.18 2.50 2532.18 53.64 1245.09 26.98 549.2235 15420.53 14.85 2429.09 47.51 1446.99 46.99 1818.69


Summary of Computational Results for General BQPP

(BQPPSOC) is the most computationally efficient relaxation in mostcases.(BQPPSOC) frequently has better gaps than (BQPPPLas)With more linear constraints, (BQPPLas) is slightly more efficientbut its bounds are weaker than those provided by (BQPPSOC).


Computational Results for QKP

We generated test instances using the approach of Pisinger (2007).

Qij and pj values are taken from discrete uniform randomdistributions in [1,100] and [1,50] respectivelycapacity c is uniformly distributed in [50,

∑nj=1 pj ]

density d of Q varies from 10% to 90%size varies from 20 to 100 items.


Computational Results for QKP

n d (QKPSS+ ) (QKPSS) (QKPHRW4) (QKPSOC)Time Gap Time Gap Time Gap Time

60 10 705.30 0.21 673.37 0.59 394.70 3.30 138.2560 30 552.28 0.35 644.20 0.48 312.36 0.37 79.4660 50 726.82 0.13 682.12 0.30 355.53 3.92 78.1860 70 663.95 0.01 797.11 0.05 343.42 0.01 58.5460 90 357.10 0.00 478.40 0.01 391.59 0.00 38.2180 10 4407.65 0.09 5008.22 0.43 2584.26 3.40 564.7580 30 3327.65 0.00 4388.67 3.87 2143.41 3.51 264.9480 50 6694.43 0.06 4650.24 0.41 2494.70 9.14 365.0180 70 5422.86 0.01 5419.69 0.06 2979.25 1.21 270.3980 90 4178.58 0.01 5052.10 0.03 2958.44 0.02 185.92100 10 23975.63 0.04 18831.11 0.15 9883.59 0.33 1867.44100 30 31499.01 0.10 17673.77 0.38 9832.83 13.38 973.49100 50 27958.75 0.26 18867.58 1.47 8553.04 19.05 1308.24100 70 20428.73 0.01 24684.33 0.08 9482.50 0.22 431.02100 90 12182.11 0.00 14881.31 0.05 10280.16 0.00 484.25


Observations

(QKPSOC) is the most computationally efficient for all instancesThe bounds from (QKPSS), and hence those from (QKPSS+) aswell, are strictly tighter than the ones from (QKPHRW4)(QKPSOC) sometimes provides better bounds than (QKPHRW4)(QKPSOC) performs particularly well for instances with high density.


Distance Geometry


Fundamental Problem of Distance Geometry

Given a set of(n

2

)distances, determine whether there exists a set of n

points in k -dimensional Euclidean space that generates the givenpairwise distances.Such a set of points, if it exists, is called a realization (of the distances)in <k .

ExampleSuppose n = 3 and d12 = 3, d13 = 4, d23 = 5. Then:

for k = 3: (0,0,0), (0,3,0), (0,0,4) is a realizationfor k = 2, similarfor k = 1: impossible

The early work in this area is presented in the seminal book of Crippenand Havel (1988).


Practical Application: Biology and Chemistry

Distances between atoms in a protein can be determined theoretically(potential energy minimization) or experimentally (NMR spectroscopy).

The problem is to determine the structure of a protein given a (partialor complete) set of pairwise distances between the atoms in theprotein.

Understanding the structure of a protein is key because the structureof a protein specifies the function of the protein, and hence itschemical and biological properties.


Practical Application: Anatomy and Anthropology

Arises in the use landmark data to analyze morphological differencesin the faces and heads of humans.

There is a set of standardised soft-tissue facial landmarks that includethe pronasale (the nasal apex, or “tip of the nose”), andthe soft-tissue pogonion (the most prominent point on the chin).

The distances between them makes it possible to quantify phenomenasuch as the changes in facial geometry due to growth or the normallevels of facial asymmetry in humans.


Facial Landmarks (Ferrario et al. (1993))


Facial Landmarks List (Ferrario et al. (1993))


Landmark Distances (Ferrario et al. (1993))

Of course, what is really of interest is the relative position of each ofthese landmarks on each subject, so we need a representation that isinvariant under translation, rotation, and reflection.

An Euclidean distance matrix is ideal to represent this data.


Euclidean distance matrices

A distance matrix is a real symmetric n × n matrix with zero diagonaland (i , j) element equal to d2

ij .

For our example with d12 = 3, d13 = 4, d23 = 5, the distance matrix is

D =

0 9 169 0 25

16 25 0

.

When a set of n points can be found in some Euclidean space suchthat their

(n2

)pairwise distances generate the values

√Dij , the matrix D

is said to be an Euclidean distance matrix (EDM).


First PSD Characterization

There are many interesting connections between the psd matrices andthe EDMs.

First, if D is an n × n distance matrix,define P to be the (n − 1)× (n − 1) matrix with entries

pij :=12

(din + djn − dij) for i , j = 1, . . . ,n − 1.

TheoremD is an EDM if and only if P is psd.

Note that if P is psd, then the corresponding D is unique.


Second PSD Characterization

TheoremD is an EDM if and only if −1

2JDJ is psd, where J = In×n − esT for anys such that eT s = 1.

If D is an EDM, then the rank of −12JDJ gives the minimal realization

dimension of D.


Obtaining a Realization

To recover a set of points that realize D in <r , construct a matrix Π asfollows:

Π =(√λ1v1

√λ2v2 . . .

√λr vr

)n×r

where r = rank (−12JDJ) and λi > 0 are the non-zero eigenvalues with

corresponding eigenvectors vi , i = 1, . . . , r .

Then the rows of Π provide a realization of D in <r .

The choice of s in defining J determines the center of mass of theresulting set of points.


Orienting a Realization

To determine the orientation of the realization, use a rotation matrix Q.Indeed, since QQT = I for any rotation,

ΠQ

gives the same realization of D, albeit rotated according to Q.

This leads to the following question:Given two matrices Π1 and Π2 of the same dimensions, do theyrepresent the same set of points up to rotation?

If not, how close are they to representing the same set of points, i.e.,can the difference be quantified?


Procrustes ProblemOne way to formulate this question is:

min ‖Π1Q− Π2‖Fs.t. Q is a rotation matrix.

Because

‖Π1Q− Π2‖2F = trace (ΠT1 Π1) + trace (ΠT

2 Π2)︸︷︷︸constant

−2 trace (ΠT2 Π1Q),

it is equivalent to maximizing trace (ΠT2 Π1Q).

This maximum occurs precisely when ΠT2 Π1Q is symmetric and psd.

Furthermore, using the singular value decomposition of ΠT2 Π1:

ΠT2 Π1 = USVT

the optimal solution can be expressed as: Q∗ = VUT .Miguel F. Anjos (Polytechnique Montréal) Course on SDP Coimbra, 9 May 2011 196 / 205

Nearest Euclidean Distance Matrix

Suppose that D is a distance matrix obtained empirically, so that itsentries have errors.

If −12JDJ is not psd, then we know that D is not an EDM.

However, we know that the values in D are approximatemeasurements of the entries of some unknown EDM.

Therefore, a natural question is to ask for the nearest EDM to D.


Nearest Euclidean Distance Matrix (ctd)

For this purpose, define

K(M) := diag (M)eT + e(diag (M))T − 2M.

Then we can write the problem of finding the nearest EDM to D as:

min ‖D−K(X)‖Fs.t. X � 0,

and the nearest EDM to D is D∗ = K(X∗).

The good news is that this problem is a nice cone optimizationproblem...


The Bad News...

This provides no guarantees about the realization dimension of D∗.

Requiring K(X) to be realizable in dimension bounded above by k isequivalent to adding the constraint rank (X) ≤ k .

The resulting optimization problem is an NP-hard problem.


SDP in Distance Geometry: Huge Area of Research!

Problem classes:sensor localizationgraph realizabilitygraph rigidity

Important issues to address:incomplete datanoisy datainconsistent data

See survey article:Euclidean Distance Matrices, Semidefinite Programming, and SensorNetwork Localization by Alfakih, A., Piccialli and Wolkowicz (2009).


Conclusion


Main challenges in this area

1. How to solve (very) large SDP relaxations?

Selected recent developments:DSDP (Ye et al.(since 2000))Row-by-row methods (Wen, Goldfarb, Ma and Scheinberg (2009))Alternating direction augmented Lagrangian methods (Wen,Goldfarb and Yin (2009))SDPNAL: Semi-smooth Newton-CG augmented Lagrangianmethod (Zhao, Sun and Toh (2010))New Robust Algorithm for SDP (Doan, Kruk and Wolkowicz(2011))


Main challenges in this area2. How to exploit structure in general conic relaxations?

Selected recent developments:Toroidal and related global structures in satisfiability problems(A. (2006, 2007); A. and Vieira (2011))Exploiting chordality structure in solving the SDP problems(Kobayashi, Kim, Kojima (2006), among others)Block-diagonal SDP hierarchies for 0/1 optimization(Gvozdenovic, Laurent and Vallentin (2007))Group structure and symmetry in certain problems:

I Code upper bounds using the Terwilliger algebra (Schrijver (2005))I Quadratic assignment problems with group symmetry (de Klerk and

Sotirov (2007))I Improved representations via matrix *-algebras (de Klerk et al.

(since 2009)


“Read all about it! Read all about it!”

Handbook on Semidefinite, Cone and Polynomial Optimization:Theory, Algorithms, Software and Applications edited by M.F.Anjos and J.B. Lasserre, to be published by Springer in 2011.


In conclusion...

This is an exciting and unbelievably active research area.There are still numerous open questions, both theoretical andalgorithmic.Furthermore, this area is ripe for real-world applications andpoised to leave its mark in the future.

There is still many discoveries to make - join in!

For papers, references, questions, you are welcome to contact me:

[email protected]


(semi)deﬁnitely the future!apdio.pt/documents/10180/12431/slides_curso_sdo.pdf ·...

Documents