convex optimization lecture 13 - university of...

Convex OptimizationLecture 13

Today: Interior-Point (continued)

• Central Path method for SDP

• Feasibility and Phase I Methods

• From Central Path to Primal/Dual

Central'Path'Log'Barrier'Method

Access&to:• 2nd order&oracle&for&!",&!_• Explicit&access&to&*, +• Strictly&feasible&point&# "

• Assumptions:• !" convex&and&selfGconcordant• !_ convex&quadratic&(or&linear)• # " strictly&convex&with&!_ # " < ö

• Overall&#Newton&Iterations:&V = (log H [⁄ + log log H õ⁄ )

• Overall&runtime:&≈ V = N + ú à + =:&':evals log H [⁄

Init: Feasible&# " and&some&5(")Do: Solve&5 > Gbarrier&problem&using&Newton&starting&at&# >

# >GH ← #∗(5 > )Stop&if&fÑ ≤ ϵ5 >GH ← ï ⋅ 5 > (for&some&parameter&ï > 1)

ArkadiNemirovski

YuriNesterov

Johnvon&Neumann

NarendraKarmarkar

Optimizing'with'Matrix'Inequalities

!_ :ℝ9 → ¢>è

• Central&path&given&by&solutions&to:

min/∈ℝ2

: !"(#)3. 5. !_ # ≼ 0

*# = +

min/∈ℝ2

: !" # − ∑ logdet(−!_ # )_

3. 5. *# = +

ã #, D, B =!" # + ∑ ⟨D_, !_ # ⟩_ + ⟨B,*# − +⟩

ãÑ #, B =!" # − H

Ñ∑ logdet −!_ #_ + ⟨B, *# − +⟩

0 = &/ãÑ #Ñ∗, BÑ∗ =&!" #Ñ∗ + ∑ éH

Ñ!_ #Ñ∗ éH⨀&!_(#Ñ∗)_ + *?BÑ∗

&/ã #Ñ∗, DÑ∗, BÑ∗

= &!" #Ñ∗ + ∑ DÑ∗⨀&!_(#Ñ∗)_ + *?BÑ∗ = 0

S DÑ∗, BÑ∗= inf

/ã #, DÑ∗, BÑ∗ = ã #Ñ∗, DÑ∗, BÑ∗

= !" #Ñ∗ − ∑ HÑ⟨!_ #Ñ∗ éH, !_(#Ñ∗)⟩ + BÑ∗,*#Ñ∗ − +

= !" #Ñ∗ −∑ >èeèß®Ñ

min/∈ℝ2

: !"(#)3. 5. !_ # ≼ 0,&*# = +

min/∈ℝ2

: !" # − HÑ∑ log det −!_ #f_`H

3. 5. *# = +

Optimum&#Ñ∗,&dual&opt&BÑ∗ #Ñ∗ is&strictly&feasible

Define&DÑ∗ =éHÑ !_ #Ñ

∗ éH ≻ 0

How&suboptimal&is&#Ñ∗ ?

DÑ∗, BÑ∗ dual&(strictly)&feasible&with

!" #Ñ∗ − S DÑ∗, BÑ∗ =∑ U_f_`H5

Optimizing'with'Matrix'Inequalities

• An&optimum&#∗(5) for&the&5Gbarrier&problem&is&F = ∑ >èeèß®Ñ

suboptimal&for&constrained&problem• Central&Path&method:

min/∈ℝ2

: !"(#)3. 5. !_ # ≼ 0 ∈ ¢>è

*# = +

min/∈ℝ2

: !" # − HÑ∑ logdet(−!_ # )_

3. 5. *# = +

Init: Feasible&# " and&some&5 (")Do: Solve&5 > Gbarrier&problem&using&Newton&starting&at&# >

# >GH ← #∗(5 > )Stop&if&∑ >èe

èß®Ñ

≤ ϵ5 >GH ← ï ⋅ 5 > (for&some&parameter&ï > 1)

Central'Path'Method'for'SDP

Access&to:• 2nd order&oracle&for&!",&!_• Explicit&access&to&*, +• Strictly&feasible&point&# "

• Assumptions:• !":ℝ9 → ℝ convex&and&selfGconcordant• !_:ℝ9 → ¢>è convex&quadratic&(or&linear)• # " strictly&feasible&with&!_ # " ≺ öÇ:

• Overall&#Newton&Iterations:&V =U (log H [⁄ + log log H õ⁄ )

• Overall&runtime:&≈ V =U N + ú à + =:&':evals log H [⁄

Init: Feasible&# " and&some&5(")Do: Solve&5 > Gbarrier&problem&using&Newton&starting&at&# >

# >GH ← #∗(5 > )Stop&if&

∑ >èeèß®Ñ ≤ ϵ

5 >GH ← ï ⋅ 5 > (for&some&parameter&ï > 1)

ArkadiNemirovski

YuriNesterov

Feasibility and Phase I Methods

Recall that in the Log Barrier Central Path method we need to start with a (strictly)feasible x(0).Two phases:

• Phase I : Solve feasibility problem

• Phase II : Use solution as starting point for barrier method

We can convert feasibility to an optimization problem:

(P )Find x ∈ Rn

s.t. fi(x) ≤ 0Ax = b

⇒min

x∈Rn,s∈Rs

s.t. fi(x) ≤ sAx = b

(P̄ )

This optimization problem is always feasible:we can start from a solution to Ax(0) = b and set s = maxi fi(x

(0)).

Then we can apply the log barrier method to solve the optimization problem.

minx∈Rn,s∈R

s

s.t. fi(x) ≤ s

Ax = b

How well do we need to optimize?

• If we find a P̄ -feasible (x, s) with s < 0 ⇒ x is strictly P -feasible

• If we get an ε-suboptimal solution to P̄ with s > ε ⇒ P is infeasible

• Otherwise, there could be a solution that is feasible but not strictly so

Can convert feasibility to optimization with matrix constraints too:

Find x ∈ Rn

s.t. fi(x) � 0Ax = b

⇒min

x∈Rn,s∈Rs

s.t. fi(x) � sIAx = b

Finally, note that we can also reduce optimization to feasibility:

min f0(x)s.t. fi(x) ≤ 0

Ax = b⇒

Find xs.t. fi(x) ≤ 0

f0(x) ≤ sAx = b

(P s)

then search over s.

From Central Path to Primal/Dual

Let us review our approach.We would like to solve the KKT of (P):

(KKT)

Ax = bfi(x) ≤ 0λ ≥ 0∇f0(x) +

∑i λifi(x) + A>ν = 0

λifi(x) = 0

At each iteration we consider problem (Pt), i.e., solving:

Ax = b

∇f0(x) +∑

i

−1

tfi(x)∇fi(x) + A>ν = 0

And we do this by Newton: linearize w.r.t. x (and ν) around x(k).

This can be viewed as solving modified KKT:

(KKTt)

Ax = bfi(x) ≤ 0λ ≥ 0∇f0(x) +

∑i λifi(x) + A>ν = 0

λifi(x) = −1/t

Solve by:(i) Eliminate λi = −1

tfi(x), and get a problem in (x, ν)

(ii) Linearize w.r.t. (x, ν) around x(k)

Instead, in P/D we maintain both x(k) and λ(k), and linearize (KKTt) w.r.t. bothx and λ around x(k) and λ(k), without first eliminating λ.

Primal-dual method

Define the residuals:

rpri(x) = Ax− b ∈ Rp

rdual(x, λ, ν) = ∇f0(x) +∑

i

λifi(x) + A>ν ∈ Rn

rcent(t)(x, λ) =

λ1f1(x) + 1/t

...λmfm(x) + 1/t

∈ Rm

Jointly:

r(t)(x, λ, ν) = (rpri, rdual, rcent(t)) ∈ Rp+n+m

If x, λ, ν satisfy r(t)(x, λ, ν) = 0 (and fi(x) < 0, λ > 0), then x = x∗(t), λ = λ∗(t),and ν = ν∗(t).

Therefore, at each iteration we approximately solve:

r(t)(x + ∆x, λ + ∆λ, ν + ∆ν) = 0

s.t. fi(x + ∆x) ≤ 0

λ + ∆λ ≥ 0

This is done by linearizing w.r.t. ∆x,∆λ:

r(t)(x, λ, ν + ∆ν) +∇xr(t)(x, λ, ν)>∆x +∇λr(t)(x, λ, ν)>∆λ

Boils down to:

610 11 Interior-point methods

duality gap m/t. The first block component of rt,

rdual = ∇f0(x) + Df(x)Tλ+ AT ν,

is called the dual residual, and the last block component, rpri = Ax − b, is calledthe primal residual. The middle block,

rcent = −diag(λ)f(x) − (1/t)1,

is the centrality residual, i.e., the residual for the modified complementarity condi-tion.

Now consider the Newton step for solving the nonlinear equations rt(x,λ, ν) =0, for fixed t (without first eliminating λ, as in §11.3.4), at a point (x,λ, ν) thatsatisifes f(x) ≺ 0, λ ≻ 0. We will denote the current point and Newton step as

y = (x,λ, ν), ∆y = (∆x, ∆λ, ∆ν),

respectively. The Newton step is characterized by the linear equations

rt(y + ∆y) ≈ rt(y) + Drt(y)∆y = 0,

i.e., ∆y = −Drt(y)−1rt(y). In terms of x, λ, and ν, we have⎡⎣

∇2f0(x) +∑m

i=1 λi∇2fi(x) Df(x)T AT

−diag(λ)Df(x) −diag(f(x)) 0A 0 0

⎤⎦⎡⎣

∆x∆λ∆ν

⎤⎦ = −

⎡⎣

rdual

rcent

rpri

⎤⎦ .

(11.54)The primal-dual search direction ∆ypd = (∆xpd, ∆λpd, ∆νpd) is defined as thesolution of (11.54).

The primal and dual search directions are coupled, both through the coefficientmatrix and the residuals. For example, the primal search direction ∆xpd dependson the current value of the dual variables λ and ν, as well as x. We note also thatif x satisfies Ax = b, i.e., the primal feasibility residual rpri is zero, then we haveA∆xpd = 0, so ∆xpd defines a (primal) feasible direction: for any s, x + s∆xpd

will satisfy A(x + s∆xpd) = b.

Comparison with barrier method search directions

The primal-dual search directions are closely related to the search directions usedin the barrier method, but not quite the same. We start with the linear equa-tions (11.54) that define the primal-dual search directions. We eliminate the vari-able ∆λpd, using

∆λpd = −diag(f(x))−1 diag(λ)Df(x)∆xpd + diag(f(x))−1rcent,

which comes from the second block of equations. Substituting this into the firstblock of equations gives

[Hpd AT

A 0

] [∆xpd

∆νpd

]

= −[

rdual + Df(x)T diag(f(x))−1rcent

rpri

]

= −[ ∇f0(x) + (1/t)

∑mi=1

1−fi(x)∇fi(x) + AT ν

rpri

], (11.55)

while always maintaining fi(x) < 0 and λi > 0

It follows that:

rpri(x) = 0 ⇒ x is primal feasible

rdual(x, λ, ν) = 0 ⇒ ∇xL(x, λ, ν) = 0, so x minimizes L, and

g(λ, ν) = f0(x) +∑

i

λifi(x) + ν>(Ax− b) > −∞,

so (λ, ν) are dual feasible

If in addition we have rcent = 0, then:

g(λ, ν) = f0(x) +∑

i

λi−1

λit+ 0 = f0(x)− m

t

So the gap between (P) and (D): f0(x)− g(λ, ν) ≤ mt .

⇒ suboptimality ≤ mt

Even if rcent 6= 0, as long as rpri = 0 and rdual = 0, then

g(λ, ν) = f0(x) +∑

i

λifi(x)

⇒ f0(x)− g(λ, ν) = −∑

i

λifi(x)

︸︷︷︸η̂(x,λ)

where η̂(x, λ) > 0 is the surrogate gap, and we are η̂ suboptimal.

Primal-dual interior-point algorithm

• Start at initial x(0), λ(0), ν(0) s.t. fi(x(0)) < 0 and λ

(0)i > 0

• Iterate:

– Determine t(k): set t(k) = µ mη̂(x(k),λ(k))

– Compute search direction:Linearize (KKTk) for x = x(k) + ∆x, λ = λ(k) + ∆λ, ν = ν(k) + ∆νSolve to obtain ∆x(k),∆λ(k),∆ν(k)

– Set step size s(k) by line search on ‖r(t)(x, λ, ν)‖,ensuring fi(x) < 0 and λi > 0

– Update: (x(k+1), λ(k+1), ν(k+1)) += s(k)(∆x(k),∆λ(k),∆ν(k))

– Stop if: ‖rpri‖ < εfeas and ‖rdual‖ < εfeas (approx. feasible),and η̂(x(k), λ(k)) < ε

Important: x(k) need not be feasible – OK if Ax(k) 6= bAlso, (λ(k), ν(k)) need not be feasible – g(λ(k), ν(k)) can be ∞

Advantages: single loop, no phase I

Why no need for phase I?

We don’t need to ensure Ax = b, but we do need fi(x) < 0 and λ > 0.

We can rewrite (P ) as:

minx∈Rn,s∈R

f0(x)

s.t. fi(x) ≤ s

Ax = b

s = 0

Now we can start with any x(0) s.t. fi(x(0)) <∞, then set s = maxi fi(x

(0)) + 1.

If finding such x(0) is hard, we can rewrite as:

minx∈Rns∈R

x1,...,xm∈Rn

f0(x)

s.t. fi(xi) ≤ s

Ax = b

s = 0

x = xi ∀iThen can find point in domain for each fi separately.But many more variables (mn)

convex optimization lecture 13 - university of...

Documents