l14 interior point - university of maryland, college...

INTERIOR POINT METHODS

WHAT’S AN INTERIOR POINT METHOD?

minimize f(x)

r2f(x)�x = �rf(x)

x x+ ↵�x

solve: invert Hessianline search

minimize f(x)

subject to Ax = b

L(x,�) = f(x) + h�, Ax� biLagrangian✓

r2f(x) AT

A 0

◆✓�x��

◆=

✓�rf(x)

b

◆

What about inequality constraints?

IMPLICIT FORM CONSTRAINTS

X�(z) =

(0, if z 0

1, otherwise

minimize f(x) + X�(g(x))

subject to Ax = b

minimize f(x)

subject to Ax = b

g(x) 0

INTERIOR POINT METHODS

X�(z) =

(0, if z 0

1, otherwise

IP methods solve this

When objective is smooth,we can use constrained

Newton, BFGS, CG, etc…


subject to Ax = b

LOG BARRIER FUNCTION

X�(z) =

(0, if z 0

1, otherwise

X�(z) ⇡ �µ log(�z)

−5 −4 −3 −2 −1 0 1−2

−1

0

1

2

3

4

µ = 1µ =

1

2

µ =1

10


subject to Ax = b

BARRIER METHOD

X�(z) ⇡ �µ log(�z)

−5 −4 −3 −2 −1 0 1−2

−1

0

1

2

3

4

µ = 1µ =

1

2

µ =1

10

Pick some “small”

Solve the problem with Newton’s method

µ

Problem: Newton direction is bad! Excessive backtracking!

why?

minimize f(x)� µ log(�g(x))

subject to Ax = b

GEOMETRIC INTERPRETATIONMinimize this?

How far should you go?

Newton’smethod only

sees this

GEOMETRIC INTERPRETATION


sees this

smoothed curve(big )µ

Smoothing globalizes information



sees this

smoothed curve(big )µ

minimizer

xµ




sees this

smoothed curve(small )µ

minimizer

xµ


BARRIER IP METHOD

minimize f0(x)� µ log(�g(x))

subject to Ax = b

• µ µ/10

While “not converged:”

xµ

• Choose some large µ

• Solve for xµ

“central path”“centering step”

idea: collapse the quadratic region around the solution

CENTRAL PATHminimize f(x)�

mX

i=1

µ log(�gi(x))

subject to Ax = b

optimality condition

LagrangianL(x,�, ⌫) = f(x) + h⌫, gi+ h�, Ax� bi

rf(x)�mX

i=1

rgi(x)µ

gi(x)+AT ⌫ = 0

CENTRAL PATHoptimality condition

Lagrangian

optimality condition

Lagrange multipliers

L(x,�, ⌫) = f(x) + h⌫, gi+ h�, Ax� bi

@xL = rf(x) +X

rgi(x)⌫i +AT� = 0

rf(x)�mX

i=1

rgi(x)µ

gi(x)+AT ⌫ = 0

⌫i = � µ

gi(x)

LAGRANGIAN INTERPRETATION

A central point minimizes the Lagrangian with Lagrange multipliers

Given xµ:primal objective: f(xµ)

⌫i = � µ

gi(x)

d(�, ⌫) = f(xµ) +mX

i=1

⌫igi(xµ) + h�, Axµ � bi

= f(xµ)�mX

i=1

µ

gi(xµ)gi(xµ) + h�, 0i

= f(xµ)�mµ

duality gap = f(xµ)� d(�, ⌫) = mµ

DUALITY GAP

Points on the central path have smallduality gap!

This proves that as gets small, we approacha solution.µ

If we shrink by constant factor of 10, thenafter iterations! gap< ✏ log10(✏)

duality gap = f(xµ)� d(�, ⌫) = mµ

COMPLEXITY

The total number of Newton steps needed to get a duality gap less than is bounded by

lpm log2

⇣mµ0

✏

⌘m✓ 1

2�+ log2 log2 ✏

◆✏

where is a constant that only depends on the backtracking parameters for the Armijo line search.

�

see Boyd and Vandenberghe, sec 11.5

Theorem

Interior point methods achieve polynomial complexity for most convex programs, including LP’s!

centering

path-following

PRIMAL-DUAL IP’S

minimize cTx

subject to Ax = b

x � 0

standard form LP

L(x,�, ⌫) = cTx+ h⌫,�xi+ h�, b�AxiLagrangian

AT�+ ⌫ = c

Ax = b

⌫ixi = 0, 8ix, ⌫ � 0

primal optimalitydual optimalitycomp slackness

primal/dual feasibility

KKT system

PRIMAL-DUAL IP’S

AT�+ ⌫ = c

Ax = b

⌫ixi = 0, 8ix, ⌫ � 0



KKT system

Primal-Dual IP’s use Newton’s method for non-linear equations to solve the KKT system

problem:this equation

is non-smooth

x

⌫

remind you of anything?

SMOOTHED KKT



KKT system

x

⌫

⌫ixi = 0 ⌫ixi = µ

Let µ ! 0

AT�+ ⌫ = c

Ax = b

⌫ixi = µ, 8ix, ⌫ � 0



KKT systemAT�+ ⌫ = c

Ax = b

⌫ixi = µ, 8ix, ⌫ � 0

NEWTON STEP

F (x,�, ⌫) = 0solve

F (x,�, ⌫) =

0

@AT�+ ⌫ � c

Ax� bXV � µ

1

A

X = diag(x), V = diag(⌫)

X = diag(x)

V = diag(⌫)

modifiedequation

NEWTON STEP

F (x,�, ⌫) = 0solve

J(x,�, ⌫)

0

@�x��x

1

A = �F (x,�, ⌫)

F (x,�, ⌫) =

0

@AT�+ ⌫ � c

Ax� bXV � µ

1

A X = diag(x)

V = diag(⌫)

Newton system0

@0 AT IA 0 0V 0 X

1

A

0

@�x��⌫

1

A =

0

@�rc�rb

�XV 1+ µ1

1

A

<latexit sha1_base64="Z65qf/l8a0tV6UDuDL6P6KDUYR0=">AAAItXicfVVdb9s2FJW7rvOyj6bb416IuR2yNg0st0D3MqBZimF9CNYNjWsgdAyKpizOFKmSVGOP5V/Z/9qv2S5lu5UlOwIkXvKcey55L0UmheDG9vv/dm59cvvTO591Pz/44suvvr57eO+boVGlpuyCKqH0KCGGCS7ZheVWsFGhGckTwd4k87OAv3nHtOFKvrbLgo1zMpM85ZRYGJoc/oNzYh2l1Ls++gGdXr2G70uEMToFo1+90BmuOyN/sHLwDr9gwhK0CPDaxgICT0l9RJYI+Z8/OD3WExpgaJOqHQ0DlFnrYo8eIZyXqDbgJ4e9/km/elDbiNdGL1o/ryb37vyHp4qWOZOWCmLMZdwv7NgRbTkVDKZfGlYQOicz5khuzDJPPHoQQpomFgZ3Yosq7/7g4EFt9LK06U9jx2VRWiYpOAKWlgJZhULm0ZRrRq1YgkGo5jAfRDOiCbVQny0plyg1tyQxEAJLdk1VnhM5dSEz/nIwdliw1B7hhM24hKVpsvSuF/veAGEGvNUIwprPMvtjU4Qa4y9jEFm5U9g9xvfiynPV8dsOymZMX3PDoOiWLaxDH0ea4oLnsG1Bw62iw1TUtW9y0o3SyxQ1ownQlh8ihQ5qRcmICBIpJA+2iRs0NTQJ8YmcVRXfUg+I2IUYC0iWqIVD+BgfmzL5C8oF1Qu9agrbfC6nvqpIRolwo+YUuHzn3ZV7HDcBy8PiSLkLU8rWlhVYLe+PjApur31ehLjzRy3xeV4BO6IWoHkFCSC6mUdGhFmvMkncn01PMIoaftbCM85lqBQYE3c2gaykdtlk5bU15buWBGmp52WwKzGBM2+SJvMdNLPFMnyWk3aNQvVuLIRpUHbrAOU9liQRcFC+b4E5WXi3xHSqLNqwGiQmhJLw490P1iS+34DPMn7TJszpaQ0+bcKpraG/tlIV4q5Km7q48Q9CJcOVw95KFX4dVlBVSjjIHHtbVleLx+gIw9+76VfH0AsGZ7Jm56D5e8E0sUo/hBA5pOn8fOj3ErjkOf8bprOxMKhO9/PJYsNfWzfzoXZw4lTffRSiZ1W1oMXHwbqJyOWGyPcrmkxzCTt23e6kOVxoBVGr7x6GhR0IK101cEjBlRk3L8i2MRycxE9OBn887T3/ZX15dqPvou+joyiOnkXPo9+iV9FFRDu3Ow87TzpPu8+64+60m66otzprn2+jraer/gdfDjCG</latexit>

PRIMAL DUAL IP METHODSecret Sauce

PD methods do not have separate centering steps. They shoot straight for the solution

While “not converged:”

• Calculate average “duality measure:” µ = xT ⌫n

• Calculate target duality measure: �µ

• Solve the Newton system

• Update iterates stepsizechosen to stay

near central path

agressiveness

0

@0 AT IA 0 0V 0 X

1

A

0

@�x��⌫

1

A =

0

@�rc�rb

�XV 1+ �µ1

1

A

x x+ ↵p�x

(�, ⌫) (�, ⌫) + ↵d(��,�⌫)

ADAPTIVE PARAMETER CHOICE

µtarget = �µ

↵maxp = min

�xi<0

xi

�xi↵maxd = min

�⌫i<0

⌫i�⌫i

find largest steps that don’t violate feasibility x, ⌫ � 0

↵p = min{1, ⌘↵maxp } ↵d = min{1, ⌘↵max

d }

new duality old duality

x x+ ↵p�x

(�, ⌫) (�, ⌫) + ↵d(��,�⌫)

� =

✓(x+ ↵p�x)T (⌫ + ↵d�⌫)

µ

◆3

see Necedal and Wright, chapter 14

COMPOSITE NEWTON’S METHODJ(x)�x = �F (x)

x x+ ↵�x

classical Newton

factorizing thisis expensive!

Let’s try to use it twice!

J(x)�xp = �F (x)

J(x)�xc = �F (x+�xp)

x x+ ↵(�xp +�xc)

predictorcorrectorupdate

Why would you do this?

COMPOSITE NEWTON’S METHODJ(x)�xp = �F (x)

J(x)�xc = �F (x+�xp)

x x+ ↵(�xp +�xc)

predictorcorrectorupdate

Allows you to use the Hessian 2X!Can prove cubic convergence (vs quadratic)

better than Newton? why?

It’s free!

see Tapia, Zhang, Saltzman, Weiser ‘90

MEHROTRA’S PREDICTOR-CORRECTOR METHOD

=0

(x+�x)T (⌫ +�⌫) = xT ⌫ + xT�⌫ + ⌫T�x+�xT�⌫ = �xT�⌫

I wantedthis to be 0!

subtract from here

0

@0 AT IA 0 0V 0 X

1

A

0

@�x��⌫

1

A =

0

@�rc�rb

�XV 1

1

A

MEHROTRA’S METHOD0

@0 AT IA 0 0V 0 X

1

A

0

@�xp

��p

�⌫p

1

A =

0

@�rc�rb

�XV 1

1

A

predictor

we now have(x+�xp)

T (⌫ +�⌫p) = �xTp �⌫p

0

@0 AT IA 0 0V 0 X

1

A

0

@�xc

��c

�⌫c

1

A =

0

@�rc�rb

�XV 1+ �µ1��Xp�Vp

1

A

x x+ ↵p�xc

(�, ⌫) (�, ⌫) + ↵d(��c,�⌫c)

finally

corrector

PHASE I VS PHASE IIIP’s require strict feasibility

minimize f(x)

subject to Ax = b

g(x) 0

Ax = b

g(x) < 0

starting point satisfies

strict

minimize z

subject to Ax = b

gi(x) z, 8i

phase Iminimize f(x)

subject to Ax = b

g(x) 0

phase IIif z<0

MOST PROBLEMS HAPPEN IN PHASE I

minimize f(x)

subject to Ax b

Ax � b

redundant constraints

will this work?

minimize f(x)

subject to Ax = b

minimize f(x)

subject to Ax = b

g(x) 0

(very) poor conditioning

difficult toinvert

PROPERTIES OF IP METHODS

fast linear/superlinear convergencepro’s

high per-iteration complexity

very high precision achievable

extremely reliableno asymptotic slowdown

totally automated

con’s

useless for more than 10K unknowns

COMPLEXITYEllipsoid method : Leonid Khachiyan1979

Solves LP’s with n variables and b bits of precision in time

First polynomial-time proof for LP method

Solves most standard convex problems in poly time

O(N6L)

COMPLEXITYProjection Algorithm : Narendra Karmarkar1984

Solves LP’s with n variables and b bits of precision in with smaller constant

First polynomial-time proof for efficient LP method

“Faster than barrier method,” but Philip Gill proved they are the same

O(N3.5L)

Other IP methods proved polynomial: Yurii Nesterov and others

BIG OPEN PROBLEMSIs there a strongly polynomial time algorithm for LP?

Weakly polynomial: number of operations depends on size of input numbers, not just on how many numbers are input

Strongly polynomial: runtime independent of input size

ex: Euclidean GCD algorithm

ex: sorting

Does a polytope with N faces and D dimensions have polynomial diameter?

Diameter: maximum distance/edges connecting 2 verticesIf not then impossible to prove polynomial runtime for simplex method

l14 interior point - university of maryland, college...

Documents