lecture 15 newton method and self-concordanceangelia/l15_selfconcordant.pdf · • self-concordance...

Lecture 15

Newton Method and Self-Concordance

October 23, 2008

Lecture 15

Outline

• Self-concordance Notion

• Self-concordant Functions

• Operations Preserving Self-concordance

• Properties of Self-concordant Functions

• Implications for Newton’s Method

• Equality Constrained Minimization

Convex Optimization 1

Lecture 15

Classical Convergence Analysis of Newton’s Method

• Given a level set L0 = {x | f(x) ≤ f(x0)}, it requires for all x, y ∈ L0:

‖∇2f(x)−∇2f(y)‖ ≤ L‖x− y‖2 mI � ∇2f(x) � MI

for some constants L > 0, m > 0, and M > 0

• Given a desired error level ε, we are interested in an ε-solution of the

problem i.e., a vector x such that f(x) ≤ f ∗ + ε

• An upper bound on the number of iterations to generate an ε-solution

is given byf(x0)− f ∗

γ+ log2 log2

(ε0

ε

)where γ = σβη2 m

M2, η ∈ (0, m2/L), and ε0 = 2m3/L2


Lecture 15

• This follows from

L

2m2‖∇f(xk+K)‖ ≤

(L

2m2‖∇f(xk)‖

)2K

≤(1

2

)2K

where k is such that ‖∇f(xk)‖ ≤ η and η is small enough so that

η L2m2 ≤ 1

2. By the above relation and strong convexity of f , we have

f(xk)− f ∗ ≤1

2m‖∇f(xk)‖2 ≤

2m3

L2

(1

2

)2K

• The bound is conceptually informative, but not practical

• Furthermore, the constants L, m, and M change with affine transfor-

mations of the space, while Newton’s method is affine invariant

• Can a bound be obtained in terms of problem data that is affine invariant

and, moreover, practically verifiable?


Lecture 15

Self-concordance

• Nesterov and Nemirovski introduced a notion of self-concordance and a

class of self-concordant functions

• Importance of the self-concordance:

• Possesses affine invariant property

• Provides a new tool for analyzing Newton’s method that exploits the

affine invariance of the method

• Results in a practical upper bound on the Newton’s iterations

• Plays a crucial role in performance analysis of interior point method

Def. A function f : R 7→ R is self-concordant when f is convex and

|f ′′′(x)| ≤ 2f ′′(x)3/2 for all x ∈ domf

The rate of change in curvature of f is bounded by the curvature

Note: One can use a constant κ other than 2 in the definition


Lecture 15

Examples

• Linear and quadratic functions are self-concordant [f ′′′(x) = 0 for all x]

• Negative logarithm f(x) = − lnx, x > 0 is self-concordant:

f ′′(x) =1

x2, f ′′′(x) = −

2

x3,

|f ′′′(x)|f ′′(x)3/2

= 2 for all x > 0

• Exponential function ex is not self-concordant: f ′′(x) = f ′′′(x) = ex,|f ′′′(x)|f ′′(x)3/2

=ex

e3x/2= e−x/2,

|f ′′′(x)|f ′′(x)3/2

→∞ as x → −∞

• Even-power monomial x2p, p > 2 is not self-concordant:

f ′′(x) = 2p(2p− 1)x2p−2, f ′′′(x) = 2p(2p− 1)(2p− 2)x2p−3,

|f ′′′(x)|f ′′(x)3/2

=p3|x2p−3|p2x3(p−1)

,|f ′′′(x)|f ′′(x)3/2

→∞ as x ↓ 0


Lecture 15

Scaling and Affine Invariance

• In the definition of the self-concordant function, one can use any other κ

• Let f be self-concordant with κ, then f(x) = κ2

4f(x) is self-concordant

with κ = 2:|f ′′′(x)|f ′′(x)3/2

=8κ2

4κ3

|f ′′′(x)|f ′′(x)3/2

≤ 2

• Self-concordant function is affine invariant: for f self-concordant and

x = ay + b, we have f(y) = f(x) self-concordant with the same κ:

• Convexity is preserved: f is convex

• f ′′′(y) = a3f ′′′(x), f ′′(y) = a2f ′′(x)

|f ′′′(x)|f ′′(x)3/2

=|a|3

|a|3|f ′′′(x)|f ′′(x)3/2

≤ κ


Lecture 15

Self-concordant Function in Rn

Def. A function f : Rn → R is self-concordant when it is self-concordant

along every line, i.e.,

• f is convex

• g(t) = f(x + tv) is self-concordant for all x ∈ domf and all v,

Note: The constant κ of self-concordance is independent of the choice of

x and v, i.e.,|g′′′(t)|g′′(t)3/2

≤ 2

for all x ∈ domf , v ∈ Rn, and t ∈ R such that x + tv ∈ domf


Lecture 15

Operations Preserving Self-concordance

Note: To start with, these operations have to preserve convexity• Scaling with a positive factor of at least 1: when f is self-concordant

and a > 1, then af is also self-concordant

• The sum f1 + f2 of two self-concordant functions is self-concordant

(extends to any finite sum)

• Composition with affine mapping: when f(y) is self-concordant, then

f(Ax + b) is self-concordant

• Composition with ln-function: Let g : R → R be a convex function with

domg = R++, and

|g′′′(x)| ≤ 3g′′(x)

xfor all x > 0

Then:f(x) = − ln (−g(x))− ln(x) over {x > 0 | g(x) < 0}

is self-concordant

HW7


Lecture 15

Implications

• When f ′′(x) > 0 for all x (then f is strictly convex), the self-concordant

condition can be written as

∣∣∣∣ d

dx

(f ′′(x)−

12

)∣∣∣∣ ≤ 1 for all x ∈ domf.

• Assuming that for some t > 0, the interval [0, t] lies in domf , we can

integrate from 0 to t, and obtain

−t ≤∫ t

0

d

dx

(f ′′(x)−

12

)dx ≤ t.


Lecture 15

• This implies the following lower and upper bounds on f ′′(x),

f ′′(0)(1 + t

√f ′′(0)

)2 ≤ f ′′(t) ≤f ′′(0)(

1− t√

f ′′(0))2, (1)

where the lower bound is valid for all t ≥ 0 with t ∈ domf , and the

upper bound is valid for all t ∈ domf with 0 ≤ t < f ′′(0)−1/2.


Lecture 15

Bounds on f

Consider f : Rn → R with ∇2f(x) > 0. Let v be a descent direction, i.e.,

such that ∇f(x)Tv < 0. Consider g(t) = f(x + tv)− f(x) for t > 0.

Integrating the lower bound of Eq. (1) twice, we obtain

g(t) ≥ g(0) + tg′(0) + t√

g′′(0)− ln(1 + t

√g′′(0)

)Consider now v = Newton′s direction, i.e., v = [∇2f(x)]−1∇f(x). Let

h(t) = f(x + tv). We have

h′(0) = ∇f(x)v = −λ2(x), h′′(0) = vT∇2f(0)v = λ2(x).

Integrating the lower bound of Eq. (1) twice, we obtain for 0 ≤ t < 1λ(x)

h(t) ≤ h(0)− tλ2(x)− tλ(x)− ln (1− tλ(x)) (2)


Lecture 15

Newton Direction for Self-concordant Functions

Let f : Rn → R be self-concordant and ∇2f(x) � 0 for all x ∈ L0

• Using self-concordance it can be seen that

• For Newton’s decrement λ(x) and any v ∈ Rn:

λ(x) = supv 6=0

−vT∇f(x)

(vT∇2f(x)v)1/2

with the supremum attainment at v = −[∇2f(x)]−1∇f(x)

HW7


Lecture 15

Self-Concordant Functions: Newton Method Analysis

• Consider Newton’s method, started at x0, with backtracking line search.

• Assume that f : Rn → R is self-concordant and strictly convex.

(If domf 6= Rn, assume the level set {x | f(x) ≤ f(x0)} is closed.

• Also, assume that f is bounded from below.

Under the preceding assumptions an optimizer x∗ of f over Rn exists. HW7

• Analysis is similar to the classic one except

• Self-concordance replaces strict convexity and Lipschitz Hessian as-

sumptions

• Newton decrement replaces the role of the gradient norm


Lecture 15

Convergence Result

Theorem 1 Assume that f is self-concordant strictly convex function thatis bounded below over Rn. Then, there exist η ∈ (0,1/4) and γ suchthat

• Damped phase If λk > η, we have f(xk+1)− f(xk) ≤ −γ.

• Pure Newton If λk ≤ η, the backtracking line search selects α = 1 and

2λk+1 ≤ (2λk)2

where λk = λ(xk).


Lecture 15

Proof

Let h(α) = f(xk + αdk). As seen in Eq. (2), we have for 0 ≤ α < 1λk

h(α) ≤ h(0)− αλ2k − αλk − ln (1− αλk) (3)

We use this relation to show that the stepsize α = 11+λk

satisfies the exit

condition of the line search. In particular, letting α = α, we have

h(α) ≤ h(0)− αλ2k − αλk − ln(1− αλk)

= h(0)− λk + ln(1 + λk)

Using the inequality −t + ln(1 + t) ≤ − t2

2(1+t)for t ≥ 0, and σ ≤ 1/2,

we obtain


Lecture 15

h(α) ≤ h(0)− σλ2

k

1 + λk

= h(0)− σαλ2k.

Therefore, at the exit of the line search, we have αk ≥ β1+λk

, implying

h(αk) ≤ h(0)− σβλ2

k

1 + λk

.

When λk > η, since h(αk) = f(xk+1) and h(0) = f(xk), we have

f(xk+1) ≤ f(xk)− γ, γ = σβη2

1 + η.

Assume that λk ≤ 1−2σ2

, from Eq. (3) and the inequality

−x− ln(1− x) ≤ x2/2 + x3 for 0 ≤ x ≤ 0.81,

we have


Lecture 15

h(1) ≤ h(0)−1

2λ2

k + λ3k ≤ h(0)− σλ2

k,

where the last inequality follows from λk ≤ 1−2σ2

.

Thus, the unit step satisfies the sufficient decrease condition (no damping).

The proof that 2λk+1 ≤ (2λk)2 left for HW7.

The preceding material is from Chapter 9.6 of the book by Boyd and

Vandenberghe.


Lecture 15

Newton’s Method for Self-concordant Functions

Let f : Rn → R be self-concordant and ∇2f(x) � 0 for all x ∈ L0

• The analysis of Newton’s method with backtracking line search:

• For an ε-solution, i.e., x with f(x) ≤ f ∗ + ε

• An upper bound on the number of iterations is given by

f(x0)− f ∗

Γ+ log2 log2

1

ε, Γ = σβ

η2

1 + η, η =

1− 2σ

4

• Explicitly:

f(x0)− f ∗

Γ+log2 log2

1

ε=

20− 8σ

σβ(1− 2σ)2[f(x0)− f ∗]+log2 log2

1

ε

• The bound is practical (when f ∗ can be underestimated)


Lecture 15

Equality Constrained Minimizationminimize f(x)

subject to Ax = b

Assumption:• The function f is convex, twice continuously differentiable• The matrix A ∈ Rp×n has Rank A = p• Optimal value f ∗ is finite and attained

• If Ax = b for some x ∈ relint(domf), there is no duality gap and

there exists an optimal dual solution

• KKT Optimality Conditions state x∗ is optimal if and only if there

exists λ∗ such that

Ax∗ = b, ∇f(x∗) + ATλ∗ = 0

• Solving the problem is equivalent to solving the KKT conditions: n + p

equations in n + p variables, x∗ ∈ Rn and λ∗ ∈ Rp


Lecture 15

Equality Constrained Quadratic Minimization

minimize 12

xTPx + qTx + r

subject to Ax = b

with P ∈ Sn+

Important in extending the Newton’s method to equality constraints

KKT Optimality Conditions[P AT

A 0

] [x∗

λ∗

]=

[−q

b

]• Coefficient matrix is referred to as KKT matrix• The KKT matrix is nonsingular if and only if

Ax = 0, x 6= 0 =⇒ xTPx > 0 [P is psd over null space of A]

• Equivalent conditions for nonsingularity of A:

• P + ATA � 0

• N (A) ∩N (P ) = {0}


Lecture 15

Newton’s Method with Equality Constraints

Almost the same as Newton’s method, except for two differences:

• The initial iterate x0 has to be feasible, Ax0 = b

• The Newton’s directions dk have to be feasible, Adk = 0

Given iterate xk, Newton’s direction dk is determined as a solution to

minimize f(xk) +∇f(xk)d + 12

dT∇2f(xk)d

subject to A(xk + d) = b

• The KKT conditions:(wk is optimal dual for the quadratic minimization)

[∇2f(xk) AT

A 0

] [dk

wk

]=

[−∇f(xk)

0

]


Lecture 15

Newton’s Method with Equality Constraints

Given starting point x ∈ domf with Ax = b and tolerance ε > 0.

Repeat

1. Compute Newton’s direction dN(x) by solving the corresponding KKT

system.

2. Compute the decrement λ(x)

3. Stopping criterion. Quit if λ2/2 ≤ ε.

4. Line search. Choose stepsize α by backtracking line search.

5. Update. x := x + αdN(x).

• When f is strictly convex and self-concordant, the bound on the

number of iterations required to achieve an ε-accuracy is the same

as in the “unconstrained case”


Lecture 15

Newton’s Method with Infeasible PointsUseful when determining an initial feasible iterate x0 is difficult

Linearizing optimality conditions Ax∗ = b and ∇f(x∗) + ATλ∗ at some

x + d ≈ x∗, with x is possibly infeasible, and using w ≈ λ∗

• We have

∇f(x∗) ≈ ∇f(x + d) ≈ ∇f(x) +∇2f(x)d

• Approximate KKT

A(x + d) = b, ∇f(x) +∇2f(x)d + ATw = 0

• At infeasible xk, the set of linear inequalities determining dk and wk:[∇2f(xk) AT

A 0

] [dk

wk

]= −

[∇f(xk)

Axk − b

]

Note: When xk is feasible, the system of equations is the same as in the

feasible point Newton’s method


Lecture 15

Equality Constrained Analytic Centering

minimize f(x) = −∑n

i=1 lnxi

subject to Ax = b

Feasible point Newton’s method: g = ∇f(x), H = ∇2f(x)

[H AT

A 0

] [d

w

]=

[−g

0

], g =

− 1

x1...

− 1xn

, H = diag[1

x21

, . . . ,1

x2n

]

• The Hessian is positive definite

• KKT matrix first row: Hd+ATw = −g ⇒ d = −H−1(g+ATw) (1)

• KKT matrix second row, Ad = 0, and Eq. (1)⇒ AH−1(g+ATw) = 0

• The matrix A has full row rank, thus AH−1AT is invertible, hence

w = −(AH−1AT

)−1AH−1g, H−1 = diag

[x2

1, . . . , x2n

]• The matrix −AH−1AT is known as Schur complement of H (any H)


Lecture 15

Network Flow Optimization

minimize∑n

l=1 φl(xl)

subject to Ax = b

• Directed graph with n arcs and p + 1 nodes

• Variable xl: flow through arc l

• Cost φl: cost flow function for arc l, with φ′′l (t) > 0

• Node-incidence matrix A ∈ R(p+1)×n defined as

Ail =

1 arc j originates at node i

−1 arc j ends at node i

0 otherwise

• Reduced node-incidence matrix A ∈ Rp×n is A with last row removed

• b ∈ Rp is (reduced) source vector

• Rank A = p when the graph is connected


Lecture 15

KKT system for infeasible Newton’s method

[H AT

A 0

] [d

w

]= −

[g

h

]where h = Ax− b is a measure of infeasibility at the current point x

• g = [φ′1(x1), . . . , φn(xn)]T

• H = diag [φ′′1(x1), . . . , φ′′n(xn)] with positive diagonal entries

• Solve via elimination:

w = (AH−1AT)−1[h−AH−1g], d = −H−1(g + ATw)


lecture 15 newton method and self-concordanceangelia/l15_selfconcordant.pdf · • self-concordance...

Documents