orie 6326: convex optimization [2ex] operators · orie 6326: convex optimization operators...

91
ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1 / 47

Upload: others

Post on 23-Aug-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

ORIE 6326: Convex Optimization

Operators

Professor Udell

Operations Research and Information EngineeringCornell

May 15, 2017

1 / 47

Page 2: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Beating the lower bound

problem: lower bound for subgradient method is slow!

solution: find a more powerful oracle

2 / 47

Page 3: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Beating the lower bound

problem: lower bound for subgradient method is slow!solution: find a more powerful oracle

2 / 47

Page 4: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Proximal operator

define the proximal operator of the function f : Rd → R

proxf (x) = argminz

(f (z) +1

2‖z − x‖22)

I proxf : Rd → Rd

I generalized projection: if 1C is the indicator of set C ,

prox1C(w) = ΠC (w)

I implicit gradient step: if z = proxf (x)

∂f (z) + z − x = 0

z = x − ∂f (z)

3 / 47

Page 5: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Proximal operator

define the proximal operator of the function f : Rd → R

proxf (x) = argminz

(f (z) +1

2‖z − x‖22)

I proxf : Rd → Rd

I generalized projection: if 1C is the indicator of set C ,

prox1C(w) = ΠC (w)

I implicit gradient step: if z = proxf (x)

∂f (z) + z − x = 0

z = x − ∂f (z)

3 / 47

Page 6: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Proximal operator

define the proximal operator of the function f : Rd → R

proxf (x) = argminz

(f (z) +1

2‖z − x‖22)

I proxf : Rd → Rd

I generalized projection: if 1C is the indicator of set C ,

prox1C(w) = ΠC (w)

I implicit gradient step: if z = proxf (x)

∂f (z) + z − x = 0

z = x − ∂f (z)

3 / 47

Page 7: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Proximal operator

define the proximal operator of the function f : Rd → R

proxf (x) = argminz

(f (z) +1

2‖z − x‖22)

I proxf : Rd → Rd

I generalized projection: if 1C is the indicator of set C ,

prox1C(w) = ΠC (w)

I implicit gradient step: if z = proxf (x)

∂f (z) + z − x = 0

z = x − ∂f (z)

3 / 47

Page 8: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Maps from functions to functions

no consistent notation for map from functions to functions.

for a function f : Rd → R,

I prox maps f to a new function proxf : Rd → Rd

I proxf (x) evaluates this function at the point x

I ∇ maps f to a new function ∇f : Rd → Rd

I ∇f (x) evaluates this function at the point x

4 / 47

Page 9: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Let’s evaluate some proximal operators!

define the proximal operator of the function f : Rd → R

proxf (x) = argminz

(f (z) +1

2‖z − x‖22)

I f (x) = 0

(identity)

I f (x) = x2 (shrinkage)

I f (x) = |x | (soft-thresholding)

I f (x) = 1(x ≥ 0) (projection)

I f (x) =∑d

i=1 fi (xi ) (separable)

I f (x) = ‖x‖1 (soft-thresholding on each index)

I f (X ) = ‖X‖∗ (soft-thresholding on singular values)

5 / 47

Page 10: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Let’s evaluate some proximal operators!

define the proximal operator of the function f : Rd → R

proxf (x) = argminz

(f (z) +1

2‖z − x‖22)

I f (x) = 0 (identity)

I f (x) = x2 (shrinkage)

I f (x) = |x | (soft-thresholding)

I f (x) = 1(x ≥ 0) (projection)

I f (x) =∑d

i=1 fi (xi ) (separable)

I f (x) = ‖x‖1 (soft-thresholding on each index)

I f (X ) = ‖X‖∗ (soft-thresholding on singular values)

5 / 47

Page 11: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Let’s evaluate some proximal operators!

define the proximal operator of the function f : Rd → R

proxf (x) = argminz

(f (z) +1

2‖z − x‖22)

I f (x) = 0 (identity)

I f (x) = x2

(shrinkage)

I f (x) = |x | (soft-thresholding)

I f (x) = 1(x ≥ 0) (projection)

I f (x) =∑d

i=1 fi (xi ) (separable)

I f (x) = ‖x‖1 (soft-thresholding on each index)

I f (X ) = ‖X‖∗ (soft-thresholding on singular values)

5 / 47

Page 12: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Let’s evaluate some proximal operators!

define the proximal operator of the function f : Rd → R

proxf (x) = argminz

(f (z) +1

2‖z − x‖22)

I f (x) = 0 (identity)

I f (x) = x2 (shrinkage)

I f (x) = |x | (soft-thresholding)

I f (x) = 1(x ≥ 0) (projection)

I f (x) =∑d

i=1 fi (xi ) (separable)

I f (x) = ‖x‖1 (soft-thresholding on each index)

I f (X ) = ‖X‖∗ (soft-thresholding on singular values)

5 / 47

Page 13: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Let’s evaluate some proximal operators!

define the proximal operator of the function f : Rd → R

proxf (x) = argminz

(f (z) +1

2‖z − x‖22)

I f (x) = 0 (identity)

I f (x) = x2 (shrinkage)

I f (x) = |x |

(soft-thresholding)

I f (x) = 1(x ≥ 0) (projection)

I f (x) =∑d

i=1 fi (xi ) (separable)

I f (x) = ‖x‖1 (soft-thresholding on each index)

I f (X ) = ‖X‖∗ (soft-thresholding on singular values)

5 / 47

Page 14: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Let’s evaluate some proximal operators!

define the proximal operator of the function f : Rd → R

proxf (x) = argminz

(f (z) +1

2‖z − x‖22)

I f (x) = 0 (identity)

I f (x) = x2 (shrinkage)

I f (x) = |x | (soft-thresholding)

I f (x) = 1(x ≥ 0) (projection)

I f (x) =∑d

i=1 fi (xi ) (separable)

I f (x) = ‖x‖1 (soft-thresholding on each index)

I f (X ) = ‖X‖∗ (soft-thresholding on singular values)

5 / 47

Page 15: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Let’s evaluate some proximal operators!

define the proximal operator of the function f : Rd → R

proxf (x) = argminz

(f (z) +1

2‖z − x‖22)

I f (x) = 0 (identity)

I f (x) = x2 (shrinkage)

I f (x) = |x | (soft-thresholding)

I f (x) = 1(x ≥ 0)

(projection)

I f (x) =∑d

i=1 fi (xi ) (separable)

I f (x) = ‖x‖1 (soft-thresholding on each index)

I f (X ) = ‖X‖∗ (soft-thresholding on singular values)

5 / 47

Page 16: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Let’s evaluate some proximal operators!

define the proximal operator of the function f : Rd → R

proxf (x) = argminz

(f (z) +1

2‖z − x‖22)

I f (x) = 0 (identity)

I f (x) = x2 (shrinkage)

I f (x) = |x | (soft-thresholding)

I f (x) = 1(x ≥ 0) (projection)

I f (x) =∑d

i=1 fi (xi ) (separable)

I f (x) = ‖x‖1 (soft-thresholding on each index)

I f (X ) = ‖X‖∗ (soft-thresholding on singular values)

5 / 47

Page 17: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Let’s evaluate some proximal operators!

define the proximal operator of the function f : Rd → R

proxf (x) = argminz

(f (z) +1

2‖z − x‖22)

I f (x) = 0 (identity)

I f (x) = x2 (shrinkage)

I f (x) = |x | (soft-thresholding)

I f (x) = 1(x ≥ 0) (projection)

I f (x) =∑d

i=1 fi (xi )

(separable)

I f (x) = ‖x‖1 (soft-thresholding on each index)

I f (X ) = ‖X‖∗ (soft-thresholding on singular values)

5 / 47

Page 18: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Let’s evaluate some proximal operators!

define the proximal operator of the function f : Rd → R

proxf (x) = argminz

(f (z) +1

2‖z − x‖22)

I f (x) = 0 (identity)

I f (x) = x2 (shrinkage)

I f (x) = |x | (soft-thresholding)

I f (x) = 1(x ≥ 0) (projection)

I f (x) =∑d

i=1 fi (xi ) (separable)

I f (x) = ‖x‖1 (soft-thresholding on each index)

I f (X ) = ‖X‖∗ (soft-thresholding on singular values)

5 / 47

Page 19: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Let’s evaluate some proximal operators!

define the proximal operator of the function f : Rd → R

proxf (x) = argminz

(f (z) +1

2‖z − x‖22)

I f (x) = 0 (identity)

I f (x) = x2 (shrinkage)

I f (x) = |x | (soft-thresholding)

I f (x) = 1(x ≥ 0) (projection)

I f (x) =∑d

i=1 fi (xi ) (separable)

I f (x) = ‖x‖1

(soft-thresholding on each index)

I f (X ) = ‖X‖∗ (soft-thresholding on singular values)

5 / 47

Page 20: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Let’s evaluate some proximal operators!

define the proximal operator of the function f : Rd → R

proxf (x) = argminz

(f (z) +1

2‖z − x‖22)

I f (x) = 0 (identity)

I f (x) = x2 (shrinkage)

I f (x) = |x | (soft-thresholding)

I f (x) = 1(x ≥ 0) (projection)

I f (x) =∑d

i=1 fi (xi ) (separable)

I f (x) = ‖x‖1 (soft-thresholding on each index)

I f (X ) = ‖X‖∗ (soft-thresholding on singular values)

5 / 47

Page 21: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Let’s evaluate some proximal operators!

define the proximal operator of the function f : Rd → R

proxf (x) = argminz

(f (z) +1

2‖z − x‖22)

I f (x) = 0 (identity)

I f (x) = x2 (shrinkage)

I f (x) = |x | (soft-thresholding)

I f (x) = 1(x ≥ 0) (projection)

I f (x) =∑d

i=1 fi (xi ) (separable)

I f (x) = ‖x‖1 (soft-thresholding on each index)

I f (X ) = ‖X‖∗

(soft-thresholding on singular values)

5 / 47

Page 22: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Let’s evaluate some proximal operators!

define the proximal operator of the function f : Rd → R

proxf (x) = argminz

(f (z) +1

2‖z − x‖22)

I f (x) = 0 (identity)

I f (x) = x2 (shrinkage)

I f (x) = |x | (soft-thresholding)

I f (x) = 1(x ≥ 0) (projection)

I f (x) =∑d

i=1 fi (xi ) (separable)

I f (x) = ‖x‖1 (soft-thresholding on each index)

I f (X ) = ‖X‖∗ (soft-thresholding on singular values)

5 / 47

Page 23: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Proxable functions

we say a function f is proxable if it’s easy to evaluate proxf (x)

all examples from previous slide are proxable

6 / 47

Page 24: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Roadmap

suppose f is smooth, g is non-smooth. solve

minimize f (x) + g(x)

using proximal operators together with gradient steps?

I the proximal operator gives a fast method to step towards theminimum of g

I gradient method works well to step towards minimum of f

I put it together with gradients to make fast optimizationalgorithms

to do this elegantly, we will need more theory. . .

7 / 47

Page 25: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Outline

Relations

Fixed points

Averaged operators

Cocoercive and coercive operators

Operators and functions

Resolvents and proxs

8 / 47

Page 26: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Functions

in much of what follows, we’ll need to assume functions are

I closed: epi(f ) is a closed set

I convex: f is convex

I proper: dom f is non-empty

which we abbreviate as CCP

9 / 47

Page 27: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Relations

(x , ∂f (x)) and (x ,proxf (x)) define relations on Rn

I a relation R on Rn is a subset of Rn × Rn

I domR = {x : (x , y) ∈ R}I let R(x) = {y : (x , y) ∈ R}I if R(x) is always empty or a singleton, we say R is a function

I any function f : R→ R is a relation

10 / 47

Page 28: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Relations: examples

I empty relation: ∅I full relation: Rn × Rn

I identity: {(x , x) : x ∈ Rn}I zero: {(x , 0) : x ∈ Rn}I subdifferential: {(x , g : x ∈ dom f , g ∈ ∂f (x)}

11 / 47

Page 29: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Operations on relations

if R and S are relations, define

I composition: RS = {(x , z) : (x , y) ∈ R, (y , z) ∈ S}I addition: R + S = {(x , y + z) : (x , y) ∈ R, (x , z) ∈ S}I inverses: R−1 = {(y , x) : (x , y) ∈ R}

use inequality on sets to mean the inequality holds for any element inthe set, e.g.,

f (y) ≥ f (x) + ∂f T (y − x)

12 / 47

Page 30: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Example: fenchel conjugates and the subdifferential

if f is CPP, (f ∗)∗ = f ∗∗ = f , so

(u, v) ∈ (∂f )−1 ⇐⇒ (v , u) ∈ ∂f⇐⇒ u ∈ ∂f (v)

⇐⇒ 0 ∈ ∂f (v)− u

⇐⇒ v ∈ argminx

(f (x)− uT x)

⇐⇒ v ∈ argmaxx

(uT x − f (x))

⇐⇒ f (v) + f ∗(u) = uT v

⇐⇒ u ∈ argmaxy

(yT v − f ∗(y))

⇐⇒ 0 ∈ v − ∂f ∗(u))

⇐⇒ (u, v) ∈ ∂f ∗

this shows ∂f ∗ = ∂f −1

13 / 47

Page 31: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Outline

Relations

Fixed points

Averaged operators

Cocoercive and coercive operators

Operators and functions

Resolvents and proxs

14 / 47

Page 32: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Zeros of a relation

I x is a zero of R if 0 ∈ R(x)

I the zero set of R is R−1(0) = {x : (x , 0) ∈ R}

x is a zero of ∂f iff x solves minimize f (x)

15 / 47

Page 33: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Zeros of a relation

I x is a zero of R if 0 ∈ R(x)

I the zero set of R is R−1(0) = {x : (x , 0) ∈ R}

x is a zero of ∂f iff x solves minimize f (x)

15 / 47

Page 34: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Lipschitz operators

relation F has Lipschitz constant L if for all (x , u) ∈ F and (y , v) ∈ F ,

‖u − v‖ ≤ L‖x − y‖

fact: if F is Lipschitz, then F is a function.proof:

if (x , u) ∈ F and (x , v) ∈ F ,

‖u − v‖ ≤ L‖x − x‖ = 0

I the relation F is nonexpansive if L ≤ 1

I the relation F is contractive if L < 1

16 / 47

Page 35: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Lipschitz operators

relation F has Lipschitz constant L if for all (x , u) ∈ F and (y , v) ∈ F ,

‖u − v‖ ≤ L‖x − y‖

fact: if F is Lipschitz, then F is a function.proof: if (x , u) ∈ F and (x , v) ∈ F ,

‖u − v‖ ≤ L‖x − x‖ = 0

I the relation F is nonexpansive if L ≤ 1

I the relation F is contractive if L < 1

16 / 47

Page 36: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Fixed points

x is a fixed point of F if x = F (x)

examples:

I F (x) = x : every point is a fixed point

I F (x) = 0: only 0 is a fixed point

I a contractive operator on Rn can have at most one FPproof: if x and y are FPs, ‖x − y‖ = ‖F (x)− F (y)‖ < ‖x − y‖contradiction

I a nonexpansive operator F need not have a fixed pointproof: translation

17 / 47

Page 37: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Fixed points

x is a fixed point of F if x = F (x)

examples:

I F (x) = x : every point is a fixed point

I F (x) = 0: only 0 is a fixed point

I a contractive operator on Rn can have at most one FP

proof: if x and y are FPs, ‖x − y‖ = ‖F (x)− F (y)‖ < ‖x − y‖contradiction

I a nonexpansive operator F need not have a fixed pointproof: translation

17 / 47

Page 38: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Fixed points

x is a fixed point of F if x = F (x)

examples:

I F (x) = x : every point is a fixed point

I F (x) = 0: only 0 is a fixed point

I a contractive operator on Rn can have at most one FPproof: if x and y are FPs, ‖x − y‖ = ‖F (x)− F (y)‖ < ‖x − y‖contradiction

I a nonexpansive operator F need not have a fixed pointproof: translation

17 / 47

Page 39: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Fixed points

x is a fixed point of F if x = F (x)

examples:

I F (x) = x : every point is a fixed point

I F (x) = 0: only 0 is a fixed point

I a contractive operator on Rn can have at most one FPproof: if x and y are FPs, ‖x − y‖ = ‖F (x)− F (y)‖ < ‖x − y‖contradiction

I a nonexpansive operator F need not have a fixed point

proof: translation

17 / 47

Page 40: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Fixed points

x is a fixed point of F if x = F (x)

examples:

I F (x) = x : every point is a fixed point

I F (x) = 0: only 0 is a fixed point

I a contractive operator on Rn can have at most one FPproof: if x and y are FPs, ‖x − y‖ = ‖F (x)− F (y)‖ < ‖x − y‖contradiction

I a nonexpansive operator F need not have a fixed pointproof: translation

17 / 47

Page 41: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Fixed point iteration

to find a fixed point of F , try the fixed point iteration

x (k+1) = F (x (k))

Q: when does this converge?

18 / 47

Page 42: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Fixed point iteration

to find a fixed point of F , try the fixed point iteration

x (k+1) = F (x (k))

Q: when does this converge?

18 / 47

Page 43: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Fixed point iteration: contractive

Banach fixed point theorem: if F is a contraction, the iteration

x (k+1) = F (x (k))

converges to the unique fixed point of F

properties: if L is the Lipschitz constant of F ,

I distance to fixed point decreases monotonically:

‖x (k+1) − x?‖ = ‖F (x (k))− F (x?)‖ ≤ L‖x (k) − x?‖

(iteration is Fejer-monotone)

I linear convergence with rate L

19 / 47

Page 44: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Proof

if F is a contraction, x (k+1) = F (x (k)) converges to unique fixed point

proof:

if F has Lipschitz constant L < 1,

I sequence x (k) is Cauchy:

‖x (k+`) − x (k)‖ ≤ ‖x (k+`) − x (k+`−1)‖+ · · ·+ ‖x (k+1) − x (k)‖≤ (L`−1 + . . .+ 1)‖x (k+1) − x (k)‖

≤ 1

1− L‖x (k+1) − x (k)‖

≤ Lk

1− L‖x (1) − x (0)‖

I so it converges to a point x?, which must be the (unique) FP

I converges to x? linearly with rate L

‖x (k)−x?‖ = ‖F (x (k−1))−F (x?)‖ ≤ L‖x (k−1)−x?‖ ≤ Lk‖x (0)−x?‖

20 / 47

Page 45: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Proof

if F is a contraction, x (k+1) = F (x (k)) converges to unique fixed point

proof: if F has Lipschitz constant L < 1,

I sequence x (k) is Cauchy:

‖x (k+`) − x (k)‖ ≤ ‖x (k+`) − x (k+`−1)‖+ · · ·+ ‖x (k+1) − x (k)‖≤ (L`−1 + . . .+ 1)‖x (k+1) − x (k)‖

≤ 1

1− L‖x (k+1) − x (k)‖

≤ Lk

1− L‖x (1) − x (0)‖

I so it converges to a point x?, which must be the (unique) FP

I converges to x? linearly with rate L

‖x (k)−x?‖ = ‖F (x (k−1))−F (x?)‖ ≤ L‖x (k−1)−x?‖ ≤ Lk‖x (0)−x?‖

20 / 47

Page 46: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Outline

Relations

Fixed points

Averaged operators

Cocoercive and coercive operators

Operators and functions

Resolvents and proxs

21 / 47

Page 47: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Fixed point iteration: nonexpansive

if F is nonexpansive, the iteration

x (k+1) = F (x (k))

need not converge to a fixed point even if one exists.

proof:

I let F rotate its argument by θ degrees around the origin.

I then F is nonexpansive and has a fixed point at x? = 0.

I but if ‖x (0)‖ = r , then ‖F (x (k))‖ = r for all k .

22 / 47

Page 48: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Fixed point iteration: nonexpansive

if F is nonexpansive, the iteration

x (k+1) = F (x (k))

need not converge to a fixed point even if one exists.

proof:

I let F rotate its argument by θ degrees around the origin.

I then F is nonexpansive and has a fixed point at x? = 0.

I but if ‖x (0)‖ = r , then ‖F (x (k))‖ = r for all k .

22 / 47

Page 49: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Averaged operators

an operator F is averaged if

F = θG + (1− θ)I

for θ ∈ (0, 1), G nonexpansive

fact: if F is averaged, then x if FP of F ⇐⇒ x is FP of Gproof:

x = Fx = θGx + (1− θ)Ix = θGx + (1− θ)x

θx = θGx

x = Gx

=⇒ if G is nonexpansive, F = 12 I + 1

2G is averaged with same FPs

23 / 47

Page 50: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Averaged operators

an operator F is averaged if

F = θG + (1− θ)I

for θ ∈ (0, 1), G nonexpansive

fact: if F is averaged, then x if FP of F ⇐⇒ x is FP of Gproof:

x = Fx = θGx + (1− θ)Ix = θGx + (1− θ)x

θx = θGx

x = Gx

=⇒ if G is nonexpansive, F = 12 I + 1

2G is averaged with same FPs

23 / 47

Page 51: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Averaged operators

an operator F is averaged if

F = θG + (1− θ)I

for θ ∈ (0, 1), G nonexpansive

fact: if F is averaged, then x if FP of F ⇐⇒ x is FP of Gproof:

x = Fx = θGx + (1− θ)Ix = θGx + (1− θ)x

θx = θGx

x = Gx

=⇒ if G is nonexpansive, F = 12 I + 1

2G is averaged with same FPs

23 / 47

Page 52: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Fixed point iteration: averaged

if F = θG + (1− θ)I is averaged (θ ∈ (0, 1), G nonexpansive),the iteration

x (k+1) = F (x (k))

converges to a fixed point if one exists.

(also called the damped, averaged, or Mann-Krasnosel’skii iteration.)

properties:

I distance to fixed point decreases monotonically (Fejer-monotone)

I sublinear convergence of fixed point residual

‖Gx (k) − x (k)‖2 ≤ 1

(k + 1)θ(1− θ)‖x (0) − x?‖2

24 / 47

Page 53: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Proof

proof follows [Ryu and Boyd, 2015]

use ‖(1− θ)a + θb‖2 = (1− θ)‖a‖2 + θ‖b‖2 − θ(1− θ)‖a− b‖2(proof by expanding), and set

x (k+1) − x? = (1− θ)(x (k) − x?) + θ(Gx (k) − x?) = (1− θ)a + θb

so a− b = x (k) − x? − (Gx (k) − x?) = x (k) − Gx (k). then

‖x (k+1) − x?‖2

= (1− θ)‖x (k) − x?‖2 + θ‖Gx (k) − x?‖2 − θ(1− θ)‖x (k) − Gx (k)‖2

≤ (1− θ)‖x (k) − x?‖2 + θ‖x (k) − x?‖2 − θ(1− θ)‖x (k) − Gx (k)‖2

= ‖x (k) − x?‖2 − θ(1− θ)‖x (k) − Gx (k)‖2,

(using the fact that G is nonexpansive for the first inequality)

this shows the iteration is Fejer monotone

25 / 47

Page 54: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Proof (II)

sum the last inequality over iterations k . it telescopes:

‖x (k+1) − x?‖2 ≤ ‖x (0) − x?‖2 − θ(1− θ)k∑

i=0

‖x (i) − Gx (i)‖2

k∑i=0

‖x (i) − Gx (i)‖2 ≤ 1

θ(1− θ)‖x (0) − x?‖2

mini=0,...,k

‖x (i) − Gx (i)‖2 ≤ 1

(k + 1)θ(1− θ)‖x (0) − x?‖2

‖x (k) − Gx (k)‖2 ≤ 1

(k + 1)θ(1− θ)‖x (0) − x?‖2

where the last inequality uses the fact that G is nonexpansive.

26 / 47

Page 55: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

How to design an algorithm

I look for an operator FI whose fixed points solve optimization problemsI which is contractive or averaged

I iterate x (k+1) = F (x (k))

I get convergenceI if F is contractive, get linear convergenceI if F is averaged, get sublinear convergence

27 / 47

Page 56: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Outline

Relations

Fixed points

Averaged operators

Cocoercive and coercive operators

Operators and functions

Resolvents and proxs

28 / 47

Page 57: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Properties of operators

we’ll need these properties of operators:

I LipschitzI contractiveI nonexpansiveI averaged

I monotoneI maximalI strongly monotone == coercive

I cocoercive

29 / 47

Page 58: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Properties of optimization operators

we’ll show connection to optimization:

I gradient of convex function is monotone

I gradient of strongly convex function is strongly monotone

I gradient of smooth function is cocoercive

I prox of convex function is 12 -averaged

I gradient step I − 2β∇f of smooth function is nonexpansive

(and so smaller stepsizes are averaged)

I prox of strongly convex function is contractive

I gradient step of SSC function is contractive

30 / 47

Page 59: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Monotone operators

I a relation F is monotone if, for all (x , u) ∈ F and (y , v) ∈ F

(u − v)T (x − y) ≥ 0.

example: ∂f is monotone

I a relation F is α-strongly monotone if, for all (x , u) ∈ F and(y , v) ∈ F

(u − v)T (x − y) ≥ α‖x − y‖2.

(also called α-coercive)example: ∂f is strongly monotone iff f is strongly convex

I a relation F is maximal monotone if there is no monotonerelation that properly contains it (as a subset of Rn × Rn)examples:

I if F : Rn → Rn is continuous and monotone, then F is maximalmonotone

I if f is CCP then ∂f is maximal monotone

useful: makes sure FP iteration doesn’t leave domain of F

31 / 47

Page 60: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Monotone operators

I a relation F is monotone if, for all (x , u) ∈ F and (y , v) ∈ F

(u − v)T (x − y) ≥ 0.

example: ∂f is monotone

I a relation F is α-strongly monotone if, for all (x , u) ∈ F and(y , v) ∈ F

(u − v)T (x − y) ≥ α‖x − y‖2.

(also called α-coercive)example: ∂f is strongly monotone iff f is strongly convex

I a relation F is maximal monotone if there is no monotonerelation that properly contains it (as a subset of Rn × Rn)examples:

I if F : Rn → Rn is continuous and monotone, then F is maximalmonotone

I if f is CCP then ∂f is maximal monotone

useful: makes sure FP iteration doesn’t leave domain of F

31 / 47

Page 61: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Monotone operators

I a relation F is monotone if, for all (x , u) ∈ F and (y , v) ∈ F

(u − v)T (x − y) ≥ 0.

example: ∂f is monotone

I a relation F is α-strongly monotone if, for all (x , u) ∈ F and(y , v) ∈ F

(u − v)T (x − y) ≥ α‖x − y‖2.

(also called α-coercive)example: ∂f is strongly monotone iff f is strongly convex

I a relation F is maximal monotone if there is no monotonerelation that properly contains it (as a subset of Rn × Rn)examples:

I if F : Rn → Rn is continuous and monotone, then F is maximalmonotone

I if f is CCP then ∂f is maximal monotone

useful: makes sure FP iteration doesn’t leave domain of F31 / 47

Page 62: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Cocoercive operators

an operator F is 1β -cocoercive if

(F (x)− F (y))T (x − y) ≥ 1

β‖F (x)− F (y)‖2.

example: we’ve already seen that ∇f is cocoercive if f is β-smooth

I F is 1β -cocoercive =⇒ F is β-Lipschitz

(proof by Cauchy-Schwarz)

I F is α-coercive ⇐⇒ F−1 is α-cocoercive(proof by interchanging x and F (x))

32 / 47

Page 63: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Cocoercive operators

an operator F is 1β -cocoercive if

(F (x)− F (y))T (x − y) ≥ 1

β‖F (x)− F (y)‖2.

example: we’ve already seen that ∇f is cocoercive if f is β-smooth

I F is 1β -cocoercive =⇒ F is β-Lipschitz

(proof by Cauchy-Schwarz)

I F is α-coercive ⇐⇒ F−1 is α-cocoercive(proof by interchanging x and F (x))

32 / 47

Page 64: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Cocoercive operators

an operator F is 1β -cocoercive if

(F (x)− F (y))T (x − y) ≥ 1

β‖F (x)− F (y)‖2.

example: we’ve already seen that ∇f is cocoercive if f is β-smooth

I F is 1β -cocoercive =⇒ F is β-Lipschitz

(proof by Cauchy-Schwarz)

I F is α-coercive ⇐⇒ F−1 is α-cocoercive(proof by interchanging x and F (x))

32 / 47

Page 65: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Cocoercive operators

an operator F is 1β -cocoercive if

(F (x)− F (y))T (x − y) ≥ 1

β‖F (x)− F (y)‖2.

example: we’ve already seen that ∇f is cocoercive if f is β-smooth

I F is 1β -cocoercive =⇒ F is β-Lipschitz

(proof by Cauchy-Schwarz)

I F is α-coercive ⇐⇒ F−1 is α-cocoercive

(proof by interchanging x and F (x))

32 / 47

Page 66: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Cocoercive operators

an operator F is 1β -cocoercive if

(F (x)− F (y))T (x − y) ≥ 1

β‖F (x)− F (y)‖2.

example: we’ve already seen that ∇f is cocoercive if f is β-smooth

I F is 1β -cocoercive =⇒ F is β-Lipschitz

(proof by Cauchy-Schwarz)

I F is α-coercive ⇐⇒ F−1 is α-cocoercive(proof by interchanging x and F (x))

32 / 47

Page 67: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Gradient step is nonexpansive (or averaged)

F is 1β -cocoercive ⇐⇒ I − 2

βF is nonexpansive

(and smaller step sizes are averaged)

proof:

‖(I − 2

βF )y − (I − 2

βF )x‖

= ‖y − 2

βFy − x − 2

βFx‖

= ‖y − x‖2 − 4

β(Fy − Fx)T (y − x) +

4

β2‖Fy − Fx‖2

= ‖y − x‖2 − 4

β

((Fy − Fx)T (y − x)− 1

β‖Fy − Fx‖2

)≤ ‖y − x‖2

inequality is definition of 1β -cocoercivity

33 / 47

Page 68: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Gradient step is nonexpansive (or averaged)

F is 1β -cocoercive ⇐⇒ I − 2

βF is nonexpansive

(and smaller step sizes are averaged)

proof:

‖(I − 2

βF )y − (I − 2

βF )x‖

= ‖y − 2

βFy − x − 2

βFx‖

= ‖y − x‖2 − 4

β(Fy − Fx)T (y − x) +

4

β2‖Fy − Fx‖2

= ‖y − x‖2 − 4

β

((Fy − Fx)T (y − x)− 1

β‖Fy − Fx‖2

)≤ ‖y − x‖2

inequality is definition of 1β -cocoercivity

33 / 47

Page 69: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Outline

Relations

Fixed points

Averaged operators

Cocoercive and coercive operators

Operators and functions

Resolvents and proxs

34 / 47

Page 70: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

∂f as an operator

if f is convex, ∂f is maximal monotone

if f is β-smooth,

I ∂f = ∇f is 1β -cocoercive

I ∂f = ∇f is β-Lipschitz

I

if f is α-strongly convex

I ∂f is α-strongly monotone

35 / 47

Page 71: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Gradient method converges

if f is convex and β-smooth,

I I − 2β∇f is nonexpansive,

I gradient mapping I − t∇f is averaged for t ∈ (0, 2β )

I so fixed point iteration

x (k+1) = (I − 2

β∇f )x (k)

converges

I from convergence guarantee for damped iteration,

1

(k + 1)θ(1− θ)‖x (0) − x?‖2 ≥ min

i=0,...,k‖(I − 2

β∇f )x (k) − x (k)‖2

≥ mini=0,...,k

(2

β)2‖∇f (x (k))‖2

36 / 47

Page 72: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Strong convexity and smoothness

f is α-strongly convex iff f ∗ is 1α -smooth

proof:

∂f −1 = ∂f ∗.∂f α-coercive ⇐⇒ ∂f ∗ α-cocoercive ⇐⇒ f ∗ 1

α -smooth.

moral: strong convexity and (strong) smoothness are dual

37 / 47

Page 73: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Strong convexity and smoothness

f is α-strongly convex iff f ∗ is 1α -smooth

proof: ∂f −1 = ∂f ∗.∂f α-coercive ⇐⇒ ∂f ∗ α-cocoercive ⇐⇒ f ∗ 1

α -smooth.

moral: strong convexity and (strong) smoothness are dual

37 / 47

Page 74: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Gradient update is contractive for SSC functions

suppose f is α-strongly convex and β-smooth

the relation

I − t∇f = {(x , x − t∇f (x)) : x ∈ dom f }

is contractive if t ∈ (0, 2αβ2 ). best contraction factor if t = αβ2

proof:

‖x − t∇f (x)− (y − t∇f (y))‖2

≤ ‖x − y‖2 + t2‖∇f (x)−∇f (y)‖2 − 2t(∇f (x)−∇f (y))T (x − y)

≤ ‖x − y‖2 + t2β2‖x − y‖2 − 2tα‖x − y‖2

≤ (1− 2tα + t2β2)‖x − y‖2

note: stronger proof (using co+cocoercive inequality from slide 28 ofGD lecture) shows I − t∇f is Lipschitz with parameterL = max{|1− tα|, |1− tβ|}. if t = 2

α+β , L = κ−1κ+1

38 / 47

Page 75: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Outline

Relations

Fixed points

Averaged operators

Cocoercive and coercive operators

Operators and functions

Resolvents and proxs

39 / 47

Page 76: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Resolvent operator

for relation F , define the resolvent of F

RF = (I + F )−1

consider resolvent of F

I (I + F ) = {(x , x + y) : (x , y) ∈ F}I RF = (I + F )−1 = {(x + y , x) : (x , y) ∈ F}I RF = {(u, v) : (u − v) ∈ F (v)}

40 / 47

Page 77: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Prox is the resolvent of ∂f

I proxf = R∂f = (I + ∂f )−1

proof: let z ∈ proxf (x),

z = argminz

f (z) +1

2‖z − x‖2

0 ∈ ∂f (z) + z − x

(x − z) ∈ ∂f (z)

I proxf = ∇h∗ where h(x) = f (x) + 12‖x‖

2

proof: h is CCP and ∂h = ∂f + I , so

∇h∗ = (∂h)−1 = (I + ∂f )−1

I proxf is a functionproof: h is strongly convex, so h∗ is smooth

41 / 47

Page 78: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Prox is the resolvent of ∂f

I proxf = R∂f = (I + ∂f )−1

proof: let z ∈ proxf (x),

z = argminz

f (z) +1

2‖z − x‖2

0 ∈ ∂f (z) + z − x

(x − z) ∈ ∂f (z)

I proxf = ∇h∗ where h(x) = f (x) + 12‖x‖

2

proof: h is CCP and ∂h = ∂f + I , so

∇h∗ = (∂h)−1 = (I + ∂f )−1

I proxf is a functionproof: h is strongly convex, so h∗ is smooth

41 / 47

Page 79: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Prox is the resolvent of ∂f

I proxf = R∂f = (I + ∂f )−1

proof: let z ∈ proxf (x),

z = argminz

f (z) +1

2‖z − x‖2

0 ∈ ∂f (z) + z − x

(x − z) ∈ ∂f (z)

I proxf = ∇h∗ where h(x) = f (x) + 12‖x‖

2

proof: h is CCP and ∂h = ∂f + I , so

∇h∗ = (∂h)−1 = (I + ∂f )−1

I proxf is a function

proof: h is strongly convex, so h∗ is smooth

41 / 47

Page 80: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Prox is the resolvent of ∂f

I proxf = R∂f = (I + ∂f )−1

proof: let z ∈ proxf (x),

z = argminz

f (z) +1

2‖z − x‖2

0 ∈ ∂f (z) + z − x

(x − z) ∈ ∂f (z)

I proxf = ∇h∗ where h(x) = f (x) + 12‖x‖

2

proof: h is CCP and ∂h = ∂f + I , so

∇h∗ = (∂h)−1 = (I + ∂f )−1

I proxf is a functionproof: h is strongly convex, so h∗ is smooth

41 / 47

Page 81: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Projection is resolvent of indicator function

let f = IC , the indicator function of the convex set C

I ∂f is the normal cone operator

∂f (x) = NC (x) =

{∅ x 6∈ C

{w : wT (z − x) ≤ 0, ∀z ∈ C} x ∈ C

I R∂f = proxf is

(I + ∂IC )−1(x) = argminu

(IC (u) +1

2‖u − x‖2 = ΠC (x)

42 / 47

Page 82: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Resolvent and Cayley operators

for relation F , recall the resolvent of F

RF = (I + F )−1

and define the Cayley operator (sometimes called reflection operator)

CF = 2RF − I = 2(I + F )−1 − I

properties:

I if F is monotone, then RF and CF are nonexpansive (and henceare functions)

I if F is maximal monotone, then RF and CF have full domain

I zeros of F are FPs of RF and CF

we’ll prove the first and last properties; second is Minty’s theorem

43 / 47

Page 83: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Fixed points of R and C are zeros of F

fact: fixed points of R and C are zeros of F

proof:

0 ∈ F (x) ⇐⇒ x ∈ (I + F )(x)

⇐⇒ (I + F )−1(x) 3 x

⇐⇒ x = R(x)

(using the fact that R is a function)

and if x = R(x),

C (x) = 2R(x)− I (x) = 2x − x = x

in particular, this means fixed points of proxf (x) are zeros of ∂f

44 / 47

Page 84: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Fixed points of R and C are zeros of F

fact: fixed points of R and C are zeros of F

proof:

0 ∈ F (x) ⇐⇒ x ∈ (I + F )(x)

⇐⇒ (I + F )−1(x) 3 x

⇐⇒ x = R(x)

(using the fact that R is a function)

and if x = R(x),

C (x) = 2R(x)− I (x) = 2x − x = x

in particular, this means fixed points of proxf (x) are zeros of ∂f

44 / 47

Page 85: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Fixed points of R and C are zeros of F

fact: fixed points of R and C are zeros of F

proof:

0 ∈ F (x) ⇐⇒ x ∈ (I + F )(x)

⇐⇒ (I + F )−1(x) 3 x

⇐⇒ x = R(x)

(using the fact that R is a function)

and if x = R(x),

C (x) = 2R(x)− I (x) = 2x − x = x

in particular, this means fixed points of proxf (x) are zeros of ∂f

44 / 47

Page 86: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Resolvent is nonexpansive

fact: R = RλF is nonexpansive

proof:

if (x , u) ∈ R and (y , v) ∈ R,

x ∈ u + F (u), y ∈ v + F (v)

so for some fu ∈ F (u), fv ∈ F (v),

x − y = u − v + fu − fv

(u − v)T (x − y) = ‖u − v‖2 + (u − v)T (fu − fv )

(u − v)T (x − y) ≥ ‖u − v‖2

so R is 1-cocoercive. this implies R is nonexpansive, too:

‖u − v‖‖x − y‖ ≥ ‖u − v‖2

‖x − y‖ ≥ ‖u − v‖

note: this also proves projections and proxs are nonexpansive

45 / 47

Page 87: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Resolvent is nonexpansive

fact: R = RλF is nonexpansive

proof: if (x , u) ∈ R and (y , v) ∈ R,

x ∈ u + F (u), y ∈ v + F (v)

so for some fu ∈ F (u), fv ∈ F (v),

x − y = u − v + fu − fv

(u − v)T (x − y) = ‖u − v‖2 + (u − v)T (fu − fv )

(u − v)T (x − y) ≥ ‖u − v‖2

so R is 1-cocoercive. this implies R is nonexpansive, too:

‖u − v‖‖x − y‖ ≥ ‖u − v‖2

‖x − y‖ ≥ ‖u − v‖

note: this also proves projections and proxs are nonexpansive45 / 47

Page 88: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Cayley operator is nonexpansive

fact: C = CF is nonexpansive

proof:

‖C (x)− C (y)‖2 = ‖2(u − v)− (x − y)‖2

= 4‖u − v‖2 − 4(x − y)T (u − v) + ‖x − y‖2

≤ ‖x − y‖2

using 1-cocoercivity of R (from above): ‖u − v‖2 ≤ (u − v)T (x − y).

note:

I this also shows R = 12C + 1

2 I is 12 -averaged

I more generally, θC + (1− θ)I is θ-averaged for any θ ∈ (0, 1)

46 / 47

Page 89: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

Cayley operator is nonexpansive

fact: C = CF is nonexpansive

proof:

‖C (x)− C (y)‖2 = ‖2(u − v)− (x − y)‖2

= 4‖u − v‖2 − 4(x − y)T (u − v) + ‖x − y‖2

≤ ‖x − y‖2

using 1-cocoercivity of R (from above): ‖u − v‖2 ≤ (u − v)T (x − y).

note:

I this also shows R = 12C + 1

2 I is 12 -averaged

I more generally, θC + (1− θ)I is θ-averaged for any θ ∈ (0, 1)

46 / 47

Page 90: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

More contractions!

I if F is α-strongly monotone

I then (I + F ) is 1 + α-strongly monotone

I so RF = (I + F )−1 is (1 + α)-cocoercive

I and hence RF is 11+α -Lipschitz

moral: we now have two ways to manufacture contractions:

I as a gradient update of an SSC function

I as a proximal update of a strongly convex function

47 / 47

Page 91: ORIE 6326: Convex Optimization [2ex] Operators · ORIE 6326: Convex Optimization Operators Professor Udell Operations Research and Information Engineering Cornell May 15, 2017 1/47

More contractions!

I if F is α-strongly monotone

I then (I + F ) is 1 + α-strongly monotone

I so RF = (I + F )−1 is (1 + α)-cocoercive

I and hence RF is 11+α -Lipschitz

moral: we now have two ways to manufacture contractions:

I as a gradient update of an SSC function

I as a proximal update of a strongly convex function

47 / 47