bv cvxslides

7/31/2019 Bv Cvxslides

1/300

Convex Optimization Boyd & Vandenberghe

1. Introduction

mathematical optimization

least-squares and linear programming

convex optimization

example

course goals and topics

nonlinear optimization

brief history of convex optimization

11


2/300

Mathematical optimization

(mathematical) optimization problem

minimize f 0(x)

subject to f i (x) bi , i = 1 , . . . , m x = ( x1, . . . , x n ): optimization variables

f 0 : Rn

R: objective function f i : Rn R, i = 1 , . . . , m : constraint functions

optimal solution x has smallest value of f 0 among all vectors thatsatisfy the constraints

Introduction 12


3/300

Examples

portfolio optimization

variables: amounts invested in different assets constraints: budget, max./min. investment per asset, minimum return objective: overall risk or return variancedevice sizing in electronic circuits

variables: device widths and lengths

constraints: manufacturing limits, timing requirements, maximum area objective: power consumptiondata tting

variables: model parameters constraints: prior information, parameter limits

objective: measure of mist or prediction error

Introduction 13


4/300

Solving optimization problems

general optimization problem

very difficult to solve methods involve some compromise, e.g. , very long computation time, ornot always nding the solution

exceptions: certain problem classes can be solved efficiently and reliably

least-squares problems linear programming problems convex optimization problems

Introduction 14


5/300

Least-squares

minimize Ax b 22solving least-squares problems

analytical solution: x = ( AT A) 1AT b reliable and efficient algorithms and software

computation time proportional to n2k (A

Rk n ); less if structured

a mature technologyusing least-squares

least-squares problems are easy to recognize a few standard techniques increase exibility (e.g. , including weights,adding regularization terms)

Introduction 15


6/300

Linear programming

minimize cT xsubject to aT i x bi , i = 1 , . . . , m

solving linear programs

no analytical formula for solution reliable and efficient algorithms and software

computation time proportional to n2m if m n; less with structure a mature technologyusing linear programming

not as easy to recognize as least-squares problems a few standard tricks used to convert problems into linear programs(e.g. , problems involving1- or -norms, piecewise-linear functions)

Introduction 16


7/300

Convex optimization problem

minimize f 0(x)subject to f i (x) bi , i = 1 , . . . , m

objective and constraint functions are convex:f i (x + y) f i (x) + f i (y)

if + = 1 , 0, 0

includes least-squares problems and linear programs as special cases

Introduction 17


8/300

solving convex optimization problems

no analytical solution reliable and efficient algorithms

computation time (roughly) proportional to max

{n3, n 2m, F

}, where F

is cost of evaluating f i s and their rst and second derivatives

almost a technology

using convex optimization

often difficult to recognize many tricks for transforming problems into convex form surprisingly many problems can be solved via convex optimization

Introduction 18


9/300

Example

m lamps illuminating n (small, at) patches

lamp power p j

illumination I k

r kjkj

intensity I k at patch k depends linearly on lamp powerspj :

I k =m

j =1

akj pj , akj = r 2kj max{cos kj , 0}problem : achieve desired illuminationI des with bounded lamp powers

minimize max k =1 ,...,n | log I k log I des|subject to 0 pj pmax, j = 1 , . . . , m

Introduction 19


10/300

how to solve?

1. use uniform power: pj = p, vary p2. use least-squares:

minimize nk =1 (I k I des)2

round pj if pj > p max or pj < 03. use weighted least-squares:

minimize nk =1 (I k I des)2 + mj =1 wj ( pj pmax/ 2)2iteratively adjust weights wj until 0 pj pmax

4. use linear programming:

minimize max k =1 ,...,n

|I k

I des

|subject to 0 pj pmax, j = 1 , . . . , mwhich can be solved via linear programming

of course these are approximate (suboptimal) solutions

Introduction 110


11/300

5. use convex optimization: problem is equivalent to

minimize f 0( p) = max k =1 ,...,n h(I k /I des)subject to 0 pj pmax, j = 1 , . . . , m

with h(u) = max

{u, 1/u

}

0 1 2 3 40

1

2

3

4

5

u

h ( u

)

f 0 is convex because maximum of convex functions is convex

exact solution obtained with effort modest factor least-squares effortIntroduction 111


12/300

additional constraints: does adding 1 or 2 below complicate the problem?

1. no more than half of total power is in any 10 lamps

2. no more than half of the lamps are on ( pj > 0)

answer: with (1), still easy to solve; with (2), extremely difficult moral: (untrained) intuition doesnt always work; without the properbackground very easy problems can appear quite similar to very difficult

problems

Introduction 112


13/300

Course goals and topics

goals

1. recognize/formulate problems (such as the illumination problem) asconvex optimization problems

2. develop code for problems of moderate size (1000 lamps, 5000 patches)

3. characterize optimal solution (optimal power distribution), give limits of performance, etc.

topics

1. convex sets, functions, optimization problems

2. examples and applications

3. algorithms

Introduction 113


14/300

Nonlinear optimization

traditional techniques for general nonconvex problems involve compromises

local optimization methods (nonlinear programming)

nd a point that minimizes f 0 among feasible points near it fast, can handle large problems require initial guess provide no information about distance to (global) optimumglobal optimization methods

nd the (global) solution worst-case complexity grows exponentially with problem size

these algorithms are often based on solving convex subproblems

Introduction 114


15/300

Brief history of convex optimization

theory (convex analysis) : ca19001970algorithms

1947: simplex algorithm for linear programming (Dantzig) 1960s: early interior-point methods (Fiacco & McCormick, Dikin, . . . ) 1970s: ellipsoid method and other subgradient methods 1980s: polynomial-time interior-point methods for linear programming(Karmarkar 1984) late 1980snow: polynomial-time interior-point methods for nonlinearconvex optimization (Nesterov & Nemirovski 1994)applications

before 1990: mostly in operations research; few in engineering since 1990: many new applications in engineering (control, signalprocessing, communications, circuit design, . . . ); new problem classes

(semidenite and second-order cone programming, robust optimization)

Introduction 115


16/300


2. Convex sets

affine and convex sets

some important examples

operations that preserve convexity

generalized inequalities

separating and supporting hyperplanes

dual cones and generalized inequalities

21


17/300

Affine set

line through x1, x2: all points

x = x1 + (1 )x2 (R)

x 1

x 2

= 1 .2 = 1

= 0 .6

= 0 = 0.2

affine set : contains the line through any two distinct points in the set

example : solution set of linear equations {x | Ax = b}(conversely, every affine set can be expressed as solution set of system of linear equations)

Convex sets 22


18/300

Convex set

line segment between x1 and x2: all points

x = x1 + (1 )x2with 0 1convex set : contains line segment between any two points in the set

x1, x2C, 0 1 = x1 + (1 )x2C examples (one convex, two nonconvex sets)

Convex sets 23


19/300

Convex combination and convex hull

convex combination of x1,. . . , xk : any point x of the form

x = 1x1 + 2x2 + + k xkwith 1 + + k = 1 , i 0

convex hullconv

S : set of all convex combinations of points inS

Convex sets 24


20/300

Convex cone

conic (nonnegative) combination of x1 and x2: any point of the form

x = 1x1 + 2x2

with 1 0, 2 0

0

x 1

x 2

convex cone : set that contains all conic combinations of points in the set

Convex sets 25


21/300

Hyperplanes and halfspaces

hyperplane : set of the form {x | aT x = b}(a = 0 )a

xa T x = b

x 0

halfspace: set of the form {x | aT x b}(a = 0 )a

a T x b

a T x b

x 0

a is the normal vector

hyperplanes are affine and convex; halfspaces are convex

Convex sets 26


22/300

Euclidean balls and ellipsoids

(Euclidean) ball with center xc and radius r :

B (xc, r ) = {x | x xc 2 r}= {xc + ru | u 2 1}ellipsoid: set of the form

{x | (x xc)T P 1(x xc) 1}with P S

n++ (i.e. , P symmetric positive denite)

x c

other representation:

{xc + Au

|u 2

1

}with A square and nonsingular

Convex sets 27


23/300

Norm balls and norm cones

norm: a function that satises x 0; x = 0 if and only if x = 0

tx =

|t

|x for t

R

x + y x + ynotation: is general (unspecied) norm; symb is particular normnorm ball with center xc and radius r :

{x

|x

xc

r

}norm cone: {(x, t ) | x t}Euclidean norm cone is called second-order cone

x 1x 2

t

10

1

1

0

10

0.5

1

norm balls and cones are convexConvex sets 28


24/300

Polyhedra

solution set of nitely many linear inequalities and equalities

Ax b, Cx = d

(ARm n , C R

p n , is componentwise inequality)

a 1 a 2

a 3

a 4

a 5P

polyhedron is intersection of nite number of halfspaces and hyperplanes

Convex sets 29


25/300

Positive semidenite cone

notation: Sn is set of symmetric n n matrices Sn+ = {X Sn | X 0}: positive semidenite n n matrices

X Sn+ z

T Xz 0 for all zSn+ is a convex cone

Sn++ =

{X

Sn

|X

0

}: positive denite n

n matrices

example: x yy z S2+

xy

z

00.5

1

1

0

10

0.5

1

Convex sets 210


26/300

Operations that preserve convexity

practical methods for establishing convexity of a set C

1. apply denition

x1, x2C, 0 1 = x1 + (1 )x2C

2. show that C is obtained from simple convex sets (hyperplanes,halfspaces, norm balls, . . . ) by operations that preserve convexity

intersection affine functions perspective function linear-fractional functions

Convex sets 211


27/300

Intersection

the intersection of (any number of) convex sets is convex

example:

S = {xRm

| | p(t)| 1 for |t| / 3}where p(t) = x1 cos t + x2 cos2t + + xm cos mtfor m = 2 :

0 / 3 2/ 3

1

01

t

p ( t )

x 1

x 2 S

2 1 0 1 2 2

1

0

1

2

Convex sets 212


28/300

Affine function

suppose f : Rn Rm is affine (f (x) = Ax + b with ARm n , bRm ) the image of a convex set under f is convex

S Rn

convex = f (S ) = {f (x) | xS }convex the inverse image f 1(C ) of a convex set under f is convex

C Rm

convex = f 1

(C ) = {xRn

| f (x)C }convexexamples

scaling, translation, projection

solution set of linear matrix inequality{x | x1A1 + + xm Am B}(with Ai , B S p) hyperbolic cone {x | xT P x (cT x)2, cT x 0}(with P Sn+ )

Convex sets 213


29/300

Perspective and linear-fractional function

perspective function P : Rn +1 Rn :P (x, t ) = x/t, dom P = {(x, t ) | t > 0}

images and inverse images of convex sets under perspective are convex

linear-fractional function f : Rn Rm :

f (x) =Ax + bcT x + d

, dom f = {x | cT x + d > 0}images and inverse images of convex sets under linear-fractional functionsare convex

Convex sets 214


30/300

example of a linear-fractional function

f (x) = 1x1 + x2 + 1

x

x 1

x 2

C

1 0 1 1

0

1

x 1

x 2

f (C )

1 0 1 1

0

1

Convex sets 215


31/300

Generalized inequalities

a convex cone K Rn is a proper cone if

K is closed (contains its boundary)

K is solid (has nonempty interior) K is pointed (contains no line)

examples

nonnegative orthant K = Rn+ = {xRn | x i 0, i = 1 , . . . , n } positive semidenite cone K = Sn+ nonnegative polynomials on [0, 1]:

K = {xRn | x1 + x2t + x3t2 + + xn tn 1 0 for t[0, 1]}

Convex sets 216


32/300

generalized inequality dened by a proper cone K :

x K y y xK, x K y y xint K examples

componentwise inequality (K = Rn+ )

x Rn+ y x i yi , i = 1 , . . . , n

matrix inequality (K = Sn+ )

X Sn+ Y Y X positive semidenitethese two types are so common that we drop the subscript in K

properties: many properties of K are similar to on R, e.g. ,x K y, u K v = x + u K y + v

Convex sets 217


33/300

Minimum and minimal elements

K is not in general a linear ordering : we can have x K y and y K x

xS is the minimum element of S with respect to K if

yS = xK y

xS is a minimal element of S with respect to K if

yS, y K x = y = x

example (K = R2+ )

x1 is the minimum element of S 1x2 is a minimal element of S 2 x 1

x 2S 1S 2

Convex sets 218


34/300

Separating hyperplane theorem

if C and D are disjoint convex sets, then there exists a = 0 , b such that

aT x b for xC, aT x b for xD

D

C

a

a T x b a T x b

the hyperplane {x | aT x = b}separates C and Dstrict separation requires additional assumptions ( e.g. , C is closed, D is asingleton)

Convex sets 219


35/300

Supporting hyperplane theorem

supporting hyperplane to set C at boundary point x0:

{x | aT x = aT x0}where a = 0 and aT x aT x0 for all xC

C

ax 0

supporting hyperplane theorem: if C is convex, then there exists asupporting hyperplane at every boundary point of C

Convex sets 220


36/300

Dual cones and generalized inequalities

dual cone of a cone K :

K = {y | yT x 0 for all xK }examples

K = Rn+ : K = Rn+ K = Sn+ : K = Sn+ K = {(x, t ) | x 2 t}: K = {(x, t ) | x 2 t} K = {(x, t ) | x 1 t}: K = {(x, t ) | x t}rst three examples are self-dual cones

dual cones of proper cones are proper, hence dene generalized inequalities:

y K 0 yT x 0 for all x K 0

Convex sets 221


37/300

Minimum and minimal elements via dual inequalities

minimum element w.r.t. K

x is minimum element of S iff for allK

0, x is the unique minimizerof T z over S

x

S

minimal element w.r.t. K

if x minimizes T z over S for some

K 0, then x is minimal

Sx 1

x 2

1

2

if x is a minimal element of aconvex set S , then there exists a nonzero K 0 such that x minimizes T z over S Convex sets 222


38/300

optimal production frontier

different production methods use different amounts of resources xRn

production set P : resource vectors x for all possible production methods efficient (Pareto optimal) methods correspond to resource vectors x

that are minimal w.r.t. Rn+

example (n = 2 )x1, x2, x3 are efficient; x4, x5 are not

x 4x 2

x 1

x 5

x 3

P

labor

fuel

Convex sets 223


39/300


3. Convex functions

basic properties and examples

operations that preserve convexity

the conjugate function

quasiconvex functions

log-concave and log-convex functions

convexity with respect to generalized inequalities

31


40/300

Denition

f : Rn

R is convex if dom f is a convex set andf (x + (1 )y) f (x) + (1 )f (y)

for all x, y

dom f , 0

1

(x, f (x ))

(y, f (y ))

f is concave if f is convex

f is strictly convex if dom f is convex and

f (x + (1 )y) < f (x) + (1 )f (y)for x, ydom f , x = y, 0 < < 1

Convex functions 32

l


41/300

Examples on R

convex:

affine: ax + b on R, for any a, bR

exponential: eax , for any a

R

powers: x on R++ , for 1 or 0 powers of absolute value: |x| p on R, for p 1

negative entropy: x log x on R++concave:

affine: ax + b on R, for any a, bR powers: x on R++ , for 0 1 logarithm: log x on R++

Convex functions 33


42/300

Examples on R n and R m n

affine functions are convex and concave; all norms are convex

examples on R n

affine function f (x) = aT x + b norms: x p = ( ni=1 |x i | p)1/p for p 1; x = max k |xk |examples on R m n (m n matrices)

affine function

f (X ) = tr (AT X ) + b =m

i=1

n

j =1

Aij X ij + b

spectral (maximum singular value) normf (X ) = X 2 = max (X ) = ( max (X T X ))1/ 2

Convex functions 34

R i i f f i li


43/300

Restriction of a convex function to a line

f : Rn R is convex if and only if the functiong : R R,g(t) = f (x + tv), dom g = {t | x + tvdom f }

is convex (in t) for any xdom f , vR

n

can check convexity of f by checking convexity of functions of one variable

example. f : Sn

R with f (X ) = log det X , dom f = Sn

++

g(t) = log det( X + tV ) = log det X + log det( I + tX 1/ 2V X 1/ 2)

= log det X +n

i=1

log(1 + t i )

where i are the eigenvalues of X 1/ 2V X 1/ 2

g is concave in t (for any choice of X

0, V ); hence f is concave

Convex functions 35

E t d d l t i


44/300

Extended-value extension

extended-value extension f of f is

f (x) = f (x), xdom f, f (x) = , xdom f

often simplies notation; for example, the condition

0 1 = f (x + (1 )y) f (x) + (1 )f (y)(as an inequality in R{}), means the same as the two conditions

dom f is convex for x, ydom f ,

0 1 = f (x + (1 )y) f (x) + (1 )f (y)

Convex functions 36

First order condition


45/300

First-order condition

f is differentiable if dom f is open and the gradient

f (x) =f (x)x1

,f (x)x2

, . . . ,f (x)xn

exists at each xdom f

1st-order condition: differentiable f with convex domain is convex iff

f (y) f (x) + f (x)T (y x) for all x, ydom f

(x, f (x ))

f (y )

f (x ) + f (x )T

(y x )

rst-order approximation of f is global underestimator

Convex functions 37

Second order conditions


46/300

Second-order conditions

f is twice differentiable if dom f is open and the Hessian2f (x)S

n ,

2f (x) ij =

2f (x)x

ix

j

, i, j = 1 , . . . , n ,

exists at each xdom f

2nd-order conditions: for twice differentiable f with convex domain

f is convex if and only if

2f (x) 0 for all x

dom f

if 2f (x)0 for all xdom f , then f is strictly convex

Convex functions 38

Examples


47/300

Examples

quadratic function: f (x) = (1 / 2)xT

P x + qT

x + r (with P Sn

)

f (x) = P x + q, 2f (x) = P

convex if P 0

least-squares objective : f (x) = Ax b 22f (x) = 2 A

T (Ax b), 2f (x) = 2 AT Aconvex (for any A)

quadratic-over-linear: f (x, y ) = x2/y

2f (x, y ) =

2y3

y

xy

xT

0

convex for y > 0 xy

f ( x

, y

)

20

2

01

20

1

2

Convex functions 39

log-sum-exp : f (x) = log n exp xk is convex


48/300

log-sum-exp : f (x) = log k =1 exp xk is convex

2f (x) = 1

1 T zdiag (z) 1(1 T z)2 zz

T (zk = exp xk )

to show

2

f (x) 0, we must verify that vT

2

f (x)v 0 for all v:vT

2f (x)v =( k zk v

2k )( k zk ) ( k vk zk )2

( k zk )2 0

since ( k vk zk )2 ( k zk v2k )( k zk ) (from Cauchy-Schwarz inequality)

geometric mean : f (x) = (nk =1 xk )1/n on R

n++ is concave

(similar proof as for log-sum-exp)

Convex functions 310

Epigraph and sublevel set


49/300

Epigraph and sublevel set

-sublevel set of f : Rn R:C = {xdom f | f (x) }

sublevel sets of convex functions are convex (converse is false)

epigraph of f : Rn R:epi

f = {(x, t )Rn +1

| xdom

f, f (x) t}epi f

f

f is convex if and only if epi f is a convex set


Jensens inequality


50/300

Jensen s inequality

basic inequality: if f is convex, then for 0 1,f (x + (1 )y) f (x) + (1 )f (y)

extension: if f is convex, then

f (E z) E f (z)for any random variable z

basic inequality is special case with discrete distribution

prob (z = x) = , prob (z = y) = 1




51/300


practical methods for establishing convexity of a function

1. verify denition (often simplied by restricting to a line)

2. for twice differentiable functions, show2f (x) 0

3. show that f is obtained from simple convex functions by operationsthat preserve convexity

nonnegative weighted sum composition with affine function pointwise maximum and supremum composition minimization perspective


Positive weighted sum & composition with affine function


52/300

Positive weighted sum & composition with affine function

nonnegative multiple: f is convex if f is convex, 0sum: f 1 + f 2 convex if f 1, f 2 convex (extends to innite sums, integrals)

composition with affine function : f (Ax + b) is convex if f is convex

examples

log barrier for linear inequalitiesf (x) =

m

i=1

log(bi aT i x), dom f = {x | aT i x < b i , i = 1 , . . . , m }

(any) norm of affine function: f (x) = Ax + b


Pointwise maximum


53/300

if f 1, . . . , f m are convex, then f (x) = max {f 1(x), . . . , f m (x)}is convex

examples

piecewise-linear function: f (x) = max i=1 ,...,m (aT i x + bi ) is convex sum of r largest components of xRn :

f (x) = x[1] + x[2] + + x[r ]is convex (x[i ] is ith largest component of x)

proof:f (x) = max {x i 1 + x i 2 + + x i r | 1 i1 < i 2 < < i r n}


Pointwise supremum


54/300

p

if f (x, y ) is convex inx for each yA, theng(x) = sup

yAf (x, y )

is convexexamples

support function of a set C : S C (x) = sup yC yT x is convex

distance to farthest point in a set C :f (x) = sup

yC x y

maximum eigenvalue of symmetric matrix: forX Sn ,max (X ) = sup

y 2=1yT Xy


Composition with scalar functions


55/300

p

composition of g : Rn

R and h : R R:f (x) = h(g(x))

f is convex if g convex, h convex, h nondecreasingg concave, h convex, h nonincreasing

proof (for n = 1 , differentiable g, h)f (x) = h (g(x))g(x)2 + h (g(x))g (x)

note: monotonicity must hold for extended-value extension h

examples

exp g(x) is convex if g is convex 1/g (x) is convex if g is concave and positive


Vector composition


56/300

composition of g : Rn Rk and h : Rk R:f (x) = h(g(x)) = h(g1(x), g2(x), . . . , gk (x))

f is convex if gi convex, h convex, h nondecreasing in each argumentgi concave, h convex, h nonincreasing in each argument

proof (for n = 1 , differentiable g, h)

f (x) = g (x)T 2h(g(x))g (x) + h(g(x))

T g (x)

examples mi=1 log gi (x) is concave if gi are concave and positive log mi=1 exp gi (x) is convex if gi are convex


Minimization


57/300

if f (x, y ) is convex in(x, y ) and C is a convex set, then

g(x) = inf yC

f (x, y )

is convex

examples

f (x, y ) = xT Ax + 2 xT By + yT Cy with

A BB T C 0, C 0

minimizing overy gives g(x) = inf y f (x, y ) = xT

(A BC 1

BT

)xg is convex, hence Schur complement A BC 1B T 0

distance to a set: dist (x, S ) = inf yS x y is convex if S is convex


Perspective


58/300

the perspective of a function f : Rn R is the function g : Rn R R,g(x, t ) = tf (x/t ), dom g = {(x, t ) | x/t dom f, t > 0}

g is convex if f is convex

examples

f (x) = xT x is convex; hence g(x, t ) = xT x/t is convex fort > 0 negative logarithm f (x) = log x is convex; hence relative entropyg(x, t ) = t log t t log x is convex onR2++ if f is convex, then

g(x) = ( cT x + d)f (Ax + b)/ (cT x + d)

is convex on{x | cT x + d > 0, (Ax + b)/ (cT x + d)dom f }Convex functions 320

The conjugate function


59/300

the conjugate of a function f is

f (y) = supxdom f

(yT x f (x))

f (x )

(0 , f (y ))

xy

x

f is convex (even if f is not) will be useful in chapter 5


examples


60/300

negative logarithm f (x) = log xf (y) = sup

x> 0(xy + log x)

= 1 log(y) y < 0 otherwise

strictly convex quadratic f (x) = (1 / 2)xT Qx with Q

Sn++

f (y) = supx

(yT x (1/ 2)xT Qx )=

1

2yT Q 1y



61/300

Examples


62/300

|x| is quasiconvex on R ceil(x) = inf {zZ | z x}is quasilinear log x is quasilinear on R++ f (x1, x2) = x1x2 is quasiconcave on R2++ linear-fractional function

f (x) =aT x + b

cT x + d, dom f =

{x

|cT x + d > 0

}is quasilinear

distance ratiof (x) = x a 2x b 2

, dom f = {x | x a 2 x b 2}is quasiconvex


internal rate of return


63/300

cash ow x = ( x0, . . . , x n ); x i is payment in period i (to us if x i > 0) we assume x0 < 0 and x0 + x1 + + xn > 0 present value of cash ow x, for interest rate r :

PV( x, r ) =n

i=0

(1 + r ) i x i

internal rate of return is smallest interest rate for which PV( x, r ) = 0 :

IRR( x) = inf {r 0 | PV( x, r ) = 0}IRR is quasiconcave: superlevel set is intersection of halfspaces

IRR( x) R n

i=0

(1 + r ) i x i 0 for 0 r R


Properties


64/300

modied Jensen inequality: for quasiconvex f 0 1 = f (x + (1 )y) max{f (x), f (y)}

rst-order condition: differentiable f with cvx domain is quasiconvex iff

f (y) f (x) = f (x)T (y x) 0

x f (x )

sums of quasiconvex functions are not necessarily quasiconvex


Log-concave and log-convex functions


65/300

a positive function f is log-concave if log f is concave:

f (x + (1 )y) f (x)f (y)1 for 0 1f is log-convex if log f is convex

powers: xa on R++ is log-convex fora 0, log-concave for a 0 many common probability densities are log-concave, e.g. , normal:

f (x) = 1 (2)n det e 12 (x x )

T 1(x x )

cumulative Gaussian distribution function is log-concave

(x) =1

2 x e u 2 / 2 duConvex functions 327

Properties of log-concave functions


66/300

twice differentiable f with convex domain is log-concave if and only if f (x)

2f (x) f (x)f (x)T

for all xdom f

product of log-concave functions is log-concave

sum of log-concave functions is not always log-concave integration: if f : Rn Rm R is log-concave, then

g(x) = f (x, y ) dyis log-concave (not easy to show)Convex functions 328

consequences of integration property


67/300

convolution f

g of log-concave functions f , g is log-concave

(f g)(x) = f (x y)g(y)dy if C Rn convex and y is a random variable with log-concave pdf then

f (x) = prob (x + yC )

is log-concave

proof: write f (x) as integral of product of log-concave functions

f (x) = g(x + y) p(y) dy, g(u) = 1 uC 0 uC, p is pdf of yConvex functions 329

example: yield function


68/300

Y (x) = prob (x + wS )

xRn : nominal parameter values for product

wRn

: random variations of parameters in manufactured product

S : set of acceptable values

if S is convex and w has a log-concave pdf, then

Y is log-concave

yield regions{x | Y (x) }are convex


Convexity with respect to generalized inequalities


69/300

f : Rn Rm is K -convex if dom f is convex andf (x + (1 )y) K f (x) + (1 )f (y)

for x, ydom f , 0 1

example f : Sm Sm , f (X ) = X 2 is Sm+ -convex

proof: for xed zRm , zT X 2z = Xz 22 is convex inX , i.e. ,

zT (X + (1 )Y )2z zT X 2z + (1 )zT Y 2zfor X, Y S

m , 0 1

therefore (X + (1 )Y )2 X 2 + (1 )Y 2Convex functions 331


4 Convex optimization problems


70/300

4. Convex optimization problems

optimization problem in standard form

convex optimization problems

quasiconvex optimization linear optimization

quadratic optimization geometric programming generalized inequality constraints

semidenite programming vector optimization

41

Optimization problem in standard form


71/300

minimize f 0(x)subject to f i (x) 0, i = 1 , . . . , mh i (x) = 0 , i = 1 , . . . , p

xRn

is the optimization variable

f 0 : Rn R is the objective or cost function f i : Rn R, i = 1 , . . . , m , are the inequality constraint functions

h i : Rn

R are the equality constraint functionsoptimal value:

p = inf

{f 0(x)

|f i (x)

0, i = 1 , . . . , m, h i (x) = 0 , i = 1 , . . . , p

} p = if problem is infeasible (nox satises the constraints) p = if problem is unbounded below

Convex optimization problems 42

Optimal and locally optimal points


72/300

x is feasible if xdom f 0 and it satises the constraints

a feasible x is optimal if f 0(x) = p; X opt is the set of optimal points

x is locally optimal if there is an R > 0 such that x is optimal for

minimize (over z) f 0(z)subject to f i (z) 0, i = 1 , . . . , m, h i (z) = 0 , i = 1 , . . . , pz x 2 R

examples (with n = 1 , m = p = 0 )

f 0(x) = 1 /x , dom f 0 = R++ : p = 0 , no optimal point

f 0(x) = log x, dom f 0 = R++ : p

= f 0(x) = x log x, dom f 0 = R++ : p = 1/e , x = 1 /e is optimal f 0(x) = x3 3x, p = , local optimum at x = 1


Implicit constraints


73/300

the standard form optimization problem has an implicit constraint

xD=m

i=0

dom f i p

i=1

dom h i ,

we call Dthe domain of the problem

the constraints f i (x)

0, h i (x) = 0 are the explicit constraints

a problem is unconstrained if it has no explicit constraints (m = p = 0 )

example :minimize f 0(x) = ki=1 log(bi aT i x)

is an unconstrained problem with implicit constraints aT i x < b i


Feasibility problem


74/300

nd xsubject to f i (x) 0, i = 1 , . . . , mh i (x) = 0 , i = 1 , . . . , p

can be considered a special case of the general problem withf 0(x) = 0 :

minimize 0subject to f i (x) 0, i = 1 , . . . , mh i (x) = 0 , i = 1 , . . . , p

p = 0 if constraints are feasible; any feasiblex is optimal

p = if constraints are infeasible


Convex optimization problem


75/300

standard form convex optimization problem

minimize f 0(x)subject to f i (x) 0, i = 1 , . . . , m

aT

i x = bi , i = 1 , . . . , p

f 0, f 1, . . . , f m are convex; equality constraints are affine problem is quasiconvex if f 0 is quasiconvex (and f 1, . . . , f m convex)often written as

minimize f 0(x)

subject to f i (x) 0, i = 1 , . . . , mAx = b

important property: feasible set of a convex optimization problem is convex


example


76/300

minimize f 0(x) = x21 + x22subject to f 1(x) = x1/ (1 + x22) 0h1(x) = ( x1 + x2)2 = 0

f 0 is convex; feasible set{(x1, x2) | x1 = x2 0}is convex not a convex problem (according to our denition): f 1 is not convex, h1is not affine

equivalent (but not identical) to the convex problemminimize x21 + x22

subject to x1 0x1 + x2 = 0


Local and global optima

any locally optimal point of a convex problem is (globally) optimal


77/300

any locally optimal point of a convex problem is (globally) optimal

proof : suppose x is locally optimal andy is optimal with f 0(y) < f 0(x)

x locally optimal means there is an R > 0 such that

z feasible, z x 2 R = f 0(z) f 0(x)

consider z = y + (1 )x with = R/ (2 y x 2)

y x 2 > R , so 0 < < 1/ 2 z is a convex combination of two feasible points, hence also feasible z x 2 = R/ 2 and

f 0(z) f 0(x) + (1 )f 0(y) < f 0(x)which contradicts our assumption that x is locally optimal


Optimality criterion for differentiable f 0


78/300

x is optimal if and only if it is feasible and

f 0(x)T (y x) 0 for all feasibley

f 0(x )

X x

if nonzero,f 0(x) denes a supporting hyperplane to feasible set X at x


unconstrained problem : x is optimal if and only if x dom f 0, f 0(x) = 0


79/300

x

dom f 0,

f 0(x) 0

equality constrained problemminimize f 0(x) subject to Ax = b

x is optimal if and only if there exists a such that

xdom f 0, Ax = b, f 0(x) + AT = 0

minimization over nonnegative orthantminimize f 0(x) subject to x 0

x is optimal if and only if

xdom f 0, x 0, f 0(x)i 0 x i = 0f 0(x)i = 0 x i > 0


Equivalent convex problems

two problems are (informally) equivalent if the solution of one is readily


80/300

p ( y) q yobtained from the solution of the other, and vice-versa

some common transformations that preserve convexity:

eliminating equality constraints

minimize f 0(x)subject to f i (x) 0, i = 1 , . . . , mAx = b

is equivalent to

minimize (over z) f 0(F z + x0)subject to f i (F z + x0) 0, i = 1 , . . . , m

where F and x0 are such that

Ax = b x = F z + x0 for some z


introducing equality constraintsminimize f 0(A0x + b0)


81/300

subject to f i (Ai x + bi ) 0, i = 1 , . . . , mis equivalent to

minimize (over x, yi ) f 0(y0)subject to f i (yi ) 0, i = 1 , . . . , myi = Ai x + bi , i = 0 , 1, . . . , m

introducing slack variables for linear inequalities

minimize f 0(x)subject to aT i x bi , i = 1 , . . . , m

is equivalent to

minimize (over x, s) f 0(x)subject to aT i x + s i = bi , i = 1 , . . . , m

s i 0, i = 1 , . . . mConvex optimization problems 412

epigraph form : standard form convex problem is equivalent tominimize (over x, t) t


82/300

minimize (over x, t) tsubject to f 0(x) t 0f i (x) 0, i = 1 , . . . , mAx = b

minimizing over some variablesminimize f 0(x1, x2)subject to f i (x1)

0, i = 1 , . . . , m

is equivalent to

minimize f 0(x1)subject to f i (x1) 0, i = 1 , . . . , m

where f 0(x1) = inf x 2 f 0(x1, x2)


Quasiconvex optimization


83/300

minimize f 0(x)subject to f i (x) 0, i = 1 , . . . , mAx = b

with f 0 : Rn

R quasiconvex, f 1, . . . , f m convex

can have locally optimal points that are not (globally) optimal

(x, f 0(x ))


convex representation of sublevel sets of f 0

if f 0 is quasiconvex, there exists a family of functionst such that:


84/300

t (x) is convex inx for xed t t-sublevel set of f 0 is 0-sublevel set of t , i.e. ,

f 0(x) t t (x) 0example

f 0(x) =p(x)q(x)

with p convex, q concave, and p(x) 0, q(x) > 0 on dom f 0can take

t(x) = p(x)

tq(x):

for t 0, t convex in x p(x)/q (x) t if and only if t (x) 0


quasiconvex optimization via convex feasibility problems

(x) 0, f (x) 0, i = 1 , . . . , m, Ax = b (1)


85/300

t( )

,

i( )

, , , , ( )

for xed t, a convex feasibility problem inx if feasible, we can conclude that t p; if infeasible,t p

Bisection method for quasiconvex optimization

given l p, u p, tolerance > 0.repeat

1. t := ( l + u ) / 2.2. Solve the convex feasibility problem (1).3. if (1) is feasible, u := t ; else l := t .

until u l .

requires exactly log2((u l)/) iterations (where u, l are initial values)Convex optimization problems 416

Linear program (LP)


86/300

minimize cT x + dsubject to Gx h

Ax = b

convex problem with affine objective and constraint functions feasible set is a polyhedron

P x

c


Examples

diet problem: choose quantities x1, . . . , xn of n foods


87/300

one unit of food j costs cj , contains amount a ij of nutrient i healthy diet requires nutrient i in quantity at least bito nd cheapest healthy diet,

minimize cT xsubject to Ax b, x 0

piecewise-linear minimization

minimize max i=1 ,...,m (aT i x + bi )

equivalent to an LP

minimize tsubject to aT i x + bi t, i = 1 , . . . , m


Chebyshev center of a polyhedron

Chebyshev center of


88/300

P = {x | aT i x bi , i = 1 , . . . , m }is center of largest inscribed ball

B= {xc + u | u 2 r}

x chebx cheb

aT i x

bi for all x

Bif and only if

sup{aT i (xc + u) | u 2 r}= aT i xc + r a i 2 bi

hence, xc, r can be determined by solving the LP

maximize rsubject to aT i xc + r a i 2 bi , i = 1 , . . . , m


(Generalized) linear-fractional program

i i i f ( )


89/300

minimize f 0(x)subject to Gx h

Ax = b

linear-fractional program

f 0(x) =cT x + deT x + f

, dom f 0(x) = {x | eT x + f > 0}

a quasiconvex optimization problem; can be solved by bisection also equivalent to the LP (variables y, z)

minimize cT y + dz

subject to Gy hzAy = bzeT y + fz = 1z 0


generalized linear-fractional program

f 0( )cT

ix + d

i d f 0( ) Ti f i 0 i 1


90/300

f 0(x) = maxi=1 ,...,r i ieT i x + f i ,dom f 0(x) = {x | eT i x+ f i > 0, i = 1 , . . . , r }

a quasiconvex optimization problem; can be solved by bisection

example : Von Neumann model of a growing economy

maximize (over x, x+ ) min i=1 ,...,n x+i /x isubject to x+ 0, Bx + Ax

x, x + Rn : activity levels of n sectors, in current and next period

(Ax)i , (Bx + ) i : produced, resp. consumed, amounts of good i

x+i /x i : growth rate of sector iallocate activity to maximize growth rate of slowest growing sector


Quadratic program (QP)

i i i (1/ 2) T P + T +


91/300

minimize (1/ 2)xT P x + qT x + rsubject to Gx h

Ax = b

P Sn+ , so objective is convex quadratic minimize a convex quadratic function over a polyhedron

P

x

f 0(x

)


Examples

l t q 2


92/300

least-squares minimize Ax b 22

analytical solution x = Ab (A is pseudo-inverse)

can add linear constraints, e.g. , l x ulinear program with random cost

minimize cT

x + xT

x = E cT

x + var (cT

x)subject to Gx h, Ax = b

c is random vector with mean c and covariance

hence, cT x is random variable with mean cT x and variance xT x > 0 is risk aversion parameter; controls the trade-off betweenexpected cost and variance (risk)


Quadratically constrained quadratic program (QCQP)


93/300

minimize (1/ 2)xT P 0x + qT 0 x + r 0subject to (1/ 2)xT P i x + qT i x + r i 0, i = 1 , . . . , mAx = b

P iSn+ ; objective and constraints are convex quadratic

if P 1, . . . , P m Sn++ , feasible region is intersection of m ellipsoids andan affine set


Second-order cone programming

T


94/300

minimize f T xsubject to Ai x + bi 2 cT i x + di , i = 1 , . . . , mF x = g

(AiRn i n , F R

p n )

inequalities are called second-order cone (SOC) constraints:(Ai x + bi , cT i x + di )second-order cone in R

n i +1

for n i = 0 , reduces to an LP; if ci = 0 , reduces to a QCQP

more general than QCQP and LP


Robust linear programming

the parameters in optimization problems are often uncertain, e.g. , in an LP


95/300

minimize cT xsubject to aT i x bi , i = 1 , . . . , m ,

there can be uncertainty in c, ai, b

i

two common approaches to handling uncertainty (in a i , for simplicity)

deterministic model: constraints must hold for all a iE iminimize cT xsubject to aT i x bi for all a iE i , i = 1 , . . . , m ,

stochastic model: a i is random variable; constraints must hold withprobability

minimize cT xsubject to prob (aT i x bi ) , i = 1 , . . . , m


deterministic approach via SOCP

choose an ellipsoid as

Ei:


96/300

E E i = {a i + P i u | u 2 1} (a iRn , P iRn n )

center is a i , semi-axes determined by singular values/vectors of P i

robust LPminimize cT x

subject to aT i x bi a iE i , i = 1 , . . . , m

is equivalent to the SOCP

minimize cT xsubject to aT i x + P T i x 2 bi , i = 1 , . . . , m

(follows from sup u 2 1(a i + P i u)T x = aT i x + P T i x 2)



97/300

Geometric programming

monomial function


98/300

f (x) = cxa 11 xa 22 xa nn , dom f = Rn++

with c > 0; exponent i can be any real number

posynomial function: sum of monomials

f (x) =K

k =1

ck xa 1k1 x

a 2k2 xa nkn , dom f = Rn++

geometric program (GP)


with f i posynomial, h i monomial


Geometric program in convex form

change variables to yi = log x i , and take logarithm of cost, constraints


99/300

monomial f (x) = cxa 11 xa nn transforms tolog f (ey1 , . . . , e yn ) = aT y + b (b = log c)

posynomial f (x) = Kk =1 ck xa 1k1 x

a 2k2 x

a nkn transforms to

log f (ey1 , . . . , e yn ) = logK

k =1

eaT k y+ bk (bk = log ck )

geometric program transforms to convex problemminimize log Kk =1 exp( aT 0k y + b0k )subject to log Kk =1 exp( a

T ik y + bik ) 0, i = 1 , . . . , m

Gy + d = 0


Design of cantilever beamsegment 4 segment 3 segment 2 segment 1


100/300

F

N segments with unit lengths, rectangular cross-sections of size wi h i given vertical forceF applied at the right enddesign problem

minimize total weightsubject to upper & lower bounds on wi , h i

upper bound & lower bounds on aspect ratios h i /w i

upper bound on stress in each segmentupper bound on vertical deection at the end of the beam

variables: wi , h i for i = 1 , . . . , N


objective and constraint functions

total weight w1h1 +

+ wN hN is posynomial


101/300

aspect ratio h i /w i and inverse aspect ratio wi /h i are monomials

maximum stress in segment i is given by6iF/ (wi h2i ), a monomial

the vertical deection yi and slope vi of central axis at the right end of segment i are dened recursively as

vi = 12( i 1/ 2) F Ew i h3i + vi+1yi = 6( i 1/ 3)

F Ew i h3i

+ vi+1 + yi+1

for i = N, N 1, . . . , 1, with vN +1 = yN +1 = 0 (E is Youngs modulus)vi and yi are posynomial functions of w, h


formulation as a GP

minimize w1h1 + + wN hN subject to w 1max wi 1, wmin w 1i 1, i = 1 , . . . , N


102/300

j , , , ,h 1max h i 1, hmin h 1i 1, i = 1 , . . . , N S 1max w

1i h i 1, S min wi h 1i 1, i = 1 , . . . , N

6iF 1max w 1i h 2i 1, i = 1 , . . . , N y 1max y1 1

note

we write wmin wi wmax and hmin h i hmaxwmin /w i 1, wi /w max 1, hmin /h i 1, hi /h max 1

we write S min h i /w i S max asS min wi /h i 1, hi / (wi S max ) 1


Minimizing spectral radius of nonnegative matrix

Perron-Frobenius eigenvalue pf (A)

i f ( l i ) i i ARn n


103/300

exists for (elementwise) positive ARn n

a real, positive eigenvalue of A, equal to spectral radius max i | i (A)| determines asymptotic growth (decay) rate of A

k

: Ak

kpf as k alternative characterization: pf (A) = inf { | Av v for some v0}

minimizing spectral radius of matrix of posynomials

minimize pf (A(x)) , where the elements A(x)ij are posynomials of x equivalent geometric program:

minimize subject to nj =1 A(x)ij vj / (v i ) 1, i = 1 , . . . , nvariables , v, x


Generalized inequality constraints

convex problem with generalized inequality constraints


104/300

minimize f 0(x)subject to f i (x) K i 0, i = 1 , . . . , m

Ax = b

f 0 : Rn R convex; f i : Rn Rk i K i -convex w.r.t. proper cone K i same properties as standard convex problem (convex feasible set, localoptimum is global, etc.)conic form problem : special case with affine objective and constraints

minimize cT x

subject to F x + g K 0Ax = b

extends linear programming (K = Rm+ ) to nonpolyhedral cones


Semidenite program (SDP)

minimize cT x


105/300

subject to x1F 1 + x2F 2 + + xn F n + G 0Ax = bwith F i , G

Sk

inequality constraint is called linear matrix inequality (LMI) includes problems with multiple LMI constraints: for example,

x1F 1 + + xn F n + G 0, x1F 1 + + xn F n + G 0is equivalent to single LMI

x1F 1 00 F 1

+ x2F 2 00 F 2

+ + xnF n 00 F n

+ G 00 G

0


LP and SOCP as SDP

LP and equivalent SDP


106/300

LP: minimize cT xsubject to Ax b

SDP: minimize cT xsubject to diag (Ax b) 0

(note different interpretation of generalized inequality )

SOCP and equivalent SDP

SOCP: minimize f T xsubject to Ai x + bi 2 cT i x + di , i = 1 , . . . , m

SDP: minimize f T x

subject to (cT i x + di )I A i x + bi

(Ai x + bi )T cT i x + di0, i = 1 , . . . , m



107/300

Matrix norm minimization

minimize A(x) 2 = max (A(x)T A(x)) 1/ 2


108/300

where A(x) = A0 + x1A1 + + xn An (with given AiR p q)equivalent SDP

minimize t

subject to tI A (x)A(x)T tI 0

variables xRn , tR constraint follows from

A 2 t AT

A t2

I, t 0

tI AAT tI 0


Vector optimization

general vector optimization problem


109/300

minimize (w.r.t. K ) f 0(x)subject to f i (x) 0, i = 1 , . . . , m

h i (x) 0, i = 1 , . . . , pvector objective f 0 : Rn Rq, minimized w.r.t. proper cone K Rq

convex vector optimization problem

minimize (w.r.t. K ) f 0(x)subject to f i (x) 0, i = 1 , . . . , m

Ax = b

with f 0 K -convex, f 1, . . . , f m convex


Optimal and Pareto optimal points

set of achievable objective values


110/300

O= {f 0(x) | x feasible}

feasible x is optimal if f 0(x) is a minimum value of O feasible x is Pareto optimal if f 0(x) is a minimal value of O

O

f 0(x )x is optimal

O

f 0(x po )

x po is Pareto optimal


Multicriterion optimization

vector optimization problem with K = Rq+

f (x) = ( F (x) F (x))


111/300

f 0(x) = ( F 1(x), . . . , F q(x))

q different objectives F i ; roughly speaking we want allF i s to be small

feasible x is optimal if y feasible = f 0(x

) f 0(y)

if there exists an optimal point, the objectives are noncompeting

feasible xpo is Pareto optimal if y feasible, f 0(y) f 0(xpo ) =

f 0(xpo ) = f 0(y)

if there are multiple Pareto optimal values, there is a trade-off betweenthe objectives


Regularized least-squares

minimize (w.r.t. R2+ ) ( Ax b

22, x

22)


112/300

0 10 20 30 40 500

5

10

15

20

25

F 1(x) = Ax

b 22

F 2 ( x )

=

x

2 2 O

example for AR100 10 ; heavy line is formed by Pareto optimal points


Risk return trade-off in portfolio optimization

minimize (w.r.t. R2+ ) (

pT x, x T x)

subject to 1 T x = 1 , x 0


113/300

xRn is investment portfolio; x i is fraction invested in asset i

p

Rn is vector of relative asset price changes; modeled as a randomvariable with mean p, covariance

pT x = E r is expected return; xT x = var r is return varianceexample

m e a n r e t u r n

standard deviation of return0% 10% 20%

0%

5%

10%

15%

standard deviation of return

a l l o c a t i o n x

x (1)

x (2)x (3)x (4)

0% 10% 20%

0

0.5

1


Scalarization

to nd Pareto optimal points: choose

K 0 and solve scalar problem

minimize T f 0(x)


114/300


if x is optimal for scalar problem,then it is Pareto-optimal for vectoroptimization problem

O

f 0(x 1)

1f

0(x

2) 2

f 0(x 3)

for convex vector optimization problems, can nd (almost) all Paretooptimal points by varying K

0


Scalarization for multicriterion problems

to nd Pareto optimal points, minimize positive weighted sum

T f 0(x) = 1F 1(x) + + qF q(x)


115/300

0( ) 1 1( ) q q( )

examples

regularized least-squares problem of page 443

take = (1 , ) with > 0

minimize Ax b 22 + x 22for xed , a LS problem

0 5 10 15 200

5

10

15

20

Ax b 22

x

2 2

= 1


risk-return trade-off of page 444minimize

pT x + x T x

subject to 1 T x = 1 , x 0


116/300

for xed > 0, a quadratic program



117/300

Lagrangian

standard form problem (not necessarily convex)

minimize f 0(x)


118/300

( )subject to f i (x) 0, i = 1 , . . . , mh i (x) = 0 , i = 1 , . . . , p

variable xRn , domain D, optimal value p

Lagrangian: L : Rn Rm R p R, with dom L = D Rm R p,

L(x,, ) = f 0(x) +m

i=1

i f i (x) + p

i=1

i h i (x)

weighted sum of objective and constraint functions

i is Lagrange multiplier associated with f i (x) 0 i is Lagrange multiplier associated with h i (x) = 0

Duality 52

Lagrange dual function

Lagrange dual function: g : Rm

R p

R,


119/300

g(, ) = inf xD

L(x,, )

= inf xD f 0(x) +m

i=1 i f i (x) +

p

i=1 i h i (x)

g is concave, can be for some , lower bound property: if 0, then g(, ) pproof: if x is feasible and 0, then

f 0(x) L(x,, ) inf xD L(x,, ) = g(, )minimizing over all feasiblex gives p g(, )Duality 53


120/300

Standard form LP

minimize cT

xsubject to Ax = b, x 0


121/300

dual function

Lagrangian isL(x,, ) = cT x + T (Ax b) T x

= bT + ( c + AT )T x L is affine inx, hence

g(, ) = inf x

L(x,, ) = bT AT + c = 0 otherwise

g is linear on affine domain{(, ) | AT + c = 0}, hence concavelower bound property : p bT if AT + c 0Duality 55

Equality constrained norm minimization

minimize xsubject to Ax = b


122/300

dual function

g( ) = inf x ( x T Ax + bT ) =bT AT

1

otherwisewhere v = sup u 1 uT v is dual norm of proof: follows frominf x ( x

yT x) = 0 if y

1,

otherwise

if y 1, then x yT x 0 for all x, with equality if x = 0 if y > 1, choose x = tu where u 1, uT y = y > 1:

x yT x = t( u y ) as t lower bound property: p bT if AT 1Duality 56

Two-way partitioning

minimize xT

W xsubject to x2i = 1 , i = 1 , . . . , n


123/300

a nonconvex problem; feasible set contains 2n discrete points

interpretation: partition {1, . . . , n }in two sets; W ij is cost of assigningi, j to the same set; W ij is cost of assigning to different setsdual function

g( ) = inf x (xT W x +i

i (x2i 1)) = inf x xT (W + diag ( ))x 1 T = 1 T W + diag ( ) 0

otherwise

lower bound property : p 1 T if W + diag ( ) 0example: = min (W )1 gives bound p n min (W )Duality 57

Lagrange dual and conjugate function

minimize f 0(x)subject to Ax b, Cx = d


124/300

dual function

g(, ) = inf xdom f 0 f 0(x) + ( AT

+ C T

)T

x bT

dT

= f 0 (AT C T ) bT dT

recall denition of conjugate f (y) = sup xdom f (yT x

f (x))

simplies derivation of dual if conjugate of f 0 is kownexample: entropy maximization

f 0(x) =n

i=1

x i log x i , f 0 (y) =n

i=1

ey i 1

Duality 58

The dual problem

Lagrange dual problem

maximize g(, )subject to 0


125/300

subject to 0

nds best lower bound on p

, obtained from Lagrange dual function a convex optimization problem; optimal value denoted d , are dual feasible if 0, (, )dom g

often simplied by making implicit constraint (, )dom g explicitexample: standard form LP and its dual (page 55)

minimize cT xsubject to Ax = b

x 0

maximize

bT

subject to AT + c 0

Duality 59

Weak and strong duality

weak duality: d p

always holds (for convex and nonconvex problems) can be used to nd nontrivial lower bounds for difficult problems


126/300

can be used to nd nontrivial lower bounds for difficult problemsfor example, solving the SDP

maximize 1 T subject to W + diag ( ) 0gives a lower bound for the two-way partitioning problem on page 57

strong duality: d = p

does not hold in general

(usually) holds for convex problems conditions that guarantee strong duality in convex problems are calledconstraint qualicationsDuality 510

Slaters constraint qualication

strong duality holds for a convex problem

minimize f 0(x)subject to f (x) 0 i = 1 m


127/300

subject to f i (x) 0, i = 1 , . . . , mAx = bif it is strictly feasible,i.e. ,

xint D: f i (x) < 0, i = 1 , . . . , m, Ax = b

also guarantees that the dual optimum is attained (if p > ) can be sharpened: e.g. , can replace int Dwith relint D(interior

relative to affine hull); linear inequalities do not need to hold with strictinequality, . . .

there exist many other types of constraint qualications

Duality 511

Inequality form LP

primal problem minimize cT xsubject to Ax b


128/300

dual function

g() = inf x

(c + AT )T x bT = bT AT + c = 0

otherwise

dual problemmaximize bT subject to AT + c = 0 , 0

from Slaters condition: p = d if Axb for some x in fact, p = d except when primal and dual are infeasible

Duality 512

Quadratic program

primal problem (assume P

Sn++ )

minimize xT P xsubject to Ax b


129/300

dual functiong() = inf

xxT P x + T (Ax b) =

14

T AP 1AT bT

dual problemmaximize (1/ 4)T AP 1AT bT subject to 0

from Slaters condition: p = d if Axb for some x in fact, p = d always

Duality 513

A nonconvex problem with strong duality

minimize xT Ax + 2 bT xsubject to xT x 1

A 0, hence nonconvex


130/300

dual function: g() = inf x (xT (A + I )x + 2 bT x

)

unbounded below if A + I 0 or if A + I 0 and bR(A + I ) minimized by x = (A + I )b otherwise: g() = bT (A + I )b

dual problem and equivalent SDP:

maximize bT (A + I )bsubject to A + I 0bR(A + I )

maximize t subject to

A + I bbT t 0

strong duality although primal problem is not convex (not easy to show)

Duality 514

Geometric interpretation

for simplicity, consider problem with one constraint f 1(x) 0interpretation of dual function:

g() = inf (t + u ) where G= {(f 1(x) f 0(x)) | xD}


131/300

g() = inf (u,t )G

(t + u ), where G= {(f 1(x), f 0(x)) | xD}

G

p

g( )u + t = g( )

t

u

G

p

d

t

u

u + t = g() is (non-vertical) supporting hyperplane to G hyperplane intersects t-axis at t = g()

Duality 515

epigraph variation: same interpretation if Gis replaced with

A= {(u, t ) | f 1(x) u, f 0(x) t for some xD}A

t


132/300

p

g( )

u + t = g( )

u

strong duality

holds if there is a non-vertical supporting hyperplane toAat (0, p) for convex problem,Ais convex, hence has supp. hyperplane at (0, p) Slaters condition: if there exist (u, t )Awith u < 0, then supportinghyperplanes at (0, p) must be non-vertical

Duality 516


133/300

Karush-Kuhn-Tucker (KKT) conditions

the following four conditions are called KKT conditions (for a problem withdifferentiable f i , h i ):


134/300

1. primal constraints: f i (x) 0, i = 1 , . . . , m , h i (x) = 0 , i = 1 , . . . , p2. dual constraints: 0

3. complementary slackness: i f i (x) = 0 , i = 1 , . . . , m

4. gradient of Lagrangian with respect to x vanishes:

f 0(x) +m

i=1

if i (x) + p

i=1

ih i (x) = 0

from page 517: if strong duality holds andx, , are optimal, then theymust satisfy the KKT conditions

Duality 518

KKT conditions for convex problem

if x, , satisfy KKT for a convex problem, then they are optimal:

from complementary slackness: f 0(x) = L(x, , )


135/300

from 4th condition (and convexity): g(, ) = L(x, , )hence, f 0(x) = g(, )

if Slaters condition is satised:

x is optimal if and only if there exist, that satisfy KKT conditions

recall that Slater implies strong duality, and dual optimum is attained generalizes optimality conditionf 0(x) = 0 for unconstrained problem

Duality 519

example: water-lling (assume i > 0)

minimize ni=1 log(x i + i )subject to x 0, 1

T x = 1

x is optimal iff x 0, 1 T x = 1 , and there exist Rn , R such that


136/300

0, i x i = 0 ,1

x i + i + i =

if < 1/ i : i = 0 and x i = 1 / i

if

1/ i : i =

1/ i and x i = 0

determine from 1 T x = ni=1 max{0, 1/ i}= 1interpretation

n patches; level of patch i is at height

i

ood area with unit amount of water resulting level is1/ i

1/ x i i

Duality 520

Perturbation and sensitivity analysis

(unperturbed) optimization problem and its dual

minimize f 0(x)subject to f i (x) 0, i = 1 , . . . , mh ( ) 0 i 1

maximize g(, )subject to 0


137/300

h i (x) = 0 , i = 1 , . . . , p

perturbed problem and its dual

min. f 0(x)

s.t. f i (x) u i , i = 1 , . . . , mh i (x) = vi , i = 1 , . . . , pmax. g(, ) uT vT s.t. 0

x is primal variable;u, v are parameters

p

(u, v ) is optimal value as a function of u, v we are interested in information about p(u, v ) that we can obtain fromthe solution of the unperturbed problem and its dual

Duality 521

global sensitivity result

assume strong duality holds for unperturbed problem, and that , aredual optimal for unperturbed problem

apply weak duality to perturbed problem:

T T


138/300

p(u, v ) g(, ) uT vT = p(0, 0) uT vT

sensitivity interpretation

if i large: p increases greatly if we tighten constraint i (u i < 0) if i small: p does not decrease much if we loosen constraint i (u i > 0) if i large and positive: p increases greatly if we take vi < 0;

if

i large and negative: p

increases greatly if we take vi > 0 if i small and positive: p does not decrease much if we take vi > 0;if i small and negative: p does not decrease much if we take vi < 0

Duality 522

local sensitivity: if (in addition) p(u, v ) is differentiable at (0, 0), then

i =

p(0, 0)

u i, i =

p(0, 0)

viproof (for i ): from global sensitivity result,


139/300

p(0, 0)

u i= lim

t 0

p(te i , 0) p(0, 0)t

i

p(0, 0)u i

= limt 0

p(te i , 0) p(0, 0)t

i

hence, equality

p(u) for a problem with one (inequality)constraint: u

p(u )

p(0) u

u = 0

Duality 523

Duality and problem reformulations

equivalent formulations of a problem can lead to very different duals reformulating the primal problem can be useful when the dual is difficultto derive or uninteresting


140/300

to derive, or uninteresting

common reformulations

introduce new variables and equality constraints

make explicit constraints implicit or vice-versa transform objective or constraint functions

e.g. , replace f 0(x) by (f 0(x)) with convex, increasing

Duality 524


141/300

norm approximation problem: minimize Ax bminimize y

subject to y = Ax bcan look up conjugate of , or derive dual directly


142/300

g( ) = inf x,y

( y + T y

T Ax + bT )

= bT + inf y ( y + T y) AT = 0

otherwise

=bT AT = 0 ,

1

otherwise(see page 54)

dual of norm approximation problemmaximize bT subject to AT = 0 , 1

Duality 526

Implicit constraints

LP with box constraints: primal and dual problem

minimize cT xsubject to Ax = b

1 x 1

maximize bT 1 T 1 1 T 2subject to c + AT + 1 2 = 01 0 2 0


143/300

1 x 1 1 0, 2 0reformulation with box constraints made implicit

minimize f 0(x) =cT x 1 x 1 otherwise

subject to Ax = bdual function

g( ) = inf 1 x 1

(cT x + T (Ax b))= bT AT + c 1

dual problem: maximize bT AT + c 1Duality 527

Problems with generalized inequalities

minimize f 0(x)subject to f i (x) K i 0, i = 1 , . . . , m

h i (x) = 0 , i = 1 , . . . , p

K is generalized inequality onRk i


144/300

K i is generalized inequality onR

denitions are parallel to scalar case:

Lagrange multiplier for f i (x) K i 0 is vector iRk i Lagrangian L : Rn Rk1 Rkm R p R, is dened as

L(x, 1, , m , ) = f 0(x) +m

i=1

T i f i (x) + p

i=1

i h i (x)

dual function g : Rk1 Rkm R p R, is dened asg(1, . . . , m , ) = inf

xDL(x, 1, , m , )

Duality 528


145/300

Semidenite program

primal SDP (F i , GSk )

minimize cT xsubject to x1F 1 + + xn F n G

k


146/300

Lagrange multiplier is matrix Z

Sk

Lagrangian L(x, Z ) = cT x + tr (Z (x1F 1 + + xn F n G)) dual function

g(Z ) = inf x L(x, Z ) = tr (GZ ) tr (F i Z ) + ci = 0 , i = 1 , . . . , n

otherwisedual SDP

maximize tr

(GZ )subject to Z 0, tr (F i Z ) + ci = 0 , i = 1 , . . . , n

p = d if primal SDP is strictly feasible (x with x1F 1 + + xn F n G)Duality 530


6. Approximation and tting

i ti


147/300

norm approximation

least-norm problems

regularized approximation

robust approximation

61


148/300

examples

least-squares approximation ( 2): solution satises normal equationsAT Ax = AT b

(x = ( AT A) 1AT b if rank A = n)


149/300

(x ( A A) A b if rank A n)

Chebyshev approximation ( ): can be solved as an LPminimize tsubject to

t1 Ax

b t1

sum of absolute residuals approximation ( 1): can be solved as an LP

minimize 1 T ysubject to y Ax b y

Approximation and tting 63


150/300

example (m = 100 , n = 30 ): histogram of residuals for penalties

(u) =

|u

|, (u) = u2, (u) = max

{0,

|u

|a

}, (u) =

log(1

u2)

p =

1

0

40


151/300

p =

2

D e a d

z o n e

L o g b a r r i e r

r 2

2

2

2

1

1

1

1

0

0

0

0

1

1

1

1

2

2

2

20

0

10

0

20

0

10

shape of penalty function has large effect on distribution of residuals


Huber penalty function (with parameter M )

hub (u) =u2 |u| M M (2|u| M ) |u| > M

linear growth for large u makes approximation less sensitive to outliers2


152/300

u

h u

b ( u )

1.5 1 0.5 0 0 .5 1 1 .50

0.5

1

1.5

t

f ( t )

10 5 0 5 10 20

10

0

10

20

left: Huber penalty for M = 1

right: affine function f (t) = + t tted to 42 points t i , yi (circles)using quadratic (dashed) and Huber (solid) penalty


Least-norm problems

minimize xsubject to Ax = b

(ARm n with m n, is a norm on Rn )


153/300

interpretations of solution x = argmin Ax = b x :

geometric: x is point in affine set{x | Ax = b}with minimumdistance to 0

estimation: b = Ax are (perfect) measurements of x; x is smallest(most plausible) estimate consistent with measurements

design: x are design variables (inputs); b are required results (outputs)x is smallest (most efficient) design that satises requirements


examples

least-squares solution of linear equations ( 2):can be solved via optimality conditions

2x + AT = 0 , Ax = b


154/300

minimum sum of absolute values ( 1): can be solved as an LPminimize 1 T ysubject to y x y, Ax = b

tends to produce sparse solution x

extension: least-penalty problem

minimize (x1) +

+ (x

n)

subject to Ax = b

: R R is convex penalty functionApproximation and tting 68

Regularized approximation

minimize (w.r.t. R2+ ) ( Ax b , x )AR

m n , norms on Rm and Rn can be different


155/300

interpretation: nd good approximation Ax b with small x

estimation: linear measurement model y = Ax + v, with priorknowledge that x is small

optimal design : small x is cheaper or more efficient, or the linearmodel y = Ax is only valid for smallx

robust approximation: good approximation Ax

b with small x is

less sensitive to errors inA than good approximation with large x


Scalarized problem

minimize Ax b + x solution for > 0 traces out optimal trade-off curve

other common method: minimize Ax b 2 + x 2 with > 0


156/300

other common method: minimize Ax

b 2 + x 2 with > 0

Tikhonov regularization

minimize Ax

b 22 + x 22

can be solved as a least-squares problem

minimizeA

I x b0

2

2

solution x = ( AT A + I ) 1AT b


Optimal input design

linear dynamical system with impulse response h:

y(t) =t

=0

h( )u(t ), t = 0 , 1, . . . , N

input design problem: multicriterion problem with 3 objectives


157/300

input design problem: multicriterion problem with 3 objectives

1. tracking error with desired output ydes : J track =N t =0 (y(t) ydes (t))2

2. input magnitude: J mag =N t =0 u(t)

2

3. input variation: J der =N 1

t =0 (u(t + 1) u(t))2

track desired output using a small and slowly varying input signal

regularized least-squares formulation

minimize J track + J der + J mag

for xed , , a least-squares problem in u(0) , . . . , u(N )


example : 3 solutions on optimal trade-off curve

(top) = 0 , small ; (middle) = 0 , larger ; (bottom) large

u ( t )

5

0

5

y ( t )

1

0.50

0.51


158/300

t0 50 100 150 200 10

t0 50 100 150 200 1

t

u ( t )

0 50 100 150 200 4 2

0

2

4

t

y ( t )

0 50 100 150 200 1

0.5

0

0.5

1

t

u ( t )

0 50 100 150 200 4

20

2

4

t

y ( t )

0 50 100 150 200 1

0.50

0.5

1


Signal reconstruction

minimize (w.r.t. R2+ ) ( x xcor 2, (x))

xRn is unknown signal


159/300

xcor = x + v is (known) corrupted version of x, with additive noise v variable x (reconstructed signal) is estimate of x : Rn R is regularization function or smoothing objectiveexamples: quadratic smoothing, total variation smoothing:

quad

(x) =n 1

i=1

(xi+1

xi)2,

tv(x) =

n 1

i=1 |x

i+1 x

i |


quadratic smoothing example

x

0 5

0

0.5 x

0 1000 2000 3000 4000 0.5

0

0.5

0.5


160/300

i

x c o r

0

0

1000

1000

2000

2000

3000

3000

4000

4000

0.5

0.5

0

0.5

i

x

x

0

0

1000

1000

2000

2000

3000

3000

4000

4000

0.5

0.5

0

0

0.5

original signal x and noisysignal xcor

three solutions on trade-off curvex xcor 2 versus quad (x)


total variation reconstruction example

x

1

0

1

2 x i

i

0 500 1000 1500 2000 2

0

2

2


161/300

i

x c o r

0

0

500

500

1000

1000

1500

1500

2000

2000

2

2

1

0

1

2

i

x i

x i

0

0

500

500

1000

1000

1500

1500

2000

2000

2

2

0

0

2

original signal x and noisysignal xcorthree solutions on trade-off curve

x xcor 2 versus quad (x)quadratic smoothing smooths out noise and sharp transitions in signal


x

0 500 1000 1500 2000 2

1

0

1

2 x

x

0 500 1000 1500 2000 2

0

0

2

2


162/300

i

x c o r

0 500 1000 1500 2000 2

1

0

1

2

i

x

0

0

500

500

1000

1000

1500

1500

2000

2000

2

2

0

2

original signal x and noisysignal xcor

three solutions on trade-off curvex

xcor 2 versus tv (x)

total variation smoothing preserves sharp transitions in signal


Robust approximation

minimize Ax b with uncertain Atwo approaches:

stochastic : assume A is random, minimize E Ax b worst-case: set Aof possible values of A, minimize sup AA Ax b


163/300

tractable only in special cases (certain norms , distributions, sets A)

example : A(u) = A0 + uA1

xnom minimizes A0x b 22 xstoch minimizes E A(u)x b 22with u uniform on [1, 1] xwc minimizes sup 1 u 1 A(u)x b 22gure shows r (u) = A(u)x b 2

u

r ( u

)

x nom

x stochx wc

2 1 0 1 20

2

4

6

8

10

12


stochastic robust LS with A = A + U , U random, E U = 0 , E U T U = P

minimize E (A + U )x b 22

explicit expression for objective:E Ax b 22 = E Ax b + Ux 22


164/300

= Ax b 22 + E xT U T Ux= Ax b 22 + xT P x

hence, robust LS problem is equivalent to LS problem

minimize Ax b 22 + P 1/ 2x 22

for P = I , get Tikhonov regularized problem

minimize Ax b 22 + x 22


worst-case robust LS with A= {A + u1A1 + + u pA p | u 2 1}minimize sup AA Ax b 22 = sup u 2 1 P (x)u + q(x) 22

where P (x) = A1x A2x A px , q(x) = Ax b

from page 514, strong duality holds between the following problems


165/300

maximize P u + q 22subject to u 22 1

minimize t +

subject toI P q

P T I 0qT 0 t

0

hence, robust LS problem is equivalent to SDPminimize t +

subject toI P (x) q(x)

P (x)T I 0q(x)T 0 t

0


example: histogram of residuals

r (u) = (A0 + u1A1 + u2A2)x b 2with u uniformly distributed on unit disk, for three values of x

x rls0.2

0.25


166/300

r (u )

x lsx tik

f r e q u e n c y

0 1 2 3 4 50

0.05

0.1

0.15

x ls minimizes A0x

b 2

x tik minimizes A0x b 22 + x 22 (Tikhonov solution) xwc minimizes sup u 2 1 A0x b 22 + x 22



7. Statistical estimation

maximum likelihood estimation


167/300

optimal detector design

experiment design

71

Parametric distribution estimation

distribution estimation problem: estimate probability density p(y) of arandom variable from observed values parametric distribution estimation: choose from a family of densities px (y), indexed by a parameter x


168/300

maximum likelihood estimation

maximize (over x) log px (y)

y is observed value l(x) = log px (y) is called log-likelihood function

can add constraints xC explicitly, or dene px (y) = 0 for xC a convex optimization problem if log px (y) is concave in x for xed y

Statistical estimation 72

Linear measurements with IID noise

linear measurement model

yi = aT i x + vi , i = 1 , . . . , m

x Rn is vector of unknown parameters


169/300

vi is IID measurement noise, with densityp(z) yi is measurement: yRm has density px (y) = mi=1 p(yi aT i x)

maximum likelihood estimate: any solution x of

maximize l(x) = mi=1 log p(yi

aT i x)

(y is observed value)


examples

Gaussian noise N (0, 2): p(z) = (2 2) 1/ 2e z2 / (2 2) ,

l(x) = m2 log(2 2) 122m

i=1

(aT i x yi )2

ML estimate is LS solution| z | /a


170/300

Laplacian noise: p(z) = (1 / (2a))e ,l(x) = m log(2a)

1a

m

i =1|aT i x yi |

ML estimate is 1-norm solution

uniform noise on [a, a ]:

l(x) = m log(2a)

|aT i x

yi

| a, i = 1 , . . . , m

otherwiseML estimate is any x with |aT i x yi | a


Logistic regression

random variable y {0, 1}with distribution

p = prob (y = 1) = exp( aT u + b)

1 + exp( aT u + b)

a, b are parameters; uRn are (observable) explanatory variables


171/300

estimation problem: estimate a, b from m observations (u i , yi )log-likelihood function (for y1 = = yk = 1 , yk +1 = = ym = 0 ):

l(a, b) = logk

i=1

exp( aT u i + b)1 + exp( aT u i + b)

m

i= k +1

11 + exp( aT u i + b)

=

k

i=1 (aT

u i + b) m

i=1 log(1 + exp( aT

u i + b))

concave in a, b


example (n = 1 , m = 50 measurements)

b

( y =

1 )

0.6

0.8

1


172/300

u

p r o b

0 2 4 6 8 10

0

0.2

0.4

circles show 50 points (u i , yi )

solid curve is ML estimate of p = exp( au + b)/ (1 + exp( au + b))


(Binary) hypothesis testing

detection (hypothesis testing) problem

given observation of a random variableX {1, . . . , n }, choose between:

hypothesis 1: X was generated by distribution p = ( p1, . . . , p n )h h d b d b ( 1 )


173/300

hypothesis 2: X was generated by distribution q = ( q1, . . . , qn )

randomized detector

a nonnegative matrix T R2 n , with 1 T T = 1 T

if we observeX = k, we choose hypothesis 1 with probabilityt1k ,hypothesis 2 with probability t2k if all elements of T are 0 or 1, it is called a deterministic detector


detection probability matrix:

D = T p T q = 1 P fp P fnP fp 1 P fn

P fp is probability of selecting hypothesis 2 if X is generated bydistribution 1 (false positive)P f i b bili f l i h h i 1 ifX i d b


174/300

P fn is probability of selecting hypothesis 1 if X is generated bydistribution 2 (false negative)multicriterion formulation of detector design

minimize (w.r.t. R2+ ) (P fp , P fn ) = (( T p)2, (T q)1)subject to t1k + t2k = 1 , k = 1 , . . . , n

t ik 0, i = 1 , 2, k = 1 , . . . , nvariable T R

2 n


scalarization (with weight > 0)

minimize (T p)2 + (T q)1

subject to t1k + t2k = 1 , tik 0, i = 1 , 2, k = 1 , . . . , nan LP with a simple analytical solution

(t1k t 2k )(1, 0) pk qk(0 1) k < k


175/300

(t1k , t 2k ) = (0, 1) pk < q k a deterministic detector, given by a likelihood ratio test

if pk = qk for some k, any value 0

t1k

1, t1k = 1

t2k is optimal

(i.e. , Pareto-optimal detectors include non-deterministic detectors)

minimax detector

minimize max

{P fp , P fn

}= max

{(T p)2, (T q)1

}subject to t1k + t2k = 1 , tik 0, i = 1 , 2, k = 1 , . . . , nan LP; solution is usually not deterministic


example

P =

0.70 0.100.20 0.100.05 0.700.05 0.10

0 8

1


176/300

P fp

P f n

12

34

0 0.2 0 .4 0 .6 0 .8 10

0.2

0.4

0.6

0.8

solutions 1, 2, 3 (and endpoints) are deterministic; 4 is minimax detector


Experiment design

m linear measurements yi = aT i x + wi , i = 1 , . . . , m of unknown xRn

measurement errors wi are IID N (0, 1) ML (least-squares) estimate is

x =

m

a i aT i

1 m

yi a i


177/300

x = i=1 a i a i i=1 yi a i

error e = x x has zero mean and covariance

E = E eeT =m

i=1

a i aT i 1

condence ellipsoids are given by

{x

|(x

x)T E 1(x

x)

}experiment design : choose a i {v1, . . . , v p}(a set of possible testvectors) to make E small


vector optimization formulation

minimize (w.r.t. Sn+

bv cvxslides

Documents