bv cvxslides

Upload: xinhui-zhou

Post on 05-Apr-2018

242 views

Category:

Documents


1 download

TRANSCRIPT

  • 7/31/2019 Bv Cvxslides

    1/300

    Convex Optimization Boyd & Vandenberghe

    1. Introduction

    mathematical optimization

    least-squares and linear programming

    convex optimization

    example

    course goals and topics

    nonlinear optimization

    brief history of convex optimization

    11

  • 7/31/2019 Bv Cvxslides

    2/300

    Mathematical optimization

    (mathematical) optimization problem

    minimize f 0(x)

    subject to f i (x) bi , i = 1 , . . . , m x = ( x1, . . . , x n ): optimization variables

    f 0 : Rn

    R: objective function f i : Rn R, i = 1 , . . . , m : constraint functions

    optimal solution x has smallest value of f 0 among all vectors thatsatisfy the constraints

    Introduction 12

  • 7/31/2019 Bv Cvxslides

    3/300

    Examples

    portfolio optimization

    variables: amounts invested in different assets constraints: budget, max./min. investment per asset, minimum return objective: overall risk or return variancedevice sizing in electronic circuits

    variables: device widths and lengths

    constraints: manufacturing limits, timing requirements, maximum area objective: power consumptiondata tting

    variables: model parameters constraints: prior information, parameter limits

    objective: measure of mist or prediction error

    Introduction 13

  • 7/31/2019 Bv Cvxslides

    4/300

    Solving optimization problems

    general optimization problem

    very difficult to solve methods involve some compromise, e.g. , very long computation time, ornot always nding the solution

    exceptions: certain problem classes can be solved efficiently and reliably

    least-squares problems linear programming problems convex optimization problems

    Introduction 14

  • 7/31/2019 Bv Cvxslides

    5/300

    Least-squares

    minimize Ax b 22solving least-squares problems

    analytical solution: x = ( AT A) 1AT b reliable and efficient algorithms and software

    computation time proportional to n2k (A

    Rk n ); less if structured

    a mature technologyusing least-squares

    least-squares problems are easy to recognize a few standard techniques increase exibility (e.g. , including weights,adding regularization terms)

    Introduction 15

  • 7/31/2019 Bv Cvxslides

    6/300

    Linear programming

    minimize cT xsubject to aT i x bi , i = 1 , . . . , m

    solving linear programs

    no analytical formula for solution reliable and efficient algorithms and software

    computation time proportional to n2m if m n; less with structure a mature technologyusing linear programming

    not as easy to recognize as least-squares problems a few standard tricks used to convert problems into linear programs(e.g. , problems involving1- or -norms, piecewise-linear functions)

    Introduction 16

  • 7/31/2019 Bv Cvxslides

    7/300

    Convex optimization problem

    minimize f 0(x)subject to f i (x) bi , i = 1 , . . . , m

    objective and constraint functions are convex:f i (x + y) f i (x) + f i (y)

    if + = 1 , 0, 0

    includes least-squares problems and linear programs as special cases

    Introduction 17

  • 7/31/2019 Bv Cvxslides

    8/300

    solving convex optimization problems

    no analytical solution reliable and efficient algorithms

    computation time (roughly) proportional to max

    {n3, n 2m, F

    }, where F

    is cost of evaluating f i s and their rst and second derivatives

    almost a technology

    using convex optimization

    often difficult to recognize many tricks for transforming problems into convex form surprisingly many problems can be solved via convex optimization

    Introduction 18

  • 7/31/2019 Bv Cvxslides

    9/300

    Example

    m lamps illuminating n (small, at) patches

    lamp power p j

    illumination I k

    r kjkj

    intensity I k at patch k depends linearly on lamp powerspj :

    I k =m

    j =1

    akj pj , akj = r 2kj max{cos kj , 0}problem : achieve desired illuminationI des with bounded lamp powers

    minimize max k =1 ,...,n | log I k log I des|subject to 0 pj pmax, j = 1 , . . . , m

    Introduction 19

  • 7/31/2019 Bv Cvxslides

    10/300

    how to solve?

    1. use uniform power: pj = p, vary p2. use least-squares:

    minimize nk =1 (I k I des)2

    round pj if pj > p max or pj < 03. use weighted least-squares:

    minimize nk =1 (I k I des)2 + mj =1 wj ( pj pmax/ 2)2iteratively adjust weights wj until 0 pj pmax

    4. use linear programming:

    minimize max k =1 ,...,n

    |I k

    I des

    |subject to 0 pj pmax, j = 1 , . . . , mwhich can be solved via linear programming

    of course these are approximate (suboptimal) solutions

    Introduction 110

  • 7/31/2019 Bv Cvxslides

    11/300

    5. use convex optimization: problem is equivalent to

    minimize f 0( p) = max k =1 ,...,n h(I k /I des)subject to 0 pj pmax, j = 1 , . . . , m

    with h(u) = max

    {u, 1/u

    }

    0 1 2 3 40

    1

    2

    3

    4

    5

    u

    h ( u

    )

    f 0 is convex because maximum of convex functions is convex

    exact solution obtained with effort modest factor least-squares effortIntroduction 111

  • 7/31/2019 Bv Cvxslides

    12/300

    additional constraints: does adding 1 or 2 below complicate the problem?

    1. no more than half of total power is in any 10 lamps

    2. no more than half of the lamps are on ( pj > 0)

    answer: with (1), still easy to solve; with (2), extremely difficult moral: (untrained) intuition doesnt always work; without the properbackground very easy problems can appear quite similar to very difficult

    problems

    Introduction 112

  • 7/31/2019 Bv Cvxslides

    13/300

    Course goals and topics

    goals

    1. recognize/formulate problems (such as the illumination problem) asconvex optimization problems

    2. develop code for problems of moderate size (1000 lamps, 5000 patches)

    3. characterize optimal solution (optimal power distribution), give limits of performance, etc.

    topics

    1. convex sets, functions, optimization problems

    2. examples and applications

    3. algorithms

    Introduction 113

  • 7/31/2019 Bv Cvxslides

    14/300

    Nonlinear optimization

    traditional techniques for general nonconvex problems involve compromises

    local optimization methods (nonlinear programming)

    nd a point that minimizes f 0 among feasible points near it fast, can handle large problems require initial guess provide no information about distance to (global) optimumglobal optimization methods

    nd the (global) solution worst-case complexity grows exponentially with problem size

    these algorithms are often based on solving convex subproblems

    Introduction 114

  • 7/31/2019 Bv Cvxslides

    15/300

    Brief history of convex optimization

    theory (convex analysis) : ca19001970algorithms

    1947: simplex algorithm for linear programming (Dantzig) 1960s: early interior-point methods (Fiacco & McCormick, Dikin, . . . ) 1970s: ellipsoid method and other subgradient methods 1980s: polynomial-time interior-point methods for linear programming(Karmarkar 1984) late 1980snow: polynomial-time interior-point methods for nonlinearconvex optimization (Nesterov & Nemirovski 1994)applications

    before 1990: mostly in operations research; few in engineering since 1990: many new applications in engineering (control, signalprocessing, communications, circuit design, . . . ); new problem classes

    (semidenite and second-order cone programming, robust optimization)

    Introduction 115

  • 7/31/2019 Bv Cvxslides

    16/300

    Convex Optimization Boyd & Vandenberghe

    2. Convex sets

    affine and convex sets

    some important examples

    operations that preserve convexity

    generalized inequalities

    separating and supporting hyperplanes

    dual cones and generalized inequalities

    21

  • 7/31/2019 Bv Cvxslides

    17/300

    Affine set

    line through x1, x2: all points

    x = x1 + (1 )x2 (R)

    x 1

    x 2

    = 1 .2 = 1

    = 0 .6

    = 0 = 0.2

    affine set : contains the line through any two distinct points in the set

    example : solution set of linear equations {x | Ax = b}(conversely, every affine set can be expressed as solution set of system of linear equations)

    Convex sets 22

  • 7/31/2019 Bv Cvxslides

    18/300

    Convex set

    line segment between x1 and x2: all points

    x = x1 + (1 )x2with 0 1convex set : contains line segment between any two points in the set

    x1, x2C, 0 1 = x1 + (1 )x2C examples (one convex, two nonconvex sets)

    Convex sets 23

  • 7/31/2019 Bv Cvxslides

    19/300

    Convex combination and convex hull

    convex combination of x1,. . . , xk : any point x of the form

    x = 1x1 + 2x2 + + k xkwith 1 + + k = 1 , i 0

    convex hullconv

    S : set of all convex combinations of points inS

    Convex sets 24

  • 7/31/2019 Bv Cvxslides

    20/300

    Convex cone

    conic (nonnegative) combination of x1 and x2: any point of the form

    x = 1x1 + 2x2

    with 1 0, 2 0

    0

    x 1

    x 2

    convex cone : set that contains all conic combinations of points in the set

    Convex sets 25

  • 7/31/2019 Bv Cvxslides

    21/300

    Hyperplanes and halfspaces

    hyperplane : set of the form {x | aT x = b}(a = 0 )a

    xa T x = b

    x 0

    halfspace: set of the form {x | aT x b}(a = 0 )a

    a T x b

    a T x b

    x 0

    a is the normal vector

    hyperplanes are affine and convex; halfspaces are convex

    Convex sets 26

  • 7/31/2019 Bv Cvxslides

    22/300

    Euclidean balls and ellipsoids

    (Euclidean) ball with center xc and radius r :

    B (xc, r ) = {x | x xc 2 r}= {xc + ru | u 2 1}ellipsoid: set of the form

    {x | (x xc)T P 1(x xc) 1}with P S

    n++ (i.e. , P symmetric positive denite)

    x c

    other representation:

    {xc + Au

    |u 2

    1

    }with A square and nonsingular

    Convex sets 27

  • 7/31/2019 Bv Cvxslides

    23/300

    Norm balls and norm cones

    norm: a function that satises x 0; x = 0 if and only if x = 0

    tx =

    |t

    |x for t

    R

    x + y x + ynotation: is general (unspecied) norm; symb is particular normnorm ball with center xc and radius r :

    {x

    |x

    xc

    r

    }norm cone: {(x, t ) | x t}Euclidean norm cone is called second-order cone

    x 1x 2

    t

    10

    1

    1

    0

    10

    0.5

    1

    norm balls and cones are convexConvex sets 28

  • 7/31/2019 Bv Cvxslides

    24/300

    Polyhedra

    solution set of nitely many linear inequalities and equalities

    Ax b, Cx = d

    (ARm n , C R

    p n , is componentwise inequality)

    a 1 a 2

    a 3

    a 4

    a 5P

    polyhedron is intersection of nite number of halfspaces and hyperplanes

    Convex sets 29

  • 7/31/2019 Bv Cvxslides

    25/300

    Positive semidenite cone

    notation: Sn is set of symmetric n n matrices Sn+ = {X Sn | X 0}: positive semidenite n n matrices

    X Sn+ z

    T Xz 0 for all zSn+ is a convex cone

    Sn++ =

    {X

    Sn

    |X

    0

    }: positive denite n

    n matrices

    example: x yy z S2+

    xy

    z

    00.5

    1

    1

    0

    10

    0.5

    1

    Convex sets 210

  • 7/31/2019 Bv Cvxslides

    26/300

    Operations that preserve convexity

    practical methods for establishing convexity of a set C

    1. apply denition

    x1, x2C, 0 1 = x1 + (1 )x2C

    2. show that C is obtained from simple convex sets (hyperplanes,halfspaces, norm balls, . . . ) by operations that preserve convexity

    intersection affine functions perspective function linear-fractional functions

    Convex sets 211

  • 7/31/2019 Bv Cvxslides

    27/300

    Intersection

    the intersection of (any number of) convex sets is convex

    example:

    S = {xRm

    | | p(t)| 1 for |t| / 3}where p(t) = x1 cos t + x2 cos2t + + xm cos mtfor m = 2 :

    0 / 3 2/ 3

    1

    01

    t

    p ( t )

    x 1

    x 2 S

    2 1 0 1 2 2

    1

    0

    1

    2

    Convex sets 212

  • 7/31/2019 Bv Cvxslides

    28/300

    Affine function

    suppose f : Rn Rm is affine (f (x) = Ax + b with ARm n , bRm ) the image of a convex set under f is convex

    S Rn

    convex = f (S ) = {f (x) | xS }convex the inverse image f 1(C ) of a convex set under f is convex

    C Rm

    convex = f 1

    (C ) = {xRn

    | f (x)C }convexexamples

    scaling, translation, projection

    solution set of linear matrix inequality{x | x1A1 + + xm Am B}(with Ai , B S p) hyperbolic cone {x | xT P x (cT x)2, cT x 0}(with P Sn+ )

    Convex sets 213

  • 7/31/2019 Bv Cvxslides

    29/300

    Perspective and linear-fractional function

    perspective function P : Rn +1 Rn :P (x, t ) = x/t, dom P = {(x, t ) | t > 0}

    images and inverse images of convex sets under perspective are convex

    linear-fractional function f : Rn Rm :

    f (x) =Ax + bcT x + d

    , dom f = {x | cT x + d > 0}images and inverse images of convex sets under linear-fractional functionsare convex

    Convex sets 214

  • 7/31/2019 Bv Cvxslides

    30/300

    example of a linear-fractional function

    f (x) = 1x1 + x2 + 1

    x

    x 1

    x 2

    C

    1 0 1 1

    0

    1

    x 1

    x 2

    f (C )

    1 0 1 1

    0

    1

    Convex sets 215

  • 7/31/2019 Bv Cvxslides

    31/300

    Generalized inequalities

    a convex cone K Rn is a proper cone if

    K is closed (contains its boundary)

    K is solid (has nonempty interior) K is pointed (contains no line)

    examples

    nonnegative orthant K = Rn+ = {xRn | x i 0, i = 1 , . . . , n } positive semidenite cone K = Sn+ nonnegative polynomials on [0, 1]:

    K = {xRn | x1 + x2t + x3t2 + + xn tn 1 0 for t[0, 1]}

    Convex sets 216

  • 7/31/2019 Bv Cvxslides

    32/300

    generalized inequality dened by a proper cone K :

    x K y y xK, x K y y xint K examples

    componentwise inequality (K = Rn+ )

    x Rn+ y x i yi , i = 1 , . . . , n

    matrix inequality (K = Sn+ )

    X Sn+ Y Y X positive semidenitethese two types are so common that we drop the subscript in K

    properties: many properties of K are similar to on R, e.g. ,x K y, u K v = x + u K y + v

    Convex sets 217

  • 7/31/2019 Bv Cvxslides

    33/300

    Minimum and minimal elements

    K is not in general a linear ordering : we can have x K y and y K x

    xS is the minimum element of S with respect to K if

    yS = xK y

    xS is a minimal element of S with respect to K if

    yS, y K x = y = x

    example (K = R2+ )

    x1 is the minimum element of S 1x2 is a minimal element of S 2 x 1

    x 2S 1S 2

    Convex sets 218

  • 7/31/2019 Bv Cvxslides

    34/300

    Separating hyperplane theorem

    if C and D are disjoint convex sets, then there exists a = 0 , b such that

    aT x b for xC, aT x b for xD

    D

    C

    a

    a T x b a T x b

    the hyperplane {x | aT x = b}separates C and Dstrict separation requires additional assumptions ( e.g. , C is closed, D is asingleton)

    Convex sets 219

  • 7/31/2019 Bv Cvxslides

    35/300

    Supporting hyperplane theorem

    supporting hyperplane to set C at boundary point x0:

    {x | aT x = aT x0}where a = 0 and aT x aT x0 for all xC

    C

    ax 0

    supporting hyperplane theorem: if C is convex, then there exists asupporting hyperplane at every boundary point of C

    Convex sets 220

  • 7/31/2019 Bv Cvxslides

    36/300

    Dual cones and generalized inequalities

    dual cone of a cone K :

    K = {y | yT x 0 for all xK }examples

    K = Rn+ : K = Rn+ K = Sn+ : K = Sn+ K = {(x, t ) | x 2 t}: K = {(x, t ) | x 2 t} K = {(x, t ) | x 1 t}: K = {(x, t ) | x t}rst three examples are self-dual cones

    dual cones of proper cones are proper, hence dene generalized inequalities:

    y K 0 yT x 0 for all x K 0

    Convex sets 221

  • 7/31/2019 Bv Cvxslides

    37/300

    Minimum and minimal elements via dual inequalities

    minimum element w.r.t. K

    x is minimum element of S iff for allK

    0, x is the unique minimizerof T z over S

    x

    S

    minimal element w.r.t. K

    if x minimizes T z over S for some

    K 0, then x is minimal

    Sx 1

    x 2

    1

    2

    if x is a minimal element of aconvex set S , then there exists a nonzero K 0 such that x minimizes T z over S Convex sets 222

  • 7/31/2019 Bv Cvxslides

    38/300

    optimal production frontier

    different production methods use different amounts of resources xRn

    production set P : resource vectors x for all possible production methods efficient (Pareto optimal) methods correspond to resource vectors x

    that are minimal w.r.t. Rn+

    example (n = 2 )x1, x2, x3 are efficient; x4, x5 are not

    x 4x 2

    x 1

    x 5

    x 3

    P

    labor

    fuel

    Convex sets 223

  • 7/31/2019 Bv Cvxslides

    39/300

    Convex Optimization Boyd & Vandenberghe

    3. Convex functions

    basic properties and examples

    operations that preserve convexity

    the conjugate function

    quasiconvex functions

    log-concave and log-convex functions

    convexity with respect to generalized inequalities

    31

  • 7/31/2019 Bv Cvxslides

    40/300

    Denition

    f : Rn

    R is convex if dom f is a convex set andf (x + (1 )y) f (x) + (1 )f (y)

    for all x, y

    dom f , 0

    1

    (x, f (x ))

    (y, f (y ))

    f is concave if f is convex

    f is strictly convex if dom f is convex and

    f (x + (1 )y) < f (x) + (1 )f (y)for x, ydom f , x = y, 0 < < 1

    Convex functions 32

    l

  • 7/31/2019 Bv Cvxslides

    41/300

    Examples on R

    convex:

    affine: ax + b on R, for any a, bR

    exponential: eax , for any a

    R

    powers: x on R++ , for 1 or 0 powers of absolute value: |x| p on R, for p 1

    negative entropy: x log x on R++concave:

    affine: ax + b on R, for any a, bR powers: x on R++ , for 0 1 logarithm: log x on R++

    Convex functions 33

  • 7/31/2019 Bv Cvxslides

    42/300

    Examples on R n and R m n

    affine functions are convex and concave; all norms are convex

    examples on R n

    affine function f (x) = aT x + b norms: x p = ( ni=1 |x i | p)1/p for p 1; x = max k |xk |examples on R m n (m n matrices)

    affine function

    f (X ) = tr (AT X ) + b =m

    i=1

    n

    j =1

    Aij X ij + b

    spectral (maximum singular value) normf (X ) = X 2 = max (X ) = ( max (X T X ))1/ 2

    Convex functions 34

    R i i f f i li

  • 7/31/2019 Bv Cvxslides

    43/300

    Restriction of a convex function to a line

    f : Rn R is convex if and only if the functiong : R R,g(t) = f (x + tv), dom g = {t | x + tvdom f }

    is convex (in t) for any xdom f , vR

    n

    can check convexity of f by checking convexity of functions of one variable

    example. f : Sn

    R with f (X ) = log det X , dom f = Sn

    ++

    g(t) = log det( X + tV ) = log det X + log det( I + tX 1/ 2V X 1/ 2)

    = log det X +n

    i=1

    log(1 + t i )

    where i are the eigenvalues of X 1/ 2V X 1/ 2

    g is concave in t (for any choice of X

    0, V ); hence f is concave

    Convex functions 35

    E t d d l t i

  • 7/31/2019 Bv Cvxslides

    44/300

    Extended-value extension

    extended-value extension f of f is

    f (x) = f (x), xdom f, f (x) = , xdom f

    often simplies notation; for example, the condition

    0 1 = f (x + (1 )y) f (x) + (1 )f (y)(as an inequality in R{}), means the same as the two conditions

    dom f is convex for x, ydom f ,

    0 1 = f (x + (1 )y) f (x) + (1 )f (y)

    Convex functions 36

    First order condition

  • 7/31/2019 Bv Cvxslides

    45/300

    First-order condition

    f is differentiable if dom f is open and the gradient

    f (x) =f (x)x1

    ,f (x)x2

    , . . . ,f (x)xn

    exists at each xdom f

    1st-order condition: differentiable f with convex domain is convex iff

    f (y) f (x) + f (x)T (y x) for all x, ydom f

    (x, f (x ))

    f (y )

    f (x ) + f (x )T

    (y x )

    rst-order approximation of f is global underestimator

    Convex functions 37

    Second order conditions

  • 7/31/2019 Bv Cvxslides

    46/300

    Second-order conditions

    f is twice differentiable if dom f is open and the Hessian2f (x)S

    n ,

    2f (x) ij =

    2f (x)x

    ix

    j

    , i, j = 1 , . . . , n ,

    exists at each xdom f

    2nd-order conditions: for twice differentiable f with convex domain

    f is convex if and only if

    2f (x) 0 for all x

    dom f

    if 2f (x)0 for all xdom f , then f is strictly convex

    Convex functions 38

    Examples

  • 7/31/2019 Bv Cvxslides

    47/300

    Examples

    quadratic function: f (x) = (1 / 2)xT

    P x + qT

    x + r (with P Sn

    )

    f (x) = P x + q, 2f (x) = P

    convex if P 0

    least-squares objective : f (x) = Ax b 22f (x) = 2 A

    T (Ax b), 2f (x) = 2 AT Aconvex (for any A)

    quadratic-over-linear: f (x, y ) = x2/y

    2f (x, y ) =

    2y3

    y

    xy

    xT

    0

    convex for y > 0 xy

    f ( x

    , y

    )

    20

    2

    01

    20

    1

    2

    Convex functions 39

    log-sum-exp : f (x) = log n exp xk is convex

  • 7/31/2019 Bv Cvxslides

    48/300

    log-sum-exp : f (x) = log k =1 exp xk is convex

    2f (x) = 1

    1 T zdiag (z) 1(1 T z)2 zz

    T (zk = exp xk )

    to show

    2

    f (x) 0, we must verify that vT

    2

    f (x)v 0 for all v:vT

    2f (x)v =( k zk v

    2k )( k zk ) ( k vk zk )2

    ( k zk )2 0

    since ( k vk zk )2 ( k zk v2k )( k zk ) (from Cauchy-Schwarz inequality)

    geometric mean : f (x) = (nk =1 xk )1/n on R

    n++ is concave

    (similar proof as for log-sum-exp)

    Convex functions 310

    Epigraph and sublevel set

  • 7/31/2019 Bv Cvxslides

    49/300

    Epigraph and sublevel set

    -sublevel set of f : Rn R:C = {xdom f | f (x) }

    sublevel sets of convex functions are convex (converse is false)

    epigraph of f : Rn R:epi

    f = {(x, t )Rn +1

    | xdom

    f, f (x) t}epi f

    f

    f is convex if and only if epi f is a convex set

    Convex functions 311

    Jensens inequality

  • 7/31/2019 Bv Cvxslides

    50/300

    Jensen s inequality

    basic inequality: if f is convex, then for 0 1,f (x + (1 )y) f (x) + (1 )f (y)

    extension: if f is convex, then

    f (E z) E f (z)for any random variable z

    basic inequality is special case with discrete distribution

    prob (z = x) = , prob (z = y) = 1

    Convex functions 312

    Operations that preserve convexity

  • 7/31/2019 Bv Cvxslides

    51/300

    Operations that preserve convexity

    practical methods for establishing convexity of a function

    1. verify denition (often simplied by restricting to a line)

    2. for twice differentiable functions, show2f (x) 0

    3. show that f is obtained from simple convex functions by operationsthat preserve convexity

    nonnegative weighted sum composition with affine function pointwise maximum and supremum composition minimization perspective

    Convex functions 313

    Positive weighted sum & composition with affine function

  • 7/31/2019 Bv Cvxslides

    52/300

    Positive weighted sum & composition with affine function

    nonnegative multiple: f is convex if f is convex, 0sum: f 1 + f 2 convex if f 1, f 2 convex (extends to innite sums, integrals)

    composition with affine function : f (Ax + b) is convex if f is convex

    examples

    log barrier for linear inequalitiesf (x) =

    m

    i=1

    log(bi aT i x), dom f = {x | aT i x < b i , i = 1 , . . . , m }

    (any) norm of affine function: f (x) = Ax + b

    Convex functions 314

    Pointwise maximum

  • 7/31/2019 Bv Cvxslides

    53/300

    if f 1, . . . , f m are convex, then f (x) = max {f 1(x), . . . , f m (x)}is convex

    examples

    piecewise-linear function: f (x) = max i=1 ,...,m (aT i x + bi ) is convex sum of r largest components of xRn :

    f (x) = x[1] + x[2] + + x[r ]is convex (x[i ] is ith largest component of x)

    proof:f (x) = max {x i 1 + x i 2 + + x i r | 1 i1 < i 2 < < i r n}

    Convex functions 315

    Pointwise supremum

  • 7/31/2019 Bv Cvxslides

    54/300

    p

    if f (x, y ) is convex inx for each yA, theng(x) = sup

    yAf (x, y )

    is convexexamples

    support function of a set C : S C (x) = sup yC yT x is convex

    distance to farthest point in a set C :f (x) = sup

    yC x y

    maximum eigenvalue of symmetric matrix: forX Sn ,max (X ) = sup

    y 2=1yT Xy

    Convex functions 316

    Composition with scalar functions

  • 7/31/2019 Bv Cvxslides

    55/300

    p

    composition of g : Rn

    R and h : R R:f (x) = h(g(x))

    f is convex if g convex, h convex, h nondecreasingg concave, h convex, h nonincreasing

    proof (for n = 1 , differentiable g, h)f (x) = h (g(x))g(x)2 + h (g(x))g (x)

    note: monotonicity must hold for extended-value extension h

    examples

    exp g(x) is convex if g is convex 1/g (x) is convex if g is concave and positive

    Convex functions 317

    Vector composition

  • 7/31/2019 Bv Cvxslides

    56/300

    composition of g : Rn Rk and h : Rk R:f (x) = h(g(x)) = h(g1(x), g2(x), . . . , gk (x))

    f is convex if gi convex, h convex, h nondecreasing in each argumentgi concave, h convex, h nonincreasing in each argument

    proof (for n = 1 , differentiable g, h)

    f (x) = g (x)T 2h(g(x))g (x) + h(g(x))

    T g (x)

    examples mi=1 log gi (x) is concave if gi are concave and positive log mi=1 exp gi (x) is convex if gi are convex

    Convex functions 318

    Minimization

  • 7/31/2019 Bv Cvxslides

    57/300

    if f (x, y ) is convex in(x, y ) and C is a convex set, then

    g(x) = inf yC

    f (x, y )

    is convex

    examples

    f (x, y ) = xT Ax + 2 xT By + yT Cy with

    A BB T C 0, C 0

    minimizing overy gives g(x) = inf y f (x, y ) = xT

    (A BC 1

    BT

    )xg is convex, hence Schur complement A BC 1B T 0

    distance to a set: dist (x, S ) = inf yS x y is convex if S is convex

    Convex functions 319

    Perspective

  • 7/31/2019 Bv Cvxslides

    58/300

    the perspective of a function f : Rn R is the function g : Rn R R,g(x, t ) = tf (x/t ), dom g = {(x, t ) | x/t dom f, t > 0}

    g is convex if f is convex

    examples

    f (x) = xT x is convex; hence g(x, t ) = xT x/t is convex fort > 0 negative logarithm f (x) = log x is convex; hence relative entropyg(x, t ) = t log t t log x is convex onR2++ if f is convex, then

    g(x) = ( cT x + d)f (Ax + b)/ (cT x + d)

    is convex on{x | cT x + d > 0, (Ax + b)/ (cT x + d)dom f }Convex functions 320

    The conjugate function

  • 7/31/2019 Bv Cvxslides

    59/300

    the conjugate of a function f is

    f (y) = supxdom f

    (yT x f (x))

    f (x )

    (0 , f (y ))

    xy

    x

    f is convex (even if f is not) will be useful in chapter 5

    Convex functions 321

    examples

  • 7/31/2019 Bv Cvxslides

    60/300

    negative logarithm f (x) = log xf (y) = sup

    x> 0(xy + log x)

    = 1 log(y) y < 0 otherwise

    strictly convex quadratic f (x) = (1 / 2)xT Qx with Q

    Sn++

    f (y) = supx

    (yT x (1/ 2)xT Qx )=

    1

    2yT Q 1y

    Convex functions 322

  • 7/31/2019 Bv Cvxslides

    61/300

    Examples

  • 7/31/2019 Bv Cvxslides

    62/300

    |x| is quasiconvex on R ceil(x) = inf {zZ | z x}is quasilinear log x is quasilinear on R++ f (x1, x2) = x1x2 is quasiconcave on R2++ linear-fractional function

    f (x) =aT x + b

    cT x + d, dom f =

    {x

    |cT x + d > 0

    }is quasilinear

    distance ratiof (x) = x a 2x b 2

    , dom f = {x | x a 2 x b 2}is quasiconvex

    Convex functions 324

    internal rate of return

  • 7/31/2019 Bv Cvxslides

    63/300

    cash ow x = ( x0, . . . , x n ); x i is payment in period i (to us if x i > 0) we assume x0 < 0 and x0 + x1 + + xn > 0 present value of cash ow x, for interest rate r :

    PV( x, r ) =n

    i=0

    (1 + r ) i x i

    internal rate of return is smallest interest rate for which PV( x, r ) = 0 :

    IRR( x) = inf {r 0 | PV( x, r ) = 0}IRR is quasiconcave: superlevel set is intersection of halfspaces

    IRR( x) R n

    i=0

    (1 + r ) i x i 0 for 0 r R

    Convex functions 325

    Properties

  • 7/31/2019 Bv Cvxslides

    64/300

    modied Jensen inequality: for quasiconvex f 0 1 = f (x + (1 )y) max{f (x), f (y)}

    rst-order condition: differentiable f with cvx domain is quasiconvex iff

    f (y) f (x) = f (x)T (y x) 0

    x f (x )

    sums of quasiconvex functions are not necessarily quasiconvex

    Convex functions 326

    Log-concave and log-convex functions

  • 7/31/2019 Bv Cvxslides

    65/300

    a positive function f is log-concave if log f is concave:

    f (x + (1 )y) f (x)f (y)1 for 0 1f is log-convex if log f is convex

    powers: xa on R++ is log-convex fora 0, log-concave for a 0 many common probability densities are log-concave, e.g. , normal:

    f (x) = 1 (2)n det e 12 (x x )

    T 1(x x )

    cumulative Gaussian distribution function is log-concave

    (x) =1

    2 x e u 2 / 2 duConvex functions 327

    Properties of log-concave functions

  • 7/31/2019 Bv Cvxslides

    66/300

    twice differentiable f with convex domain is log-concave if and only if f (x)

    2f (x) f (x)f (x)T

    for all xdom f

    product of log-concave functions is log-concave

    sum of log-concave functions is not always log-concave integration: if f : Rn Rm R is log-concave, then

    g(x) = f (x, y ) dyis log-concave (not easy to show)Convex functions 328

    consequences of integration property

  • 7/31/2019 Bv Cvxslides

    67/300

    convolution f

    g of log-concave functions f , g is log-concave

    (f g)(x) = f (x y)g(y)dy if C Rn convex and y is a random variable with log-concave pdf then

    f (x) = prob (x + yC )

    is log-concave

    proof: write f (x) as integral of product of log-concave functions

    f (x) = g(x + y) p(y) dy, g(u) = 1 uC 0 uC, p is pdf of yConvex functions 329

    example: yield function

  • 7/31/2019 Bv Cvxslides

    68/300

    Y (x) = prob (x + wS )

    xRn : nominal parameter values for product

    wRn

    : random variations of parameters in manufactured product

    S : set of acceptable values

    if S is convex and w has a log-concave pdf, then

    Y is log-concave

    yield regions{x | Y (x) }are convex

    Convex functions 330

    Convexity with respect to generalized inequalities

  • 7/31/2019 Bv Cvxslides

    69/300

    f : Rn Rm is K -convex if dom f is convex andf (x + (1 )y) K f (x) + (1 )f (y)

    for x, ydom f , 0 1

    example f : Sm Sm , f (X ) = X 2 is Sm+ -convex

    proof: for xed zRm , zT X 2z = Xz 22 is convex inX , i.e. ,

    zT (X + (1 )Y )2z zT X 2z + (1 )zT Y 2zfor X, Y S

    m , 0 1

    therefore (X + (1 )Y )2 X 2 + (1 )Y 2Convex functions 331

    Convex Optimization Boyd & Vandenberghe

    4 Convex optimization problems

  • 7/31/2019 Bv Cvxslides

    70/300

    4. Convex optimization problems

    optimization problem in standard form

    convex optimization problems

    quasiconvex optimization linear optimization

    quadratic optimization geometric programming generalized inequality constraints

    semidenite programming vector optimization

    41

    Optimization problem in standard form

  • 7/31/2019 Bv Cvxslides

    71/300

    minimize f 0(x)subject to f i (x) 0, i = 1 , . . . , mh i (x) = 0 , i = 1 , . . . , p

    xRn

    is the optimization variable

    f 0 : Rn R is the objective or cost function f i : Rn R, i = 1 , . . . , m , are the inequality constraint functions

    h i : Rn

    R are the equality constraint functionsoptimal value:

    p = inf

    {f 0(x)

    |f i (x)

    0, i = 1 , . . . , m, h i (x) = 0 , i = 1 , . . . , p

    } p = if problem is infeasible (nox satises the constraints) p = if problem is unbounded below

    Convex optimization problems 42

    Optimal and locally optimal points

  • 7/31/2019 Bv Cvxslides

    72/300

    x is feasible if xdom f 0 and it satises the constraints

    a feasible x is optimal if f 0(x) = p; X opt is the set of optimal points

    x is locally optimal if there is an R > 0 such that x is optimal for

    minimize (over z) f 0(z)subject to f i (z) 0, i = 1 , . . . , m, h i (z) = 0 , i = 1 , . . . , pz x 2 R

    examples (with n = 1 , m = p = 0 )

    f 0(x) = 1 /x , dom f 0 = R++ : p = 0 , no optimal point

    f 0(x) = log x, dom f 0 = R++ : p

    = f 0(x) = x log x, dom f 0 = R++ : p = 1/e , x = 1 /e is optimal f 0(x) = x3 3x, p = , local optimum at x = 1

    Convex optimization problems 43

    Implicit constraints

  • 7/31/2019 Bv Cvxslides

    73/300

    the standard form optimization problem has an implicit constraint

    xD=m

    i=0

    dom f i p

    i=1

    dom h i ,

    we call Dthe domain of the problem

    the constraints f i (x)

    0, h i (x) = 0 are the explicit constraints

    a problem is unconstrained if it has no explicit constraints (m = p = 0 )

    example :minimize f 0(x) = ki=1 log(bi aT i x)

    is an unconstrained problem with implicit constraints aT i x < b i

    Convex optimization problems 44

    Feasibility problem

  • 7/31/2019 Bv Cvxslides

    74/300

    nd xsubject to f i (x) 0, i = 1 , . . . , mh i (x) = 0 , i = 1 , . . . , p

    can be considered a special case of the general problem withf 0(x) = 0 :

    minimize 0subject to f i (x) 0, i = 1 , . . . , mh i (x) = 0 , i = 1 , . . . , p

    p = 0 if constraints are feasible; any feasiblex is optimal

    p = if constraints are infeasible

    Convex optimization problems 45

    Convex optimization problem

  • 7/31/2019 Bv Cvxslides

    75/300

    standard form convex optimization problem

    minimize f 0(x)subject to f i (x) 0, i = 1 , . . . , m

    aT

    i x = bi , i = 1 , . . . , p

    f 0, f 1, . . . , f m are convex; equality constraints are affine problem is quasiconvex if f 0 is quasiconvex (and f 1, . . . , f m convex)often written as

    minimize f 0(x)

    subject to f i (x) 0, i = 1 , . . . , mAx = b

    important property: feasible set of a convex optimization problem is convex

    Convex optimization problems 46

    example

  • 7/31/2019 Bv Cvxslides

    76/300

    minimize f 0(x) = x21 + x22subject to f 1(x) = x1/ (1 + x22) 0h1(x) = ( x1 + x2)2 = 0

    f 0 is convex; feasible set{(x1, x2) | x1 = x2 0}is convex not a convex problem (according to our denition): f 1 is not convex, h1is not affine

    equivalent (but not identical) to the convex problemminimize x21 + x22

    subject to x1 0x1 + x2 = 0

    Convex optimization problems 47

    Local and global optima

    any locally optimal point of a convex problem is (globally) optimal

  • 7/31/2019 Bv Cvxslides

    77/300

    any locally optimal point of a convex problem is (globally) optimal

    proof : suppose x is locally optimal andy is optimal with f 0(y) < f 0(x)

    x locally optimal means there is an R > 0 such that

    z feasible, z x 2 R = f 0(z) f 0(x)

    consider z = y + (1 )x with = R/ (2 y x 2)

    y x 2 > R , so 0 < < 1/ 2 z is a convex combination of two feasible points, hence also feasible z x 2 = R/ 2 and

    f 0(z) f 0(x) + (1 )f 0(y) < f 0(x)which contradicts our assumption that x is locally optimal

    Convex optimization problems 48

    Optimality criterion for differentiable f 0

  • 7/31/2019 Bv Cvxslides

    78/300

    x is optimal if and only if it is feasible and

    f 0(x)T (y x) 0 for all feasibley

    f 0(x )

    X x

    if nonzero,f 0(x) denes a supporting hyperplane to feasible set X at x

    Convex optimization problems 49

    unconstrained problem : x is optimal if and only if x dom f 0, f 0(x) = 0

  • 7/31/2019 Bv Cvxslides

    79/300

    x

    dom f 0,

    f 0(x) 0

    equality constrained problemminimize f 0(x) subject to Ax = b

    x is optimal if and only if there exists a such that

    xdom f 0, Ax = b, f 0(x) + AT = 0

    minimization over nonnegative orthantminimize f 0(x) subject to x 0

    x is optimal if and only if

    xdom f 0, x 0, f 0(x)i 0 x i = 0f 0(x)i = 0 x i > 0

    Convex optimization problems 410

    Equivalent convex problems

    two problems are (informally) equivalent if the solution of one is readily

  • 7/31/2019 Bv Cvxslides

    80/300

    p ( y) q yobtained from the solution of the other, and vice-versa

    some common transformations that preserve convexity:

    eliminating equality constraints

    minimize f 0(x)subject to f i (x) 0, i = 1 , . . . , mAx = b

    is equivalent to

    minimize (over z) f 0(F z + x0)subject to f i (F z + x0) 0, i = 1 , . . . , m

    where F and x0 are such that

    Ax = b x = F z + x0 for some z

    Convex optimization problems 411

    introducing equality constraintsminimize f 0(A0x + b0)

  • 7/31/2019 Bv Cvxslides

    81/300

    subject to f i (Ai x + bi ) 0, i = 1 , . . . , mis equivalent to

    minimize (over x, yi ) f 0(y0)subject to f i (yi ) 0, i = 1 , . . . , myi = Ai x + bi , i = 0 , 1, . . . , m

    introducing slack variables for linear inequalities

    minimize f 0(x)subject to aT i x bi , i = 1 , . . . , m

    is equivalent to

    minimize (over x, s) f 0(x)subject to aT i x + s i = bi , i = 1 , . . . , m

    s i 0, i = 1 , . . . mConvex optimization problems 412

    epigraph form : standard form convex problem is equivalent tominimize (over x, t) t

  • 7/31/2019 Bv Cvxslides

    82/300

    minimize (over x, t) tsubject to f 0(x) t 0f i (x) 0, i = 1 , . . . , mAx = b

    minimizing over some variablesminimize f 0(x1, x2)subject to f i (x1)

    0, i = 1 , . . . , m

    is equivalent to

    minimize f 0(x1)subject to f i (x1) 0, i = 1 , . . . , m

    where f 0(x1) = inf x 2 f 0(x1, x2)

    Convex optimization problems 413

    Quasiconvex optimization

  • 7/31/2019 Bv Cvxslides

    83/300

    minimize f 0(x)subject to f i (x) 0, i = 1 , . . . , mAx = b

    with f 0 : Rn

    R quasiconvex, f 1, . . . , f m convex

    can have locally optimal points that are not (globally) optimal

    (x, f 0(x ))

    Convex optimization problems 414

    convex representation of sublevel sets of f 0

    if f 0 is quasiconvex, there exists a family of functionst such that:

  • 7/31/2019 Bv Cvxslides

    84/300

    t (x) is convex inx for xed t t-sublevel set of f 0 is 0-sublevel set of t , i.e. ,

    f 0(x) t t (x) 0example

    f 0(x) =p(x)q(x)

    with p convex, q concave, and p(x) 0, q(x) > 0 on dom f 0can take

    t(x) = p(x)

    tq(x):

    for t 0, t convex in x p(x)/q (x) t if and only if t (x) 0

    Convex optimization problems 415

    quasiconvex optimization via convex feasibility problems

    (x) 0, f (x) 0, i = 1 , . . . , m, Ax = b (1)

  • 7/31/2019 Bv Cvxslides

    85/300

    t( )

    ,

    i( )

    , , , , ( )

    for xed t, a convex feasibility problem inx if feasible, we can conclude that t p; if infeasible,t p

    Bisection method for quasiconvex optimization

    given l p, u p, tolerance > 0.repeat

    1. t := ( l + u ) / 2.2. Solve the convex feasibility problem (1).3. if (1) is feasible, u := t ; else l := t .

    until u l .

    requires exactly log2((u l)/) iterations (where u, l are initial values)Convex optimization problems 416

    Linear program (LP)

  • 7/31/2019 Bv Cvxslides

    86/300

    minimize cT x + dsubject to Gx h

    Ax = b

    convex problem with affine objective and constraint functions feasible set is a polyhedron

    P x

    c

    Convex optimization problems 417

    Examples

    diet problem: choose quantities x1, . . . , xn of n foods

  • 7/31/2019 Bv Cvxslides

    87/300

    one unit of food j costs cj , contains amount a ij of nutrient i healthy diet requires nutrient i in quantity at least bito nd cheapest healthy diet,

    minimize cT xsubject to Ax b, x 0

    piecewise-linear minimization

    minimize max i=1 ,...,m (aT i x + bi )

    equivalent to an LP

    minimize tsubject to aT i x + bi t, i = 1 , . . . , m

    Convex optimization problems 418

    Chebyshev center of a polyhedron

    Chebyshev center of

  • 7/31/2019 Bv Cvxslides

    88/300

    P = {x | aT i x bi , i = 1 , . . . , m }is center of largest inscribed ball

    B= {xc + u | u 2 r}

    x chebx cheb

    aT i x

    bi for all x

    Bif and only if

    sup{aT i (xc + u) | u 2 r}= aT i xc + r a i 2 bi

    hence, xc, r can be determined by solving the LP

    maximize rsubject to aT i xc + r a i 2 bi , i = 1 , . . . , m

    Convex optimization problems 419

    (Generalized) linear-fractional program

    i i i f ( )

  • 7/31/2019 Bv Cvxslides

    89/300

    minimize f 0(x)subject to Gx h

    Ax = b

    linear-fractional program

    f 0(x) =cT x + deT x + f

    , dom f 0(x) = {x | eT x + f > 0}

    a quasiconvex optimization problem; can be solved by bisection also equivalent to the LP (variables y, z)

    minimize cT y + dz

    subject to Gy hzAy = bzeT y + fz = 1z 0

    Convex optimization problems 420

    generalized linear-fractional program

    f 0( )cT

    ix + d

    i d f 0( ) Ti f i 0 i 1

  • 7/31/2019 Bv Cvxslides

    90/300

    f 0(x) = maxi=1 ,...,r i ieT i x + f i ,dom f 0(x) = {x | eT i x+ f i > 0, i = 1 , . . . , r }

    a quasiconvex optimization problem; can be solved by bisection

    example : Von Neumann model of a growing economy

    maximize (over x, x+ ) min i=1 ,...,n x+i /x isubject to x+ 0, Bx + Ax

    x, x + Rn : activity levels of n sectors, in current and next period

    (Ax)i , (Bx + ) i : produced, resp. consumed, amounts of good i

    x+i /x i : growth rate of sector iallocate activity to maximize growth rate of slowest growing sector

    Convex optimization problems 421

    Quadratic program (QP)

    i i i (1/ 2) T P + T +

  • 7/31/2019 Bv Cvxslides

    91/300

    minimize (1/ 2)xT P x + qT x + rsubject to Gx h

    Ax = b

    P Sn+ , so objective is convex quadratic minimize a convex quadratic function over a polyhedron

    P

    x

    f 0(x

    )

    Convex optimization problems 422

    Examples

    l t q 2

  • 7/31/2019 Bv Cvxslides

    92/300

    least-squares minimize Ax b 22

    analytical solution x = Ab (A is pseudo-inverse)

    can add linear constraints, e.g. , l x ulinear program with random cost

    minimize cT

    x + xT

    x = E cT

    x + var (cT

    x)subject to Gx h, Ax = b

    c is random vector with mean c and covariance

    hence, cT x is random variable with mean cT x and variance xT x > 0 is risk aversion parameter; controls the trade-off betweenexpected cost and variance (risk)

    Convex optimization problems 423

    Quadratically constrained quadratic program (QCQP)

  • 7/31/2019 Bv Cvxslides

    93/300

    minimize (1/ 2)xT P 0x + qT 0 x + r 0subject to (1/ 2)xT P i x + qT i x + r i 0, i = 1 , . . . , mAx = b

    P iSn+ ; objective and constraints are convex quadratic

    if P 1, . . . , P m Sn++ , feasible region is intersection of m ellipsoids andan affine set

    Convex optimization problems 424

    Second-order cone programming

    T

  • 7/31/2019 Bv Cvxslides

    94/300

    minimize f T xsubject to Ai x + bi 2 cT i x + di , i = 1 , . . . , mF x = g

    (AiRn i n , F R

    p n )

    inequalities are called second-order cone (SOC) constraints:(Ai x + bi , cT i x + di )second-order cone in R

    n i +1

    for n i = 0 , reduces to an LP; if ci = 0 , reduces to a QCQP

    more general than QCQP and LP

    Convex optimization problems 425

    Robust linear programming

    the parameters in optimization problems are often uncertain, e.g. , in an LP

  • 7/31/2019 Bv Cvxslides

    95/300

    minimize cT xsubject to aT i x bi , i = 1 , . . . , m ,

    there can be uncertainty in c, ai, b

    i

    two common approaches to handling uncertainty (in a i , for simplicity)

    deterministic model: constraints must hold for all a iE iminimize cT xsubject to aT i x bi for all a iE i , i = 1 , . . . , m ,

    stochastic model: a i is random variable; constraints must hold withprobability

    minimize cT xsubject to prob (aT i x bi ) , i = 1 , . . . , m

    Convex optimization problems 426

    deterministic approach via SOCP

    choose an ellipsoid as

    Ei:

  • 7/31/2019 Bv Cvxslides

    96/300

    E E i = {a i + P i u | u 2 1} (a iRn , P iRn n )

    center is a i , semi-axes determined by singular values/vectors of P i

    robust LPminimize cT x

    subject to aT i x bi a iE i , i = 1 , . . . , m

    is equivalent to the SOCP

    minimize cT xsubject to aT i x + P T i x 2 bi , i = 1 , . . . , m

    (follows from sup u 2 1(a i + P i u)T x = aT i x + P T i x 2)

    Convex optimization problems 427

  • 7/31/2019 Bv Cvxslides

    97/300

    Geometric programming

    monomial function

  • 7/31/2019 Bv Cvxslides

    98/300

    f (x) = cxa 11 xa 22 xa nn , dom f = Rn++

    with c > 0; exponent i can be any real number

    posynomial function: sum of monomials

    f (x) =K

    k =1

    ck xa 1k1 x

    a 2k2 xa nkn , dom f = Rn++

    geometric program (GP)

    minimize f 0(x)subject to f i (x) 1, i = 1 , . . . , mh i (x) = 1 , i = 1 , . . . , p

    with f i posynomial, h i monomial

    Convex optimization problems 429

    Geometric program in convex form

    change variables to yi = log x i , and take logarithm of cost, constraints

  • 7/31/2019 Bv Cvxslides

    99/300

    monomial f (x) = cxa 11 xa nn transforms tolog f (ey1 , . . . , e yn ) = aT y + b (b = log c)

    posynomial f (x) = Kk =1 ck xa 1k1 x

    a 2k2 x

    a nkn transforms to

    log f (ey1 , . . . , e yn ) = logK

    k =1

    eaT k y+ bk (bk = log ck )

    geometric program transforms to convex problemminimize log Kk =1 exp( aT 0k y + b0k )subject to log Kk =1 exp( a

    T ik y + bik ) 0, i = 1 , . . . , m

    Gy + d = 0

    Convex optimization problems 430

    Design of cantilever beamsegment 4 segment 3 segment 2 segment 1

  • 7/31/2019 Bv Cvxslides

    100/300

    F

    N segments with unit lengths, rectangular cross-sections of size wi h i given vertical forceF applied at the right enddesign problem

    minimize total weightsubject to upper & lower bounds on wi , h i

    upper bound & lower bounds on aspect ratios h i /w i

    upper bound on stress in each segmentupper bound on vertical deection at the end of the beam

    variables: wi , h i for i = 1 , . . . , N

    Convex optimization problems 431

    objective and constraint functions

    total weight w1h1 +

    + wN hN is posynomial

  • 7/31/2019 Bv Cvxslides

    101/300

    aspect ratio h i /w i and inverse aspect ratio wi /h i are monomials

    maximum stress in segment i is given by6iF/ (wi h2i ), a monomial

    the vertical deection yi and slope vi of central axis at the right end of segment i are dened recursively as

    vi = 12( i 1/ 2) F Ew i h3i + vi+1yi = 6( i 1/ 3)

    F Ew i h3i

    + vi+1 + yi+1

    for i = N, N 1, . . . , 1, with vN +1 = yN +1 = 0 (E is Youngs modulus)vi and yi are posynomial functions of w, h

    Convex optimization problems 432

    formulation as a GP

    minimize w1h1 + + wN hN subject to w 1max wi 1, wmin w 1i 1, i = 1 , . . . , N

  • 7/31/2019 Bv Cvxslides

    102/300

    j , , , ,h 1max h i 1, hmin h 1i 1, i = 1 , . . . , N S 1max w

    1i h i 1, S min wi h 1i 1, i = 1 , . . . , N

    6iF 1max w 1i h 2i 1, i = 1 , . . . , N y 1max y1 1

    note

    we write wmin wi wmax and hmin h i hmaxwmin /w i 1, wi /w max 1, hmin /h i 1, hi /h max 1

    we write S min h i /w i S max asS min wi /h i 1, hi / (wi S max ) 1

    Convex optimization problems 433

    Minimizing spectral radius of nonnegative matrix

    Perron-Frobenius eigenvalue pf (A)

    i f ( l i ) i i ARn n

  • 7/31/2019 Bv Cvxslides

    103/300

    exists for (elementwise) positive ARn n

    a real, positive eigenvalue of A, equal to spectral radius max i | i (A)| determines asymptotic growth (decay) rate of A

    k

    : Ak

    kpf as k alternative characterization: pf (A) = inf { | Av v for some v0}

    minimizing spectral radius of matrix of posynomials

    minimize pf (A(x)) , where the elements A(x)ij are posynomials of x equivalent geometric program:

    minimize subject to nj =1 A(x)ij vj / (v i ) 1, i = 1 , . . . , nvariables , v, x

    Convex optimization problems 434

    Generalized inequality constraints

    convex problem with generalized inequality constraints

  • 7/31/2019 Bv Cvxslides

    104/300

    minimize f 0(x)subject to f i (x) K i 0, i = 1 , . . . , m

    Ax = b

    f 0 : Rn R convex; f i : Rn Rk i K i -convex w.r.t. proper cone K i same properties as standard convex problem (convex feasible set, localoptimum is global, etc.)conic form problem : special case with affine objective and constraints

    minimize cT x

    subject to F x + g K 0Ax = b

    extends linear programming (K = Rm+ ) to nonpolyhedral cones

    Convex optimization problems 435

    Semidenite program (SDP)

    minimize cT x

  • 7/31/2019 Bv Cvxslides

    105/300

    subject to x1F 1 + x2F 2 + + xn F n + G 0Ax = bwith F i , G

    Sk

    inequality constraint is called linear matrix inequality (LMI) includes problems with multiple LMI constraints: for example,

    x1F 1 + + xn F n + G 0, x1F 1 + + xn F n + G 0is equivalent to single LMI

    x1F 1 00 F 1

    + x2F 2 00 F 2

    + + xnF n 00 F n

    + G 00 G

    0

    Convex optimization problems 436

    LP and SOCP as SDP

    LP and equivalent SDP

  • 7/31/2019 Bv Cvxslides

    106/300

    LP: minimize cT xsubject to Ax b

    SDP: minimize cT xsubject to diag (Ax b) 0

    (note different interpretation of generalized inequality )

    SOCP and equivalent SDP

    SOCP: minimize f T xsubject to Ai x + bi 2 cT i x + di , i = 1 , . . . , m

    SDP: minimize f T x

    subject to (cT i x + di )I A i x + bi

    (Ai x + bi )T cT i x + di0, i = 1 , . . . , m

    Convex optimization problems 437

  • 7/31/2019 Bv Cvxslides

    107/300

    Matrix norm minimization

    minimize A(x) 2 = max (A(x)T A(x)) 1/ 2

  • 7/31/2019 Bv Cvxslides

    108/300

    where A(x) = A0 + x1A1 + + xn An (with given AiR p q)equivalent SDP

    minimize t

    subject to tI A (x)A(x)T tI 0

    variables xRn , tR constraint follows from

    A 2 t AT

    A t2

    I, t 0

    tI AAT tI 0

    Convex optimization problems 439

    Vector optimization

    general vector optimization problem

  • 7/31/2019 Bv Cvxslides

    109/300

    minimize (w.r.t. K ) f 0(x)subject to f i (x) 0, i = 1 , . . . , m

    h i (x) 0, i = 1 , . . . , pvector objective f 0 : Rn Rq, minimized w.r.t. proper cone K Rq

    convex vector optimization problem

    minimize (w.r.t. K ) f 0(x)subject to f i (x) 0, i = 1 , . . . , m

    Ax = b

    with f 0 K -convex, f 1, . . . , f m convex

    Convex optimization problems 440

    Optimal and Pareto optimal points

    set of achievable objective values

  • 7/31/2019 Bv Cvxslides

    110/300

    O= {f 0(x) | x feasible}

    feasible x is optimal if f 0(x) is a minimum value of O feasible x is Pareto optimal if f 0(x) is a minimal value of O

    O

    f 0(x )x is optimal

    O

    f 0(x po )

    x po is Pareto optimal

    Convex optimization problems 441

    Multicriterion optimization

    vector optimization problem with K = Rq+

    f (x) = ( F (x) F (x))

  • 7/31/2019 Bv Cvxslides

    111/300

    f 0(x) = ( F 1(x), . . . , F q(x))

    q different objectives F i ; roughly speaking we want allF i s to be small

    feasible x is optimal if y feasible = f 0(x

    ) f 0(y)

    if there exists an optimal point, the objectives are noncompeting

    feasible xpo is Pareto optimal if y feasible, f 0(y) f 0(xpo ) =

    f 0(xpo ) = f 0(y)

    if there are multiple Pareto optimal values, there is a trade-off betweenthe objectives

    Convex optimization problems 442

    Regularized least-squares

    minimize (w.r.t. R2+ ) ( Ax b

    22, x

    22)

  • 7/31/2019 Bv Cvxslides

    112/300

    0 10 20 30 40 500

    5

    10

    15

    20

    25

    F 1(x) = Ax

    b 22

    F 2 ( x )

    =

    x

    2 2 O

    example for AR100 10 ; heavy line is formed by Pareto optimal points

    Convex optimization problems 443

    Risk return trade-off in portfolio optimization

    minimize (w.r.t. R2+ ) (

    pT x, x T x)

    subject to 1 T x = 1 , x 0

  • 7/31/2019 Bv Cvxslides

    113/300

    xRn is investment portfolio; x i is fraction invested in asset i

    p

    Rn is vector of relative asset price changes; modeled as a randomvariable with mean p, covariance

    pT x = E r is expected return; xT x = var r is return varianceexample

    m e a n r e t u r n

    standard deviation of return0% 10% 20%

    0%

    5%

    10%

    15%

    standard deviation of return

    a l l o c a t i o n x

    x (1)

    x (2)x (3)x (4)

    0% 10% 20%

    0

    0.5

    1

    Convex optimization problems 444

    Scalarization

    to nd Pareto optimal points: choose

    K 0 and solve scalar problem

    minimize T f 0(x)

  • 7/31/2019 Bv Cvxslides

    114/300

    minimize f 0(x)subject to f i (x) 0, i = 1 , . . . , mh i (x) = 0 , i = 1 , . . . , p

    if x is optimal for scalar problem,then it is Pareto-optimal for vectoroptimization problem

    O

    f 0(x 1)

    1f

    0(x

    2) 2

    f 0(x 3)

    for convex vector optimization problems, can nd (almost) all Paretooptimal points by varying K

    0

    Convex optimization problems 445

    Scalarization for multicriterion problems

    to nd Pareto optimal points, minimize positive weighted sum

    T f 0(x) = 1F 1(x) + + qF q(x)

  • 7/31/2019 Bv Cvxslides

    115/300

    0( ) 1 1( ) q q( )

    examples

    regularized least-squares problem of page 443

    take = (1 , ) with > 0

    minimize Ax b 22 + x 22for xed , a LS problem

    0 5 10 15 200

    5

    10

    15

    20

    Ax b 22

    x

    2 2

    = 1

    Convex optimization problems 446

    risk-return trade-off of page 444minimize

    pT x + x T x

    subject to 1 T x = 1 , x 0

  • 7/31/2019 Bv Cvxslides

    116/300

    for xed > 0, a quadratic program

    Convex optimization problems 447

  • 7/31/2019 Bv Cvxslides

    117/300

    Lagrangian

    standard form problem (not necessarily convex)

    minimize f 0(x)

  • 7/31/2019 Bv Cvxslides

    118/300

    ( )subject to f i (x) 0, i = 1 , . . . , mh i (x) = 0 , i = 1 , . . . , p

    variable xRn , domain D, optimal value p

    Lagrangian: L : Rn Rm R p R, with dom L = D Rm R p,

    L(x,, ) = f 0(x) +m

    i=1

    i f i (x) + p

    i=1

    i h i (x)

    weighted sum of objective and constraint functions

    i is Lagrange multiplier associated with f i (x) 0 i is Lagrange multiplier associated with h i (x) = 0

    Duality 52

    Lagrange dual function

    Lagrange dual function: g : Rm

    R p

    R,

  • 7/31/2019 Bv Cvxslides

    119/300

    g(, ) = inf xD

    L(x,, )

    = inf xD f 0(x) +m

    i=1 i f i (x) +

    p

    i=1 i h i (x)

    g is concave, can be for some , lower bound property: if 0, then g(, ) pproof: if x is feasible and 0, then

    f 0(x) L(x,, ) inf xD L(x,, ) = g(, )minimizing over all feasiblex gives p g(, )Duality 53

  • 7/31/2019 Bv Cvxslides

    120/300

    Standard form LP

    minimize cT

    xsubject to Ax = b, x 0

  • 7/31/2019 Bv Cvxslides

    121/300

    dual function

    Lagrangian isL(x,, ) = cT x + T (Ax b) T x

    = bT + ( c + AT )T x L is affine inx, hence

    g(, ) = inf x

    L(x,, ) = bT AT + c = 0 otherwise

    g is linear on affine domain{(, ) | AT + c = 0}, hence concavelower bound property : p bT if AT + c 0Duality 55

    Equality constrained norm minimization

    minimize xsubject to Ax = b

  • 7/31/2019 Bv Cvxslides

    122/300

    dual function

    g( ) = inf x ( x T Ax + bT ) =bT AT

    1

    otherwisewhere v = sup u 1 uT v is dual norm of proof: follows frominf x ( x

    yT x) = 0 if y

    1,

    otherwise

    if y 1, then x yT x 0 for all x, with equality if x = 0 if y > 1, choose x = tu where u 1, uT y = y > 1:

    x yT x = t( u y ) as t lower bound property: p bT if AT 1Duality 56

    Two-way partitioning

    minimize xT

    W xsubject to x2i = 1 , i = 1 , . . . , n

  • 7/31/2019 Bv Cvxslides

    123/300

    a nonconvex problem; feasible set contains 2n discrete points

    interpretation: partition {1, . . . , n }in two sets; W ij is cost of assigningi, j to the same set; W ij is cost of assigning to different setsdual function

    g( ) = inf x (xT W x +i

    i (x2i 1)) = inf x xT (W + diag ( ))x 1 T = 1 T W + diag ( ) 0

    otherwise

    lower bound property : p 1 T if W + diag ( ) 0example: = min (W )1 gives bound p n min (W )Duality 57

    Lagrange dual and conjugate function

    minimize f 0(x)subject to Ax b, Cx = d

  • 7/31/2019 Bv Cvxslides

    124/300

    dual function

    g(, ) = inf xdom f 0 f 0(x) + ( AT

    + C T

    )T

    x bT

    dT

    = f 0 (AT C T ) bT dT

    recall denition of conjugate f (y) = sup xdom f (yT x

    f (x))

    simplies derivation of dual if conjugate of f 0 is kownexample: entropy maximization

    f 0(x) =n

    i=1

    x i log x i , f 0 (y) =n

    i=1

    ey i 1

    Duality 58

    The dual problem

    Lagrange dual problem

    maximize g(, )subject to 0

  • 7/31/2019 Bv Cvxslides

    125/300

    subject to 0

    nds best lower bound on p

    , obtained from Lagrange dual function a convex optimization problem; optimal value denoted d , are dual feasible if 0, (, )dom g

    often simplied by making implicit constraint (, )dom g explicitexample: standard form LP and its dual (page 55)

    minimize cT xsubject to Ax = b

    x 0

    maximize

    bT

    subject to AT + c 0

    Duality 59

    Weak and strong duality

    weak duality: d p

    always holds (for convex and nonconvex problems) can be used to nd nontrivial lower bounds for difficult problems

  • 7/31/2019 Bv Cvxslides

    126/300

    can be used to nd nontrivial lower bounds for difficult problemsfor example, solving the SDP

    maximize 1 T subject to W + diag ( ) 0gives a lower bound for the two-way partitioning problem on page 57

    strong duality: d = p

    does not hold in general

    (usually) holds for convex problems conditions that guarantee strong duality in convex problems are calledconstraint qualicationsDuality 510

    Slaters constraint qualication

    strong duality holds for a convex problem

    minimize f 0(x)subject to f (x) 0 i = 1 m

  • 7/31/2019 Bv Cvxslides

    127/300

    subject to f i (x) 0, i = 1 , . . . , mAx = bif it is strictly feasible,i.e. ,

    xint D: f i (x) < 0, i = 1 , . . . , m, Ax = b

    also guarantees that the dual optimum is attained (if p > ) can be sharpened: e.g. , can replace int Dwith relint D(interior

    relative to affine hull); linear inequalities do not need to hold with strictinequality, . . .

    there exist many other types of constraint qualications

    Duality 511

    Inequality form LP

    primal problem minimize cT xsubject to Ax b

  • 7/31/2019 Bv Cvxslides

    128/300

    dual function

    g() = inf x

    (c + AT )T x bT = bT AT + c = 0

    otherwise

    dual problemmaximize bT subject to AT + c = 0 , 0

    from Slaters condition: p = d if Axb for some x in fact, p = d except when primal and dual are infeasible

    Duality 512

    Quadratic program

    primal problem (assume P

    Sn++ )

    minimize xT P xsubject to Ax b

  • 7/31/2019 Bv Cvxslides

    129/300

    dual functiong() = inf

    xxT P x + T (Ax b) =

    14

    T AP 1AT bT

    dual problemmaximize (1/ 4)T AP 1AT bT subject to 0

    from Slaters condition: p = d if Axb for some x in fact, p = d always

    Duality 513

    A nonconvex problem with strong duality

    minimize xT Ax + 2 bT xsubject to xT x 1

    A 0, hence nonconvex

  • 7/31/2019 Bv Cvxslides

    130/300

    dual function: g() = inf x (xT (A + I )x + 2 bT x

    )

    unbounded below if A + I 0 or if A + I 0 and bR(A + I ) minimized by x = (A + I )b otherwise: g() = bT (A + I )b

    dual problem and equivalent SDP:

    maximize bT (A + I )bsubject to A + I 0bR(A + I )

    maximize t subject to

    A + I bbT t 0

    strong duality although primal problem is not convex (not easy to show)

    Duality 514

    Geometric interpretation

    for simplicity, consider problem with one constraint f 1(x) 0interpretation of dual function:

    g() = inf (t + u ) where G= {(f 1(x) f 0(x)) | xD}

  • 7/31/2019 Bv Cvxslides

    131/300

    g() = inf (u,t )G

    (t + u ), where G= {(f 1(x), f 0(x)) | xD}

    G

    p

    g( )u + t = g( )

    t

    u

    G

    p

    d

    t

    u

    u + t = g() is (non-vertical) supporting hyperplane to G hyperplane intersects t-axis at t = g()

    Duality 515

    epigraph variation: same interpretation if Gis replaced with

    A= {(u, t ) | f 1(x) u, f 0(x) t for some xD}A

    t

  • 7/31/2019 Bv Cvxslides

    132/300

    p

    g( )

    u + t = g( )

    u

    strong duality

    holds if there is a non-vertical supporting hyperplane toAat (0, p) for convex problem,Ais convex, hence has supp. hyperplane at (0, p) Slaters condition: if there exist (u, t )Awith u < 0, then supportinghyperplanes at (0, p) must be non-vertical

    Duality 516

  • 7/31/2019 Bv Cvxslides

    133/300

    Karush-Kuhn-Tucker (KKT) conditions

    the following four conditions are called KKT conditions (for a problem withdifferentiable f i , h i ):

  • 7/31/2019 Bv Cvxslides

    134/300

    1. primal constraints: f i (x) 0, i = 1 , . . . , m , h i (x) = 0 , i = 1 , . . . , p2. dual constraints: 0

    3. complementary slackness: i f i (x) = 0 , i = 1 , . . . , m

    4. gradient of Lagrangian with respect to x vanishes:

    f 0(x) +m

    i=1

    if i (x) + p

    i=1

    ih i (x) = 0

    from page 517: if strong duality holds andx, , are optimal, then theymust satisfy the KKT conditions

    Duality 518

    KKT conditions for convex problem

    if x, , satisfy KKT for a convex problem, then they are optimal:

    from complementary slackness: f 0(x) = L(x, , )

  • 7/31/2019 Bv Cvxslides

    135/300

    from 4th condition (and convexity): g(, ) = L(x, , )hence, f 0(x) = g(, )

    if Slaters condition is satised:

    x is optimal if and only if there exist, that satisfy KKT conditions

    recall that Slater implies strong duality, and dual optimum is attained generalizes optimality conditionf 0(x) = 0 for unconstrained problem

    Duality 519

    example: water-lling (assume i > 0)

    minimize ni=1 log(x i + i )subject to x 0, 1

    T x = 1

    x is optimal iff x 0, 1 T x = 1 , and there exist Rn , R such that

  • 7/31/2019 Bv Cvxslides

    136/300

    0, i x i = 0 ,1

    x i + i + i =

    if < 1/ i : i = 0 and x i = 1 / i

    if

    1/ i : i =

    1/ i and x i = 0

    determine from 1 T x = ni=1 max{0, 1/ i}= 1interpretation

    n patches; level of patch i is at height

    i

    ood area with unit amount of water resulting level is1/ i

    1/ x i i

    Duality 520

    Perturbation and sensitivity analysis

    (unperturbed) optimization problem and its dual

    minimize f 0(x)subject to f i (x) 0, i = 1 , . . . , mh ( ) 0 i 1

    maximize g(, )subject to 0

  • 7/31/2019 Bv Cvxslides

    137/300

    h i (x) = 0 , i = 1 , . . . , p

    perturbed problem and its dual

    min. f 0(x)

    s.t. f i (x) u i , i = 1 , . . . , mh i (x) = vi , i = 1 , . . . , pmax. g(, ) uT vT s.t. 0

    x is primal variable;u, v are parameters

    p

    (u, v ) is optimal value as a function of u, v we are interested in information about p(u, v ) that we can obtain fromthe solution of the unperturbed problem and its dual

    Duality 521

    global sensitivity result

    assume strong duality holds for unperturbed problem, and that , aredual optimal for unperturbed problem

    apply weak duality to perturbed problem:

    T T

  • 7/31/2019 Bv Cvxslides

    138/300

    p(u, v ) g(, ) uT vT = p(0, 0) uT vT

    sensitivity interpretation

    if i large: p increases greatly if we tighten constraint i (u i < 0) if i small: p does not decrease much if we loosen constraint i (u i > 0) if i large and positive: p increases greatly if we take vi < 0;

    if

    i large and negative: p

    increases greatly if we take vi > 0 if i small and positive: p does not decrease much if we take vi > 0;if i small and negative: p does not decrease much if we take vi < 0

    Duality 522

    local sensitivity: if (in addition) p(u, v ) is differentiable at (0, 0), then

    i =

    p(0, 0)

    u i, i =

    p(0, 0)

    viproof (for i ): from global sensitivity result,

  • 7/31/2019 Bv Cvxslides

    139/300

    p(0, 0)

    u i= lim

    t 0

    p(te i , 0) p(0, 0)t

    i

    p(0, 0)u i

    = limt 0

    p(te i , 0) p(0, 0)t

    i

    hence, equality

    p(u) for a problem with one (inequality)constraint: u

    p(u )

    p(0) u

    u = 0

    Duality 523

    Duality and problem reformulations

    equivalent formulations of a problem can lead to very different duals reformulating the primal problem can be useful when the dual is difficultto derive or uninteresting

  • 7/31/2019 Bv Cvxslides

    140/300

    to derive, or uninteresting

    common reformulations

    introduce new variables and equality constraints

    make explicit constraints implicit or vice-versa transform objective or constraint functions

    e.g. , replace f 0(x) by (f 0(x)) with convex, increasing

    Duality 524

  • 7/31/2019 Bv Cvxslides

    141/300

    norm approximation problem: minimize Ax bminimize y

    subject to y = Ax bcan look up conjugate of , or derive dual directly

  • 7/31/2019 Bv Cvxslides

    142/300

    g( ) = inf x,y

    ( y + T y

    T Ax + bT )

    = bT + inf y ( y + T y) AT = 0

    otherwise

    =bT AT = 0 ,

    1

    otherwise(see page 54)

    dual of norm approximation problemmaximize bT subject to AT = 0 , 1

    Duality 526

    Implicit constraints

    LP with box constraints: primal and dual problem

    minimize cT xsubject to Ax = b

    1 x 1

    maximize bT 1 T 1 1 T 2subject to c + AT + 1 2 = 01 0 2 0

  • 7/31/2019 Bv Cvxslides

    143/300

    1 x 1 1 0, 2 0reformulation with box constraints made implicit

    minimize f 0(x) =cT x 1 x 1 otherwise

    subject to Ax = bdual function

    g( ) = inf 1 x 1

    (cT x + T (Ax b))= bT AT + c 1

    dual problem: maximize bT AT + c 1Duality 527

    Problems with generalized inequalities

    minimize f 0(x)subject to f i (x) K i 0, i = 1 , . . . , m

    h i (x) = 0 , i = 1 , . . . , p

    K is generalized inequality onRk i

  • 7/31/2019 Bv Cvxslides

    144/300

    K i is generalized inequality onR

    denitions are parallel to scalar case:

    Lagrange multiplier for f i (x) K i 0 is vector iRk i Lagrangian L : Rn Rk1 Rkm R p R, is dened as

    L(x, 1, , m , ) = f 0(x) +m

    i=1

    T i f i (x) + p

    i=1

    i h i (x)

    dual function g : Rk1 Rkm R p R, is dened asg(1, . . . , m , ) = inf

    xDL(x, 1, , m , )

    Duality 528

  • 7/31/2019 Bv Cvxslides

    145/300

    Semidenite program

    primal SDP (F i , GSk )

    minimize cT xsubject to x1F 1 + + xn F n G

    k

  • 7/31/2019 Bv Cvxslides

    146/300

    Lagrange multiplier is matrix Z

    Sk

    Lagrangian L(x, Z ) = cT x + tr (Z (x1F 1 + + xn F n G)) dual function

    g(Z ) = inf x L(x, Z ) = tr (GZ ) tr (F i Z ) + ci = 0 , i = 1 , . . . , n

    otherwisedual SDP

    maximize tr

    (GZ )subject to Z 0, tr (F i Z ) + ci = 0 , i = 1 , . . . , n

    p = d if primal SDP is strictly feasible (x with x1F 1 + + xn F n G)Duality 530

    Convex Optimization Boyd & Vandenberghe

    6. Approximation and tting

    i ti

  • 7/31/2019 Bv Cvxslides

    147/300

    norm approximation

    least-norm problems

    regularized approximation

    robust approximation

    61

  • 7/31/2019 Bv Cvxslides

    148/300

    examples

    least-squares approximation ( 2): solution satises normal equationsAT Ax = AT b

    (x = ( AT A) 1AT b if rank A = n)

  • 7/31/2019 Bv Cvxslides

    149/300

    (x ( A A) A b if rank A n)

    Chebyshev approximation ( ): can be solved as an LPminimize tsubject to

    t1 Ax

    b t1

    sum of absolute residuals approximation ( 1): can be solved as an LP

    minimize 1 T ysubject to y Ax b y

    Approximation and tting 63

  • 7/31/2019 Bv Cvxslides

    150/300

    example (m = 100 , n = 30 ): histogram of residuals for penalties

    (u) =

    |u

    |, (u) = u2, (u) = max

    {0,

    |u

    |a

    }, (u) =

    log(1

    u2)

    p =

    1

    0

    40

  • 7/31/2019 Bv Cvxslides

    151/300

    p =

    2

    D e a d

    z o n e

    L o g b a r r i e r

    r 2

    2

    2

    2

    1

    1

    1

    1

    0

    0

    0

    0

    1

    1

    1

    1

    2

    2

    2

    20

    0

    10

    0

    20

    0

    10

    shape of penalty function has large effect on distribution of residuals

    Approximation and tting 65

    Huber penalty function (with parameter M )

    hub (u) =u2 |u| M M (2|u| M ) |u| > M

    linear growth for large u makes approximation less sensitive to outliers2

  • 7/31/2019 Bv Cvxslides

    152/300

    u

    h u

    b ( u )

    1.5 1 0.5 0 0 .5 1 1 .50

    0.5

    1

    1.5

    t

    f ( t )

    10 5 0 5 10 20

    10

    0

    10

    20

    left: Huber penalty for M = 1

    right: affine function f (t) = + t tted to 42 points t i , yi (circles)using quadratic (dashed) and Huber (solid) penalty

    Approximation and tting 66

    Least-norm problems

    minimize xsubject to Ax = b

    (ARm n with m n, is a norm on Rn )

  • 7/31/2019 Bv Cvxslides

    153/300

    interpretations of solution x = argmin Ax = b x :

    geometric: x is point in affine set{x | Ax = b}with minimumdistance to 0

    estimation: b = Ax are (perfect) measurements of x; x is smallest(most plausible) estimate consistent with measurements

    design: x are design variables (inputs); b are required results (outputs)x is smallest (most efficient) design that satises requirements

    Approximation and tting 67

    examples

    least-squares solution of linear equations ( 2):can be solved via optimality conditions

    2x + AT = 0 , Ax = b

  • 7/31/2019 Bv Cvxslides

    154/300

    minimum sum of absolute values ( 1): can be solved as an LPminimize 1 T ysubject to y x y, Ax = b

    tends to produce sparse solution x

    extension: least-penalty problem

    minimize (x1) +

    + (x

    n)

    subject to Ax = b

    : R R is convex penalty functionApproximation and tting 68

    Regularized approximation

    minimize (w.r.t. R2+ ) ( Ax b , x )AR

    m n , norms on Rm and Rn can be different

  • 7/31/2019 Bv Cvxslides

    155/300

    interpretation: nd good approximation Ax b with small x

    estimation: linear measurement model y = Ax + v, with priorknowledge that x is small

    optimal design : small x is cheaper or more efficient, or the linearmodel y = Ax is only valid for smallx

    robust approximation: good approximation Ax

    b with small x is

    less sensitive to errors inA than good approximation with large x

    Approximation and tting 69

    Scalarized problem

    minimize Ax b + x solution for > 0 traces out optimal trade-off curve

    other common method: minimize Ax b 2 + x 2 with > 0

  • 7/31/2019 Bv Cvxslides

    156/300

    other common method: minimize Ax

    b 2 + x 2 with > 0

    Tikhonov regularization

    minimize Ax

    b 22 + x 22

    can be solved as a least-squares problem

    minimizeA

    I x b0

    2

    2

    solution x = ( AT A + I ) 1AT b

    Approximation and tting 610

    Optimal input design

    linear dynamical system with impulse response h:

    y(t) =t

    =0

    h( )u(t ), t = 0 , 1, . . . , N

    input design problem: multicriterion problem with 3 objectives

  • 7/31/2019 Bv Cvxslides

    157/300

    input design problem: multicriterion problem with 3 objectives

    1. tracking error with desired output ydes : J track =N t =0 (y(t) ydes (t))2

    2. input magnitude: J mag =N t =0 u(t)

    2

    3. input variation: J der =N 1

    t =0 (u(t + 1) u(t))2

    track desired output using a small and slowly varying input signal

    regularized least-squares formulation

    minimize J track + J der + J mag

    for xed , , a least-squares problem in u(0) , . . . , u(N )

    Approximation and tting 611

    example : 3 solutions on optimal trade-off curve

    (top) = 0 , small ; (middle) = 0 , larger ; (bottom) large

    u ( t )

    5

    0

    5

    y ( t )

    1

    0.50

    0.51

  • 7/31/2019 Bv Cvxslides

    158/300

    t0 50 100 150 200 10

    t0 50 100 150 200 1

    t

    u ( t )

    0 50 100 150 200 4 2

    0

    2

    4

    t

    y ( t )

    0 50 100 150 200 1

    0.5

    0

    0.5

    1

    t

    u ( t )

    0 50 100 150 200 4

    20

    2

    4

    t

    y ( t )

    0 50 100 150 200 1

    0.50

    0.5

    1

    Approximation and tting 612

    Signal reconstruction

    minimize (w.r.t. R2+ ) ( x xcor 2, (x))

    xRn is unknown signal

  • 7/31/2019 Bv Cvxslides

    159/300

    xcor = x + v is (known) corrupted version of x, with additive noise v variable x (reconstructed signal) is estimate of x : Rn R is regularization function or smoothing objectiveexamples: quadratic smoothing, total variation smoothing:

    quad

    (x) =n 1

    i=1

    (xi+1

    xi)2,

    tv(x) =

    n 1

    i=1 |x

    i+1 x

    i |

    Approximation and tting 613

    quadratic smoothing example

    x

    0 5

    0

    0.5 x

    0 1000 2000 3000 4000 0.5

    0

    0.5

    0.5

  • 7/31/2019 Bv Cvxslides

    160/300

    i

    x c o r

    0

    0

    1000

    1000

    2000

    2000

    3000

    3000

    4000

    4000

    0.5

    0.5

    0

    0.5

    i

    x

    x

    0

    0

    1000

    1000

    2000

    2000

    3000

    3000

    4000

    4000

    0.5

    0.5

    0

    0

    0.5

    original signal x and noisysignal xcor

    three solutions on trade-off curvex xcor 2 versus quad (x)

    Approximation and tting 614

    total variation reconstruction example

    x

    1

    0

    1

    2 x i

    i

    0 500 1000 1500 2000 2

    0

    2

    2

  • 7/31/2019 Bv Cvxslides

    161/300

    i

    x c o r

    0

    0

    500

    500

    1000

    1000

    1500

    1500

    2000

    2000

    2

    2

    1

    0

    1

    2

    i

    x i

    x i

    0

    0

    500

    500

    1000

    1000

    1500

    1500

    2000

    2000

    2

    2

    0

    0

    2

    original signal x and noisysignal xcorthree solutions on trade-off curve

    x xcor 2 versus quad (x)quadratic smoothing smooths out noise and sharp transitions in signal

    Approximation and tting 615

    x

    0 500 1000 1500 2000 2

    1

    0

    1

    2 x

    x

    0 500 1000 1500 2000 2

    0

    0

    2

    2

  • 7/31/2019 Bv Cvxslides

    162/300

    i

    x c o r

    0 500 1000 1500 2000 2

    1

    0

    1

    2

    i

    x

    0

    0

    500

    500

    1000

    1000

    1500

    1500

    2000

    2000

    2

    2

    0

    2

    original signal x and noisysignal xcor

    three solutions on trade-off curvex

    xcor 2 versus tv (x)

    total variation smoothing preserves sharp transitions in signal

    Approximation and tting 616

    Robust approximation

    minimize Ax b with uncertain Atwo approaches:

    stochastic : assume A is random, minimize E Ax b worst-case: set Aof possible values of A, minimize sup AA Ax b

  • 7/31/2019 Bv Cvxslides

    163/300

    tractable only in special cases (certain norms , distributions, sets A)

    example : A(u) = A0 + uA1

    xnom minimizes A0x b 22 xstoch minimizes E A(u)x b 22with u uniform on [1, 1] xwc minimizes sup 1 u 1 A(u)x b 22gure shows r (u) = A(u)x b 2

    u

    r ( u

    )

    x nom

    x stochx wc

    2 1 0 1 20

    2

    4

    6

    8

    10

    12

    Approximation and tting 617

    stochastic robust LS with A = A + U , U random, E U = 0 , E U T U = P

    minimize E (A + U )x b 22

    explicit expression for objective:E Ax b 22 = E Ax b + Ux 22

  • 7/31/2019 Bv Cvxslides

    164/300

    = Ax b 22 + E xT U T Ux= Ax b 22 + xT P x

    hence, robust LS problem is equivalent to LS problem

    minimize Ax b 22 + P 1/ 2x 22

    for P = I , get Tikhonov regularized problem

    minimize Ax b 22 + x 22

    Approximation and tting 618

    worst-case robust LS with A= {A + u1A1 + + u pA p | u 2 1}minimize sup AA Ax b 22 = sup u 2 1 P (x)u + q(x) 22

    where P (x) = A1x A2x A px , q(x) = Ax b

    from page 514, strong duality holds between the following problems

  • 7/31/2019 Bv Cvxslides

    165/300

    maximize P u + q 22subject to u 22 1

    minimize t +

    subject toI P q

    P T I 0qT 0 t

    0

    hence, robust LS problem is equivalent to SDPminimize t +

    subject toI P (x) q(x)

    P (x)T I 0q(x)T 0 t

    0

    Approximation and tting 619

    example: histogram of residuals

    r (u) = (A0 + u1A1 + u2A2)x b 2with u uniformly distributed on unit disk, for three values of x

    x rls0.2

    0.25

  • 7/31/2019 Bv Cvxslides

    166/300

    r (u )

    x lsx tik

    f r e q u e n c y

    0 1 2 3 4 50

    0.05

    0.1

    0.15

    x ls minimizes A0x

    b 2

    x tik minimizes A0x b 22 + x 22 (Tikhonov solution) xwc minimizes sup u 2 1 A0x b 22 + x 22

    Approximation and tting 620

    Convex Optimization Boyd & Vandenberghe

    7. Statistical estimation

    maximum likelihood estimation

  • 7/31/2019 Bv Cvxslides

    167/300

    optimal detector design

    experiment design

    71

    Parametric distribution estimation

    distribution estimation problem: estimate probability density p(y) of arandom variable from observed values parametric distribution estimation: choose from a family of densities px (y), indexed by a parameter x

  • 7/31/2019 Bv Cvxslides

    168/300

    maximum likelihood estimation

    maximize (over x) log px (y)

    y is observed value l(x) = log px (y) is called log-likelihood function

    can add constraints xC explicitly, or dene px (y) = 0 for xC a convex optimization problem if log px (y) is concave in x for xed y

    Statistical estimation 72

    Linear measurements with IID noise

    linear measurement model

    yi = aT i x + vi , i = 1 , . . . , m

    x Rn is vector of unknown parameters

  • 7/31/2019 Bv Cvxslides

    169/300

    vi is IID measurement noise, with densityp(z) yi is measurement: yRm has density px (y) = mi=1 p(yi aT i x)

    maximum likelihood estimate: any solution x of

    maximize l(x) = mi=1 log p(yi

    aT i x)

    (y is observed value)

    Statistical estimation 73

    examples

    Gaussian noise N (0, 2): p(z) = (2 2) 1/ 2e z2 / (2 2) ,

    l(x) = m2 log(2 2) 122m

    i=1

    (aT i x yi )2

    ML estimate is LS solution| z | /a

  • 7/31/2019 Bv Cvxslides

    170/300

    Laplacian noise: p(z) = (1 / (2a))e ,l(x) = m log(2a)

    1a

    m

    i =1|aT i x yi |

    ML estimate is 1-norm solution

    uniform noise on [a, a ]:

    l(x) = m log(2a)

    |aT i x

    yi

    | a, i = 1 , . . . , m

    otherwiseML estimate is any x with |aT i x yi | a

    Statistical estimation 74

    Logistic regression

    random variable y {0, 1}with distribution

    p = prob (y = 1) = exp( aT u + b)

    1 + exp( aT u + b)

    a, b are parameters; uRn are (observable) explanatory variables

  • 7/31/2019 Bv Cvxslides

    171/300

    estimation problem: estimate a, b from m observations (u i , yi )log-likelihood function (for y1 = = yk = 1 , yk +1 = = ym = 0 ):

    l(a, b) = logk

    i=1

    exp( aT u i + b)1 + exp( aT u i + b)

    m

    i= k +1

    11 + exp( aT u i + b)

    =

    k

    i=1 (aT

    u i + b) m

    i=1 log(1 + exp( aT

    u i + b))

    concave in a, b

    Statistical estimation 75

    example (n = 1 , m = 50 measurements)

    b

    ( y =

    1 )

    0.6

    0.8

    1

  • 7/31/2019 Bv Cvxslides

    172/300

    u

    p r o b

    0 2 4 6 8 10

    0

    0.2

    0.4

    circles show 50 points (u i , yi )

    solid curve is ML estimate of p = exp( au + b)/ (1 + exp( au + b))

    Statistical estimation 76

    (Binary) hypothesis testing

    detection (hypothesis testing) problem

    given observation of a random variableX {1, . . . , n }, choose between:

    hypothesis 1: X was generated by distribution p = ( p1, . . . , p n )h h d b d b ( 1 )

  • 7/31/2019 Bv Cvxslides

    173/300

    hypothesis 2: X was generated by distribution q = ( q1, . . . , qn )

    randomized detector

    a nonnegative matrix T R2 n , with 1 T T = 1 T

    if we observeX = k, we choose hypothesis 1 with probabilityt1k ,hypothesis 2 with probability t2k if all elements of T are 0 or 1, it is called a deterministic detector

    Statistical estimation 77

    detection probability matrix:

    D = T p T q = 1 P fp P fnP fp 1 P fn

    P fp is probability of selecting hypothesis 2 if X is generated bydistribution 1 (false positive)P f i b bili f l i h h i 1 ifX i d b

  • 7/31/2019 Bv Cvxslides

    174/300

    P fn is probability of selecting hypothesis 1 if X is generated bydistribution 2 (false negative)multicriterion formulation of detector design

    minimize (w.r.t. R2+ ) (P fp , P fn ) = (( T p)2, (T q)1)subject to t1k + t2k = 1 , k = 1 , . . . , n

    t ik 0, i = 1 , 2, k = 1 , . . . , nvariable T R

    2 n

    Statistical estimation 78

    scalarization (with weight > 0)

    minimize (T p)2 + (T q)1

    subject to t1k + t2k = 1 , tik 0, i = 1 , 2, k = 1 , . . . , nan LP with a simple analytical solution

    (t1k t 2k )(1, 0) pk qk(0 1) k < k

  • 7/31/2019 Bv Cvxslides

    175/300

    (t1k , t 2k ) = (0, 1) pk < q k a deterministic detector, given by a likelihood ratio test

    if pk = qk for some k, any value 0

    t1k

    1, t1k = 1

    t2k is optimal

    (i.e. , Pareto-optimal detectors include non-deterministic detectors)

    minimax detector

    minimize max

    {P fp , P fn

    }= max

    {(T p)2, (T q)1

    }subject to t1k + t2k = 1 , tik 0, i = 1 , 2, k = 1 , . . . , nan LP; solution is usually not deterministic

    Statistical estimation 79

    example

    P =

    0.70 0.100.20 0.100.05 0.700.05 0.10

    0 8

    1

  • 7/31/2019 Bv Cvxslides

    176/300

    P fp

    P f n

    12

    34

    0 0.2 0 .4 0 .6 0 .8 10

    0.2

    0.4

    0.6

    0.8

    solutions 1, 2, 3 (and endpoints) are deterministic; 4 is minimax detector

    Statistical estimation 710

    Experiment design

    m linear measurements yi = aT i x + wi , i = 1 , . . . , m of unknown xRn

    measurement errors wi are IID N (0, 1) ML (least-squares) estimate is

    x =

    m

    a i aT i

    1 m

    yi a i

  • 7/31/2019 Bv Cvxslides

    177/300

    x = i=1 a i a i i=1 yi a i

    error e = x x has zero mean and covariance

    E = E eeT =m

    i=1

    a i aT i 1

    condence ellipsoids are given by

    {x

    |(x

    x)T E 1(x

    x)

    }experiment design : choose a i {v1, . . . , v p}(a set of possible testvectors) to make E small

    Statistical estimation 711

    vector optimization formulation

    minimize (w.r.t. Sn+