convex programs - duke university

Convex Programs

COMPSCI 371D — Machine Learning

COMPSCI 371D — Machine Learning Convex Programs 1 / 16

Logistic Regression → Support Vector Machines

Support Vector Machines (SVMs)and Convex Programs

• SVMs are linear predictors in their original form• Defined for both regression and classification• Multi-class versions exist• We will cover only binary SVM classification• Why do we need another linear classifier?• We’ll need some new math: Convex Programs• Optimization of convex functions with affine constraints



Outline

1 Logistic Regression→ Support Vector Machines

2 Local Convex Minimization→ Convex Programs

3 Shape of the Solution Set

4 The Karush-Kuhn-Tucker Conditions



Logistic Regression→ SVMs• A logistic-regression classifier places the decision boundary

somewhere (and approximately) between the two classes• Loss is never zero→ Exact location of the boundary can be

determined by samples that are very distant from theboundary (even on the correct side of it)

• SVMs place the boundary “exactly half-way” between thetwo classes (with exceptions to allow for nonlinearly-separable classes)

• Only samples close to the boundary matter:These are the support vectors

• A “kernel trick” allows going beyond linear classifiers• We only look at the binary case



Roadmap for SVMs• SVM training minimizes a convex function with constraints

• Convex: Unique minimum risk• Constraints: Define a convex program as minimizing a

convex function subject to affine constraints• Representer theorem: The SVM hyperplane normal vector

is a linear combination of a subset of training samples(xn, yn). The xn are the support vectors.

• The proof of the representer theorem is based on acharacterization of the solutions of a convex program

• Characterization for an unconstrained problem: ∇f (u) = 0• Characterization for a convex program:

The Karush-Kuhn-Tucker (KKT) conditions• The representer theorem leads to the kernel trick, through

which SVMs can be turned into nonlinear classifiers• Decision boundary is no longer necessarily a hyperplane



Roadmap Summary

Convex program→ SVM formulation

KKT conditions→ representer theorem→ kernel trick


Local Convex Minimization → Convex Programs

Local Convex Minimization→ ConvexPrograms

• Convex function f (u) : Rm → R• f differentiable, with continuous first derivatives• Unconstrained minimization: u∗ ∈ argminu∈Rm f (u)• Constrained minimization: u∗ ∈ argminu∈C f (u)• C = {u ∈ Rm : Au + b ≥ 0}• f is a convex function• C is a convex set: If u,v ∈ C, then for t ∈ [0,1]

tu + (1− t)v ∈ C• The specific C is bounded by hyperplanes• This is a convex program


Local Convex Minimization → Convex Programs

Convex Program

u∗ ∈ argminu∈C

f (u)

whereC def

= {u ∈ Rm : c(u) ≥ 0} .

• f differentiable, with continuous gradient, and convex• k inequalities in C are affine:

c(u) = Au + b ≥ 0 .


Shape of the Solution Set


• Just as for the unconstrained problem:• There is one f ∗ but there can be multiple u∗

(a flat valley)• The set of solution points u∗ is convex• if f is strictly convex at u∗, then u∗ is the unique solution point



Zero Gradient→ KKT Conditions• For the unconstrained problem, the solution is characterized

by ∇f (u) = 0• Constraints can generate new minima and maxima• Example: f (u) = eu

0 1

u

f(u

)

0 1

u

f(u

)

0 1

u

f(u

)

• What is the new characterization?• Karush-Kuhn-Tucker conditions, necessary and sufficient


The Karush-Kuhn-Tucker Conditions

Regular Points

∇fs

u

H +H −



Corner Points

C

∇fs

uc1

c2

H +H −

C

∇f

uc1

c2

H +H −



The Convex Cone of the Constraint Gradients

∇f

uc1

c2

H +H −



Inactive Constraints Do Not Matter

C

∇f

uc1

c2 c3

H +H −

C

u

c1

c2 v



Conic Combinations

∇f

uc1

c2

H +H −a1 a2

v

n1 n2

{v : v = α1a1 + α2a2 with α1, α2 ≥ 0}



The KKT Conditionsu ∈ C is a solution to a convex program iff there exist αi s.t.

∇f (u) =∑

i∈A(u)

αi∇ci(u) with αi ≥ 0

where A(u) = {i : ci(u) = 0} is the active set at u

∇f

uc1

c2

H +H −

Convention:∑

i∈∅ = 0 (so condition also holds in interior of C)COMPSCI 371D — Machine Learning Convex Programs 16 / 16

convex programs - duke university

Documents