convex programs - duke university
TRANSCRIPT
Convex Programs
COMPSCI 371D — Machine Learning
COMPSCI 371D — Machine Learning Convex Programs 1 / 16
Logistic Regression → Support Vector Machines
Support Vector Machines (SVMs)and Convex Programs
• SVMs are linear predictors in their original form• Defined for both regression and classification• Multi-class versions exist• We will cover only binary SVM classification• Why do we need another linear classifier?• We’ll need some new math: Convex Programs• Optimization of convex functions with affine constraints
COMPSCI 371D — Machine Learning Convex Programs 2 / 16
Logistic Regression → Support Vector Machines
Outline
1 Logistic Regression→ Support Vector Machines
2 Local Convex Minimization→ Convex Programs
3 Shape of the Solution Set
4 The Karush-Kuhn-Tucker Conditions
COMPSCI 371D — Machine Learning Convex Programs 3 / 16
Logistic Regression → Support Vector Machines
Logistic Regression→ SVMs• A logistic-regression classifier places the decision boundary
somewhere (and approximately) between the two classes• Loss is never zero→ Exact location of the boundary can be
determined by samples that are very distant from theboundary (even on the correct side of it)
• SVMs place the boundary “exactly half-way” between thetwo classes (with exceptions to allow for nonlinearly-separable classes)
• Only samples close to the boundary matter:These are the support vectors
• A “kernel trick” allows going beyond linear classifiers• We only look at the binary case
COMPSCI 371D — Machine Learning Convex Programs 4 / 16
Logistic Regression → Support Vector Machines
Roadmap for SVMs• SVM training minimizes a convex function with constraints
• Convex: Unique minimum risk• Constraints: Define a convex program as minimizing a
convex function subject to affine constraints• Representer theorem: The SVM hyperplane normal vector
is a linear combination of a subset of training samples(xn, yn). The xn are the support vectors.
• The proof of the representer theorem is based on acharacterization of the solutions of a convex program
• Characterization for an unconstrained problem: ∇f (u) = 0• Characterization for a convex program:
The Karush-Kuhn-Tucker (KKT) conditions• The representer theorem leads to the kernel trick, through
which SVMs can be turned into nonlinear classifiers• Decision boundary is no longer necessarily a hyperplane
COMPSCI 371D — Machine Learning Convex Programs 5 / 16
Logistic Regression → Support Vector Machines
Roadmap Summary
Convex program→ SVM formulation
KKT conditions→ representer theorem→ kernel trick
COMPSCI 371D — Machine Learning Convex Programs 6 / 16
Local Convex Minimization → Convex Programs
Local Convex Minimization→ ConvexPrograms
• Convex function f (u) : Rm → R• f differentiable, with continuous first derivatives• Unconstrained minimization: u∗ ∈ argminu∈Rm f (u)• Constrained minimization: u∗ ∈ argminu∈C f (u)• C = {u ∈ Rm : Au + b ≥ 0}• f is a convex function• C is a convex set: If u,v ∈ C, then for t ∈ [0,1]
tu + (1− t)v ∈ C• The specific C is bounded by hyperplanes• This is a convex program
COMPSCI 371D — Machine Learning Convex Programs 7 / 16
Local Convex Minimization → Convex Programs
Convex Program
u∗ ∈ argminu∈C
f (u)
whereC def
= {u ∈ Rm : c(u) ≥ 0} .
• f differentiable, with continuous gradient, and convex• k inequalities in C are affine:
c(u) = Au + b ≥ 0 .
COMPSCI 371D — Machine Learning Convex Programs 8 / 16
Shape of the Solution Set
Shape of the Solution Set
• Just as for the unconstrained problem:• There is one f ∗ but there can be multiple u∗
(a flat valley)• The set of solution points u∗ is convex• if f is strictly convex at u∗, then u∗ is the unique solution point
COMPSCI 371D — Machine Learning Convex Programs 9 / 16
Shape of the Solution Set
Zero Gradient→ KKT Conditions• For the unconstrained problem, the solution is characterized
by ∇f (u) = 0• Constraints can generate new minima and maxima• Example: f (u) = eu
0 1
u
f(u
)
0 1
u
f(u
)
0 1
u
f(u
)
• What is the new characterization?• Karush-Kuhn-Tucker conditions, necessary and sufficient
COMPSCI 371D — Machine Learning Convex Programs 10 / 16
The Karush-Kuhn-Tucker Conditions
Regular Points
∇fs
u
H +H −
COMPSCI 371D — Machine Learning Convex Programs 11 / 16
The Karush-Kuhn-Tucker Conditions
Corner Points
C
∇fs
uc1
c2
H +H −
C
∇f
uc1
c2
H +H −
COMPSCI 371D — Machine Learning Convex Programs 12 / 16
The Karush-Kuhn-Tucker Conditions
The Convex Cone of the Constraint Gradients
∇f
uc1
c2
H +H −
COMPSCI 371D — Machine Learning Convex Programs 13 / 16
The Karush-Kuhn-Tucker Conditions
Inactive Constraints Do Not Matter
C
∇f
uc1
c2 c3
H +H −
C
u
c1
c2 v
COMPSCI 371D — Machine Learning Convex Programs 14 / 16
The Karush-Kuhn-Tucker Conditions
Conic Combinations
∇f
uc1
c2
H +H −a1 a2
v
n1 n2
{v : v = α1a1 + α2a2 with α1, α2 ≥ 0}
COMPSCI 371D — Machine Learning Convex Programs 15 / 16
The Karush-Kuhn-Tucker Conditions
The KKT Conditionsu ∈ C is a solution to a convex program iff there exist αi s.t.
∇f (u) =∑
i∈A(u)
αi∇ci(u) with αi ≥ 0
where A(u) = {i : ci(u) = 0} is the active set at u
∇f
uc1
c2
H +H −
Convention:∑
i∈∅ = 0 (so condition also holds in interior of C)COMPSCI 371D — Machine Learning Convex Programs 16 / 16