convex programs - duke university

16
Convex Programs COMPSCI 371D — Machine Learning COMPSCI 371D — Machine Learning Convex Programs 1 / 16

Upload: others

Post on 10-Nov-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Convex Programs - Duke University

Convex Programs

COMPSCI 371D — Machine Learning

COMPSCI 371D — Machine Learning Convex Programs 1 / 16

Page 2: Convex Programs - Duke University

Logistic Regression → Support Vector Machines

Support Vector Machines (SVMs)and Convex Programs

• SVMs are linear predictors in their original form• Defined for both regression and classification• Multi-class versions exist• We will cover only binary SVM classification• Why do we need another linear classifier?• We’ll need some new math: Convex Programs• Optimization of convex functions with affine constraints

COMPSCI 371D — Machine Learning Convex Programs 2 / 16

Page 3: Convex Programs - Duke University

Logistic Regression → Support Vector Machines

Outline

1 Logistic Regression→ Support Vector Machines

2 Local Convex Minimization→ Convex Programs

3 Shape of the Solution Set

4 The Karush-Kuhn-Tucker Conditions

COMPSCI 371D — Machine Learning Convex Programs 3 / 16

Page 4: Convex Programs - Duke University

Logistic Regression → Support Vector Machines

Logistic Regression→ SVMs• A logistic-regression classifier places the decision boundary

somewhere (and approximately) between the two classes• Loss is never zero→ Exact location of the boundary can be

determined by samples that are very distant from theboundary (even on the correct side of it)

• SVMs place the boundary “exactly half-way” between thetwo classes (with exceptions to allow for nonlinearly-separable classes)

• Only samples close to the boundary matter:These are the support vectors

• A “kernel trick” allows going beyond linear classifiers• We only look at the binary case

COMPSCI 371D — Machine Learning Convex Programs 4 / 16

Page 5: Convex Programs - Duke University

Logistic Regression → Support Vector Machines

Roadmap for SVMs• SVM training minimizes a convex function with constraints

• Convex: Unique minimum risk• Constraints: Define a convex program as minimizing a

convex function subject to affine constraints• Representer theorem: The SVM hyperplane normal vector

is a linear combination of a subset of training samples(xn, yn). The xn are the support vectors.

• The proof of the representer theorem is based on acharacterization of the solutions of a convex program

• Characterization for an unconstrained problem: ∇f (u) = 0• Characterization for a convex program:

The Karush-Kuhn-Tucker (KKT) conditions• The representer theorem leads to the kernel trick, through

which SVMs can be turned into nonlinear classifiers• Decision boundary is no longer necessarily a hyperplane

COMPSCI 371D — Machine Learning Convex Programs 5 / 16

Page 6: Convex Programs - Duke University

Logistic Regression → Support Vector Machines

Roadmap Summary

Convex program→ SVM formulation

KKT conditions→ representer theorem→ kernel trick

COMPSCI 371D — Machine Learning Convex Programs 6 / 16

Page 7: Convex Programs - Duke University

Local Convex Minimization → Convex Programs

Local Convex Minimization→ ConvexPrograms

• Convex function f (u) : Rm → R• f differentiable, with continuous first derivatives• Unconstrained minimization: u∗ ∈ argminu∈Rm f (u)• Constrained minimization: u∗ ∈ argminu∈C f (u)• C = {u ∈ Rm : Au + b ≥ 0}• f is a convex function• C is a convex set: If u,v ∈ C, then for t ∈ [0,1]

tu + (1− t)v ∈ C• The specific C is bounded by hyperplanes• This is a convex program

COMPSCI 371D — Machine Learning Convex Programs 7 / 16

Page 8: Convex Programs - Duke University

Local Convex Minimization → Convex Programs

Convex Program

u∗ ∈ argminu∈C

f (u)

whereC def

= {u ∈ Rm : c(u) ≥ 0} .

• f differentiable, with continuous gradient, and convex• k inequalities in C are affine:

c(u) = Au + b ≥ 0 .

COMPSCI 371D — Machine Learning Convex Programs 8 / 16

Page 9: Convex Programs - Duke University

Shape of the Solution Set

Shape of the Solution Set

• Just as for the unconstrained problem:• There is one f ∗ but there can be multiple u∗

(a flat valley)• The set of solution points u∗ is convex• if f is strictly convex at u∗, then u∗ is the unique solution point

COMPSCI 371D — Machine Learning Convex Programs 9 / 16

Page 10: Convex Programs - Duke University

Shape of the Solution Set

Zero Gradient→ KKT Conditions• For the unconstrained problem, the solution is characterized

by ∇f (u) = 0• Constraints can generate new minima and maxima• Example: f (u) = eu

0 1

u

f(u

)

0 1

u

f(u

)

0 1

u

f(u

)

• What is the new characterization?• Karush-Kuhn-Tucker conditions, necessary and sufficient

COMPSCI 371D — Machine Learning Convex Programs 10 / 16

Page 11: Convex Programs - Duke University

The Karush-Kuhn-Tucker Conditions

Regular Points

∇fs

u

H +H −

COMPSCI 371D — Machine Learning Convex Programs 11 / 16

Page 12: Convex Programs - Duke University

The Karush-Kuhn-Tucker Conditions

Corner Points

C

∇fs

uc1

c2

H +H −

C

∇f

uc1

c2

H +H −

COMPSCI 371D — Machine Learning Convex Programs 12 / 16

Page 13: Convex Programs - Duke University

The Karush-Kuhn-Tucker Conditions

The Convex Cone of the Constraint Gradients

∇f

uc1

c2

H +H −

COMPSCI 371D — Machine Learning Convex Programs 13 / 16

Page 14: Convex Programs - Duke University

The Karush-Kuhn-Tucker Conditions

Inactive Constraints Do Not Matter

C

∇f

uc1

c2 c3

H +H −

C

u

c1

c2 v

COMPSCI 371D — Machine Learning Convex Programs 14 / 16

Page 15: Convex Programs - Duke University

The Karush-Kuhn-Tucker Conditions

Conic Combinations

∇f

uc1

c2

H +H −a1 a2

v

n1 n2

{v : v = α1a1 + α2a2 with α1, α2 ≥ 0}

COMPSCI 371D — Machine Learning Convex Programs 15 / 16

Page 16: Convex Programs - Duke University

The Karush-Kuhn-Tucker Conditions

The KKT Conditionsu ∈ C is a solution to a convex program iff there exist αi s.t.

∇f (u) =∑

i∈A(u)

αi∇ci(u) with αi ≥ 0

where A(u) = {i : ci(u) = 0} is the active set at u

∇f

uc1

c2

H +H −

Convention:∑

i∈∅ = 0 (so condition also holds in interior of C)COMPSCI 371D — Machine Learning Convex Programs 16 / 16