an introduction to support vector machine (svm). famous examples that helped svm become popular

28
An Introduction to Support Vector Machine (SVM)

Upload: doreen-jordan

Post on 17-Jan-2016

220 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: An Introduction to Support Vector Machine (SVM). Famous Examples that helped SVM become popular

An Introduction to Support Vector Machine (SVM)

Page 2: An Introduction to Support Vector Machine (SVM). Famous Examples that helped SVM become popular

Famous Examples that helped SVM become popular

Page 3: An Introduction to Support Vector Machine (SVM). Famous Examples that helped SVM become popular
Page 4: An Introduction to Support Vector Machine (SVM). Famous Examples that helped SVM become popular

Classification

Everyday, all the time we classify things.

Eg crossing the street: Is there a car coming?At what speed?How far is it to the other side?Classification: Safe to walk or not!!!

Page 5: An Introduction to Support Vector Machine (SVM). Famous Examples that helped SVM become popular

Discriminant Function It can be arbitrary functions of x, such as:

Nearest Neighbor

Decision Tree

LinearFunctions

( ) Tg b x w x

NonlinearFunctions

Page 6: An Introduction to Support Vector Machine (SVM). Famous Examples that helped SVM become popular

Background – Classification Problem

Applications:Personal IdentificationCredit RatingMedical DiagnosisText CategorizationDenial of Service DetectionCharacter recognitionBiometricsImage classification

Page 7: An Introduction to Support Vector Machine (SVM). Famous Examples that helped SVM become popular

Classification FormulationGiven

an input spacea set of classes ={ }

the Classification Problem is to define a mapping f: where each x in is

assigned to one classThis mapping function is called a Decision

Function

c ,...,, 21

Page 8: An Introduction to Support Vector Machine (SVM). Famous Examples that helped SVM become popular

Decision Function

The basic problem in classification problem is to find c decision functions

with the property that, if a pattern x belongs to class i, then

is some similarity measure between x and class i, such as distance or probability concept

xdxdxd c,...,, 21

xdxd ji ijcji ;,...2,1,

xdi

Page 9: An Introduction to Support Vector Machine (SVM). Famous Examples that helped SVM become popular

Decision Function

Example

d3=d2

d1=d3

d1=d2

d1,d3<d2

Class 1

Class 3

Class 2

d2,d3<d1

d1,d2<d3

Page 10: An Introduction to Support Vector Machine (SVM). Famous Examples that helped SVM become popular

Single Classifier

Most popular single classifiers:Minimum Distance ClassifierBayes ClassifierK-Nearest NeighborDecision TreeNeural NetworkSupport Vector Machine

Page 11: An Introduction to Support Vector Machine (SVM). Famous Examples that helped SVM become popular

Support Vector Machines (SVM) (Separable case)

The one with largest margin!!

Which is the best separation hyperplane?

SVM

Page 12: An Introduction to Support Vector Machine (SVM). Famous Examples that helped SVM become popular

Linearly Separable Classes

Page 13: An Introduction to Support Vector Machine (SVM). Famous Examples that helped SVM become popular

Support Vector Machine

Basically a 2-class classifier developed by Vapnik and Chervonenkis (1992)

Which line is optimal?

Page 14: An Introduction to Support Vector Machine (SVM). Famous Examples that helped SVM become popular

2

2

1min w

niby ii ,...,2,1,01])[(s.t. xw

}1,1{,,,...,1,),( yRniy pii xx

11

11

ii

ii

yforb

yforb

xw

xw

bf )()( xwx

Maximizing Margin:

Correct Separation:

Support Vector Machines (SVM)

large margin provides better generalization ability

Page 15: An Introduction to Support Vector Machine (SVM). Famous Examples that helped SVM become popular

Why named “Support Vector Machine”?

Support VectorsSupport Vectors

SVsof#

1

** )(sgn)(i

iii byf xxx

Page 16: An Introduction to Support Vector Machine (SVM). Famous Examples that helped SVM become popular

Support Vector Machine

Training vectors : xi , i=1….n

Consider a simple case with two classes :Define a vector y yi = 1 if xi in class 1

= -1 if xi in class 2

A hyperplane which separates all data

r

ρ

Separating plane

Margin

Class 1

Class 2

Support Vector (Class 1)

Support Vector (Class 2)

Page 17: An Introduction to Support Vector Machine (SVM). Famous Examples that helped SVM become popular

i

i

2.8 SVM

Page 18: An Introduction to Support Vector Machine (SVM). Famous Examples that helped SVM become popular

Linear Separable SVM

Label the training data

Suppose we have some hyperplanes which separates the “+” from “-” examples (a separating hyperplane)

x which lie on the hyperplane, satisfy w is noraml to hyperplane, |b|/||w|| is the

perpendicular distance from hyperplane to origin

Page 19: An Introduction to Support Vector Machine (SVM). Famous Examples that helped SVM become popular

Linear Separable SVM

Define two support hyperplane as

H1:wTx = b +δ and H2:wTx = b –δTo solve over-parameterized problem, set δ=1Define the distance as

Margin = distance between H1 and H2 = 2/||w||

Page 20: An Introduction to Support Vector Machine (SVM). Famous Examples that helped SVM become popular

The Primal problem of SVM

Goal: Find a separating hyperplane with largest margin. A SVM is to find w and b that satisfy

(1) minimize ||w||/2 = wTw/2

(2) yi(xi·w+b)-1 ≥ 0

Switch the above problem to a Lagrangian formulation for two reason

(1) easier to handle by transforming into quadratic eq.(2) training data only appear in form of dot products

between vectors => can be generalized to nonlinear case

Page 21: An Introduction to Support Vector Machine (SVM). Famous Examples that helped SVM become popular

Langrange Muliplier Method

a method to find the extremum of a multivariate function f(x1,x2,…xn) subject to the constraint g(x1,x2,…xn) = 0

For an extremum of f to exist on g, the gradient of f must line up with the gradient of g .

for all k = 1, ...,n , where the constant λis called the Lagrange multiplier

The Lagrangian transformation of the problem is

Page 22: An Introduction to Support Vector Machine (SVM). Famous Examples that helped SVM become popular

Langrange Muliplier Method

To have , we need to find the gradient of L with respect to w and b.

(1)

(2) Substitute them into Lagrangian form, we have a

dual problem

Inner product form => Can be generalize to nonlinear case by applying kernel

Page 23: An Introduction to Support Vector Machine (SVM). Famous Examples that helped SVM become popular

KKT Conditions

Since the problems for SVM is convex, the KKT conditions are necessary and sufficient for w, b and α to be a solution.

w is determined by training procedure. b is easily found by using KKT complementary conditions,

by choosing any i for which αi≠ 0

Complementary slackness

Page 24: An Introduction to Support Vector Machine (SVM). Famous Examples that helped SVM become popular

2.8 SVM

What about non-linear boundary?

Page 25: An Introduction to Support Vector Machine (SVM). Famous Examples that helped SVM become popular

Non-Linear Separable SVM : Kernal

To extend to non-linear case, we need to the data to some other Euclidean space.

Page 26: An Introduction to Support Vector Machine (SVM). Famous Examples that helped SVM become popular

Kernal

Φ is a mapping function. Since the training algorithm only depend on data

thru dot products. We can use a “kernal function” K such that

One commonly used example is radial based function (RBF)

A RBF is a real-valued function whose value depends only on the distance from the origin, so that Φ(x)= Φ(||x||) ; or alternatively on the distance from some other point c, called a center, so that Φ(x,c)= Φ(||x-c||).

Page 27: An Introduction to Support Vector Machine (SVM). Famous Examples that helped SVM become popular

Non-separable SVM

Real world application usually have no OSH. We need to add an error term ζ.

=>

To give penalty to error term, define New Lagrangian form is

Page 28: An Introduction to Support Vector Machine (SVM). Famous Examples that helped SVM become popular

Non-separable SVM New KKT Conditions