1 support vector machines İsmail gÜneŞ. 2 what is svm? a new generation learning system. a new...

1

SUPPORT SUPPORT VECTOR VECTOR

MACHINESMACHINES

İsmail GÜNEŞİsmail GÜNEŞ

2

What is SVM?What is SVM?

A new generation learning system. A new generation learning system. Based on recent advances in Based on recent advances in

statistical learning theory.statistical learning theory. Use a hypothesis space of linear Use a hypothesis space of linear

functions,functions, High dimensional feature space,High dimensional feature space, Optimisation theory,Optimisation theory, Statistical learning theory.Statistical learning theory.

3

Features of SVMFeatures of SVM

Invented by Vapnik.Invented by Vapnik. Simple, geometric, and always trained Simple, geometric, and always trained

to find global optimumto find global optimum.. Used for pattern recognition, Used for pattern recognition,

regression, and linear operator regression, and linear operator inversioninversion..

Considered too slow at the beginning. Considered too slow at the beginning. Now for most application, this problem Now for most application, this problem

is overcomeis overcome..

4

Features of SVM(Cont’d)Features of SVM(Cont’d)

Based on simple ideaBased on simple idea.. High performance in practical High performance in practical

applicationsapplications.. Can deal with complex nonlinear Can deal with complex nonlinear

problemsproblems.. But working with a simple linear But working with a simple linear

algorithmalgorithm..

5

The main idea of SVMs :The main idea of SVMs : Finding Finding Optimal hyperplaneOptimal hyperplane for for

linearly separable patternslinearly separable patterns ! ! Extend to patterns that are not Extend to patterns that are not

linearly separablelinearly separable ! !

6

Separating Line (or Separating Line (or hyperplane)hyperplane)

Goal: Find the best line (or hyperplane) to separate the training data. Goal: Find the best line (or hyperplane) to separate the training data. How to formalize?How to formalize? ●In two dimensions, equation of the line is given by:In two dimensions, equation of the line is given by:

Class 1Class -1

w1x + w2y = b ●Better notation for n dimensions:

bxiiwn

i

.0

bxw

.

7

Simple ClassifierSimple Classifier The Simple Classifier:The Simple Classifier:

Points that fall on the right are classified as “1”Points that fall on the right are classified as “1” Points that fall on the left are classified as “-1”Points that fall on the left are classified as “-1”

UUsing the training set, find a hyperplane sing the training set, find a hyperplane (line) so that(line) so that

w is a weight vector. x is input vector. b is bias.

How can we improve this simple classifier ?How can we improve this simple classifier ?

bxw i

. 1classifor

1classiforbxw i

.

8

Finding the Best PlaneFinding the Best Plane Which of the following two planes are Which of the following two planes are

better ?better ?

Class 1Class -1

The green plane is the better choice, since it The green plane is the better choice, since it is more likely to do well on future test data.is more likely to do well on future test data.

9

Separating the planesSeparating the planes Construct the Construct the bounding planesbounding planes::

Draw two parallel planes to the classification plane.Draw two parallel planes to the classification plane. Push them as far apart as possible, until they hit data Push them as far apart as possible, until they hit data

points.points. The classification plane with bounding planes furthest apart The classification plane with bounding planes furthest apart

is the best one.is the best one.

bxw

.

Class 1Class -1

1. bxw 1. bxw

10

Finding the Best Finding the Best PlanePlane(Cont’d)(Cont’d)

All points in class 1 should be All points in class 1 should be to theto the right of bounding plane right of bounding plane 1.1.

All points in class -1 should be All points in class -1 should be to theto theleft of bounding plane -1.left of bounding plane -1.

yyii isis +1 or -1 depending on the +1 or -1 depending on the classification. Then the above classification. Then the above two inequalities can be written two inequalities can be written as oneas one..

The distance between The distance between bounding planes should be bounding planes should be maximized.maximized.

1. bxw i

1. bxw i

1).( bxwy ii

2222

21

2

...

2

wwww n

11

The Optimization The Optimization ProblemProblem

Mathematical techniques to find hyperplanes Mathematical techniques to find hyperplanes optimizing measures.(maximize distance).optimizing measures.(maximize distance).

This is a This is a mathematical programmathematical program.. Optimization problem subject to Optimization problem subject to

constraintsconstraints.. More specifically, this is a quadratic More specifically, this is a quadratic

programprogram.. There are high powered software tools for There are high powered software tools for

solving this kind of problem (both solving this kind of problem (both commercial and academic)commercial and academic)

12

Data Which is Not Linearly Data Which is Not Linearly SeparableSeparable

What if a separating plane does What if a separating plane does not exist?not exist?

1. bxw i

Class 1Class -1

error

1. bxw ii

Find the plane that maximizes the margin and Find the plane that maximizes the margin and minimizes the errorsminimizes the errors on the training points. on the training points.

Take original inequality and add a Take original inequality and add a slack variableslack variable to to measure error:measure error:

13

The Support Vector The Support Vector MachineMachine

Push the Push the planes apartplanes apart and and minimize minimize the errorthe error at the same time: at the same time:

such that CC is a positive number that is chosen to balance these two goals. is a positive number that is chosen to balance these two goals. This problem is called a This problem is called a Support Vector MachineSupport Vector Machine, or SVM., or SVM. The SVM is one of many techniques for doing supervised The SVM is one of many techniques for doing supervised machine learningmachine learning..

Others: Neural networks, decision trees, k-nearest neighborOthers: Neural networks, decision trees, k-nearest neighbor

m

ii

bw

Cwi 1

2

,, 2

1min

1).( iii bxwy

14

TerminologyTerminology Those points that touch the bounding plane, or Those points that touch the bounding plane, or

lie on the wrong side, are called lie on the wrong side, are called support vectorssupport vectors..

If all the support vectors were removed, the solution would If all the support vectors were removed, the solution would bebe the the same.same. They are the most difficult to classifyThey are the most difficult to classify..

15

What about nonlinear What about nonlinear surfaces?surfaces?

Some datasets may not be best separated by a plane.

First Idea :(Simple and effective) MMap each data point into a higher ap each data point into a higher

dimensional space, and find a linear fit dimensional space, and find a linear fit therethere. .

Finding Quadratic solution.Finding Quadratic solution. Problem: If dimensionality of space If dimensionality of space

is high, lots of calculationsis high, lots of calculations..

16

SolutionSolution

Nonlinear surfaces can be used Nonlinear surfaces can be used without these problems through without these problems through the use of a the use of a kernel functionkernel function..

The kernel function specifies a The kernel function specifies a similarity measure between two similarity measure between two vectorsvectors..

17

SolutionSolution(Cont’d)(Cont’d) The only way in which the data appears in the The only way in which the data appears in the

training problem is in the form of dot products xtraining problem is in the form of dot products xiixxjj..

First map the data to some other (possibly First map the data to some other (possibly infinite dimensional) space infinite dimensional) space H H using a using a mappingmapping ..

Training algorithm now only depends on Training algorithm now only depends on data through dot products in data through dot products in HH: : (x(xii))(x(xjj))

If there is a kernel function K such that If there is a kernel function K such that

K(xK(xii,x,xjj)=)=(x(xii))(x(xjj))we would only need to use K in the training we would only need to use K in the training algorithm and would never need to know algorithm and would never need to know explicitly.explicitly.

18

SVM Applications.SVM Applications. Pattern Recognition :Pattern Recognition :

handwriting recognitionhandwriting recognition 3D 3D object recognitionobject recognition speaker identificationspeaker identification face detectionface detection text categorizationtext categorization bio-informaticsbio-informatics

Regression estimationRegression estimation.. Density estimationDensity estimation.. More…More…

19

ConclusionsConclusions SVM assure that SVM assure that good performancegood performance in a in a

variety of applications such as Pattern variety of applications such as Pattern Recognition,Recognition, regression estimation,regression estimation, time time series prediction etc.series prediction etc.

SSome open issues,ome open issues, Considered too slow at the beginning. Now this Considered too slow at the beginning. Now this

problem is problem is solvedsolved.. The choice of kernel function : there are no The choice of kernel function : there are no

guidelinesguidelines.. In most cases, SVM generalizes better thIn most cases, SVM generalizes better than an

other competing methodsother competing methods(Holds the record for (Holds the record for lowest handwriting recog. error rate, 0.56%)lowest handwriting recog. error rate, 0.56%)..

20

ReferencesReferences

Cristianini, N. and B. Shawe-Taylor, J. “An Inroduction to Support Vector Machines and other kernel-based learning methods”, 2000.

www.support-vector.net Burges, J. C. “A tutorial on support

vector machines for pattern recognition,” Data Mining and Knowledge Discovery, 1998.

1 support vector machines İsmail gÜneŞ. 2 what is svm? a new generation learning system. a new...

Documents