support vector machines

27
Support Vector Machines Theory and Implementation in python by Nachi

Upload: nachi-vpn

Post on 04-Nov-2014

249 views

Category:

Science


1 download

DESCRIPTION

Understanding Support Vector Machines

TRANSCRIPT

Page 1: Support Vector Machines

Support Vector Machines

Theory and Implementation in python

byNachi

Page 2: Support Vector Machines

Definition

In machine learning, support vector machines are supervised learning models with associated learning algorithms that analyze data and recognize patterns, used for classification and regression analysis.

- Wikipedia

Page 3: Support Vector Machines

Properties of an SVM

Non probabilistic binary linear classifier

Support for non-linear classification using the 'kernel trick'

Page 4: Support Vector Machines

Linear separability

Two sets of points in p-dimensional space are said to be linearly separable if they can be separated using a p-1 dimensional hyperplane.

Example - The two sets of 2D datain the image are separated by a single straight line (1D hyperplane), and hence are linearly separable

Page 5: Support Vector Machines

Linear Discriminant

The hyperplane that separates the two sets of data is called the linear discriminant.

Equation:WT X = C

W = [w1,w2,.......wn]X = [X1,X2,......Xn]for the nth dimension

Page 6: Support Vector Machines

Selecting the hyperplane

For every linearly separable data, there exist infinite number of separating hyperplanes. Hence, we must choose the most suitable one for classification.

Page 7: Support Vector Machines

Maximal Margin Hyperplane

We can compute the (perpendicular) distance from each observation in the data set to a given separating hyperplane; the smallest such distance is the minimal distance from the observations to the hyperplane, and is known as the margin. The maximal margin hyperplane is the separating hyperplane for which the margin is largest.

Page 8: Support Vector Machines

Example - maximal margin hyperplane

Page 9: Support Vector Machines

Finding the shortest distance (margin)Find Xp

Such that ||Xp-X|| is minimum and

Wt Xp =C (as Xp is on decision boundary)

[Wt - W transpose]

Page 10: Support Vector Machines

Maximizing the marginMaximize D such that

D = (WT X – C) / ||W||

where X is the support v

Page 11: Support Vector Machines

Why maximum margin hyperplane?● Supposing we have a maximal margin hyperplane for

a data set and want to predict the class for a new observation, we compute the distance from the hyperplane.

● The more the distance from the hyperplane the more confident we are that the sample belongs to that class.

● Thus the hyperplane with the farthest smallest distance from the training observation would be the most suitable.

Page 12: Support Vector Machines

Classifying a new sample

Consider a new sample x’ = [x1,x2,....xn]. To predict the class to which the sample belongs, we must simply compute WT X = C.

If WT X > C it lies on one side (positive half space) of the hyperplane or if WT X < C it lies on the other side (negative half space) of the hyperplane. The sample belongs to the class which represents the corresponding half space.

Page 13: Support Vector Machines

SVM - A linear discriminant

An SVM is simply a linear discriminant which tries to build a hyperplane such that it has a large margin.

It classifies a new sample by simply computing the distance from the hyperplane.

Page 14: Support Vector Machines

Support Vectors

● Observations (represented as vectors) which lie at marginal distance from the hyperplane are called support vectors.

● These are important as shifting them even slightly might change the position of the hyperplane to a great extent.

Page 15: Support Vector Machines

Example - Support vectors

The vectors lying on the green lines in the image are the support vectors.

Page 16: Support Vector Machines

Soft margin

To avoid ‘overfitting’ of data (i.e. low sensitivity of individual observations) by trying to make perfectly linearly separable sets, we may opt to allow some amount of misclassification keeping in mind the greater robustness to individual observations and better classification of most of the observations.

Page 17: Support Vector Machines

Achieving soft marginEach observation has something known as the ‘slack variable’ that allow individual observations to be on the wrong side of the margin or the hyperplane.Sum of slack variables <= CWhere C is a nonnegative tuning parameter. C is our budget for the amount that the margin can violated by all the observations.

Page 18: Support Vector Machines

Tuning parameter C & Support vectors relation

Observations that lie directly on the margin, or on the wrong side of the margin for their class, are known as support vectors. These observations do affect the support vector classifier.When the tuning parameter C is large, then the margin is wide, many observations violate the margin, and so there are many support vectors.

Page 19: Support Vector Machines

Non linearly separable

In this case, an SVM would not able to linearly classify the data. Hence SVM uses what is known as the ‘kernel trick’.The idea is that the enlarged feature space might have a linear boundary which might not quite be linear in the original feature space. In this ‘trick’ the feature space is enlarged. This can be done using various kernel functions.

Page 20: Support Vector Machines

Enlarged feature space

Page 21: Support Vector Machines

Multi-Category Classification

● One-Versus-One Classification

● One-Versus-All Classification

Page 22: Support Vector Machines

Sample Data

X = [ [0,0], [1,1], [2,2], [3,3], [4,4] ]

Y =[ 0, 0, 0, 1, 1]

Page 23: Support Vector Machines

SVM in sklearn

clfy = svm.SVC()

Default:class sklearn.svm.SVC(C=1.0, kernel='rbf', degree=3, gamma=0.0, coef0=0.0, shrinking=True, probability=False, tol=0.001, cache_size=200, class_weight=None, verbose=False, max_iter=-1, random_state=None)

Page 24: Support Vector Machines

‘Fit’ the model

clfy.fit(x,y)

Fit the SVM model i.e., compute and build a hyperplane.

Page 25: Support Vector Machines

Features of sklearn

clfy.support_vectors_ Retrieve all the support vectors of the model

clfy.predict([3,3]) Predict the class of the given sample

Page 26: Support Vector Machines

Features of sklearn

clfy.score(x,y)Returns the mean accuracy on the given test data and labels.

clfy.decision_function([2.5,2.5])Distance of the samples X to the separating hyperplane.

Page 27: Support Vector Machines

Conclusion

Parameter and kernel selection is crucial in an SVM model.