machine learning : support vector machinesds24/lehre/ml_ws_2013/ml_10_svm.pdf · 2020-01-24 ·...

14
Machine Learning Support Vector Machines Machine Learning : Support Vector Machines 05/01/2014

Upload: others

Post on 08-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Machine Learning : Support Vector Machinesds24/lehre/ml_ws_2013/ml_10_svm.pdf · 2020-01-24 · Conclusion 05/01/2014 Machine Learning : Support Vector Machines 14 • SVM is a representative

Machine Learning

Support Vector Machines

Machine Learning : Support Vector Machines05/01/2014

Page 2: Machine Learning : Support Vector Machinesds24/lehre/ml_ws_2013/ml_10_svm.pdf · 2020-01-24 · Conclusion 05/01/2014 Machine Learning : Support Vector Machines 14 • SVM is a representative

Linear Classifiers (recap)

05/01/2014 Machine Learning : Support Vector Machines 2

A building block for almost all – a mapping ,a partitioning of the input space into half-spaces that correspond to classes.

Decision rule:is the normal to the hyper plane

(Synonyms – Neuron model, Perceptron etc.)

Page 3: Machine Learning : Support Vector Machinesds24/lehre/ml_ws_2013/ml_10_svm.pdf · 2020-01-24 · Conclusion 05/01/2014 Machine Learning : Support Vector Machines 14 • SVM is a representative

Two learning tasks

05/01/2014 Machine Learning : Support Vector Machines 3

Let a training dataset be given with

(i) data and (ii) classes

The goal is to find a hyper plane that separates the data (correctly)

________________________________________________________

Now: The goal is to find a “corridor” (stripe) of the maximal width that separates the data (correctly).

Page 4: Machine Learning : Support Vector Machinesds24/lehre/ml_ws_2013/ml_10_svm.pdf · 2020-01-24 · Conclusion 05/01/2014 Machine Learning : Support Vector Machines 14 • SVM is a representative

Linear SVM

05/01/2014 Machine Learning : Support Vector Machines 4

Remember that the solution is defined only up to a common scale → Use canonical (with respect to the learning data) form in order to avoid ambiguity:

The margin:

The optimization problem:

Page 5: Machine Learning : Support Vector Machinesds24/lehre/ml_ws_2013/ml_10_svm.pdf · 2020-01-24 · Conclusion 05/01/2014 Machine Learning : Support Vector Machines 14 • SVM is a representative

Linear SVM

05/01/2014 Machine Learning : Support Vector Machines 5

The Lagrangian of the problem:

The meaning of the dual variables v :

a) (a constraint is broken) →maximization wrt. gives: (surely not a minimum)

b) →maximization wrt. gives →no influence on the Lagrangian

c) → does not mater, the vector is located “on the wall of the corridor” – Support Vector

Page 6: Machine Learning : Support Vector Machinesds24/lehre/ml_ws_2013/ml_10_svm.pdf · 2020-01-24 · Conclusion 05/01/2014 Machine Learning : Support Vector Machines 14 • SVM is a representative

Linear SVM

05/01/2014 Machine Learning : Support Vector Machines 6

Lagrangian:

Derivatives:

The solution is a linear combination of the data points.

Page 7: Machine Learning : Support Vector Machinesds24/lehre/ml_ws_2013/ml_10_svm.pdf · 2020-01-24 · Conclusion 05/01/2014 Machine Learning : Support Vector Machines 14 • SVM is a representative

Linear SVM

05/01/2014 Machine Learning : Support Vector Machines 7

Substitute into the decision rule and obtain

→ the vector is not needed explicitly !!!

The decision rule can be expressed as a linear combination of scalar products with support vectors.

Only strictly positive (i.e. those corresponding to the support vectors) are necessary for that.

Page 8: Machine Learning : Support Vector Machinesds24/lehre/ml_ws_2013/ml_10_svm.pdf · 2020-01-24 · Conclusion 05/01/2014 Machine Learning : Support Vector Machines 14 • SVM is a representative

Linear SVM

05/01/2014 Machine Learning : Support Vector Machines 8

Substitute

into the Lagrangian

and obtain the dual task

→ can also be expressed in terms of scalar products only, the data points are not explicitly necessary.

Page 9: Machine Learning : Support Vector Machinesds24/lehre/ml_ws_2013/ml_10_svm.pdf · 2020-01-24 · Conclusion 05/01/2014 Machine Learning : Support Vector Machines 14 • SVM is a representative

Feature spaces

05/01/2014 Machine Learning : Support Vector Machines 9

1. The input space is mapped onto a feature space by a non-linear transformation

2. The data are separated (classified) by a linear decision rule in the feature space

Example: quadratic classifier

The transformation is

(the images are separable in the feature space)

Page 10: Machine Learning : Support Vector Machinesds24/lehre/ml_ws_2013/ml_10_svm.pdf · 2020-01-24 · Conclusion 05/01/2014 Machine Learning : Support Vector Machines 14 • SVM is a representative

Feature spaces

05/01/2014 Machine Learning : Support Vector Machines 10

The images are not explicitly necessary in order to find the separating plane in the feature space, but their scalar products

For the example above:

→ the scalar product can be computed in the input space, it is not necessary to map the data points onto the feature space explicitly.

Such functions are called Kernels.

Page 11: Machine Learning : Support Vector Machinesds24/lehre/ml_ws_2013/ml_10_svm.pdf · 2020-01-24 · Conclusion 05/01/2014 Machine Learning : Support Vector Machines 14 • SVM is a representative

Kernels

05/01/2014 Machine Learning : Support Vector Machines 11

Kernel is a function that computes scalar product in a feature space

Neither the corresponding space nor the mapping need to be specified thereby explicitly → “Black Box”.

Alternative definition: if a function is a kernel, then there exists such a mapping , that … The corresponding feature space is called the Hilbert space induced by the kernel .

Let a function be given. Is it a kernel?→Mercer’s theorem.

Page 12: Machine Learning : Support Vector Machinesds24/lehre/ml_ws_2013/ml_10_svm.pdf · 2020-01-24 · Conclusion 05/01/2014 Machine Learning : Support Vector Machines 14 • SVM is a representative

Kernels

05/01/2014 Machine Learning : Support Vector Machines 12

Let and be two kernels.

Than are kernels as well(there are also other possibilities to build kernels from kernels).

Popular Kernels:

• Polynomial:

• Sigmoid:

• Gaussian: (interesting : )

Page 13: Machine Learning : Support Vector Machinesds24/lehre/ml_ws_2013/ml_10_svm.pdf · 2020-01-24 · Conclusion 05/01/2014 Machine Learning : Support Vector Machines 14 • SVM is a representative

An example

05/01/2014 Machine Learning : Support Vector Machines 13

The decision rule with a Gaussian kernel

Page 14: Machine Learning : Support Vector Machinesds24/lehre/ml_ws_2013/ml_10_svm.pdf · 2020-01-24 · Conclusion 05/01/2014 Machine Learning : Support Vector Machines 14 • SVM is a representative

Conclusion

05/01/2014 Machine Learning : Support Vector Machines 14

• SVM is a representative of discriminative learning – i.e. with all corresponding advantages (power) and drawbacks (overfitting) –remember e.g. the Gaussian kernel with

• The building block – linear classifiers. All formalisms can be expressed in terms of scalar products – the data are not needed explicitly.

• Feature spaces – make non-linear decision rules in the input spaces possible.

• Kernels – scalar product in feature spaces, the latter need not be necessarily defined explicitly.

Literature (names):

• Bernhard Schölkopf, Alex Smola ...• Nello Cristianini, John Shawe-Taylor ...