support vector machine (svm) based on nello cristianini presentation
TRANSCRIPT
![Page 1: Support Vector Machine (SVM) Based on Nello Cristianini presentation](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb35503460f94bbb276/html5/thumbnails/1.jpg)
Support Vector Machine (SVM)
Based on Nello Cristianini presentationhttp://www.support-vector.net/tutorial.html
![Page 2: Support Vector Machine (SVM) Based on Nello Cristianini presentation](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb35503460f94bbb276/html5/thumbnails/2.jpg)
Basic Idea
• Use Linear Learning Machine (LLM).
• Overcome the linearity constraints: Map to non-linearly to higher dimension.
• Select between hyperplans Use margin as a test
• Generalization depends on the margin.
![Page 3: Support Vector Machine (SVM) Based on Nello Cristianini presentation](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb35503460f94bbb276/html5/thumbnails/3.jpg)
General idea
Original Problem Transformed Problem
![Page 4: Support Vector Machine (SVM) Based on Nello Cristianini presentation](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb35503460f94bbb276/html5/thumbnails/4.jpg)
Kernel Based Algorithms
• Two separate learning functions
• Learning Algorithm: in an imbedded space
• Kernel function performs the embedding
![Page 5: Support Vector Machine (SVM) Based on Nello Cristianini presentation](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb35503460f94bbb276/html5/thumbnails/5.jpg)
Basic Example: Kernel Perceptron
• Hyperplane classification f(x)=<w,x>+b = <w’,x’> h(x)= sign(f(x))
• Perceptron Algorithm: Sample: (xi,ti), ti{-1,+1}
If ti <wk,xi> < 0 THEN /* Error*/
wk+1 = wk + ti xi
k=k+1
![Page 6: Support Vector Machine (SVM) Based on Nello Cristianini presentation](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb35503460f94bbb276/html5/thumbnails/6.jpg)
Recall
• Margin of hyperplan w
• Mistake bound
2
*max
)(
wD
xM
w
xwwD
Sbx ii
,
min)(
![Page 7: Support Vector Machine (SVM) Based on Nello Cristianini presentation](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb35503460f94bbb276/html5/thumbnails/7.jpg)
Observations
• Solution is a linear combination of inputs w = ai ti xi
where ai >0
• Mistake driven Only points on which we make mistake
influence!
• Support vectors The non-zero ai
![Page 8: Support Vector Machine (SVM) Based on Nello Cristianini presentation](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb35503460f94bbb276/html5/thumbnails/8.jpg)
Dual representation
• Rewrite basic function: f(x) = <w,x> +b = ai ti <xi , x> +b
w = ai ti xi
• Change update rule: IF tj ( ai ti <xi , xj> +b) < 0
THEN aj = aj+1
• Observation: Data only inside inner product!
![Page 9: Support Vector Machine (SVM) Based on Nello Cristianini presentation](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb35503460f94bbb276/html5/thumbnails/9.jpg)
Limitation of Perceptron
• Only linear separations• Only converges for linearly
separable data• Only defined on vectorial data
![Page 10: Support Vector Machine (SVM) Based on Nello Cristianini presentation](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb35503460f94bbb276/html5/thumbnails/10.jpg)
The idea of a Kernel
• Embed data to a different space
• Possibly higher dimension
• Linearly separable in the new space.
Original Problem Transformed Problem
![Page 11: Support Vector Machine (SVM) Based on Nello Cristianini presentation](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb35503460f94bbb276/html5/thumbnails/11.jpg)
Kernel Mapping
• Need only to compute inner-products.
• Mapping: M(x)
• Kernel: K(x,y) = < M(x) , M(y)>
• Dimensionality of M(x): unimportant!
• Need only to compute K(x,y)
• Using it in the embedded space: Replace <x,y> by K(x,y)
![Page 12: Support Vector Machine (SVM) Based on Nello Cristianini presentation](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb35503460f94bbb276/html5/thumbnails/12.jpg)
Example
x=(x1 , x2); z=(z1 , z2); K(x,z) = (<x,z>)2
M(z)) (M(x),
])2,,[],2,,([
)2(
)(),(
2122
221
22
2
221122
22
22
22211
2
11
11
zzzzxxxx
zxzxzxzx
zxzxzx
![Page 13: Support Vector Machine (SVM) Based on Nello Cristianini presentation](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb35503460f94bbb276/html5/thumbnails/13.jpg)
Polynomial Kernel
Original Problem Transformed Problem
![Page 14: Support Vector Machine (SVM) Based on Nello Cristianini presentation](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb35503460f94bbb276/html5/thumbnails/14.jpg)
Kernel Matrix
k(1,4)k(1,3)k(1,2)K(1,1)K(2,4)K(2,3)K(2,2)K(2,1)K(3,4)K(3,3)K(3,2)K(3,1)K(4,4)K(4,3)K(4,2)K(4,1)
![Page 15: Support Vector Machine (SVM) Based on Nello Cristianini presentation](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb35503460f94bbb276/html5/thumbnails/15.jpg)
Example of Basic Kernels
• Polynomial K(x,z)= (<x,z> )d
• Gaussian K(x,z)= exp{- ||x-z||2 /2}
![Page 16: Support Vector Machine (SVM) Based on Nello Cristianini presentation](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb35503460f94bbb276/html5/thumbnails/16.jpg)
Kernel: Closure Properties
• K(x,z) = K1(x,z) + c
• K(x,z) = c*K1(x,z)
• K(x,z) = K1(x,z) * K2(x,z)
• K(x,z) = K1(x,z) + K2(x,z)
• Create new kernels using basic ones!
![Page 17: Support Vector Machine (SVM) Based on Nello Cristianini presentation](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb35503460f94bbb276/html5/thumbnails/17.jpg)
Support Vector Machines
• Linear Learning Machines (LLM)
• Use dual representation
• Work in the kernel induced feature space f(x) = ai ti K(xi , x) +b
• Which hyperplane to select
![Page 18: Support Vector Machine (SVM) Based on Nello Cristianini presentation](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb35503460f94bbb276/html5/thumbnails/18.jpg)
Generalization of SVM
• PAC theory: error = O( Vcdim / m) Problem: Vcdim >> m No preference between consistent hyperplanes
![Page 19: Support Vector Machine (SVM) Based on Nello Cristianini presentation](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb35503460f94bbb276/html5/thumbnails/19.jpg)
Margin based bounds
• H: Basic Hypothesis class
• conv(H): finite convex combinations of H
• D: Distribution over X and {+1,-1}
• S: Sample of size m over D
![Page 20: Support Vector Machine (SVM) Based on Nello Cristianini presentation](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb35503460f94bbb276/html5/thumbnails/20.jpg)
Margin based bounds
• THEOREM: for every f in conv(H)
Lxyfxyf SD ])([Pr]0)([Pr
/1log
θ
||loglog12
Hm
mOL
![Page 21: Support Vector Machine (SVM) Based on Nello Cristianini presentation](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb35503460f94bbb276/html5/thumbnails/21.jpg)
Maximal Margin Classifier
• Maximizes the margin
• Minimizes the overfitting due to margin selection.
• Increases margin Rather than reduce dimensionality
![Page 22: Support Vector Machine (SVM) Based on Nello Cristianini presentation](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb35503460f94bbb276/html5/thumbnails/22.jpg)
SVM: Support Vectors
![Page 23: Support Vector Machine (SVM) Based on Nello Cristianini presentation](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb35503460f94bbb276/html5/thumbnails/23.jpg)
Margins
• Geometric Margin: mini ti f(xi)/ ||w|| Functional margin: mini ti f(xi)
f(x)
![Page 24: Support Vector Machine (SVM) Based on Nello Cristianini presentation](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb35503460f94bbb276/html5/thumbnails/24.jpg)
Main trick in SVM
• Insist on functional marginal at least 1. Support vectors have margin 1.
• Geometric margin = 1 / || w||
• Proof.
![Page 25: Support Vector Machine (SVM) Based on Nello Cristianini presentation](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb35503460f94bbb276/html5/thumbnails/25.jpg)
SVM criteria
• Find a hyperplane (w,b)
• That Maximizes: || w ||2 = <w,w>
• Subject to: for all i ti (<w,xi>+b) 1
![Page 26: Support Vector Machine (SVM) Based on Nello Cristianini presentation](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb35503460f94bbb276/html5/thumbnails/26.jpg)
Quadratic Programming
• Quadratic goal function.
• Linear constraint.
• Unique Maximum.
• Polynomial time algorithms.
![Page 27: Support Vector Machine (SVM) Based on Nello Cristianini presentation](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb35503460f94bbb276/html5/thumbnails/27.jpg)
Dual Problem
• Maximize W(a) = ai - 1/2 i,j ai ti aj tj K(xi , xj) +b
• Subject to i ai ti =0
ai 0
![Page 28: Support Vector Machine (SVM) Based on Nello Cristianini presentation](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb35503460f94bbb276/html5/thumbnails/28.jpg)
Applications: Text
• Classify a text to given categories Sports, news, business, science, …
• Feature space Bag of words Huge sparse vector!
![Page 29: Support Vector Machine (SVM) Based on Nello Cristianini presentation](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb35503460f94bbb276/html5/thumbnails/29.jpg)
Applications: Text
• Practicalities: Mw(x) = tfw log (idfw) / K
ftw = text frequency of w
idfw = inverse document frequency
idfw = # documents / # documents with w
• Inner product <M(x),M(z)> sparse vectors
• SVM: finds a hyperplan in “document space”