support vector machine (svm) based on nello cristianini presentation

Support Vector Machine (SVM)

Based on Nello Cristianini presentationhttp://www.support-vector.net/tutorial.html

Basic Idea

• Use Linear Learning Machine (LLM).

• Overcome the linearity constraints: Map to non-linearly to higher dimension.

• Select between hyperplans Use margin as a test

• Generalization depends on the margin.

General idea

Original Problem Transformed Problem

Kernel Based Algorithms

• Two separate learning functions

• Learning Algorithm: in an imbedded space

• Kernel function performs the embedding

Basic Example: Kernel Perceptron

• Hyperplane classification f(x)=<w,x>+b = <w’,x’> h(x)= sign(f(x))

• Perceptron Algorithm: Sample: (xi,ti), ti{-1,+1}

If ti <wk,xi> < 0 THEN /* Error*/

wk+1 = wk + ti xi

k=k+1

Recall

• Margin of hyperplan w

• Mistake bound

2

*max

)(

wD

xM

w

xwwD

Sbx ii

,

min)(

Observations

• Solution is a linear combination of inputs w = ai ti xi

where ai >0

• Mistake driven Only points on which we make mistake

influence!

• Support vectors The non-zero ai

Dual representation

• Rewrite basic function: f(x) = <w,x> +b = ai ti <xi , x> +b

w = ai ti xi

• Change update rule: IF tj ( ai ti <xi , xj> +b) < 0

THEN aj = aj+1

• Observation: Data only inside inner product!

Limitation of Perceptron

• Only linear separations• Only converges for linearly

separable data• Only defined on vectorial data

The idea of a Kernel

• Embed data to a different space

• Possibly higher dimension

• Linearly separable in the new space.


Kernel Mapping

• Need only to compute inner-products.

• Mapping: M(x)

• Kernel: K(x,y) = < M(x) , M(y)>

• Dimensionality of M(x): unimportant!

• Need only to compute K(x,y)

• Using it in the embedded space: Replace <x,y> by K(x,y)

Example

x=(x1 , x2); z=(z1 , z2); K(x,z) = (<x,z>)2

M(z)) (M(x),

])2,,[],2,,([

)2(

)(),(

2122

221

22

2

221122

22

22

22211

2

11

11

zzzzxxxx

zxzxzxzx

zxzxzx

Polynomial Kernel


Kernel Matrix

k(1,4)k(1,3)k(1,2)K(1,1)K(2,4)K(2,3)K(2,2)K(2,1)K(3,4)K(3,3)K(3,2)K(3,1)K(4,4)K(4,3)K(4,2)K(4,1)

Example of Basic Kernels

• Polynomial K(x,z)= (<x,z> )d

• Gaussian K(x,z)= exp{- ||x-z||2 /2}

Kernel: Closure Properties

• K(x,z) = K1(x,z) + c

• K(x,z) = c*K1(x,z)

• K(x,z) = K1(x,z) * K2(x,z)

• K(x,z) = K1(x,z) + K2(x,z)

• Create new kernels using basic ones!

Support Vector Machines

• Linear Learning Machines (LLM)

• Use dual representation

• Work in the kernel induced feature space f(x) = ai ti K(xi , x) +b

• Which hyperplane to select

Generalization of SVM

• PAC theory: error = O( Vcdim / m) Problem: Vcdim >> m No preference between consistent hyperplanes

Margin based bounds

• H: Basic Hypothesis class

• conv(H): finite convex combinations of H

• D: Distribution over X and {+1,-1}

• S: Sample of size m over D

Margin based bounds

• THEOREM: for every f in conv(H)

Lxyfxyf SD ])([Pr]0)([Pr

/1log

θ

||loglog12

Hm

mOL

Maximal Margin Classifier

• Maximizes the margin

• Minimizes the overfitting due to margin selection.

• Increases margin Rather than reduce dimensionality

SVM: Support Vectors

Margins

• Geometric Margin: mini ti f(xi)/ ||w|| Functional margin: mini ti f(xi)

f(x)

Main trick in SVM

• Insist on functional marginal at least 1. Support vectors have margin 1.

• Geometric margin = 1 / || w||

• Proof.

SVM criteria

• Find a hyperplane (w,b)

• That Maximizes: || w ||2 = <w,w>

• Subject to: for all i ti (<w,xi>+b) 1

Quadratic Programming

• Quadratic goal function.

• Linear constraint.

• Unique Maximum.

• Polynomial time algorithms.

Dual Problem

• Maximize W(a) = ai - 1/2 i,j ai ti aj tj K(xi , xj) +b

• Subject to i ai ti =0

ai 0

Applications: Text

• Classify a text to given categories Sports, news, business, science, …

• Feature space Bag of words Huge sparse vector!

Applications: Text

• Practicalities: Mw(x) = tfw log (idfw) / K

ftw = text frequency of w

idfw = inverse document frequency

idfw = # documents / # documents with w

• Inner product <M(x),M(z)> sparse vectors

• SVM: finds a hyperplan in “document space”

support vector machine (svm) based on nello cristianini presentation

Documents

ai ti xiwhere ai

ai ti kxi

ai ti bw

z k2x

tj ai ti b

z ckx

wk ti xik

xj bsubject toi ai ti