1 CSE 5526: SVMs
CSE 5526: Introduction to Neural Networks
Support Vector Machines (SVMs)
Part 3: Kernels
2 CSE 5526: SVMs
Kernels are generalizations of inner products
โข A kernel is a function of two data points such that ๐๐ ๐ฅ๐ฅ, ๐ฅ๐ฅโฒ = ๐๐๐๐ ๐ฅ๐ฅ ๐๐(๐ฅ๐ฅโฒ)
For some function ๐๐ ๐ฅ๐ฅ โข It is therefore symmetric: ๐๐ ๐ฅ๐ฅ, ๐ฅ๐ฅโฒ = ๐๐ ๐ฅ๐ฅโฒ, ๐ฅ๐ฅ โข Can compute ๐๐ ๐ฅ๐ฅ, ๐ฅ๐ฅโฒ from an explicit ๐๐ ๐ฅ๐ฅ โข Or prove that ๐๐ ๐ฅ๐ฅ, ๐ฅ๐ฅโฒ corresponds to some ๐๐ ๐ฅ๐ฅ
โข Never need to actually compute ๐๐ ๐ฅ๐ฅ
3 CSE 5526: SVMs
SVM as a kernel machine
โข Coverโs theorem: A complex classification problem, cast in a high-dimensional space nonlinearly, is more likely to be linearly separable than in the low-dimensional input space
โข SVM for pattern classification 1. Nonlinear mapping of the input space onto a high-
dimensional feature space 2. Constructing the optimal hyperplane for the feature
space
4 CSE 5526: SVMs
Kernel machine illustration
5 CSE 5526: SVMs
Kernelized SVM looks a lot like an RBF net
6 CSE 5526: SVMs
Kernel matrix
โข The matrix
๐๐ =
๐๐(๐ฑ๐ฑ1, ๐ฑ๐ฑ1)
โฏโฎ
๐๐(๐ฑ๐ฑ1, ๐ฑ๐ฑ๐๐)
โฆ ๐๐(๐ฑ๐ฑ๐๐ , ๐ฑ๐ฑ๐๐) โฆ
๐๐(๐ฑ๐ฑ๐๐ , ๐ฑ๐ฑ1)โฎโฆ
๐๐(๐ฑ๐ฑ๐๐ , ๐ฑ๐ฑ๐๐)
is called the kernel matrix, or the Gram matrix. โข K is positive semidefinite
7 CSE 5526: SVMs
Mercerโs theorem relates kernel functions and inner product spaces
โข Suppose that for all finite sets of points ๐๐๐๐ ๐๐=1๐๐
and real numbers ๐๐ ๐๐=1
โ
๏ฟฝ๐๐๐๐๐๐๐๐๐๐ ๐๐๐๐ ,๐๐๐๐๐๐,๐๐
โฅ 0
โข Then ๐ฒ๐ฒ is called a positive semidefinite kernel โข And can be written as
๐๐ ๐๐,๐๐โฒ = ๐๐๐๐ ๐๐ ๐๐(๐๐โฒ) โข For some vector-valued function ๐๐ ๐๐
8 CSE 5526: SVMs
Kernels can be applied in many situations
โข Kernel trick: when predictions are based on inner products of data points, replace with kernel function
โข Some algorithms where this is possible โข Linear / ridge regression โข Principal components analysis โข Canonical correlation analysis โข Perceptron classifier
9 CSE 5526: SVMs
Some popular kernels
โข Polynomial kernel, parameters ๐๐ and ๐๐ ๐๐ ๐๐,๐๐โฒ = ๐๐๐๐๐๐โฒ + ๐๐ ๐๐
โข Finite-dimensional ๐๐(๐๐) can be explicitly computed
โข Gaussian or RBF kernel, parameter ๐๐
๐๐ ๐๐,๐๐โฒ = exp โ12๐๐
๐๐ โ ๐๐โฒ 2
โข Infinite-dimensional ๐๐(๐๐) โข Equivalent to RBF network, but more principled way of
finding centers
10 CSE 5526: SVMs
Some popular kernels
โข Hypebolic tangent kernel, parameters ๐ฝ๐ฝ1 and ๐ฝ๐ฝ2 ๐๐ ๐๐,๐๐โฒ = tanh ๐ฝ๐ฝ1๐๐๐๐๐๐โฒ + ๐ฝ๐ฝ2
โข Only positive semidefinite for some values of ๐ฝ๐ฝ1 and ๐ฝ๐ฝ2 โข Inspired by neural networks, but more principled way of
selecting number of hidden units
โข String kernels or other structure kernels โข Can prove that they are positive definite โข Computed between non-numeric items โข Avoid converting to fixed-length feature vectors
11 CSE 5526: SVMs
Example: polynomial kernel
โข Polynomial kernel in 2D, ๐๐ = 1, ๐๐ = 2 ๐๐ ๐๐,๐๐โฒ = ๐๐๐๐๐๐โฒ + 1 2 = ๐ฅ๐ฅ1๐ฅ๐ฅ1โฒ + ๐ฅ๐ฅ2๐ฅ๐ฅ2โฒ + 1 2
= ๐ฅ๐ฅ12๐ฅ๐ฅ1โฒ2 + ๐ฅ๐ฅ22๐ฅ๐ฅ2โฒ
2 + 2๐ฅ๐ฅ1๐ฅ๐ฅ1โฒ๐ฅ๐ฅ2๐ฅ๐ฅ2โฒ + 2๐ฅ๐ฅ1๐ฅ๐ฅ1โฒ + 2๐ฅ๐ฅ2๐ฅ๐ฅ2โฒ + 1 โข If we define
๐๐ ๐๐ = ๐ฅ๐ฅ12, ๐ฅ๐ฅ22, 2x1x2, 2x1, 2x2, 1๐๐
โข Then ๐๐ ๐๐,๐๐โฒ = ๐๐๐๐ ๐๐ ๐๐ ๐๐โฒ
12 CSE 5526: SVMs
Example: XOR problem again
โข Consider (once again) the XOR problem โข The SVM can solve it using a polynomial kernel
โข With ๐๐ = 2 and ๐๐ = 1
13 CSE 5526: SVMs
XOR: first compute the kernel matrix
โข In general, ๐พ๐พ๐๐๐๐ = ๐๐ ๐๐๐๐ ,๐๐๐๐ = 1 + ๐๐๐๐๐๐๐๐๐๐2
โข For example, ๐พ๐พ11 = ๐๐ โ1
โ1 , โ1โ1 = 1 + 2 2 = 9
๐พ๐พ12 = ๐๐ โ1โ1 , โ1
+1 = 1 + 0 2 = 1
โข So
๐พ๐พ =
9 1 1 11 9 1 11 1 9 11 1 1 9
14 CSE 5526: SVMs
XOR: first compute the kernel matrix
โข Or compute ๐๐ ๐๐๐๐ and their inner products, e.g.,
โข Remember, ๐๐ ๐๐ = ๐ฅ๐ฅ12, ๐ฅ๐ฅ22, 2x1x2, 2x1, 2x2, 1๐๐
โข Since ๐๐ ๐๐ includes 1, no need for separate ๐๐ later
๐๐ ๐๐1 = ๐๐ โ1โ1 = 1,1, 2,โ 2,โ 2, 1
๐๐
๐๐ ๐๐2 = ๐๐ โ1+1 = 1,1,โ 2,โ 2, 2, 1
๐๐
โข Then ๐พ๐พ11 = ๐๐๐๐ ๐๐1 ๐๐ ๐๐1 = 1 + 1 + 2 + 2 + 2 + 1 = 9 ๐พ๐พ12 = ๐๐๐๐ ๐๐1 ๐๐ ๐๐2 = 1 + 1 โ 2 + 2 โ 2 + 1 = 1
โข Results in same ๐พ๐พ matrix, but more computation
For ๐๐
15 CSE 5526: SVMs
XOR: Combine class labels into ๐พ๐พ
โข Define matrix ๐พ๐พ๏ฟฝ such that ๐พ๐พ๏ฟฝ๐๐๐๐ = ๐พ๐พ๐๐๐๐๐๐๐๐๐๐๐๐ โข Recall ๐ ๐ = โ1, +1, +1,โ1 ๐๐
๐พ๐พ๏ฟฝ =
+9 โ1 โ1 +1โ1 +9 +1 โ1โ1 +1 +9 โ1+1 โ1 โ1 +9
16 CSE 5526: SVMs
XOR: Solve dual Lagrangian for ๐๐
โข Find fixed points of
๐ฟ๐ฟ๏ฟฝ ๐๐ = ๐๐๐๐๐๐ โ12๐๐๐๐๐พ๐พ๏ฟฝ๐๐
โข Set matrix gradient to 0 ๐ป๐ป๐ฟ๐ฟ๏ฟฝ = ๐๐ โ ๐พ๐พ๏ฟฝ๐๐ = ๐๐
โ ๐๐ = ๐พ๐พ๏ฟฝโ1๐๐ =18
,18
,18
,18
๐๐
โข Satisfies all conditions: ๐๐๐๐ โฅ 0 โ๐๐ โ ๐๐๐๐๐๐๐๐ = 0๐๐ โข So this is the solution
โข All points are support vectors
17 CSE 5526: SVMs
XOR: Compute ๐๐ (including ๐๐) from ๐๐
๐๐ = ๏ฟฝ๐๐๐๐๐๐๐๐๐๐๐๐๐๐
= โ18๐๐ ๐๐1 +
18๐๐ ๐๐2 +
18๐๐ ๐๐3 โ
18๐๐ ๐๐4
=18
โ
112
โ 2โ 2
1
+
11
โ 2โ 2
21
+
11
โ 22
โ 21
โ
11222
1
=
00
โ12
000
๐๐
18 CSE 5526: SVMs
XOR: Examine prediction function
โข Prediction function ๐ฆ๐ฆ ๐๐ = ๐๐๐๐๐๐ ๐๐
= 0,0,โ12
, 0,0,0๐๐
๐ฅ๐ฅ12, ๐ฅ๐ฅ22, 2x1x2, 2x1, 2x2, 1
= โ๐ฅ๐ฅ1๐ฅ๐ฅ2 โข Predictions are based on product of the dimensions
๐ฆ๐ฆ ๐๐1 = โ โ1 โ1 = โ1 ๐ฆ๐ฆ ๐๐2 = โ โ1 +1 = +1 ๐ฆ๐ฆ ๐๐3 = โ +1 โ1 = +1 ๐ฆ๐ฆ ๐๐4 = โ +1 +1 = โ1
19 CSE 5526: SVMs
XOR: Decision boundaries
โข Decision boundary at ๐ฆ๐ฆ ๐๐ = โ๐ฅ๐ฅ1๐ฅ๐ฅ2 = 0 โข Support vectors at ๐ฆ๐ฆ ๐๐ = โ๐ฅ๐ฅ1๐ฅ๐ฅ2 = 1
๐ฆ๐ฆ ๐๐ > 0