support vector machines s.v.m. special session
DESCRIPTION
Support Vector Machines S.V.M. Special session. Bernhard Schölkopf & Stéphane Canu. GMD-FIRST I.N.S.A. - P.S.I. http://svm.first.gmd.de/ http://psichaud.insa-rouen.fr/~scanu/. radial SVM. linear discrimination: the separable case linear discrimination: the NON separable case - PowerPoint PPT PresentationTRANSCRIPT
Support Vector Machines
S.V.M.
Special session
Bernhard Schölkopf & Stéphane Canu
GMD-FIRST I.N.S.A. - P.S.I.
http://svm.first.gmd.de/ http://psichaud.insa-rouen.fr/~scanu/
2ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999
radial SVM
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
3ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999
Road map
• linear discrimination: the separable case• linear discrimination: the NON separable case• quadratic discrimination• radial SVM
– principle– 3 regularization hyperparametres– some benchmark results (glass data)
• SMV for regression
4ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999
What ’s new with SVM
Artificial Neural Networks
Support Vector Machine
From biology to Machine learning
– It works ! Some reason
– formalization of learning : statistical learning theory - learning from data
From maths ! to Machine learning = minimization
– universality learn every thing : Kernel trick
– complexity control but not any thing : Margin
minimization + constraints
5ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999
Space functional
)(=)(),(=)(),(=)(),(
spaceHilbert ngreproduici F case) l(orthogona 1
F )()(
)()(),( and
)()(),( such that then
0)()(),( ,
definite positive offunction -bi a be ),(Let
Theorem sMercer'
11
1=k
2
F1
1
,1
2
yfdxxyxKdxxyxKdxxfyxK
ffxxf
yxyxK
ydxxyxK
dxdyygxfyxKgf
LyxK
kk
kkk
k
k
k
kk
kk
kkk
k
kkkkk
Kernel’s trick
6ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999
Minimization with constraints
)0*)(or 0(either
0)( and 0 with
)()(),( min max
0)( sconstraint e with th, )(min
0*)*,(
0*)*,(
such that *)*,( couple thefind
)()(),( minmax
0)( sconstraintunder )(min
0*)(' such that * find )(min
x
,0
xg
xg
xgxfxL
xgxf
xLx
xL
x
xgxfxL
xgxf
xfxxf
x
convexe
L(x,) : the Lagrangian (Lagrange, 1788)
7ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999
Minimization with constraintsdual formulation
0)( and 0 with )()( max
)()('
)('0)(')('
) 0*)(or 0*(either
0)( and 0 with
)()(),( min max
0)( contraints e with th, )(min
x
ggf
xxg
xfxgxf
xg
xg
xgxfxL
xgxf convex
Phase 1
Phase 2
8ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999
Linear discriminationthe separable case
+
+
+ +
+
++
+
+
+
+
+
+wx+ b=0
Well classifyall examples
9ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999
Margin
Margin
+
+
+ +
+
++
+
+
+
+
+
+
Linear discriminationthe separable case
wx+ b=0
With thelargestMARGIN
Well classifyall examples
10ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999
+
Linear discriminationthe separable case
y
x
+ + +
1
- 1
11ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999
+
Linear discriminationthe separable case
y = wx y
x
+ + +
1
- 1
MARGIN
MARGIN1
w
1
w
12ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999
Margin
Margin
With thelargestMARGIN
+
+
+ +
+
++
+
+
+
+
+
+
Linear discriminationthe separable case
wx+ b=0
max 1
w 2
min w 2
Well classifyall examples
13ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999
Linear classification- the separable case
0
,,,, Tucker andKuhn
'y 2
1, ,maxmin
,1 = 'y sconstraint ith the wmin
,1 = 0'y sconstraint ith the wmin
minimize set learning wholeheclassify t well
1,1 , , '
i1
2
0
i2
i2
2
,1
b
bwL
w
bwL
bxwwwLwL
nibxww
nibxww
w
yRxyxbxwsignd
i
n
ii
w
i
i
dniii
1
1
and
14ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999
Equality constraint integration
min
max
L(,) = 1
2' H c' + y'
L( ,)
H c + y 0 H + y c
L( ,)
y' 0
0 0
=H c
y
y
15ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999
Inequality constraint integration
min
max,
L(, ,) = 1
2' H c' + y'
Optimality conditions * 0 (completed system solution) 0 (multiplyers have to be positive)
While () do not verify optimality conditions
= M-1 b and = - H + c + y
if <0, a constraint is blocked : (i=0) (an active variable is eliminated)else if < 0, a constraint is relaxed
O n3
QP
16ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999
Linear classification : the non separable
case
0 and C 0ith w''2
1 min
,1 = -1' sconstraint he with tC + min
,1 = -1'et withmin
errortion classifica : 1,n whe
,1 = -1' sconstraint relax the
1
n
1=ii
2
w,
n
1=ii
n
iii
iii
iii
i
iii
ycH
nibxwyw
nibxwycw
nibxwy
Error variables
17ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999
quadratic SVM
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2-3
-2
-1
0
1
2
3
18ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999
polynomial classificationd sign w' x1 x2 x1
2 x22 x1x2 b xi , yi i1,n x Rd , y 1,1
well classify the training set and minimize w 2
min w 2 under contraints yi w' x1i , x2i b 1 i = 1, n
x1, x2 x1 x2 x12 x2
2 x1x2 '
H = diag(y) diag(y) '
1
n
1 5
Rang(H) = 5regularization needed
19ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999
Gaussian Kernel based S.V.M.
S
ssss
s
m
kksk
S
s
m
kkskss
m
kk
S
sskss
S
sskssk
m
kjkikij
m
kkk
xxKyxr
xxKxx
bxxy
bxxyxrxyw
xxHH
bxwxr
1
1
1 1
1 11
1
1
),(sign)(ˆ :about forget
),()()( : theoremsMercer'
)()(sign
)()(sign)(ˆ)(
)()( '
)(sign)(ˆ
20ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999
1 d example
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
SVM en 1 d : en vert l'ensemble d'apprentissage, les points entrourees sont supports
Class 1 : mixture of 2 gaussian
Class 2 : gaussian
Training set
Output of the SVM
for the test set
Margin
Supportvectors
21ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999
3 regularization parameters• C : the superior bound
• : the kernel bandwidth: K(x,y)
• the linear system regularization H=b => (H+I)=b
K(x, y) exp x y 2
2
min R, w, n R2 w 2
nR2 min
amax
xi
K(xi , xi ) K(a, a) 2K(xi , a)
C0
22ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999
Small bandwidth and large C
-3 -2 -1 0 1 2 3-3
-2
-1
0
1
2
3
23ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999
Large bandwidth and large C
-3 -2 -1 0 1 2 3-3
-2
-1
0
1
2
3
24ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999
Large bandwidth and small C
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
25ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999
n
iiiii
n
iiiii
n
iiiii
n
iiiii
n
iii
iii
iii
xwfyxwfy
cL
c
bxwy
bxwy
xfy
xfyfyxC
ii
11
**
1
**2
1
**
2
1
*
,
i
*i
*
),(),(
wC- ,
w with min
0 .
0 .
else )(
)( if ),,(
*
SVMfor
regression
26ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999
Example...
0 0.5 1 1.5 2 2.5 3 3.58.5
9
9.5
10
10.5
11
11.5Support Vector Machine Regression
27ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999
small and also
0 0.5 1 1.5 2 2.5 3 3.58.5
9
9.5
10
10.5
11
11.5Support Vector Machine Regression
28ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999
Geostatistics
29ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999
An other way to see things (Girosi, 97)
n
1=i1,1
n
1=iF1,F1
11
n
1=i
2
F1
,2
1
min
,,,2
1,),(
min
,et )(=F
,)(2
1
min
i
n
jijiji
n
iii
i
i
n
jiiiji
n
iii
i
ii
iii
ii
i
n
iii
i
xxKy
xxKxxKxxKxr
yxyxKxcxff
xxKxr
30ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999
SVM history and trends
Vapnik, V.; Lerner, A. 1963statistical learning theory Mangasarian, O. 1965, 1968
optimization Kimeldorf, G; Wahba, G; 1971non parametric regression : splines
Boser, B.; Guyon, I..; Vapnik, V. 1992Bennett, K.; Mangasarian, O. 1992
Learning Theory : Cortes, C. 1995. • soft margin classifier,• effective VC-dimensions• other formalisms, ...
The pioneers
The 2nd start : ANN, learning & computers...
Trends...
Applications :• on-line handwritten C. R.• Face recognition • Text mining• ...
Optimization : • Vapnik• Osuna, E. & Girosi, • John C. Platt • Linda Kaufman• Thorsten Joachims
31ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999
Optimization issuesQP with constraints
0 and C 0 with ''2
1 min
1
n
iii ycH
• Box constraints
• H is positive semidefinite (beware commercial solver)
• Size of H ! But a lot of are 0 or C–active constraint set, starting with = 0–do not compute (store) the whole H –chunk
• multiclass issue !
32ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999
Optimization issuesSolve the whole problem
• commercial : LOQO (primal-dual approach), MINOS, Matlab !!!• Vapnik : More and Toraldo (1991)
Decompose the problem• Chunking (Vapnik, 82, 92),• Ozuna & Girosi (implemented in SVMlight by Thorsten Joachims, 98)• Sequential Minimal Optimization (SMO) John C. Platt, 98
No H : Start from 0 - active set technique (Linda Kaufman, 98)• minimize the cost function
– 2nd order : Newton,– conjugate gradient, projected conjugate gradient PCG, Burges, 98
• select the relevant constraints
Interior point methodsMoré, 91, Z. Dostal, 97 and others...
33ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999
Some benchmark considerations (Platt 98)
• Osuna’s decomposition technique permits the solution of SVMs via fixed-size QP subproblems
• Using two-variable QP subproblems (SMO) does not require QP library
• SMO trades off QP time for kernel evaluation time
• Optimizations can dramatically reduce kernel time– Linear SVMs (useful for text categorization)– Sparse dot products– Kernel caching (good for smaller problems, Thorsten Joachims, 98)
• SMO can be much faster than other techniques for some problems
• what about active set and interior points technique ?
34ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999
open issues• VC Entropy for Margin Classifiers: learning bounds• other margin classifiers: boosting • Non “L2” (quadratic) cost function: Sparse coding (Drezet & Harrsion) • curse of dimensionality: local vs global• kernel influence (Tsuda)• applications:
– classification (Weston & Watkins),– …to regression (Pontil & al.)– face detection (Fernandez & Viennet)
• algorithms (Christiani & Campbell)• making bridges - other formalisms:
– bayesian (Kwok), – statistical mechanics (Buhot & Gordon), – logic (Sebag), …
35ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999
Books in Support Vector ResearchV. Vapnik, The Nature of Statistical Learning Theory. Springer-Verlag, 1995, Statistical Learning Theory. Wiley, 1998.
SVM introductive chapter in :• S. Haykin, Neural Networks, a Comprehensive Foundation. Macmillan, New York, NY., 1998 (2nd ed).• V. Cherkassky and F. Mulier; Learning from Data: Concepts, Theory, and Methods. Wiley, 1998.
C.J.C. Burges; 1998. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge, Discovery, Vol 2 Number 2.
Schölkopf, B.; 1997. Support Vector Learning. PhD Thesis. Published by: R. Oldenbourg Verlag, Munich, 1997. ISBN 3-486-24632-1.
Smola, A. J.; 1998. Learning with Kernels. PhD Thesis. Published by: GMD, Birlinghoven, 1999
NIPS’ 97 workshop’s book : B. Schölkopf, C. Burges, A. Smola. Advances in Kernel Methods: Support Vector Machines, MIT Press, Cambridge, MA; December 1998,
NIPS’ 98 workshop’s book on large margin classifier… is coming
36ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999
Events in Support Vector Research
ACAI '99 WORKSHOP Support Vector Machine Theory and Applications Workshop on Support Vector Machines - IJCAI'99, August 2, 1999, Stockholm, Sweden EUROCOLT'99 workshop on Kernel Methods , March 27, 1999, Nordkirchen Castle, Germany
37ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999
Conclusion
SVM select relevant patterns in a robust way
- svm.cs.rhbnc.ac.uk
Matlab code available under request
Multi class problemsSmall error