svm and svr as convex optimization techniques
DESCRIPTION
SVM and SVR as Convex Optimization Techniques. Mohammed Nasser Department of Statistics Rajshahi University Rajshahi 6205. Acknowledgement. Andrew W. Moore Professor School of Computer Science Carnegie Mellon University. Kenji Fukumizu Institute of Statistical Mathematics, ROIS - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/1.jpg)
SVM and SVR as Convex Optimization Techniques
Mohammed NasserDepartment of Statistics
Rajshahi UniversityRajshahi 6205
![Page 2: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/2.jpg)
AcknowledgementAndrew W. Moore
ProfessorSchool of Computer ScienceCarnegie Mellon University
Kenji FukumizuInstitute of Statistical Mathematics, ROIS
Department of Statistical Science, Graduate University for Advanced Studies
Georgi NalbantovEconometric Institute, School of Economics, Erasmus
University Rotterdam
![Page 3: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/3.jpg)
Contents
Glimpses of Historical Development
Optimal Separating Hyperplane
Soft Margin Support Vector Machine
Support Vector Regression
Convex Optimization
Use of Lagrange and Duality Theory
Example
Conclusion
![Page 4: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/4.jpg)
Early History
In 1900 Karl Pearson published his famous article on goodness of fit, judged as one of first best twelve scientific articles in twentieth century.
In 1902 Jacques Hadamard pointed that mathematical models of physical phenomena should have the properties that
A solution exists The solution is unique
The solution depends continuously on the data, in some reasonable topology
( Well-Posed Problem)
![Page 5: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/5.jpg)
Early History In 1940 Fréchet, PhD student of Hadamard highly criticized
mean and standard deviation as measures of location and scale respectively. But he did express his belief in development of statistics without proposing any alternative.
During sixties and seventies Tukey, Huber and Hampel tried to develop Robust Statistics in order to remove ill-posedness of classical statistics.
Robustness means insensitivity to minor change in both model and sample, high tolerance to major changes and good performance at model.
Data Mining onslaught and the problem of non-linearity and nonvectorial data have made robust statistics somewhat nonattractive.
Let Us See What KM present…………….
![Page 6: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/6.jpg)
6
Support Vector Machines (SVM) introduced in COLT-92 (conference on learning theory) greatly developed since then.
Result: a class of algorithms for Pattern Recognition(Kernel Machines)Now: a large and diverse community, from machinelearning, optimization, statistics, neural networks,functional analysis, etcCentralized website: www.kernel-machines.orgFirst Text book (2000): see www.support-vector.net Now ( 2012): At least twenty books of different taste are
avialable in international marketThe book, “ The Elements of Statistical Learning”(2001)
by Hastie,Tibshirani and Friedman went into second edition within seven years.
Recent History
![Page 7: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/7.jpg)
7
The common characteristic (structure) among the following statistical methods?
1. Principal Components Analysis2. (Ridge ) regression3. Fisher discriminant analysis4. Canonical correlation analysis5.Singular value decomposition6. Independent component analysis
Kernel methods: Heuristic View
We consider linear combinations of input vector: ( ) Tf x w x
We make use concepts of length and dot product available in Euclidean space.
![Page 8: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/8.jpg)
8
• Linear learning typically has nice properties– Unique optimal solutions, Fast learning algorithms– Better statistical analysis
• But one big problem– Insufficient capacity
That means, in many data sets it fails to detect nonlinearship among the variables.
• The other demerits
- Cann’t handle non-vectorial data
Kernel methods: Heuristic View
![Page 9: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/9.jpg)
9
Kernel Methods
Classical KernelPCA KPCA
CCA KCCA
FLDA KFLDA
ICA KICA
Regression SVR
Classification SVM
More…………………………………….
Test of independence
Test of equalityofdistributions
Outliers detection
Data depth function
![Page 10: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/10.jpg)
10
Kernel methods: Heuristic View
In Classical Multivariate Analysis we consider linear combinations of input vector:
( ) Tf x w x
We make use concepts of length and dot product/inner product available in Euclidean/non-Euclidean space.
In Modern Multivariate Analysis we consider linear combinations of feature vector::
1 1
( ) ( ) ( ), ( ) ( , )n n
Ti i i i
i i
f x w x x x k x x
We make use concepts of length and dot product available in Euclidean space.
![Page 11: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/11.jpg)
Some Review of College Geometry
y+x-1=0
y+x-1>0
y+x-1<0
ky+kx-k=0 90
(1,1)
Different effect of k on two signed regions
![Page 12: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/12.jpg)
Some Review of College GeometryIn General Form
wx+b=0
wx+b>0
wx+b<0
kwx+kb=0 90
w
Different effect of k on two signed regions
![Page 13: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/13.jpg)
Some Review of College GeometryIn General Form
wx+b=0
90
w
Effect of change in w
Effect of change in b
![Page 14: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/14.jpg)
Linear Kernel
,
Its RKHS,
. It can be shown,
Let
,
=
![Page 15: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/15.jpg)
Linearly Seperable Classes
![Page 16: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/16.jpg)
Linear Classifiersf x
yest
denotes +1denotes -1
f(x,w,b) = sign(w x + b)
How would you classify this data?
w x + b
=0w x + b<0
w x + b>0
![Page 17: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/17.jpg)
Linear Classifiersf x
yest
denotes +1denotes -1
f(x,w,b) = sign(w x + b)
How would you classify this data?
![Page 18: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/18.jpg)
Linear Classifiersf x
yest
denotes +1denotes -1
f(x,w,b) = sign(w x + b)
How would you classify this data?
![Page 19: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/19.jpg)
Linear Classifiersf x
yest
denotes +1denotes -1
f(x,w,b) = sign(w x + b)
Any of these would be fine..
..but which is the best?
![Page 20: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/20.jpg)
Linear Classifiersf x
yest
denotes +1denotes -1
f(x,w,b) = sign(w x + b)
How would you classify this data?
Misclassified to +1 class
![Page 21: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/21.jpg)
Classifier Marginf x
yest
denotes +1denotes -1
f(x,w,b) = sign(w x + b)
Define the margin of a linear classifier as the width that the boundary could be increased by before hitting a datapoint.
f x
yest
denotes +1denotes -1
f(x,w,b) = sign(w x + b)
![Page 22: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/22.jpg)
Geometric margin versus functional margin
![Page 23: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/23.jpg)
![Page 24: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/24.jpg)
![Page 25: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/25.jpg)
![Page 26: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/26.jpg)
![Page 27: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/27.jpg)
![Page 28: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/28.jpg)
![Page 29: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/29.jpg)
![Page 30: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/30.jpg)
![Page 31: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/31.jpg)
![Page 32: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/32.jpg)
![Page 33: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/33.jpg)
![Page 34: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/34.jpg)
![Page 35: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/35.jpg)
![Page 36: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/36.jpg)
![Page 37: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/37.jpg)
Maximum Marginf x
yest
denotes +1denotes -1
f(x,w,b) = sign(w x + b)
The maximum margin linear classifier is the linear classifier with the, um, maximum margin.This is the simplest kind of SVM (Called an LSVM)Linear SVM
Support Vectors are those datapoints that the margin pushes up against
1. Maximizing the margin is good according to intuition and PAC theory
2. Implies that only support vectors are important; other training examples are ignorable.
3. Empirically it works very very well.
![Page 38: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/38.jpg)
Linear SVM Mathematically
1) Correctly classify all training data if yi = +1 if yi = -1 for all i
wM 2
1bwxi
1iwx b
1)( bwxy ii
wwt
21
Our Goal
2) Maximize the Margin
same as minimize
![Page 39: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/39.jpg)
Minimize
subject to
www t
21)(
1)( bwxy ii i
Linear SVM Mathematically
We can formulate a Quadratic Optimization Problem and solve for w and b
Strictly convex quadratic function
Linear inequality constraints
![Page 40: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/40.jpg)
![Page 41: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/41.jpg)
Slack variables ξi can be added to allow misclassification of difficult or noisy examples.
wx+b=1
wx+b=0
wx+b=-1
e7
e11 e2
Soft Margin Classification
What should our quadratic optimization criterion be?
Minimize
R
kk
T εC12
1 ww
![Page 42: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/42.jpg)
Hard Margin v.s. Soft Margin The old formulation:
The new formulation incorporating slack variables:
Parameter C can be viewed as a way to control overfitting.
Find w and b such thatΦ(w) =½ wTw is minimized and for all {(xi ,yi)}yi (wTxi + b) ≥ 1
Find w and b such thatΦ(w) =½ wTw + CΣξi is minimized and for all {(xi ,yi)}yi (wTxi + b) ≥ 1- ξi and ξi ≥ 0 for all i
![Page 43: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/43.jpg)
Linear Support Vector Regression • Marketing Problem
Given variables:– person’s age– income group– season– holiday duration– location– number of children– etc. (12 variables)E
xpen
ditu
res
Age
●
●
●●
●
●
●
Predict:the level of holiday Expenditures
![Page 44: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/44.jpg)
Linear Support Vector Regression
“Suspiciouslysmart case”(overfitting)
“Lazy case”
(underfitting)
Exp
endi
ture
s
Age
●●
●● ●
●
●
Exp
endi
ture
s
Age
●●
●● ●
●
●
Exp
endi
ture
s
Age
●●
●● ●
●
●
“Compromise case”, SVR(good generalizability)
![Page 45: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/45.jpg)
●●
●●
●●
Linear Support Vector Regression
error,
penalty
• The epsilon-insensitive loss function
penalty
●45°
0
1
2
3
4
●
![Page 46: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/46.jpg)
Linear Support Vector Regression
“Suspiciouslysmart case”
(overfitting)
“Compromise case”, SVR(good generalizability)
“Lazy case”(underfitting)
Exp
endi
ture
s
Age
●●
●● ●
●
●
Exp
endi
ture
s
Age
●●
●● ●
●
●
• The thinner the “tube”, the more complex the model
biggest area small area
Exp
endi
ture
s
Age
●●
●● ●
●
●middle-sized area
“Support vectors”
![Page 47: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/47.jpg)
Non-linear Support Vector RegressionE
xpen
ditu
res
Age
●●
●● ●
●
●
• Map the data into a higher-dimensional space:
![Page 48: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/48.jpg)
Non-linear Support Vector RegressionE
xpen
ditu
res
Age
●●
●● ●
●
●
• Map the data into a higher-dimensional space:
![Page 49: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/49.jpg)
Non-linear Support Vector RegressionE
xpen
ditu
res
Age
●●
●● ●
●
●
• Finding the value of a new point:
![Page 50: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/50.jpg)
Linear SVR: Derivation
• Given training data
• Find: ,such that optimally describes the data:
Exp
endi
ture
s
Age
●●
●● ●
●
●
{ , }, 1, ,i ix y i l
(1)
![Page 51: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/51.jpg)
First Formulation
(2)
![Page 52: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/52.jpg)
![Page 53: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/53.jpg)
Regularized Error Function
22
1
1 ( ( ) )2 2
l
i ii
f x y wl
2
1
1( ( ) )2
l
ii
C E f x y we
In linear regression, we minimize the error function:
Replace the quadratic error function by Є-insensitive error function:
An example of Є-insensitive error function:
![Page 54: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/54.jpg)
Linear SVR: Derivation
Meaning of equation 3
![Page 55: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/55.jpg)
●●
Linear SVR: Derivation
●
●
●●
●
Complexity Sum of errors
vs.
Case I:
Case II:
“tube” complexity
“tube” complexity
![Page 56: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/56.jpg)
Linear SVR: Derivation
Case I:
Case II:
“tube” complexity
“tube” complexity
• The role of C
●●
●● ●
●
●
C is small
●●
●● ●
●
●
C is big
![Page 57: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/57.jpg)
●●
Linear SVR: Derivation
●
●
●●
● Subject to:
Back
![Page 58: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/58.jpg)
Review of Convex Optimization
![Page 59: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/59.jpg)
Review of Convex Optimization
![Page 60: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/60.jpg)
Review of Convex Optimization
Back
![Page 61: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/61.jpg)
Weak DualityReview of Convex Optimization
![Page 62: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/62.jpg)
Strong DualityReview of Convex Optimization
![Page 63: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/63.jpg)
Review of Convex Optimization
![Page 64: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/64.jpg)
Review of Convex Optimization
![Page 65: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/65.jpg)
Review of Convex Optimization
![Page 66: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/66.jpg)
Review of Convex Optimization
![Page 67: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/67.jpg)
Lagrangian
2* * * * *
1 1 1 1
*
1
*
1
* **
1( ) ( ) ( , ) ( , )2
0 ( )
0 ( ) 0
0
0
l l l l
n n n n n n n n n n n n n nn n n n
l
n n nn
l
n nn
n nn
n nn
L C w y w x b y w x b
L w xwLbL C
L a C
e e
Minimize:
f(x)=<w,x>= * *
1 1
( ) , ( ) ,l l
n n n n n nn n
x x x x
Dual var. α
n, αn *,μ
n ,μ*n >=0
Support Vector Regression
![Page 68: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/68.jpg)
Dual Form of Lagrangian
* * * * *
1 1 1 1
*
*
1
1( , ) ( )( ) , ( ) ( )2
0
0
( ) 0
l l l l
n n m m n m n n n n nn m n n
n
n
l
n nn
W a a x x y
C
C
e
Prediction can be made using:
*
1
( ) ( ) ,l
n n nn
f x x x b
Maximize:
???
![Page 69: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/69.jpg)
How to Determine b?Karush-Kuhn-Tucker (KKT) conditions implies( at the optimal solutions):
* *
* *
( , ) 0
( , ) 0( ) 0
( ) 0
n n n n
n n n n
n n
n n
y w x b
y w x bC
C
e
e
Support vectors are points that lie on the boundary or outside the tube
These equations implies many important things.
![Page 70: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/70.jpg)
Important Interpretations
* *0, . . 0 (why??)i i i ii e
* *
*
, 0
,,
i n n n
n n n
n n
C y w x b
w x b yw x b y
e
e e
*
*
0 0,
and 0
0
i i
i
i
![Page 71: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/71.jpg)
Support Vector: The Sparsity of SV Expansion
*
0 ( )
0 ( )i i i
i i i
y f x
f x y
e
e
and
*
( ) 0
( ) 0i i i
i i i
y f x
f x y
e e
![Page 72: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/72.jpg)
Dual Form of Lagrangian(Nonlinear case)
* * * * *
1 1 1 1
*
*
1
1( , ) ( )( ) ( , ) ( ) ( )2
0
0
( ) 0
l l l l
n n m m n m n n n n nn m n n
n
n
l
i ii
W k x x y
C
C
e
e
Prediction can be made using:
*
1
( ) ( ) ( , )l
n n nn
f x a a k x x b
Maximize:
![Page 73: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/73.jpg)
Non-linear SVR: Derivation
Subject to:
![Page 74: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/74.jpg)
Non-linear SVR: Derivation
Subject to:
Saddle point of L has to be found:
min with respect to
max with respect to
![Page 75: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/75.jpg)
Non-linear SVR: derivation
...
![Page 76: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/76.jpg)
• Strengths of SVR:– No local minima– It scales relatively well to high dimensional data– Trade-off between classifier complexity and error can be controlled
explicitly via C and epsilon– Overfitting is avoided (for any fixed C and epsilon)– Robustness of the results– The “curse of dimensionality” is avoided– “[Huber (1964) demonstrated that the best cost function over the worst
model over any pdf of y given x is the linear cost function. Therefore, if the pdf p(y/x) is unknown the best cost function is the linear penalization over the errors” (Perez-Cruz et al., 2003)
• Weaknesses of SVR:
– What is the best trade-off parameter C and best epsilon? – What is a good transformation of the original space
Strengths and Weaknesses of SVR
![Page 77: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/77.jpg)
Experiments and Results
• The vacation problem (again)• Given training data of input-output pairs
where output is “Expenditures” and inputs are “Age”, “Duration” of holiday, “Income group”, “Number of children”, etc.
• Predict on the basis of , • The training set consists of 600, and the test set of 108
observations
![Page 78: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/78.jpg)
Experiments and Results• The SVR
function:
Subject to:
• To find the unknown parameters of the SVR function, solve:
• How to choose , , ,
= RBF kernel: Find , , and from a cross-validation procedure
![Page 79: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/79.jpg)
Experiments and Results
• Do 5-fold cross-validation to find and for several fixed values of .
0 5 10 150.006
0.008
0.01
0.012
0.014
0.016
0.018
0.02
C
gam
ma
CV_MSE, epsilon = 0.15
0.0588
0.0588
0.0588
0.0588
0.0592
0.0592
0.0592
0.0592
0.0592
0.0598
0.0598
0.0598
0.0598
0.061
05 10
150
0.01
0.020.058
0.059
0.06
0.061
0.062
0.063
0.064
gamma
CV_MSE, epsilon = 0.15
C
CV
MS
E
![Page 80: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/80.jpg)
0 5 10 15 20 25 30 35 402
2.5
3
3.5
4Holiday Data, training set, the epsilon-insensitive tube = 0.45
Obserlation
Exp
endi
ture
Experiments and Results• The effect changes in : as it increases, the functional
relationship gets flatter in the higher-dimensional space, but also in the original space
= 0.45
= 0.15
0 5 10 15 20 25 30 35 402
2.5
3
3.5
4
Observation
Exp
endi
ture
Holiday Data, training set, the epsilon-insensitive tube = 0.15
![Page 81: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/81.jpg)
Experiments and Results• Performance on the test set
0 5 10 15 20 25 30 35 402
2.5
3
3.5
4
Observation
Exp
endi
ture
s
Holiday Data, test set, epsilon = 0.15
MSE= 0.04
0 5 10 15 20 25 30 35 402
2.5
3
3.5
4
Obserlation
Exp
endi
ture
Holiday Data, test set, OLS solution
MSE= 0.23
![Page 82: SVM and SVR as Convex Optimization Techniques](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816248550346895dd28a3a/html5/thumbnails/82.jpg)
Experiments and Results• Performance on the test set
0 5 10 15 20 25 30 35 402
2.5
3
3.5
4
Observation
Exp
endi
ture
s
Holiday Data, test set, epsilon = 0.15
MSE= 0.04
0 5 10 15 20 25 30 35 402
2.5
3
3.5
4
Obserlation
Exp
endi
ture
Holiday Data, test set, OLS solution
MSE= 0.23