http://datamining.xmu.edu.cn page: 1 of 38 support vector machine 李旭斌 (li xubin)...
TRANSCRIPT
http://datamining.xmu.edu.cn Page: 1 of 38
Support Vector Machine
李旭斌 (LI Xubin)[email protected]
@Data mining Lab.6/19/2012
http://datamining.xmu.edu.cn Page: 2 of 38
No theory, Just use
Structural risk minimization
VC dimension
hyperplane
Maximum Margin Classifier
Bla,bla….
Paper: What is a support vector machine?
Kernel function
Theory is so complicated…
http://datamining.xmu.edu.cn Page: 3 of 38
What can it do?
Main usage:Classification: C-SVC, nu-SVCRegression: epsilon-SVR, nu-SVRDistribution estimation: one-class SVM
Other:clustering
http://datamining.xmu.edu.cn Page: 4 of 38
But, we have many software with friendly interface.
http://datamining.xmu.edu.cn Page: 5 of 38
Who can achieve SVM?
libSVMJava, C, R, MATLAB, Python, Perl, C#...CUDA!Hadoop(Mahout)!
WEKA Weka-ParallelMATLAB SVM ToolboxSpiderSVM in RGPU-accelerated LIBSVM
http://datamining.xmu.edu.cn Page: 6 of 38
Examples for Machine Learning Algorithms
http://datamining.xmu.edu.cn Page: 7 of 38
ClassificationSVM
http://datamining.xmu.edu.cn Page: 8 of 38
RegressionSVR
http://datamining.xmu.edu.cn Page: 9 of 38
ClusteringK-means
Shortcuts from MLDemos.
http://datamining.xmu.edu.cn Page: 10 of 38
Let’s back to libSVM
http://datamining.xmu.edu.cn Page: 11 of 38
Format of inputThe format of training and testing data file is:<label> <index1>:<value1> <index2>:<value2> .....Each line contains an instance and is ended by a '\n' character. Forclassification, <label> is an integer indicating the class label(multi-class is supported). For regression, <label> is the targetvalue which can be any real number. For one-class SVM, it's not usedso can be any number. The pair <index>:<value> gives a feature(attribute) value: <index> is an integer starting from 1 and <value>is a real number.
Example:1 0:1 1:4 2:6 3:11 0:2 1:6 2:8 3:0 0 0:3 1:1 2:0 3:1
http://datamining.xmu.edu.cn Page: 12 of 38
Parameters
Usage: svm-train [options] training_set_file [model_file]options:-s svm_type : set type of SVM (default 0)
0 -- C-SVC1 -- nu-SVC2 -- one-class SVM3 -- epsilon-SVR4 -- nu-SVR
-t kernel_type : set type of kernel function (default 2)0 -- linear: u'*v1 -- polynomial: (gamma*u'*v + coef0)^degree2 -- radial basis function: exp(-gamma*|u-v|^2)3 -- sigmoid: tanh(gamma*u'*v + coef0)4 -- precomputed kernel (kernel values in training_set_file)
Attention:Parametersin formula
http://datamining.xmu.edu.cn Page: 13 of 38
-d degree : set degree in kernel function (default 3)-g gamma : set gamma in kernel function (default 1/num_feature
s)-r coef0 : set coef0 in kernel function (default 0)-c cost : set the parameter C of C-SVC, epsilon-SVR, and nu-SVR
(default 1)-n nu : set the parameter nu of nu-SVC, one-class SVM, and nu-S
VR (default 0.5)-p epsilon : set the epsilon in loss function of epsilon-SVR (defaul
t 0.1)-m cachesize : set cache memory size in MB (default 100)-e epsilon : set tolerance of termination criterion (default 0.001)-h shrinking : whether to use the shrinking heuristics, 0 or 1 (defa
ult 1)-b probability_estimates : whether to train a SVC or SVR model f
or probability estimates, 0 or 1 (default 0)-wi weight : set the parameter C of class i to weight*C, for C-SV
C (default 1)-v n: n-fold cross validation mode-q : quiet mode (no outputs)
http://datamining.xmu.edu.cn Page: 14 of 38
nu-SVC & C-SVC
“Basically they are the same thing but with different parameters. The range of C is from zero to infinity but nu is always between [0,1]. A nice property of nu is that it is related to the ratio of support vectors and the ratio of the training error. ”
http://datamining.xmu.edu.cn Page: 15 of 38
one-class SVM
Fault diagnosisTrain set is always made up of normal instances.Label: 1 (no -1)Test set contains unknown statuses (instances).Output Label: 1 or -11 : normal-1: anomalous
Anomaly detection
http://datamining.xmu.edu.cn Page: 16 of 38
epsilon-SVR & nu-SVR
Paper:
LIBSVM: A Library for Support Vector Machines
http://datamining.xmu.edu.cn Page: 17 of 38
Comparisonepsilon nu
epsilon nu
http://datamining.xmu.edu.cn Page: 18 of 38
Related experience
Usage and grid search
Code Analysis
Chinese version of libSVM FAQ
http://datamining.xmu.edu.cn Page: 19 of 38
libSVM Guide
http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf
http://datamining.xmu.edu.cn Page: 20 of 38
train
svm-train.model
svm-predict test
result
Flowchart of Task
Train set and test set should been both scaled.Before that, do you reallyneed to scale them?
svm-scaletrain.scale
svm-scaletest.scale
http://datamining.xmu.edu.cn Page: 21 of 38
Parameters are important!
Good parameters will build a good model.
How to get the ‘good’ parameters?
Features are important!
Model is also important!Stupid line
http://datamining.xmu.edu.cn Page: 22 of 38
Example
Train Set
C=2, g=100Positive 83%Negative 85%
C=50, g=100Positive 86%Negative 91%
ROC ?Click here
http://datamining.xmu.edu.cn Page: 24 of 38
Parameter Selection
Grid SearchParticle Swarm OptimizationOther AlgorithmManual try…Random? My God!!
Now, our work:Type: Classification.Goal: Find best (C, G)
http://datamining.xmu.edu.cn Page: 25 of 38
Grid Search
http://datamining.xmu.edu.cn Page: 26 of 38
Parallel Grid Search
SSH Commandgrid.py
Hadoop-based:使用 MapReduce 对 svm 模型进行训练
http://datamining.xmu.edu.cn Page: 27 of 38
Particle Swarm Optimization(PSO)
demo
Demo can’t work? Click here
http://datamining.xmu.edu.cn Page: 29 of 38
Similar Algorithms
Hill-climbing algorithmGenetic algorithmAnt colony optimizationSimulated annealing algorithm
http://datamining.xmu.edu.cn Page: 30 of 38
Let’s back to PSO.
Paper: Development of Particle Swarm Optimization Algorithm
http://datamining.xmu.edu.cn Page: 31 of 38
Particle Swarm Optimization
Birds hurt foodC
G
0
Distance
(Cbest, Gbest)
http://datamining.xmu.edu.cn Page: 32 of 38
PSO and Parameter Selection
PSOFind a point (C, G) tomake the distance between (C, G) and (Cbest, Gbest) shortest.
Parameter SelectionFind a pair (C, G) tomake the error ratelowest.
Estimate function
http://datamining.xmu.edu.cn Page: 33 of 38
1 1k k kid id idx x v
1 2( , , , )Ti i i iNX x x x
1 2( , , , )Ti i i iNV v v v
1 2( , , , )Ti i i iNP p p p
1 2( , , , )Tg g g gNP p p p
Position of Particle i :
Speed:
Particle i best:
Global best:
Update rule:
kV
kX pBestX
gBestX
1kX
1kV
11 2( ) ( ) ( ) ( )k k k k
id id id id gd idv w v c rand p x c rand p x
max minmax
max
w ww w iter
iter
Update position
Update speed
Update weight
http://datamining.xmu.edu.cn Page: 34 of 38
•Max Iteration(20)
•threshold (0.03)
•Max dead-stop times(10)
Stop criterion
Algorithm const variables
•Dimension (M = 2)
•Number of Particles (N = 20-50)
•Space scope (0<X[i]<1024, 0<i<M)
•Max speed
•Speedup factor
max maxd dv k x 0.1 0.2k
1c 2c = 2 =
http://datamining.xmu.edu.cn Page: 35 of 38
Figure too small?Click here
Begin
Init Swarm, Let k=0, i=0
Calculate Score for particle i
Score(i) > Pi
YesPi = Score(i)
Yes
Update Xi+1, Vi+1
Satisfy stop criteria
i = i+1
No
Over
Yes
Output Pg
No
i < N
Update Global Best Pg
k = k+1
Update Wi
No
http://datamining.xmu.edu.cn Page: 36 of 38
Example
There is a problem.
http://datamining.xmu.edu.cn Page: 37 of 38
Discussion
http://datamining.xmu.edu.cn Page: 38 of 38
Thank you for your attention!