http://datamining.xmu.edu.cn page: 1 of 38 support vector machine 李旭斌 (li xubin)...

http://datamining.xmu.edu.cn Page: 1 of 38

Support Vector Machine

李旭斌 (LI Xubin)[email protected]

@Data mining Lab.6/19/2012


No theory, Just use

Structural risk minimization

VC dimension

hyperplane

Maximum Margin Classifier

Bla,bla….

Paper: What is a support vector machine?

Kernel function

Theory is so complicated…


What can it do?

Main usage:Classification: C-SVC, nu-SVCRegression: epsilon-SVR, nu-SVRDistribution estimation: one-class SVM

Other:clustering


But, we have many software with friendly interface.


Who can achieve SVM?

libSVMJava, C, R, MATLAB, Python, Perl, C#...CUDA!Hadoop(Mahout)!

WEKA Weka-ParallelMATLAB SVM ToolboxSpiderSVM in RGPU-accelerated LIBSVM


Examples for Machine Learning Algorithms


ClassificationSVM


RegressionSVR


ClusteringK-means

Shortcuts from MLDemos.


Let’s back to libSVM


Format of inputThe format of training and testing data file is:<label> <index1>:<value1> <index2>:<value2> .....Each line contains an instance and is ended by a '\n' character. Forclassification, <label> is an integer indicating the class label(multi-class is supported). For regression, <label> is the targetvalue which can be any real number. For one-class SVM, it's not usedso can be any number. The pair <index>:<value> gives a feature(attribute) value: <index> is an integer starting from 1 and <value>is a real number.

Example:1 0:1 1:4 2:6 3:11 0:2 1:6 2:8 3:0 0 0:3 1:1 2:0 3:1


Parameters

Usage: svm-train [options] training_set_file [model_file]options:-s svm_type : set type of SVM (default 0)

0 -- C-SVC1 -- nu-SVC2 -- one-class SVM3 -- epsilon-SVR4 -- nu-SVR

-t kernel_type : set type of kernel function (default 2)0 -- linear: u'*v1 -- polynomial: (gamma*u'*v + coef0)^degree2 -- radial basis function: exp(-gamma*|u-v|^2)3 -- sigmoid: tanh(gamma*u'*v + coef0)4 -- precomputed kernel (kernel values in training_set_file)

Attention:Parametersin formula


-d degree : set degree in kernel function (default 3)-g gamma : set gamma in kernel function (default 1/num_feature

s)-r coef0 : set coef0 in kernel function (default 0)-c cost : set the parameter C of C-SVC, epsilon-SVR, and nu-SVR

(default 1)-n nu : set the parameter nu of nu-SVC, one-class SVM, and nu-S

VR (default 0.5)-p epsilon : set the epsilon in loss function of epsilon-SVR (defaul

t 0.1)-m cachesize : set cache memory size in MB (default 100)-e epsilon : set tolerance of termination criterion (default 0.001)-h shrinking : whether to use the shrinking heuristics, 0 or 1 (defa

ult 1)-b probability_estimates : whether to train a SVC or SVR model f

or probability estimates, 0 or 1 (default 0)-wi weight : set the parameter C of class i to weight*C, for C-SV

C (default 1)-v n: n-fold cross validation mode-q : quiet mode (no outputs)


nu-SVC & C-SVC

“Basically they are the same thing but with different parameters. The range of C is from zero to infinity but nu is always between [0,1]. A nice property of nu is that it is related to the ratio of support vectors and the ratio of the training error. ”


one-class SVM

Fault diagnosisTrain set is always made up of normal instances.Label: 1 (no -1)Test set contains unknown statuses (instances).Output Label: 1 or -11 : normal-1: anomalous

Anomaly detection


epsilon-SVR & nu-SVR

Paper:

LIBSVM: A Library for Support Vector Machines


Comparisonepsilon nu

epsilon nu


Related experience

Usage and grid search

Code Analysis

Chinese version of libSVM FAQ


libSVM Guide

http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf


train

svm-train.model

svm-predict test

result

Flowchart of Task

Train set and test set should been both scaled.Before that, do you reallyneed to scale them?

svm-scaletrain.scale

svm-scaletest.scale


Parameters are important!

Good parameters will build a good model.

How to get the ‘good’ parameters?

Features are important!

Model is also important!Stupid line


Example

Train Set

C=2, g=100Positive 83%Negative 85%

C=50, g=100Positive 86%Negative 91%

ROC ?Click here


Parameter Selection

Grid SearchParticle Swarm OptimizationOther AlgorithmManual try…Random? My God!!

Now, our work:Type: Classification.Goal: Find best (C, G)


Grid Search


Parallel Grid Search

SSH Commandgrid.py

Hadoop-based:使用 MapReduce 对 svm 模型进行训练


Particle Swarm Optimization(PSO)

demo

Demo can’t work? Click here


Similar Algorithms

Hill-climbing algorithmGenetic algorithmAnt colony optimizationSimulated annealing algorithm


Let’s back to PSO.

Paper: Development of Particle Swarm Optimization Algorithm


Particle Swarm Optimization

Birds hurt foodC

G

0

Distance

(Cbest, Gbest)


PSO and Parameter Selection

PSOFind a point (C, G) tomake the distance between (C, G) and (Cbest, Gbest) shortest.

Parameter SelectionFind a pair (C, G) tomake the error ratelowest.

Estimate function


1 1k k kid id idx x v

1 2( , , , )Ti i i iNX x x x

1 2( , , , )Ti i i iNV v v v

1 2( , , , )Ti i i iNP p p p

1 2( , , , )Tg g g gNP p p p

Position of Particle i :

Speed:

Particle i best:

Global best:

Update rule:

kV

kX pBestX

gBestX

1kX

1kV

11 2( ) ( ) ( ) ( )k k k k

id id id id gd idv w v c rand p x c rand p x

max minmax

max

w ww w iter

iter

Update position

Update speed

Update weight


•Max Iteration(20)

•threshold (0.03)

•Max dead-stop times(10)

Stop criterion

Algorithm const variables

•Dimension (M = 2)

•Number of Particles (N = 20-50)

•Space scope (0<X[i]<1024, 0<i<M)

•Max speed

•Speedup factor

max maxd dv k x 0.1 0.2k

1c 2c = 2 =


Figure too small?Click here

Begin

Init Swarm, Let k=0, i=0

Calculate Score for particle i

Score(i) > Pi

YesPi = Score(i)

Yes

Update Xi+1, Vi+1

Satisfy stop criteria

i = i+1

No

Over

Yes

Output Pg

No

i < N

Update Global Best Pg

k = k+1

Update Wi

No


Example

There is a problem.


Discussion


Thank you for your attention!

http://datamining.xmu.edu.cn page: 1 of 38 support vector machine 李旭斌 (li xubin)...

Documents