towards scalable support vector machines using squashing author:dmitry pavlov, darya chudova,...

Towards Scalable Support Vector Machines Using Squashing

• Author:Dmitry Pavlov, Darya Chudova,

• Padhraic Smyth

• Info. And Comp. Science

• University of California

• Advisor:Dr. Hsu.

• Reporter:Hung Ching-Wen

Outline

• 1. Motivation

• 2. Objective

• 3. Introduction

• 4. SVM

• 5. Squashing for SVM

• 6.EXPERIMENTS

• 7. conclusion

Motivation

• SVM provide classification model with strong theoretical foundation and excellent empirical performance.

• But the major drawback of SVM is the necessity to solve a large-scale quadratic programming problem.

Objective

• This paper combines likelihooh-based squashing with a probabilistic formulation of SVMs, enabling fast training on squashed data sets.

Introduction

• The applicability of SVMs to large datasets is limited ,because the high computational cost.

• Speed-up training algorithms:• Chunking,Osuna’s decomposition method S

MO• They can accelerate the training, but cannot

scale well with the size of the training data.

Introduction

• Reducing the computational cost :

• Sampling

• Boosting

• Squashing(DuMouchel et. al.,Madigan et. al.)

• 本文作者提出 Squashing-SMO,以解決 SVM的高計算成本問題

SVM

• Training data:D=｛ (xi,yi):i=1,…,N｝• xi is a vector, yi=+1,-1

• In linear SVM :The linear separating classify y=<w,x>+b

• w is the normal vector

• b is the intercept of the hyperplane

SVM(non-separable)

SVM(a prior on w)

Squashing for SVM

• (1).Select a probabilistic model• P((X,Y) θ)∣• (2).Our objective is to find mle θML

Squashing for SVM

• (3). Training data:D=｛ (xi,yi):i=1,…,N｝ can be grouped into Nc groups

• (Xc,Yc)sq:The squashed data point placed at the cluster C

• βc :the wieght

Squashing for SVM

• If take the prior of w is • P(w) ～ exp(- w∥ ∥2)

Squashing for SVM

• (4).The optimization model for the squashed data:

Squashing for SVM

• Important design issues for the squashing algorithm:

• (1).the choice of the number and location of the squashing points

• (2).to sample the values of w from the prior p(w)• (3).b can be made from the optimization model • (4).fixed w,b ,we evaluate the likelihood of trainin

g point, and repeat the selection procedure L times(L is length)

EXPERIMENTS

• experiment datasets:

• Synthetic data

• UCI machine learning

• UCI KKD repositories

EXPERIMENTS

• Evalute:

• Full-SMO,Srs-SMO(simple random simple),squash-SMO,boost-SMO

• Run:over 100 runs

• Performance:

• Misclassification rate ,learning time ,the memory

EXPERIMENTS(Results on Synthetic data)

• (Wf,bf):estimated by full-SMO• (Ws,bs): :estimated by squashed or sampled data

EXPERIMENTS(Results on Synthetic data)

EXPERIMENTS(Results on Benchmark data)

EXPERIMENTS(Results on Benchmark data

EXPERIMENTS(Results on Benchmark data)

conclusion

• 1.we describe how the use of squashing make the training of SVM applicable to large datasets.

• 2.comparison with full-SMO show squash-SMO and boost-SMO are near-optimal performance with much lower time and memory.

• 3.srs-SMO has a higher misclassification rate.• 4.squash-SMO and boost-SMO can tune paramete

r in cross-validation ,it is impossible to full-SMO

conclusion

• 5.although the performance of squash-SMO and boost-SMO is similar on the benchmark problems.

• 6. squash-SMO can offer a better interpretability of model and can be expected to run faster than SMO that do not reside in the memory.

opinion

• It is a good ideal that the author describe how the use of squashing make the training of SVM applicable to large datasets.

• 我們可以根據資料性質來改變 w的 prior distribution, 例如指數分配 ,Log-normal,或用無母數方法去做

towards scalable support vector machines using squashing author:dmitry pavlov, darya chudova,...

Documents