towards scalable support vector machines using squashing author:dmitry pavlov, darya chudova,...
TRANSCRIPT
![Page 1: Towards Scalable Support Vector Machines Using Squashing Author:Dmitry Pavlov, Darya Chudova, Padhraic Smyth Info. And Comp. Science University of California](https://reader031.vdocuments.us/reader031/viewer/2022013101/5697bfd91a28abf838caf675/html5/thumbnails/1.jpg)
Towards Scalable Support Vector Machines Using Squashing
• Author:Dmitry Pavlov, Darya Chudova,
• Padhraic Smyth
• Info. And Comp. Science
• University of California
• Advisor:Dr. Hsu.
• Reporter:Hung Ching-Wen
![Page 2: Towards Scalable Support Vector Machines Using Squashing Author:Dmitry Pavlov, Darya Chudova, Padhraic Smyth Info. And Comp. Science University of California](https://reader031.vdocuments.us/reader031/viewer/2022013101/5697bfd91a28abf838caf675/html5/thumbnails/2.jpg)
Outline
• 1. Motivation
• 2. Objective
• 3. Introduction
• 4. SVM
• 5. Squashing for SVM
• 6.EXPERIMENTS
• 7. conclusion
![Page 3: Towards Scalable Support Vector Machines Using Squashing Author:Dmitry Pavlov, Darya Chudova, Padhraic Smyth Info. And Comp. Science University of California](https://reader031.vdocuments.us/reader031/viewer/2022013101/5697bfd91a28abf838caf675/html5/thumbnails/3.jpg)
Motivation
• SVM provide classification model with strong theoretical foundation and excellent empirical performance.
• But the major drawback of SVM is the necessity to solve a large-scale quadratic programming problem.
![Page 4: Towards Scalable Support Vector Machines Using Squashing Author:Dmitry Pavlov, Darya Chudova, Padhraic Smyth Info. And Comp. Science University of California](https://reader031.vdocuments.us/reader031/viewer/2022013101/5697bfd91a28abf838caf675/html5/thumbnails/4.jpg)
Objective
• This paper combines likelihooh-based squashing with a probabilistic formulation of SVMs, enabling fast training on squashed data sets.
![Page 5: Towards Scalable Support Vector Machines Using Squashing Author:Dmitry Pavlov, Darya Chudova, Padhraic Smyth Info. And Comp. Science University of California](https://reader031.vdocuments.us/reader031/viewer/2022013101/5697bfd91a28abf838caf675/html5/thumbnails/5.jpg)
Introduction
• The applicability of SVMs to large datasets is limited ,because the high computational cost.
• Speed-up training algorithms:• Chunking,Osuna’s decomposition method S
MO• They can accelerate the training, but cannot
scale well with the size of the training data.
![Page 6: Towards Scalable Support Vector Machines Using Squashing Author:Dmitry Pavlov, Darya Chudova, Padhraic Smyth Info. And Comp. Science University of California](https://reader031.vdocuments.us/reader031/viewer/2022013101/5697bfd91a28abf838caf675/html5/thumbnails/6.jpg)
Introduction
• Reducing the computational cost :
• Sampling
• Boosting
• Squashing(DuMouchel et. al.,Madigan et. al.)
• 本文作者提出 Squashing-SMO,以解決 SVM的高計算成本問題
![Page 7: Towards Scalable Support Vector Machines Using Squashing Author:Dmitry Pavlov, Darya Chudova, Padhraic Smyth Info. And Comp. Science University of California](https://reader031.vdocuments.us/reader031/viewer/2022013101/5697bfd91a28abf838caf675/html5/thumbnails/7.jpg)
SVM
• Training data:D={ (xi,yi):i=1,…,N}• xi is a vector, yi=+1,-1
• In linear SVM :The linear separating classify y=<w,x>+b
• w is the normal vector
• b is the intercept of the hyperplane
![Page 8: Towards Scalable Support Vector Machines Using Squashing Author:Dmitry Pavlov, Darya Chudova, Padhraic Smyth Info. And Comp. Science University of California](https://reader031.vdocuments.us/reader031/viewer/2022013101/5697bfd91a28abf838caf675/html5/thumbnails/8.jpg)
SVM(non-separable)
![Page 9: Towards Scalable Support Vector Machines Using Squashing Author:Dmitry Pavlov, Darya Chudova, Padhraic Smyth Info. And Comp. Science University of California](https://reader031.vdocuments.us/reader031/viewer/2022013101/5697bfd91a28abf838caf675/html5/thumbnails/9.jpg)
SVM(a prior on w)
![Page 10: Towards Scalable Support Vector Machines Using Squashing Author:Dmitry Pavlov, Darya Chudova, Padhraic Smyth Info. And Comp. Science University of California](https://reader031.vdocuments.us/reader031/viewer/2022013101/5697bfd91a28abf838caf675/html5/thumbnails/10.jpg)
Squashing for SVM
• (1).Select a probabilistic model• P((X,Y) θ)∣• (2).Our objective is to find mle θML
![Page 11: Towards Scalable Support Vector Machines Using Squashing Author:Dmitry Pavlov, Darya Chudova, Padhraic Smyth Info. And Comp. Science University of California](https://reader031.vdocuments.us/reader031/viewer/2022013101/5697bfd91a28abf838caf675/html5/thumbnails/11.jpg)
Squashing for SVM
• (3). Training data:D={ (xi,yi):i=1,…,N} can be grouped into Nc groups
• (Xc,Yc)sq:The squashed data point placed at the cluster C
• βc :the wieght
![Page 12: Towards Scalable Support Vector Machines Using Squashing Author:Dmitry Pavlov, Darya Chudova, Padhraic Smyth Info. And Comp. Science University of California](https://reader031.vdocuments.us/reader031/viewer/2022013101/5697bfd91a28abf838caf675/html5/thumbnails/12.jpg)
Squashing for SVM
• If take the prior of w is • P(w) ~ exp(- w∥ ∥2)
![Page 13: Towards Scalable Support Vector Machines Using Squashing Author:Dmitry Pavlov, Darya Chudova, Padhraic Smyth Info. And Comp. Science University of California](https://reader031.vdocuments.us/reader031/viewer/2022013101/5697bfd91a28abf838caf675/html5/thumbnails/13.jpg)
Squashing for SVM
• (4).The optimization model for the squashed data:
![Page 14: Towards Scalable Support Vector Machines Using Squashing Author:Dmitry Pavlov, Darya Chudova, Padhraic Smyth Info. And Comp. Science University of California](https://reader031.vdocuments.us/reader031/viewer/2022013101/5697bfd91a28abf838caf675/html5/thumbnails/14.jpg)
Squashing for SVM
• Important design issues for the squashing algorithm:
• (1).the choice of the number and location of the squashing points
• (2).to sample the values of w from the prior p(w)• (3).b can be made from the optimization model • (4).fixed w,b ,we evaluate the likelihood of trainin
g point, and repeat the selection procedure L times(L is length)
![Page 15: Towards Scalable Support Vector Machines Using Squashing Author:Dmitry Pavlov, Darya Chudova, Padhraic Smyth Info. And Comp. Science University of California](https://reader031.vdocuments.us/reader031/viewer/2022013101/5697bfd91a28abf838caf675/html5/thumbnails/15.jpg)
EXPERIMENTS
• experiment datasets:
• Synthetic data
• UCI machine learning
• UCI KKD repositories
![Page 16: Towards Scalable Support Vector Machines Using Squashing Author:Dmitry Pavlov, Darya Chudova, Padhraic Smyth Info. And Comp. Science University of California](https://reader031.vdocuments.us/reader031/viewer/2022013101/5697bfd91a28abf838caf675/html5/thumbnails/16.jpg)
EXPERIMENTS
• Evalute:
• Full-SMO,Srs-SMO(simple random simple),squash-SMO,boost-SMO
• Run:over 100 runs
• Performance:
• Misclassification rate ,learning time ,the memory
![Page 17: Towards Scalable Support Vector Machines Using Squashing Author:Dmitry Pavlov, Darya Chudova, Padhraic Smyth Info. And Comp. Science University of California](https://reader031.vdocuments.us/reader031/viewer/2022013101/5697bfd91a28abf838caf675/html5/thumbnails/17.jpg)
EXPERIMENTS(Results on Synthetic data)
• (Wf,bf):estimated by full-SMO• (Ws,bs): :estimated by squashed or sampled data
![Page 18: Towards Scalable Support Vector Machines Using Squashing Author:Dmitry Pavlov, Darya Chudova, Padhraic Smyth Info. And Comp. Science University of California](https://reader031.vdocuments.us/reader031/viewer/2022013101/5697bfd91a28abf838caf675/html5/thumbnails/18.jpg)
EXPERIMENTS(Results on Synthetic data)
![Page 19: Towards Scalable Support Vector Machines Using Squashing Author:Dmitry Pavlov, Darya Chudova, Padhraic Smyth Info. And Comp. Science University of California](https://reader031.vdocuments.us/reader031/viewer/2022013101/5697bfd91a28abf838caf675/html5/thumbnails/19.jpg)
EXPERIMENTS(Results on Synthetic data)
![Page 20: Towards Scalable Support Vector Machines Using Squashing Author:Dmitry Pavlov, Darya Chudova, Padhraic Smyth Info. And Comp. Science University of California](https://reader031.vdocuments.us/reader031/viewer/2022013101/5697bfd91a28abf838caf675/html5/thumbnails/20.jpg)
EXPERIMENTS(Results on Benchmark data)
![Page 21: Towards Scalable Support Vector Machines Using Squashing Author:Dmitry Pavlov, Darya Chudova, Padhraic Smyth Info. And Comp. Science University of California](https://reader031.vdocuments.us/reader031/viewer/2022013101/5697bfd91a28abf838caf675/html5/thumbnails/21.jpg)
EXPERIMENTS(Results on Benchmark data)
![Page 22: Towards Scalable Support Vector Machines Using Squashing Author:Dmitry Pavlov, Darya Chudova, Padhraic Smyth Info. And Comp. Science University of California](https://reader031.vdocuments.us/reader031/viewer/2022013101/5697bfd91a28abf838caf675/html5/thumbnails/22.jpg)
EXPERIMENTS(Results on Benchmark data
![Page 23: Towards Scalable Support Vector Machines Using Squashing Author:Dmitry Pavlov, Darya Chudova, Padhraic Smyth Info. And Comp. Science University of California](https://reader031.vdocuments.us/reader031/viewer/2022013101/5697bfd91a28abf838caf675/html5/thumbnails/23.jpg)
EXPERIMENTS(Results on Benchmark data)
![Page 24: Towards Scalable Support Vector Machines Using Squashing Author:Dmitry Pavlov, Darya Chudova, Padhraic Smyth Info. And Comp. Science University of California](https://reader031.vdocuments.us/reader031/viewer/2022013101/5697bfd91a28abf838caf675/html5/thumbnails/24.jpg)
conclusion
• 1.we describe how the use of squashing make the training of SVM applicable to large datasets.
• 2.comparison with full-SMO show squash-SMO and boost-SMO are near-optimal performance with much lower time and memory.
• 3.srs-SMO has a higher misclassification rate.• 4.squash-SMO and boost-SMO can tune paramete
r in cross-validation ,it is impossible to full-SMO
![Page 25: Towards Scalable Support Vector Machines Using Squashing Author:Dmitry Pavlov, Darya Chudova, Padhraic Smyth Info. And Comp. Science University of California](https://reader031.vdocuments.us/reader031/viewer/2022013101/5697bfd91a28abf838caf675/html5/thumbnails/25.jpg)
conclusion
• 5.although the performance of squash-SMO and boost-SMO is similar on the benchmark problems.
• 6. squash-SMO can offer a better interpretability of model and can be expected to run faster than SMO that do not reside in the memory.
![Page 26: Towards Scalable Support Vector Machines Using Squashing Author:Dmitry Pavlov, Darya Chudova, Padhraic Smyth Info. And Comp. Science University of California](https://reader031.vdocuments.us/reader031/viewer/2022013101/5697bfd91a28abf838caf675/html5/thumbnails/26.jpg)
opinion
• It is a good ideal that the author describe how the use of squashing make the training of SVM applicable to large datasets.
• 我們可以根據資料性質來改變 w的 prior distribution, 例如指數分配 ,Log-normal,或用無母數方法去做