max-margin additive classifiers for detection
DESCRIPTION
Max-Margin Additive Classifiers for Detection. Subhransu Maji & Alexander Berg University of California at Berkeley Columbia University ICCV 2009, Kyoto, Japan. Accuracy vs. Evaluation Time for SVM Classifiers. Non-linear Kernel. Evaluation time. Linear Kernel. Accuracy. - PowerPoint PPT PresentationTRANSCRIPT
Max-Margin Additive Classifiers for Detection
Subhransu Maji & Alexander BergUniversity of California at Berkeley
Columbia UniversityICCV 2009, Kyoto, Japan
Accuracy vs. Evaluation Timefor SVM Classifiers
Accuracy
Eva
luat
ion
time
Non-linear Kernel
Linear Kernel
Accuracy vs. Evaluation Timefor SVM Classifiers
Accuracy
Eva
luat
ion
time
Our CVPR 08
Non-linear Kernel
Linear Kernel
Additive Kernel
Accuracy
Eva
luat
ion
time
Our CVPR 08
Accuracy vs. Evaluation Timefor SVM Classifiers
Non-linear Kernel
Linear Kernel
Additive Kernel
Additive Kernel
Accuracy
Eva
luat
ion
time
Our CVPR 08
Accuracy vs. Evaluation Timefor SVM Classifiers
Non-linear Kernel
Linear Kernel
Accuracy vs. Evaluation Timefor SVM Classifiers
Accuracy
Eva
luat
ion
time
Our CVPR 08
Made it possible to use SVMs with additive kernels for detection.
Non-linear Kernel
Additive KernelLinear Kernel
Additive Kernel
Additive Classifiers
• Much work already uses them!– SVMs with additive kernels are additive classifiers
• Histogram based kernels– Histogram intersection, chi-squared kernel
– Pyramid Match Kernel (Grauman & Darell, ICCV’05)– Spatial Pyramid Match Kernel (Lazebnik et.al., CVPR’06)– ….
Accuracy vs. Training Timefor SVM Classifiers
Linear Kernel
Accuracy
Trai
ning
tim
e
Non-linear
Accuracy vs. Training Timefor SVM Classifiers
Accuracy
Trai
ning
tim
e
Linear <=1990s
Non-linear
Accuracy vs. Training Timefor SVM Classifiers
Accuracy
Trai
ning
tim
e
TodayLinear
Non-linear
Eg. Cutting Plane, Stoc. Gradient Descend, Dual Coordinate Descend
Accuracy vs. Training Timefor SVM Classifiers
Accuracy
Trai
ning
tim
e
Linear
Our CVPR 08
Additive
Non-linear
Accuracy vs. Training Timefor SVM Classifiers
Accuracy
Trai
ning
tim
e
Linear
Our CVPR 08
✗
Non-linear
Additive
Accuracy vs. Training Timefor SVM Classifiers
Accuracy
Trai
ning
tim
e
Linear
This Paper
Non-linear
Additive
Accuracy vs. Training Timefor SVM Classifiers
Linear
Accuracy
Trai
ning
tim
e
This Paper
Makes it possible to train additive classifiers very fast.
Non-linear
Additive
Summary
• Additive classifiers are widely used and can provide better accuracy than linear
• Our CVPR 08: SVMs with additive kernels are additive classifiers and can be evaluated in O(#dim) -- same as linear.
• This work: additive classifiers can be trained directly as efficiently (up to a small constant) as the best approaches for training linear classifiers.
Additive Kernel SVM
Our Additive Classifier
Linear SVM
Time Train 1000 Test 1000
Train 10Test 1
Train 10Test 1
Accuracy 95 % 94 % 82 %
An example
Support Vector Machines
Kernel Function• Inner Product in the embedded space• Can learn non-linear boundaries in input space
Classification Function
Kernel Trick
Input Space Embedded Space
Embeddings…
• These embeddings can be high dimensional (even infinite)
• Our approach is based on embeddings that approximate kernels.
• We’d like this to be as accurate as possible• We are going to use fast linear classifier training
algorithms on the so sparseness is important.
Key Idea: Embedding an Additive Kernel
• Additive Kernels are easy to embed, just embed each dimension independently
• Linear Embedding for min Kernel for integers
• For non integers can approximate by quantizing
Issues: Embedding Error
• Quantization leads to large errors
• Better encoding
xy
Issues: Sparsity• Represent with sparse values
• Linear SVM objective (solve with LIBLINEAR):
• Encoded SVM objective (not practical):
Linear vs. Encoded SVMs
Linear vs. Encoded SVMs
• Linear SVM objective (solve with LIBLINEAR):
• Encoded SVM modified (custom solver):
Encourages smooth functionsClosely approximates min kernel SVMCustom solver : PWLSGD (see paper)
• Linear SVM objective (solve with LIBLINEAR):
• Encoded SVM objective (solve with LIBLINEAR) :
Linear vs. Encoded SVMs
linear piecewise linear IKSVM
I ✔ ✔
✔
✔ ✔
Additive Classifier Choices
Regularization
Encoding
linear piecewise linear IKSVM
I ✔ ✔
✔
✔ ✔
Additive Classifier Choices
Evaluation times are similar
Regularization
Encoding
linear piecewise linear IKSVM
I ✔ ✔
✔
✔ ✔
Additive Classifier Choices
Evaluation times are similar
Regularization
Encoding
linear piecewise linear IKSVM
I ✔ ✔
✔
✔ ✔
Additive Classifier Choices
Few lines of code + standard solverEg. LIBLINEAR
Standard solverEg. LIBSVM
Regularization
Encoding
linear piecewise linear IKSVM
I ✔ ✔
✔
✔ ✔
Additive Classifier Choices
Custom solver
Regularization
Encoding
linear piecewise linear IKSVM
I
Additive Classifier Choices
Classifier Notations
Regularization
Encoding
Experiments
• “Small” Scale: Caltech 101 (Fei-Fei, et.al.)
• “Medium” Scale: DC Pedestrians (Munder & Gavrila)
• “Large” Scale : INRIA Pedestrians (Dalal & Triggs)
Experiment : DC Pedestrians
20,000 features, 656 dimensional100 bins for encoding6-fold cross validation
100x fastertraining time ~ linear SVMaccuracy ~ kernel SVM
(1.89s, 72.98%)
(2.98s, 85.71%)
(1.86s, 88.80%)
(3.18s, 89.25%)
(363s, 89.05%)
Experiment : Caltech 101
30 training examples per category100 bins for encoding
Pyramid HOG + Spatial Pyramid Match Kernel
(41s, 46.15%)
(2687s, 56.49%)
(291s, 55.35%)
(102s, 54.8%)
(90s, 51.64%)
10x fasterSmall loss in accuracy
Experiment : INRIA Pedestrians
SPHOG: 39,000 features, 2268 dimensional 100 bins for encodingCross Validation Plots
(20s, 0.82)
(27s, 0.88)
(140 mins, 0.95)(76s, 0.94)
(122s, 0.85)
300x fastertraining time ~ linear SVM
accuracy ~ kernel SVMtrains the detector in < 2 mins
Experiment : INRIA Pedestrians
SPHOG: 39,000 features, 2268 dimensional 100 bins for encodingCross Validation Plots
300x fastertraining time ~ linear SVM
accuracy ~ kernel SVMtrains the detector in < 2 mins
Take Home Messages
• Additive models are practical for large scale data• Can be trained discriminatively:
– Poor man’s version : encode + Linear SVM Solver– Middle man’s version : encode + Custom Solver– Rich man’s version : Min Kernel SVM
• Embedding only Approximates kernels, leads to small loss in accuracy but up to 100x speedup in training time
• Everyone should use: see code on our websites– Fast IKSVM from CVPR’08, Encoded SVMs, etc
Thank You