sparse coding and its extensions for visual recognition
DESCRIPTION
Sparse Coding and Its Extensions for Visual Recognition. Kai Yu M edia Analytics Department NEC Labs America, C upertino, CA. V isual Recognition is HOT in Computer Vision. 80 Million Tiny Images. C altech 101. I mageNet. PASCAL VOC. T he pipeline of machine visual perception. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Sparse Coding and Its Extensions for Visual Recognition](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815b63550346895dc950c9/html5/thumbnails/1.jpg)
1
Sparse Coding and Its Extensions for Visual Recognition
Kai Yu
Media Analytics DepartmentNEC Labs America, Cupertino, CA
![Page 2: Sparse Coding and Its Extensions for Visual Recognition](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815b63550346895dc950c9/html5/thumbnails/2.jpg)
Visual Recognition is HOT in Computer Vision
04/22/23 2
Caltech 101
PASCAL VOC
80 Million Tiny Images
ImageNet
![Page 3: Sparse Coding and Its Extensions for Visual Recognition](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815b63550346895dc950c9/html5/thumbnails/3.jpg)
The pipeline of machine visual perception
04/22/23 3
Low-level sensing
Pre-processing
Feature extract.
Feature selection
Inference: prediction, recognition
• Most critical for accuracy• Account for most of the computation• Most time-consuming in development cycle• Often hand-craft in practice
Most Efforts in Machine Learning
![Page 4: Sparse Coding and Its Extensions for Visual Recognition](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815b63550346895dc950c9/html5/thumbnails/4.jpg)
Computer vision features
SIFT Spin image
HoG RIFT
Slide Credit: Andrew Ng
GLOH
![Page 5: Sparse Coding and Its Extensions for Visual Recognition](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815b63550346895dc950c9/html5/thumbnails/5.jpg)
Learning everything from data
04/22/23 5
Low-level sensing
Pre-processing
Feature extract.
Feature selection
Inference: prediction, recognition
Machine Learning
Machine Learning
![Page 6: Sparse Coding and Its Extensions for Visual Recognition](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815b63550346895dc950c9/html5/thumbnails/6.jpg)
BoW + SPM Kernel
04/22/23 6
• Combining multiple features, this method had been the state-of-the-art on Caltech-101, PASCAL, 15 Scene Categories, …
Figure credit: Fei-Fei Li, Svetlana Lazebnik
Bag-of-visual-words representation (BoW) based on vector quantization (VQ)
Spatial pyramid matching (SPM) kernel
![Page 7: Sparse Coding and Its Extensions for Visual Recognition](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815b63550346895dc950c9/html5/thumbnails/7.jpg)
Winning Method in PASCAL VOC before 2009
04/22/23
Multiple Feature Sampling Methods
Multiple Visual Descriptors
VQ Coding, Histogram,
SPM Nonlinear SVM
7
![Page 8: Sparse Coding and Its Extensions for Visual Recognition](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815b63550346895dc950c9/html5/thumbnails/8.jpg)
Convolution Neural Networks
8
• The architectures of some successful methods are not so much different from CNNs
Conv. Filtering Pooling Conv. Filtering Pooling
![Page 9: Sparse Coding and Its Extensions for Visual Recognition](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815b63550346895dc950c9/html5/thumbnails/9.jpg)
BoW+SPM: the same architecture
9
e.g, SIFT, HOG
VQ Coding Average Pooling (obtain histogram)
Nonlinear SVM
Local Gradients Pooling
Observations: • Nonlinear SVM is not scalable• VQ coding may be too coarse• Average pooling is not optimal• Why not learn the whole thing?
![Page 10: Sparse Coding and Its Extensions for Visual Recognition](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815b63550346895dc950c9/html5/thumbnails/10.jpg)
Develop better methods
10
Better Coding Better Pooling Scalable Linear
Classifier
Better Coding Better Pooling
![Page 11: Sparse Coding and Its Extensions for Visual Recognition](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815b63550346895dc950c9/html5/thumbnails/11.jpg)
Sparse Coding
04/22/23 11
Sparse coding (Olshausen & Field,1996). Originally developed to explain early visual processing in the brain (edge detection).
Training: given a set of random patches x, learning a dictionary of bases [Φ1, Φ2, …]
Coding: for data vector x, solve LASSO to find the sparse coefficient vector a
![Page 12: Sparse Coding and Its Extensions for Visual Recognition](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815b63550346895dc950c9/html5/thumbnails/12.jpg)
Sparse Coding Example Natural Images Learned bases (1 , …, 64): “Edges”
50 100 150 200 250 300 350 400 450 500
50
100
150
200
250
300
350
400
450
500
50 100 150 200 250 300 350 400 450 500
50
100
150
200
250
300
350
400
450
500
50 100 150 200 250 300 350 400 450 500
50
100
150
200
250
300
350
400
450
500
0.8 * + 0.3 * + 0.5 *
x 0.8 * 36 + 0.3 * 42
+ 0.5 * 63
[a1, …, a64] = [0, 0, …, 0, 0.8, 0, …, 0, 0.3, 0, …, 0, 0.5, 0] (feature representation)
Test example
Compact & easily interpretableSlide credit: Andrew Ng
![Page 13: Sparse Coding and Its Extensions for Visual Recognition](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815b63550346895dc950c9/html5/thumbnails/13.jpg)
Testing:What is this?
Motorcycles Not motorcycles
Unlabeled images
…
[Raina, Lee, Battle, Packer & Ng, ICML 07]Self-taught Learning
Testing:What is this?
Slide credit: Andrew Ng
![Page 14: Sparse Coding and Its Extensions for Visual Recognition](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815b63550346895dc950c9/html5/thumbnails/14.jpg)
Classification Result on Caltech 101
04/22/23 14
64% SIFT VQ + Nonlinear SVM
50%Pixel Sparse Coding + Linear SVM
9K images, 101 classes
![Page 15: Sparse Coding and Its Extensions for Visual Recognition](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815b63550346895dc950c9/html5/thumbnails/15.jpg)
15
Sparse Coding Max Pooling Scalable Linear
Classifier
Local Gradients Pooling
e.g, SIFT, HOG
Sparse Coding on SIFT [Yang, Yu, Gong & Huang, CVPR09]
![Page 16: Sparse Coding and Its Extensions for Visual Recognition](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815b63550346895dc950c9/html5/thumbnails/16.jpg)
04/22/23 16
64% SIFT VQ + Nonlinear SVM
73%SIFT Sparse Coding + Linear SVM
Caltech-101
Sparse Coding on SIFT [Yang, Yu, Gong & Huang, CVPR09]
![Page 17: Sparse Coding and Its Extensions for Visual Recognition](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815b63550346895dc950c9/html5/thumbnails/17.jpg)
What we have learned?
17
Sparse Coding Max Pooling Scalable Linear
Classifier
Local Gradients Pooling
1. Sparse coding is a useful stuff (why?)2. Hierarchical architecture is needed
e.g, SIFT, HOG
![Page 18: Sparse Coding and Its Extensions for Visual Recognition](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815b63550346895dc950c9/html5/thumbnails/18.jpg)
MNIST Experiments
04/22/23 18
Error: 4.54%
• When SC achieves the best classification accuracy, the learned bases are like digits – each basis has a clear local class association.
Error: 3.75% Error: 2.64%
![Page 19: Sparse Coding and Its Extensions for Visual Recognition](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815b63550346895dc950c9/html5/thumbnails/19.jpg)
Distribution of coefficient (SIFT, Caltech101)
04/22/23 19
Neighbor bases tend to get nonzero coefficients
![Page 20: Sparse Coding and Its Extensions for Visual Recognition](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815b63550346895dc950c9/html5/thumbnails/20.jpg)
04/22/23 20
Interpretation 2Geometry of data manifold
• Each basis an “anchor point”• Sparsity is induced by locality: each datum is a linear combination of neighbor anchors.
Interpretation 1Discover subspaces
• Each basis is a “direction”• Sparsity: each datum is a linear combination of only several bases.• Related to topic model
![Page 21: Sparse Coding and Its Extensions for Visual Recognition](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815b63550346895dc950c9/html5/thumbnails/21.jpg)
A Function Approximation View to Coding
04/22/23 21
• Setting: f(x) is a nonlinear feature extraction function on image patches x
• Coding: nonlinear mapping x a
typically, a is high-dim & sparse
• Nonlinear Learning: f(x) = <w, a>
A coding scheme is good if it helps learning f(x)
![Page 22: Sparse Coding and Its Extensions for Visual Recognition](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815b63550346895dc950c9/html5/thumbnails/22.jpg)
04/22/23 22
A Function Approximation View to Coding – The General Formulation
Function Approx. Error
≤ An unsupervised learning objective
![Page 23: Sparse Coding and Its Extensions for Visual Recognition](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815b63550346895dc950c9/html5/thumbnails/23.jpg)
Local Coordinate Coding (LCC)
04/22/23 23
• Dictionary Learning: k-means (or hierarchical k-means)
• Coding for x, to obtain its sparse representation a
Step 1 – ensure locality: find the K nearest bases
Step 2 – ensure low coding error:
Yu, Zhang & Gong, NIPS 09Wang, Yang, Yu, Lv, Huang CVPR 10
![Page 24: Sparse Coding and Its Extensions for Visual Recognition](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815b63550346895dc950c9/html5/thumbnails/24.jpg)
Super-Vector Coding (SVC)
04/22/23 24
• Dictionary Learning: k-means (or hierarchical k-means)
• Coding for x, to obtain its sparse representation a
Step 1 – find the nearest basis of x, obtain its VQ coding
e.g. [0, 0, 1, 0, …]
Step 2 – form super vector coding:
e.g. [0, 0, 1, 0, …, 0, 0, (x-m3), 0 ,… ]
Zhou, Yu, Zhang, and Huang, ECCV 10
Zero-order Local tangent
![Page 25: Sparse Coding and Its Extensions for Visual Recognition](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815b63550346895dc950c9/html5/thumbnails/25.jpg)
Function Approximation based on LCC
04/22/23 25
data points bases
locally linear
Yu, Zhang, Gong, NIPS 10
![Page 26: Sparse Coding and Its Extensions for Visual Recognition](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815b63550346895dc950c9/html5/thumbnails/26.jpg)
Function Approximation based on SVC
data pointscluster centers
Piecewise local linear (first-order)Local tangent
Zhou, Yu, Zhang, and Huang, ECCV 10
![Page 27: Sparse Coding and Its Extensions for Visual Recognition](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815b63550346895dc950c9/html5/thumbnails/27.jpg)
PASCAL VOC Challenge 2009
04/22/23 27
OursBest of
Other Teams DifferenceClasses
No.1 for 18 of 20 categories
We used only HOG feature on gray images
![Page 28: Sparse Coding and Its Extensions for Visual Recognition](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815b63550346895dc950c9/html5/thumbnails/28.jpg)
ImageNet Challenge 2010
04/22/23 28
~40% VQ + Intersection Kernel
64%~73%Various Coding Methods + Linear SVM
1.4 million images, 1000 classes, top5 hit rate
50%Classification accuracy
![Page 29: Sparse Coding and Its Extensions for Visual Recognition](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815b63550346895dc950c9/html5/thumbnails/29.jpg)
Hierarchical sparse coding
29
Conv. Filtering Pooling Conv. Filtering Pooling
Learning from unlabeled data
Yu, Lin, & Lafferty, CVPR 11
![Page 30: Sparse Coding and Its Extensions for Visual Recognition](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815b63550346895dc950c9/html5/thumbnails/30.jpg)
A two-layer sparse coding formulation
04/22/23 30
![Page 31: Sparse Coding and Its Extensions for Visual Recognition](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815b63550346895dc950c9/html5/thumbnails/31.jpg)
MNIST Results -- classificationMNIST Results -- classification
HSC vs. CNN: HSC provide even better performance than CNN more amazingly, HSC learns features in unsupervised
manner!31
![Page 32: Sparse Coding and Its Extensions for Visual Recognition](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815b63550346895dc950c9/html5/thumbnails/32.jpg)
MNIST results MNIST results -- effect of hierarchical learning -- effect of hierarchical learning
Comparing the Fisher score of HSC and SC
Discriminative power: is significantly improved by HSC although HSC is unsupervised coding 32
![Page 33: Sparse Coding and Its Extensions for Visual Recognition](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815b63550346895dc950c9/html5/thumbnails/33.jpg)
MNIST results MNIST results -- learned codebook-- learned codebook
33
One dimension in the second layer: invariance to translation, rotation, and deformation
![Page 34: Sparse Coding and Its Extensions for Visual Recognition](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815b63550346895dc950c9/html5/thumbnails/34.jpg)
Caltech101 results Caltech101 results -- classification-- classification
Learned descriptor: performs slightly better than SIFT + SC
34
![Page 35: Sparse Coding and Its Extensions for Visual Recognition](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815b63550346895dc950c9/html5/thumbnails/35.jpg)
Conclusion and Future Work
“function approximation” view to derive novel sparse coding methods.
Locality – one way to achieve sparsity and it’s really useful. But we need deeper understanding of the feature learning methods
Interesting directions– Hierarchical coding – Deep Learning (many papers now!)– Faster methods for sparse coding (e.g. from LeCun’s group)– Learning features from a richer structure of data, e.g., video
(learning invariance to out plane rotation)
![Page 36: Sparse Coding and Its Extensions for Visual Recognition](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815b63550346895dc950c9/html5/thumbnails/36.jpg)
References
04/22/23 37
• Learning Image Representations from Pixel Level via Hierarchical Sparse Coding, Kai Yu, Yuanqing Lin, John Lafferty. CVPR 2011
• Large-scale Image Classification: Fast Feature Extraction and SVM Training, Yuanqing Lin, Fengjun Lv, Liangliang Cao, Shenghuo Zhu, Ming Yang, Timothee Cour, Thomas Huang, Kai Yu in CVPR 2011
• ECCV 2010 Tutorial, Kai Yu, Andrew Ng (with links to some source codes)
• Deep Coding Networks, Yuanqing Lin, Tong Zhang, Shenghuo Zhu, Kai Yu. In NIPS 2010.
• Image Classification using Super-Vector Coding of Local Image Descriptors, Xi Zhou, Kai Yu, Tong Zhang, and Thomas Huang. In ECCV 2010.
• Efficient Highly Over-Complete Sparse Coding using a Mixture Model, Jianchao Yang, Kai Yu, and Thomas Huang. In ECCV 2010.
• Improved Local Coordinate Coding using Local Tangents, Kai Yu and Tong Zhang. In ICML 2010.
• Supervised translation-invariant sparse coding, Jianchao Yang, Kai Yu, and Thomas Huang, In CVPR 2010
• Learning locality-constrained linear coding for image classification, Jingjun Wang, Jianchao Yang, Kai Yu, Fengjun Lv, Thomas Huang. In CVPR 2010.
• Nonlinear learning using local coordinate coding, Kai Yu, Tong Zhang, and Yihong Gong. In NIPS 2009.
• Linear spatial pyramid matching using sparse coding for image classification, Jianchao Yang, Kai Yu, Yihong Gong, and Thomas Huang. In CVPR 2009.