coding and pooling llcao -...
TRANSCRIPT
Visual Recognition And Search Columbia University, Spring 20141
EECS 6890 – Topics in Information Processing
Spring 2014, Columbia University
http://rogerioferis.com/VisualRecognitionAndSearch2014
Class 4 Feature Coding and Pooling
Liangliang Cao, Feb 20, 2014
Visual Recognition And Search Columbia University, Spring 20142
General Scene/Object Recognition
Problem
Searching enginesTraditional companies Mobile Apps
Visual Recognition And Search Columbia University, Spring 20143
Problem
http://www.vision.caltech.edu
Examples of Object Recognition Dataset
Visual Recognition And Search Columbia University, Spring 20144
http://groups.csail.mit.edu/vision/SUN/
Problem
Visual Recognition And Search Columbia University, Spring 20145
Outline
• Histogram of local features
• Bag of words model
• Sparse coding and related soft assignment
• Fisher vector and supervector
Outlines
Visual Recognition And Search Columbia University, Spring 20146
Histogram of Local Features
Visual Recognition And Search Columbia University, Spring 20147
Bag of Words Models
• Powerful local features
– DoG
– Hessian, Harris
– Dense-sampling
Recall of Last Class
Non-fixed number oflocal regions per image!
Visual Recognition And Search Columbia University, Spring 20148
Bag of Words Models
• Histograms can provide a fixed size representation of images
• Spatial pyramid/gridding can enhance histogram presentation with spatial information
Recall of Last Class (2)
Visual Recognition And Search Columbia University, Spring 20149
Bag of Words Models
Histogram of Local Features
…..
frequency
codewords dim = # codewords
Visual Recognition And Search Columbia University, Spring 201410
Bag of Words Models
Histogram of Local Features (2)
dim = #codewords x #grids
……
Visual Recognition And Search Columbia University, Spring 201411
…
Local Feature Quantization
Bag of Words Models
Slide courtesy to Fei-Fei Li
Visual Recognition And Search Columbia University, Spring 201412
Local Feature Quantization
Bag of Words Models
…
Visual Recognition And Search Columbia University, Spring 201413
Local Feature Quantization
Bag of Words Models
- Vector quantization- Dictionary learning
…
Visual Recognition And Search Columbia University, Spring 201414
Dictionary for Codewords
Histogram of Local Features
Pix
ture
court
esy t
o F
ei-F
eiLi
Visual Recognition And Search Columbia University, Spring 201415
Bag of Words Models
Most slides in this section are courtesy to Fei-Fei Li
Visual Recognition And Search Columbia University, Spring 201416
ObjectObject Bag of Bag of ‘‘wordswords’’
Visual Recognition And Search Columbia University, Spring 201417
Bag of Words Models
Of all the sensory impressions proceeding to the brain, the visual experiences are the dominant ones. Our perception of the world around us is based essentially on the messages that reach the brain from our eyes. For a long time it was thought that the retinal image was transmitted point by point to visual centers in the brain; the cerebral cortex was a movie screen, so to speak, upon which the image in the eye was projected. Through the discoveries of Hubel and Wiesel we now know that behind the origin of the visual perception in the brain there is a considerably more complicated course of events. By following the visual impulses along their path to the various cell layers of the optical cortex, Hubel and Wiesel have been able to demonstrate that the message about the image falling on the retina undergoes a step-wise analysis in a system of nerve cells stored in columns. In this system each cell has its specific function and is responsible for a specific detail in the pattern of the retinal image.
sensory, brain, visual, perception,
retinal, cerebral cortex,eye, cell, optical
nerve, imageHubel, Wiesel
China is forecasting a trade surplus of $90bn (£51bn) to $100bn this year, a threefold increase on 2004's $32bn. The Commerce Ministry said the surplus would be created by a predicted 30% jump in exports to $750bn, compared with a 18% rise in imports to $660bn. The figures are likely to further annoy the US, which has long argued that China's exports are unfairly helped by a deliberately undervalued yuan. Beijing agrees the surplus is too high, but says the yuan is only one factor. Bank of China governor Zhou Xiaochuan said the country also needed to do more to boost domestic demand so more goods stayed within the country. China increased the value of the yuan against the dollar by 2.1% in July and permitted it to trade within a narrow band, but the US wants the yuan to be allowed to trade freely. However, Beijing has made it clear that it will take its time and tread carefully before allowing the yuan to rise further in value.
China, trade, surplus, commerce,
exports, imports, US, yuan, bank, domestic,
foreign, increase, trade, value
Underlining Assumptions - Text
Visual Recognition And Search Columbia University, Spring 201418
Bag of Words Models
Underlining Assumptions - Image
Visual Recognition And Search Columbia University, Spring 201419
categorycategory
decisiondecision
learninglearning
feature detection& representation
KK--meansmeans
image representation
category modelscategory models
(and/or) classifiers(and/or) classifiers
recognitionrecognition
Visual Recognition And Search Columbia University, Spring 201420
Bag of Words Models
Borrowing Techniques from Text Classification
• PLSA
• Naïve Bayesian Model
• wn: each patch in an image
– wn = [0,0,…1,…,0,0]T
• w: a collection of all N patches in an image
– w = [w1,w2,…,wN]
• dj: the jth image in an image collection
• c: category of the image
• z: theme or topic of the patch
No
tati
on
s
Visual Recognition And Search Columbia University, Spring 201421
Hoffman, 2001
w
N
d z
D
w
N
c z
D
π
Blei et al., 2001
Probabilistic Latent Semantic Analysis (pLSA)
Latent Dirichlet Allocation (LDA)
Bag of Words Models
Visual Recognition And Search Columbia University, Spring 201422
w
N
d z
D
Bag of Words Models
Probabilistic Latent Semantic Analysis (pLSA)
“face”
Sivic et al. ICCV 2005
Visual Recognition And Search Columbia University, Spring 201423
wN
d z
D
Observed codeworddistributions
Codeword distributionsper theme (topic)
Theme distributionsper image
Slide credit: Josef Sivic
∑=
=
K
k
jkkiji dzpzwpdwp1
)|()|()|(
Bag of Words Models
Parameter estimated by EM or Gibbs sampling
Visual Recognition And Search Columbia University, Spring 201424
)|(maxarg dzpzz
=∗
Slide credit: Josef Sivic
Bag of Words Models
Recognition using pLSA
Visual Recognition And Search Columbia University, Spring 201425
w
N
c z
D
π
Latent Dirichlet Allocation (LDA)
Fei-Fei et al. ICCV 2005
“beach”
Bag of Words Models
Scene Recognition using LDA
Visual Recognition And Search Columbia University, Spring 201426
Bag of Words Models
Spatial-Coherent Latent Topic Model
Cao and Fei-Fei, ICCV 2007
Visual Recognition And Search Columbia University, Spring 201427
Bag of Words Models
Simultaneous Segmentation and Recognition
Visual Recognition And Search Columbia University, Spring 201428
But these models suffer from
- Loss of information in quantization of “visual words”
- Loss of spatial information
Bag of Words Models
Pros and Cons
Images differ from texts!
Better coding
Better pooling
Bag of Words Models are good in
- Modeling prior knowledge
- Providing intuitive interpretation
Visual Recognition And Search Columbia University, Spring 201429
Sparse Coding and Soft Quantization
Visual Recognition And Search Columbia University, Spring 201430
Soft Quantization
Hard Quantization
Visual Recognition And Search Columbia University, Spring 201431
Soft Quantization
Hard quantization
A more general but hard to solve representation
In practice we consider
Sparse Coding-Based Quantization
s.t.
Sparse coding
Visual Recognition And Search Columbia University, Spring 201432
A Famous Paper on Sparse Coding
Yang et al obtained good recognition accuracy by
combining sparse coding
with spatial pyramid
matching (ScSPM).
Yang et al, Linear Spatial Pyramid Matching using Sparse Coding for Image Classification, CVPR 2009
Sparse coding
+
spatial pyramid
Visual Recognition And Search Columbia University, Spring 201433
Sparse Coding
Hard quantization
Sparse coding based representation
Clarify One Ambiguity
s.t. Sparsest solution!
Less sparse!
1. Hard quantization (VQ) is the sparsest representation. Sparse coding is less sparse.
Visual Recognition And Search Columbia University, Spring 201434
Sparse Coding
Both hard quantization and sparse coding is applied to every patch.
To get image level representation, hard quantization uses “sum” (or called average pooling), sparse coding uses “max” (or called max pooling)
When there are a lot of patches with small codebook, the final image level representation will be non-sparse.
Clarify Another Ambiguity
Visual Recognition And Search Columbia University, Spring 201435
Sparse Coding
If “sparsity” is not the most important key,
– Sparse coding is less sparse than hard quantization
– Image level presentation may be dense after pooling local sparse codes
Then how to understand the success of ScSPM?
– Sparse coding keeps more information than hard quantization
– Sometimes sparsity is less important than locality.
Reflection
Visual Recognition And Search Columbia University, Spring 201436
Sparse Coding
Sparse coding
Locality-constrained linear coding (LLC)
Can you guess which method is faster?
Locality-constrained Linear Coding
Wang, Yang, Yu et al, Locality-constrained Linear Coding for Image Classification, CVPR 2010
Visual Recognition And Search Columbia University, Spring 201437
From Sparse Coding to LLC
• LLC is much faster than sparse coding
– Suppose the codebook size is M (e.g, M = 1K)
– Find K nearest neighbors (e.g., K =5)
– The computational complexity
• Sparse coding on M codewords: O(1000a)
• Sparse coding on K codewords: O(5a)
Faster and higher accuracy!
Locality-constrained Linear Coding
Visual Recognition And Search Columbia University, Spring 201438
From Sparse Coding to LLC
Hands-on Experience of LLC
• Could you improve Jianchao Yang’s code? (http://www.ifp.illinois.edu/~jyang29/LLC.htm)
You may know Matlab’s way of searching for top-k is terrible...
Visual Recognition And Search Columbia University, Spring 201439
From Sparse Coding to LLC
Hands-on Experience of LLC
• Could you improve Jianchao Yang’s code? (http://www.ifp.illinois.edu/~jyang29/LLC.htm)
This is regularized least square regressionDo you want to try• standard sparse coding,• regression after PCA,
• or more general subspace learning?
Visual Recognition And Search Columbia University, Spring 201440
Another Soft Quantization
Model the uncertainty across multiple codewords
Uncertainty-Based Quantization
Gemert et al, Visual Word Ambiguity, PAMI 2009
Visual Recognition And Search Columbia University, Spring 201441
Soft Quantization
Intuition of UNC
Hard quantization
Soft quantization based on uncertainty
Gemert et al, Visual Word Ambiguity, PAMI 2009
Visual Recognition And Search Columbia University, Spring 201442
Soft Quantization
Improvement of UNC
Visual Recognition And Search Columbia University, Spring 201443
Fisher Vector and Supervector
One of the most powerful image/video classification techniques
Thanks to Zhen Li and Qiang Chen constructive suggestions to this section
Visual Recognition And Search Columbia University, Spring 201444
Fisher Vector and Supervector
Winning Systems
2009 2010
Classification task
2011 2012
2010 2011
Large Scale Visual
Recognition Challenge
Visual Recognition And Search Columbia University, Spring 201445
Fisher Vector and Supervector
Literature
These papers are not very easy to read.
Let me take a simplified perspective via coding&pooling framework
[5] Perronnin et al, Improving the Fisher kernel for large-scale image classification, ECCV 2010[6] Jégou et al, Aggregating local image descriptors into compact codes PAMI 2011.
Fisher Vector
[1] Yan et al, Regression from patch-kernel. CVPR 2008[2] Zhou et al, SIFT-Bag kernel for video event analysis. ACM Multimedia 2008[3] Zhou et al, Hierarchical Gaussianization for image classification. ICCV 2009: 1971-1977[4] Zhou et al, Image classification using super-vector coding of local image descriptors. ECCV 2010
Supervector
Visual Recognition And Search Columbia University, Spring 201446
Fisher Vector and Supervector
• Coding with hard assignment
• Coding with soft assignment
• How to keep all the information?
Coding without Information Loss
Visual Recognition And Search Columbia University, Spring 201447
Fisher Vector and Supervector
• Coding with hard assignment
• Coding with soft assignment
• How to keep all the information?
Coding without Information Loss
Visual Recognition And Search Columbia University, Spring 201448
Fisher Vector and Supervector
An Intuitive Illustration
Coding
Visual Recognition And Search Columbia University, Spring 201449
Fisher Vector and Supervector
An Intuitive Illustration
CodingComponent 1 Component 2 Component 3
Visual Recognition And Search Columbia University, Spring 201450
Fisher Vector and Supervector
An Intuitive Illustration
Component 1 Component 2 Component 3
+ +
+ + +
Pooling
+
Visual Recognition And Search Columbia University, Spring 201451
Fisher Vector and Supervector
Implementation of Supervector
In speech (speaker identification), supervector refer to
stacked means of adaptive GMMs.
Supervector =
Visual Recognition And Search Columbia University, Spring 201452
In practice, a normalization process using the covariance
matrix often improves the performance
Moreover, we can subtract the original mean vector for
the ease of normalization
Fisher Vector and Supervector
Normalization of Supervector
The representation is also called Hierarchical Gaussianization
(HG).
Visual Recognition And Search Columbia University, Spring 201453
Fisher Vector and Supervector
Fisher Vector
Now we can define the Fisher Kernel
where is called Fisher information matrix
[Jaakkola and Haussler , NIPS 98] suggested X can be
described by the derivative subject to
Let be the probability density function with para
Visual Recognition And Search Columbia University, Spring 201454
Fisher Vector and Supervector
Fisher Vector with GMM
Let
Consider the Gaussian Mixture Model
We consider
With GMM, Fisher vectors can be obtained:
The Fisher vector
Visual Recognition And Search Columbia University, Spring 201455
Fisher Vector and Supervector
• Supervector
• Fisher vector
Comparison
Visual Recognition And Search Columbia University, Spring 201456
Fisher Vector and Supervector
Comparison
Diagonal covariance matrix
Diagonal covariance with same derivationPosterior estimation of
The two representations are almost the same even with different motivations.
Visual Recognition And Search Columbia University, Spring 201457
Fisher Vector and Supervector
• Learn from existing code
http://www.vlfeat.org/api/fisher.html
• Learn from public GMM code
• Be careful with pitfalls
– Probability is comparable to machine’s rounding error: compute logP instead of P
– Try different normalization strategy
– Try to make the code efficient
How to Code Your Own
Visual Recognition And Search Columbia University, Spring 201458
Coding: to map local features into a compact representation
– Hard quantization (VQ)
– Sparse coding
– LLC
– Mapping using GMM
• Pooling: to aggregate local code into image level represent.
– Average pooling (histogram aggregation)
– Max pooling
– Supervector and Fishervector
Review of Classification Model
A Unified Perspective
Visual Recognition And Search Columbia University, Spring 201459
Review of Classification Model
Coding
Pooling
Histogram of SIFT
Uncertainty-
Based
Quantization
Sparse
Coding
Fisher vector/
Supervector
Vector quantization
Histogramaggregation
Soft quantization
Soft quantization
GMM probability estimation
Histogram aggregation
Max pooling GMM adptation
Visual Recognition And Search Columbia University, Spring 201460
1) Project proposal: slides on tuesday Feb 25 and presentation in class Feb 27
Required content: http://rogerioferis.com/VisualRecognitionAndSearch2014/material/ProjectProposal.pdf
2) Fisher vector(ECCV 2010) paper review is out: March 11
Required content: http://rogerioferis.com/VisualRecognitionAndSearch2014/PaperReviews.html
3) Check the student presentation dates:
http://rogerioferis.com/VisualRecognitionAndSearch2014/material/Deadlines.pdf
Deadlines