coding and pooling llcao -...

Visual Recognition And Search Columbia University, Spring 20141

EECS 6890 – Topics in Information Processing

Spring 2014, Columbia University

http://rogerioferis.com/VisualRecognitionAndSearch2014

Class 4 Feature Coding and Pooling

Liangliang Cao, Feb 20, 2014


General Scene/Object Recognition

Problem

Searching enginesTraditional companies Mobile Apps


Problem

http://www.vision.caltech.edu

Examples of Object Recognition Dataset


http://groups.csail.mit.edu/vision/SUN/

Problem


Outline

• Histogram of local features

• Bag of words model

• Sparse coding and related soft assignment

• Fisher vector and supervector

Outlines


Histogram of Local Features


Bag of Words Models

• Powerful local features

– DoG

– Hessian, Harris

– Dense-sampling

Recall of Last Class

Non-fixed number oflocal regions per image!


Bag of Words Models

• Histograms can provide a fixed size representation of images

• Spatial pyramid/gridding can enhance histogram presentation with spatial information

Recall of Last Class (2)


Bag of Words Models


…..

frequency

codewords dim = # codewords


Bag of Words Models

Histogram of Local Features (2)

dim = #codewords x #grids

……


…

Local Feature Quantization

Bag of Words Models

Slide courtesy to Fei-Fei Li



Bag of Words Models

…



Bag of Words Models

- Vector quantization- Dictionary learning

…


Dictionary for Codewords


Pix

ture

court

esy t

o F

ei-F

eiLi


Bag of Words Models

Most slides in this section are courtesy to Fei-Fei Li


ObjectObject Bag of Bag of ‘‘wordswords’’


Bag of Words Models

Of all the sensory impressions proceeding to the brain, the visual experiences are the dominant ones. Our perception of the world around us is based essentially on the messages that reach the brain from our eyes. For a long time it was thought that the retinal image was transmitted point by point to visual centers in the brain; the cerebral cortex was a movie screen, so to speak, upon which the image in the eye was projected. Through the discoveries of Hubel and Wiesel we now know that behind the origin of the visual perception in the brain there is a considerably more complicated course of events. By following the visual impulses along their path to the various cell layers of the optical cortex, Hubel and Wiesel have been able to demonstrate that the message about the image falling on the retina undergoes a step-wise analysis in a system of nerve cells stored in columns. In this system each cell has its specific function and is responsible for a specific detail in the pattern of the retinal image.

sensory, brain, visual, perception,

retinal, cerebral cortex,eye, cell, optical

nerve, imageHubel, Wiesel

China is forecasting a trade surplus of $90bn (£51bn) to $100bn this year, a threefold increase on 2004's $32bn. The Commerce Ministry said the surplus would be created by a predicted 30% jump in exports to $750bn, compared with a 18% rise in imports to $660bn. The figures are likely to further annoy the US, which has long argued that China's exports are unfairly helped by a deliberately undervalued yuan. Beijing agrees the surplus is too high, but says the yuan is only one factor. Bank of China governor Zhou Xiaochuan said the country also needed to do more to boost domestic demand so more goods stayed within the country. China increased the value of the yuan against the dollar by 2.1% in July and permitted it to trade within a narrow band, but the US wants the yuan to be allowed to trade freely. However, Beijing has made it clear that it will take its time and tread carefully before allowing the yuan to rise further in value.

China, trade, surplus, commerce,

exports, imports, US, yuan, bank, domestic,

foreign, increase, trade, value

Underlining Assumptions - Text


Bag of Words Models

Underlining Assumptions - Image


categorycategory

decisiondecision

learninglearning

feature detection& representation

KK--meansmeans

image representation

category modelscategory models

(and/or) classifiers(and/or) classifiers

recognitionrecognition


Bag of Words Models

Borrowing Techniques from Text Classification

• PLSA

• Naïve Bayesian Model

• wn: each patch in an image

– wn = [0,0,…1,…,0,0]T

• w: a collection of all N patches in an image

– w = [w1,w2,…,wN]

• dj: the jth image in an image collection

• c: category of the image

• z: theme or topic of the patch

No

tati

on

s


Hoffman, 2001

w

N

d z

D

w

N

c z

D

π

Blei et al., 2001

Probabilistic Latent Semantic Analysis (pLSA)

Latent Dirichlet Allocation (LDA)

Bag of Words Models


w

N

d z

D

Bag of Words Models

Probabilistic Latent Semantic Analysis (pLSA)

“face”

Sivic et al. ICCV 2005


wN

d z

D

Observed codeworddistributions

Codeword distributionsper theme (topic)

Theme distributionsper image

Slide credit: Josef Sivic

∑=

=

K

k

jkkiji dzpzwpdwp1

)|()|()|(

Bag of Words Models

Parameter estimated by EM or Gibbs sampling


)|(maxarg dzpzz

=∗

Slide credit: Josef Sivic

Bag of Words Models

Recognition using pLSA


w

N

c z

D

π

Latent Dirichlet Allocation (LDA)

Fei-Fei et al. ICCV 2005

“beach”

Bag of Words Models

Scene Recognition using LDA


Bag of Words Models

Spatial-Coherent Latent Topic Model

Cao and Fei-Fei, ICCV 2007


Bag of Words Models

Simultaneous Segmentation and Recognition


But these models suffer from

- Loss of information in quantization of “visual words”

- Loss of spatial information

Bag of Words Models

Pros and Cons

Images differ from texts!

Better coding

Better pooling

Bag of Words Models are good in

- Modeling prior knowledge

- Providing intuitive interpretation


Sparse Coding and Soft Quantization


Soft Quantization

Hard Quantization


Soft Quantization

Hard quantization

A more general but hard to solve representation

In practice we consider

Sparse Coding-Based Quantization

s.t.

Sparse coding


A Famous Paper on Sparse Coding

Yang et al obtained good recognition accuracy by

combining sparse coding

with spatial pyramid

matching (ScSPM).

Yang et al, Linear Spatial Pyramid Matching using Sparse Coding for Image Classification, CVPR 2009

Sparse coding

+

spatial pyramid


Sparse Coding

Hard quantization

Sparse coding based representation

Clarify One Ambiguity

s.t. Sparsest solution!

Less sparse!

1. Hard quantization (VQ) is the sparsest representation. Sparse coding is less sparse.


Sparse Coding

Both hard quantization and sparse coding is applied to every patch.

To get image level representation, hard quantization uses “sum” (or called average pooling), sparse coding uses “max” (or called max pooling)

When there are a lot of patches with small codebook, the final image level representation will be non-sparse.

Clarify Another Ambiguity


Sparse Coding

If “sparsity” is not the most important key,

– Sparse coding is less sparse than hard quantization

– Image level presentation may be dense after pooling local sparse codes

Then how to understand the success of ScSPM?

– Sparse coding keeps more information than hard quantization

– Sometimes sparsity is less important than locality.

Reflection


Sparse Coding

Sparse coding

Locality-constrained linear coding (LLC)

Can you guess which method is faster?

Locality-constrained Linear Coding

Wang, Yang, Yu et al, Locality-constrained Linear Coding for Image Classification, CVPR 2010


From Sparse Coding to LLC

• LLC is much faster than sparse coding

– Suppose the codebook size is M (e.g, M = 1K)

– Find K nearest neighbors (e.g., K =5)

– The computational complexity

• Sparse coding on M codewords: O(1000a)

• Sparse coding on K codewords: O(5a)

Faster and higher accuracy!

Locality-constrained Linear Coding



Hands-on Experience of LLC

• Could you improve Jianchao Yang’s code? (http://www.ifp.illinois.edu/~jyang29/LLC.htm)

You may know Matlab’s way of searching for top-k is terrible...



Hands-on Experience of LLC

• Could you improve Jianchao Yang’s code? (http://www.ifp.illinois.edu/~jyang29/LLC.htm)

This is regularized least square regressionDo you want to try• standard sparse coding,• regression after PCA,

• or more general subspace learning?


Another Soft Quantization

Model the uncertainty across multiple codewords

Uncertainty-Based Quantization

Gemert et al, Visual Word Ambiguity, PAMI 2009


Soft Quantization

Intuition of UNC

Hard quantization

Soft quantization based on uncertainty

Gemert et al, Visual Word Ambiguity, PAMI 2009


Soft Quantization

Improvement of UNC


Fisher Vector and Supervector

One of the most powerful image/video classification techniques

Thanks to Zhen Li and Qiang Chen constructive suggestions to this section



Winning Systems

2009 2010

Classification task

2011 2012

2010 2011

Large Scale Visual

Recognition Challenge



Literature

These papers are not very easy to read.

Let me take a simplified perspective via coding&pooling framework

[5] Perronnin et al, Improving the Fisher kernel for large-scale image classification, ECCV 2010[6] Jégou et al, Aggregating local image descriptors into compact codes PAMI 2011.

Fisher Vector

[1] Yan et al, Regression from patch-kernel. CVPR 2008[2] Zhou et al, SIFT-Bag kernel for video event analysis. ACM Multimedia 2008[3] Zhou et al, Hierarchical Gaussianization for image classification. ICCV 2009: 1971-1977[4] Zhou et al, Image classification using super-vector coding of local image descriptors. ECCV 2010

Supervector



• Coding with hard assignment

• Coding with soft assignment

• How to keep all the information?

Coding without Information Loss



An Intuitive Illustration

Coding




CodingComponent 1 Component 2 Component 3




Component 1 Component 2 Component 3

+ +

+ + +

Pooling

+



Implementation of Supervector

In speech (speaker identification), supervector refer to

stacked means of adaptive GMMs.

Supervector =


In practice, a normalization process using the covariance

matrix often improves the performance

Moreover, we can subtract the original mean vector for

the ease of normalization


Normalization of Supervector

The representation is also called Hierarchical Gaussianization

(HG).



Fisher Vector

Now we can define the Fisher Kernel

where is called Fisher information matrix

[Jaakkola and Haussler , NIPS 98] suggested X can be

described by the derivative subject to

Let be the probability density function with para



Fisher Vector with GMM

Let

Consider the Gaussian Mixture Model

We consider

With GMM, Fisher vectors can be obtained:

The Fisher vector



• Supervector

• Fisher vector

Comparison



Comparison

Diagonal covariance matrix

Diagonal covariance with same derivationPosterior estimation of

The two representations are almost the same even with different motivations.



• Learn from existing code

http://www.vlfeat.org/api/fisher.html

• Learn from public GMM code

• Be careful with pitfalls

– Probability is comparable to machine’s rounding error: compute logP instead of P

– Try different normalization strategy

– Try to make the code efficient

How to Code Your Own


Coding: to map local features into a compact representation

– Hard quantization (VQ)

– Sparse coding

– LLC

– Mapping using GMM

• Pooling: to aggregate local code into image level represent.

– Average pooling (histogram aggregation)

– Max pooling

– Supervector and Fishervector

Review of Classification Model

A Unified Perspective


Review of Classification Model

Coding

Pooling

Histogram of SIFT

Uncertainty-

Based

Quantization

Sparse

Coding

Fisher vector/

Supervector

Vector quantization

Histogramaggregation

Soft quantization

Soft quantization

GMM probability estimation

Histogram aggregation

Max pooling GMM adptation


1) Project proposal: slides on tuesday Feb 25 and presentation in class Feb 27

Required content: http://rogerioferis.com/VisualRecognitionAndSearch2014/material/ProjectProposal.pdf

2) Fisher vector(ECCV 2010) paper review is out: March 11

Required content: http://rogerioferis.com/VisualRecognitionAndSearch2014/PaperReviews.html

3) Check the student presentation dates:

http://rogerioferis.com/VisualRecognitionAndSearch2014/material/Deadlines.pdf

Deadlines

coding and pooling llcao -...

Documents