coding and pooling llcao -...

60
Visual Recognition And Search Columbia University, Spring 2014 1 EECS 6890 – Topics in Information Processing Spring 2014, Columbia University http://rogerioferis.com/VisualRecognitionAndSearch2014 Class 4 Feature Coding and Pooling Liangliang Cao, Feb 20, 2014

Upload: others

Post on 12-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 20141

EECS 6890 – Topics in Information Processing

Spring 2014, Columbia University

http://rogerioferis.com/VisualRecognitionAndSearch2014

Class 4 Feature Coding and Pooling

Liangliang Cao, Feb 20, 2014

Page 2: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 20142

General Scene/Object Recognition

Problem

Searching enginesTraditional companies Mobile Apps

Page 3: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 20143

Problem

http://www.vision.caltech.edu

Examples of Object Recognition Dataset

Page 4: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 20144

http://groups.csail.mit.edu/vision/SUN/

Problem

Page 5: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 20145

Outline

• Histogram of local features

• Bag of words model

• Sparse coding and related soft assignment

• Fisher vector and supervector

Outlines

Page 6: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 20146

Histogram of Local Features

Page 7: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 20147

Bag of Words Models

• Powerful local features

– DoG

– Hessian, Harris

– Dense-sampling

Recall of Last Class

Non-fixed number oflocal regions per image!

Page 8: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 20148

Bag of Words Models

• Histograms can provide a fixed size representation of images

• Spatial pyramid/gridding can enhance histogram presentation with spatial information

Recall of Last Class (2)

Page 9: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 20149

Bag of Words Models

Histogram of Local Features

…..

frequency

codewords dim = # codewords

Page 10: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 201410

Bag of Words Models

Histogram of Local Features (2)

dim = #codewords x #grids

……

Page 11: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 201411

Local Feature Quantization

Bag of Words Models

Slide courtesy to Fei-Fei Li

Page 12: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 201412

Local Feature Quantization

Bag of Words Models

Page 13: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 201413

Local Feature Quantization

Bag of Words Models

- Vector quantization- Dictionary learning

Page 14: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 201414

Dictionary for Codewords

Histogram of Local Features

Pix

ture

court

esy t

o F

ei-F

eiLi

Page 15: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 201415

Bag of Words Models

Most slides in this section are courtesy to Fei-Fei Li

Page 16: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 201416

ObjectObject Bag of Bag of ‘‘wordswords’’

Page 17: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 201417

Bag of Words Models

Of all the sensory impressions proceeding to the brain, the visual experiences are the dominant ones. Our perception of the world around us is based essentially on the messages that reach the brain from our eyes. For a long time it was thought that the retinal image was transmitted point by point to visual centers in the brain; the cerebral cortex was a movie screen, so to speak, upon which the image in the eye was projected. Through the discoveries of Hubel and Wiesel we now know that behind the origin of the visual perception in the brain there is a considerably more complicated course of events. By following the visual impulses along their path to the various cell layers of the optical cortex, Hubel and Wiesel have been able to demonstrate that the message about the image falling on the retina undergoes a step-wise analysis in a system of nerve cells stored in columns. In this system each cell has its specific function and is responsible for a specific detail in the pattern of the retinal image.

sensory, brain, visual, perception,

retinal, cerebral cortex,eye, cell, optical

nerve, imageHubel, Wiesel

China is forecasting a trade surplus of $90bn (£51bn) to $100bn this year, a threefold increase on 2004's $32bn. The Commerce Ministry said the surplus would be created by a predicted 30% jump in exports to $750bn, compared with a 18% rise in imports to $660bn. The figures are likely to further annoy the US, which has long argued that China's exports are unfairly helped by a deliberately undervalued yuan. Beijing agrees the surplus is too high, but says the yuan is only one factor. Bank of China governor Zhou Xiaochuan said the country also needed to do more to boost domestic demand so more goods stayed within the country. China increased the value of the yuan against the dollar by 2.1% in July and permitted it to trade within a narrow band, but the US wants the yuan to be allowed to trade freely. However, Beijing has made it clear that it will take its time and tread carefully before allowing the yuan to rise further in value.

China, trade, surplus, commerce,

exports, imports, US, yuan, bank, domestic,

foreign, increase, trade, value

Underlining Assumptions - Text

Page 18: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 201418

Bag of Words Models

Underlining Assumptions - Image

Page 19: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 201419

categorycategory

decisiondecision

learninglearning

feature detection& representation

KK--meansmeans

image representation

category modelscategory models

(and/or) classifiers(and/or) classifiers

recognitionrecognition

Page 20: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 201420

Bag of Words Models

Borrowing Techniques from Text Classification

• PLSA

• Naïve Bayesian Model

• wn: each patch in an image

– wn = [0,0,…1,…,0,0]T

• w: a collection of all N patches in an image

– w = [w1,w2,…,wN]

• dj: the jth image in an image collection

• c: category of the image

• z: theme or topic of the patch

No

tati

on

s

Page 21: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 201421

Hoffman, 2001

w

N

d z

D

w

N

c z

D

π

Blei et al., 2001

Probabilistic Latent Semantic Analysis (pLSA)

Latent Dirichlet Allocation (LDA)

Bag of Words Models

Page 22: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 201422

w

N

d z

D

Bag of Words Models

Probabilistic Latent Semantic Analysis (pLSA)

“face”

Sivic et al. ICCV 2005

Page 23: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 201423

wN

d z

D

Observed codeworddistributions

Codeword distributionsper theme (topic)

Theme distributionsper image

Slide credit: Josef Sivic

∑=

=

K

k

jkkiji dzpzwpdwp1

)|()|()|(

Bag of Words Models

Parameter estimated by EM or Gibbs sampling

Page 24: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 201424

)|(maxarg dzpzz

=∗

Slide credit: Josef Sivic

Bag of Words Models

Recognition using pLSA

Page 25: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 201425

w

N

c z

D

π

Latent Dirichlet Allocation (LDA)

Fei-Fei et al. ICCV 2005

“beach”

Bag of Words Models

Scene Recognition using LDA

Page 26: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 201426

Bag of Words Models

Spatial-Coherent Latent Topic Model

Cao and Fei-Fei, ICCV 2007

Page 27: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 201427

Bag of Words Models

Simultaneous Segmentation and Recognition

Page 28: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 201428

But these models suffer from

- Loss of information in quantization of “visual words”

- Loss of spatial information

Bag of Words Models

Pros and Cons

Images differ from texts!

Better coding

Better pooling

Bag of Words Models are good in

- Modeling prior knowledge

- Providing intuitive interpretation

Page 29: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 201429

Sparse Coding and Soft Quantization

Page 30: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 201430

Soft Quantization

Hard Quantization

Page 31: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 201431

Soft Quantization

Hard quantization

A more general but hard to solve representation

In practice we consider

Sparse Coding-Based Quantization

s.t.

Sparse coding

Page 32: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 201432

A Famous Paper on Sparse Coding

Yang et al obtained good recognition accuracy by

combining sparse coding

with spatial pyramid

matching (ScSPM).

Yang et al, Linear Spatial Pyramid Matching using Sparse Coding for Image Classification, CVPR 2009

Sparse coding

+

spatial pyramid

Page 33: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 201433

Sparse Coding

Hard quantization

Sparse coding based representation

Clarify One Ambiguity

s.t. Sparsest solution!

Less sparse!

1. Hard quantization (VQ) is the sparsest representation. Sparse coding is less sparse.

Page 34: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 201434

Sparse Coding

Both hard quantization and sparse coding is applied to every patch.

To get image level representation, hard quantization uses “sum” (or called average pooling), sparse coding uses “max” (or called max pooling)

When there are a lot of patches with small codebook, the final image level representation will be non-sparse.

Clarify Another Ambiguity

Page 35: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 201435

Sparse Coding

If “sparsity” is not the most important key,

– Sparse coding is less sparse than hard quantization

– Image level presentation may be dense after pooling local sparse codes

Then how to understand the success of ScSPM?

– Sparse coding keeps more information than hard quantization

– Sometimes sparsity is less important than locality.

Reflection

Page 36: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 201436

Sparse Coding

Sparse coding

Locality-constrained linear coding (LLC)

Can you guess which method is faster?

Locality-constrained Linear Coding

Wang, Yang, Yu et al, Locality-constrained Linear Coding for Image Classification, CVPR 2010

Page 37: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 201437

From Sparse Coding to LLC

• LLC is much faster than sparse coding

– Suppose the codebook size is M (e.g, M = 1K)

– Find K nearest neighbors (e.g., K =5)

– The computational complexity

• Sparse coding on M codewords: O(1000a)

• Sparse coding on K codewords: O(5a)

Faster and higher accuracy!

Locality-constrained Linear Coding

Page 38: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 201438

From Sparse Coding to LLC

Hands-on Experience of LLC

• Could you improve Jianchao Yang’s code? (http://www.ifp.illinois.edu/~jyang29/LLC.htm)

You may know Matlab’s way of searching for top-k is terrible...

Page 39: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 201439

From Sparse Coding to LLC

Hands-on Experience of LLC

• Could you improve Jianchao Yang’s code? (http://www.ifp.illinois.edu/~jyang29/LLC.htm)

This is regularized least square regressionDo you want to try• standard sparse coding,• regression after PCA,

• or more general subspace learning?

Page 40: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 201440

Another Soft Quantization

Model the uncertainty across multiple codewords

Uncertainty-Based Quantization

Gemert et al, Visual Word Ambiguity, PAMI 2009

Page 41: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 201441

Soft Quantization

Intuition of UNC

Hard quantization

Soft quantization based on uncertainty

Gemert et al, Visual Word Ambiguity, PAMI 2009

Page 42: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 201442

Soft Quantization

Improvement of UNC

Page 43: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 201443

Fisher Vector and Supervector

One of the most powerful image/video classification techniques

Thanks to Zhen Li and Qiang Chen constructive suggestions to this section

Page 44: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 201444

Fisher Vector and Supervector

Winning Systems

2009 2010

Classification task

2011 2012

2010 2011

Large Scale Visual

Recognition Challenge

Page 45: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 201445

Fisher Vector and Supervector

Literature

These papers are not very easy to read.

Let me take a simplified perspective via coding&pooling framework

[5] Perronnin et al, Improving the Fisher kernel for large-scale image classification, ECCV 2010[6] Jégou et al, Aggregating local image descriptors into compact codes PAMI 2011.

Fisher Vector

[1] Yan et al, Regression from patch-kernel. CVPR 2008[2] Zhou et al, SIFT-Bag kernel for video event analysis. ACM Multimedia 2008[3] Zhou et al, Hierarchical Gaussianization for image classification. ICCV 2009: 1971-1977[4] Zhou et al, Image classification using super-vector coding of local image descriptors. ECCV 2010

Supervector

Page 46: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 201446

Fisher Vector and Supervector

• Coding with hard assignment

• Coding with soft assignment

• How to keep all the information?

Coding without Information Loss

Page 47: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 201447

Fisher Vector and Supervector

• Coding with hard assignment

• Coding with soft assignment

• How to keep all the information?

Coding without Information Loss

Page 48: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 201448

Fisher Vector and Supervector

An Intuitive Illustration

Coding

Page 49: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 201449

Fisher Vector and Supervector

An Intuitive Illustration

CodingComponent 1 Component 2 Component 3

Page 50: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 201450

Fisher Vector and Supervector

An Intuitive Illustration

Component 1 Component 2 Component 3

+ +

+ + +

Pooling

+

Page 51: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 201451

Fisher Vector and Supervector

Implementation of Supervector

In speech (speaker identification), supervector refer to

stacked means of adaptive GMMs.

Supervector =

Page 52: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 201452

In practice, a normalization process using the covariance

matrix often improves the performance

Moreover, we can subtract the original mean vector for

the ease of normalization

Fisher Vector and Supervector

Normalization of Supervector

The representation is also called Hierarchical Gaussianization

(HG).

Page 53: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 201453

Fisher Vector and Supervector

Fisher Vector

Now we can define the Fisher Kernel

where is called Fisher information matrix

[Jaakkola and Haussler , NIPS 98] suggested X can be

described by the derivative subject to

Let be the probability density function with para

Page 54: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 201454

Fisher Vector and Supervector

Fisher Vector with GMM

Let

Consider the Gaussian Mixture Model

We consider

With GMM, Fisher vectors can be obtained:

The Fisher vector

Page 55: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 201455

Fisher Vector and Supervector

• Supervector

• Fisher vector

Comparison

Page 56: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 201456

Fisher Vector and Supervector

Comparison

Diagonal covariance matrix

Diagonal covariance with same derivationPosterior estimation of

The two representations are almost the same even with different motivations.

Page 57: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 201457

Fisher Vector and Supervector

• Learn from existing code

http://www.vlfeat.org/api/fisher.html

• Learn from public GMM code

• Be careful with pitfalls

– Probability is comparable to machine’s rounding error: compute logP instead of P

– Try different normalization strategy

– Try to make the code efficient

How to Code Your Own

Page 58: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 201458

Coding: to map local features into a compact representation

– Hard quantization (VQ)

– Sparse coding

– LLC

– Mapping using GMM

• Pooling: to aggregate local code into image level represent.

– Average pooling (histogram aggregation)

– Max pooling

– Supervector and Fishervector

Review of Classification Model

A Unified Perspective

Page 59: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 201459

Review of Classification Model

Coding

Pooling

Histogram of SIFT

Uncertainty-

Based

Quantization

Sparse

Coding

Fisher vector/

Supervector

Vector quantization

Histogramaggregation

Soft quantization

Soft quantization

GMM probability estimation

Histogram aggregation

Max pooling GMM adptation

Page 60: Coding and Pooling llcao - rogerioferis.comrogerioferis.com/VisualRecognitionAndSearch2014/classes/class4.pdfVisual Recognition And Search 17 Columbia University, Spring 2014 Bag of

Visual Recognition And Search Columbia University, Spring 201460

1) Project proposal: slides on tuesday Feb 25 and presentation in class Feb 27

Required content: http://rogerioferis.com/VisualRecognitionAndSearch2014/material/ProjectProposal.pdf

2) Fisher vector(ECCV 2010) paper review is out: March 11

Required content: http://rogerioferis.com/VisualRecognitionAndSearch2014/PaperReviews.html

3) Check the student presentation dates:

http://rogerioferis.com/VisualRecognitionAndSearch2014/material/Deadlines.pdf

Deadlines