fisher vector encoding - cs department

32
University of Central Florida Fisher Vector Encoding Gonzalo Vaca-Castano

Upload: others

Post on 07-Dec-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Fisher Vector Encoding - CS Department

U n i v e rs i t y o f C e n t ra l F l o r i d a

Fisher Vector Encoding

Gonzalo Vaca-Castano

Page 2: Fisher Vector Encoding - CS Department

Fisher Vector

CRCV | Center for Research in Computer Vision U n i v e rs i t y o f C e n t ra l F l o r i d a

• Fisher Kernels on Visual Vocabularies for Image Categorization Florent Perronnin and Christopher Dance. CVPR 2007

• Improving the Fisher Kernel for Large-Scale Image Classification. Florent Perronnin, Jorge Sanchez, and Thomas Mensink. ECCV 2010

• Image Classification with the Fisher Vector: Theory and Practice. Jorge Sánchez , Florent Perronnin , Thomas Mensink , Jakob Verbeek.

Papers used for this presentation

Page 3: Fisher Vector Encoding - CS Department

Fisher Vector

CRCV | Center for Research in Computer Vision U n i v e rs i t y o f C e n t ra l F l o r i d a

• BoW is the most typical representation method

Motivation

Page 4: Fisher Vector Encoding - CS Department

Fisher Vector

CRCV | Center for Research in Computer Vision U n i v e rs i t y o f C e n t ra l F l o r i d a

• Define number of Gaussians

• MLE to estimate GMM

• Encode using fisher

• SVM

Page 5: Fisher Vector Encoding - CS Department

Fisher Vector

CRCV | Center for Research in Computer Vision U n i v e rs i t y o f C e n t ra l F l o r i d a

Page 6: Fisher Vector Encoding - CS Department

Fisher Vector

CRCV | Center for Research in Computer Vision U n i v e rs i t y o f C e n t ra l F l o r i d a

• In retrieval:

– the larger the dataset size, the higher the probability to find another similar but irrelevant image to a given query.

• in classification:

– the larger the number of other classes, the higher the probability to find a class which is similar to any given class

Problem

Page 7: Fisher Vector Encoding - CS Department

Fisher Vector

CRCV | Center for Research in Computer Vision U n i v e rs i t y o f C e n t ra l F l o r i d a

• In retrieval:

– the larger the dataset size, the higher the probability to find another similar but irrelevant image to a given query.

• in classification:

– the larger the number of other classes, the higher the probability to find a class which is similar to any given class

Motivation

We need image representation which contain fine-grained information !

Page 8: Fisher Vector Encoding - CS Department

Fisher Vector

CRCV | Center for Research in Computer Vision U n i v e rs i t y o f C e n t ra l F l o r i d a

• BoW answer:

– increase visual vocabulary size

• How to increase amount of information without increasing the visual vocabulary size?

– BOV is only about counting

– Include higher order statistics (mean, covariance) in representation

Motivation

Page 9: Fisher Vector Encoding - CS Department

Fisher Vector

CRCV | Center for Research in Computer Vision U n i v e rs i t y o f C e n t ra l F l o r i d a

• Mean

Motivation

Page 10: Fisher Vector Encoding - CS Department

Fisher Vector

CRCV | Center for Research in Computer Vision U n i v e rs i t y o f C e n t ra l F l o r i d a

• Variance

Motivation

Page 11: Fisher Vector Encoding - CS Department

Fisher Vector

CRCV | Center for Research in Computer Vision U n i v e rs i t y o f C e n t ra l F l o r i d a

• Characterizing a sample by its deviation from the generative model (GMM).

• Deviation is measured by computing the gradient of the sample log-likelihood with respect to the model parameters (w,m,s)

Fisher Vector Idea

Page 12: Fisher Vector Encoding - CS Department

Fisher Vector

CRCV | Center for Research in Computer Vision U n i v e rs i t y o f C e n t ra l F l o r i d a

• T samples

• Vector of M parameters

• Likelihood is: =

• In statistics, score function (informant) is given by

• Intuition: direction in which the parameters l of the model should we modified to better fit the data.

Fisher Vector

Page 13: Fisher Vector Encoding - CS Department

Fisher Vector

CRCV | Center for Research in Computer Vision U n i v e rs i t y o f C e n t ra l F l o r i d a

• The score function is a representation of the data using higher order statistics.

• gradient of the log-likelihood describes the direction in which parameters should be modified to best fit the data

• Dimensions depends in number of parameters M, not in number of samples

• It is important to normalize the input vectors since most discriminative classifiers use an inner product term.

Fisher Vector

Page 14: Fisher Vector Encoding - CS Department

Fisher Vector

CRCV | Center for Research in Computer Vision U n i v e rs i t y o f C e n t ra l F l o r i d a

• Fisher information matrix (FIM)

• FIM is the variance of the score G.

• 𝑉𝑎𝑟 𝐺 = 𝐸[𝐺2] − (𝐸[𝐺])^2.

• But 𝐸 𝐺 = 0 (see next slide)

• 𝑉𝑎𝑟 𝐺 = 𝐸[𝐺2]

How to Normalize ?

Page 15: Fisher Vector Encoding - CS Department

Fisher Vector

CRCV | Center for Research in Computer Vision U n i v e rs i t y o f C e n t ra l F l o r i d a

• 𝐸𝑥 𝜕

𝜕𝜆log 𝑝 𝑥 𝜆 =

𝐸𝑥 (𝜕

𝜕𝜆𝑝 𝑥 𝜆 )/𝑝 𝑥 𝜆 =

∫𝜕

𝜕𝜆𝑝 𝑥 𝜆

𝑝 𝑥 𝜆∗ 𝑝 𝑥 𝜆 𝑑𝑥 =

𝜕

𝜕𝜆∫ 𝑝 𝑥 𝜆 𝑑𝑥 =

𝜕

𝜕𝜆 1 =0

Score mean is zero

Page 16: Fisher Vector Encoding - CS Department

Fisher Vector

CRCV | Center for Research in Computer Vision U n i v e rs i t y o f C e n t ra l F l o r i d a

• Use FIM ( ) to normalize distances

• Fisher Kernel:

• 𝐹𝜆 is symmetric --> positive semi-definite

• Has a Cholesky decomposition

• Fisher Kernel becomes

• Where is the Fisher Vector

How to measure distances ?

Page 17: Fisher Vector Encoding - CS Department

Fisher Vector

CRCV | Center for Research in Computer Vision U n i v e rs i t y o f C e n t ra l F l o r i d a

• Fisher Kernel is non-linear,

• But is a linear kernel when you use the Fisher vector as feature vector

• Consequence: linear classifiers can be learned very efficiently.

Important Observation

Page 18: Fisher Vector Encoding - CS Department

Fisher Vector

CRCV | Center for Research in Computer Vision U n i v e rs i t y o f C e n t ra l F l o r i d a

• Fisher vector is given by:

• Assuming that the samples (SIFT descriptors) are independent p(x1,x2,…xt)=p(x1)p(x2) …p(xt)

• FV is a sum of normalized gradient statistics computed for each descriptor !!!

Fisher Vector on Images

Page 19: Fisher Vector Encoding - CS Department

Fisher Vector

CRCV | Center for Research in Computer Vision U n i v e rs i t y o f C e n t ra l F l o r i d a

• Model is GMM

• Assuming that covariance matrices are diagonal (Uncorrelated data)

GMM case

Page 20: Fisher Vector Encoding - CS Department

Fisher Vector

CRCV | Center for Research in Computer Vision U n i v e rs i t y o f C e n t ra l F l o r i d a

• Soft Assignment:

Score Function for GMM

Page 21: Fisher Vector Encoding - CS Department

Fisher Vector

CRCV | Center for Research in Computer Vision U n i v e rs i t y o f C e n t ra l F l o r i d a

• See appendix A

• Fisher vector

Fisher Normalization

Page 22: Fisher Vector Encoding - CS Department

Fisher Vector

CRCV | Center for Research in Computer Vision U n i v e rs i t y o f C e n t ra l F l o r i d a

Fisher Vector

Page 23: Fisher Vector Encoding - CS Department

Fisher Vector

CRCV | Center for Research in Computer Vision U n i v e rs i t y o f C e n t ra l F l o r i d a

Fisher Vector

Closely related to BoW (Soft assignment)

Closely related to Vlad

Page 24: Fisher Vector Encoding - CS Department

Fisher Vector

CRCV | Center for Research in Computer Vision U n i v e rs i t y o f C e n t ra l F l o r i d a

Fisher Vector. Comparison with BOW

Advantages

• BoV is a particular case of the FV where the gradient computation is restricted to the mixture weight parameters of the GMM.

• FV is that it can be computed from much smaller vocabularies and therefore at a lower computational cost.

• it performs well even with simple linear classifiers

Disadvantages

• Requires more storage

(2*D+1)*N – 1

D= feature Dimension

N = Num codewords

Page 25: Fisher Vector Encoding - CS Department

Fisher Vector

CRCV | Center for Research in Computer Vision U n i v e rs i t y o f C e n t ra l F l o r i d a

• Case 1:

– Train the GMM in an unsupervised manner with the low-level feature vectors from all categories or even on a separate dataset

• Case 2:

– Train a vocabulary for each class

– For one image, a representation is generated for each class

Specific Dictionary or Global Dictionary

Page 26: Fisher Vector Encoding - CS Department

Fisher Vector

CRCV | Center for Research in Computer Vision U n i v e rs i t y o f C e n t ra l F l o r i d a

Experiments

Page 27: Fisher Vector Encoding - CS Department

Fisher Vector

CRCV | Center for Research in Computer Vision U n i v e rs i t y o f C e n t ra l F l o r i d a

Page 28: Fisher Vector Encoding - CS Department

Fisher Vector

CRCV | Center for Research in Computer Vision U n i v e rs i t y o f C e n t ra l F l o r i d a

• In House dataset, Pascal 2006

• Best results:

– Num Gaussians= 128

– Gradient respect to mean and variance concatenated.

– Dimension Reduction using PCA

– L2 Normalization

– Power normalization

Experimental setup

Page 29: Fisher Vector Encoding - CS Department

Fisher Vector

CRCV | Center for Research in Computer Vision U n i v e rs i t y o f C e n t ra l F l o r i d a

• Effect of PCA

Additional notes (Pascal 2007)

Page 30: Fisher Vector Encoding - CS Department

Fisher Vector

CRCV | Center for Research in Computer Vision U n i v e rs i t y o f C e n t ra l F l o r i d a

• Effect of Normalization

Additional Notes (Pascal 2007)

Page 31: Fisher Vector Encoding - CS Department

Fisher Vector

CRCV | Center for Research in Computer Vision U n i v e rs i t y o f C e n t ra l F l o r i d a

Additional Notes (Pascal 2007)

Effect of Lp normalization

Page 32: Fisher Vector Encoding - CS Department

Fisher Vector

CRCV | Center for Research in Computer Vision U n i v e rs i t y o f C e n t ra l F l o r i d a

Additional Notes