eccv2010: feature learning for image classification, part 1

1

Part 1:Classical Image Classification

Methods

Kai Yu

Dept. of Media AnalyticsNEC Laboratories America

Andrew Ng

Computer Science Dept.Stanford University

Outline of Part 2

04/10/23 2

•Local Features, Sampling, Visual Words

•Discriminative Methods- Bag-of-Words (BoW) representation- Spatial pyramid matching (SPM)

•Generative Methods- Part-based methods- Topic models




Outline of Part 2

04/10/23 3







Local features

04/10/23 4

• Distinctive descriptors of local image patches• Invariant to local translation, scale, …• and sometimes rotation or general affine transformations• The most famous choice is the SIFT feature

• Distinctive descriptors of local image patches• Invariant to local translation, scale, …• and sometimes rotation or general affine transformations• The most famous choice is the SIFT feature

Sampling local features from images

04/10/23 5

A set of points

Image credits: F-F. Li, E. Nowak, J. Sivic

Visual words

04/10/23 6

• Similar points are grouped into one visual word• Algorithms: k-means, agglomerative clustering, …• Points from different images are then more easily compared.

• Similar points are grouped into one visual word• Algorithms: k-means, agglomerative clustering, …• Points from different images are then more easily compared.

Slide credit: Kristen Grauman

Outline of Part 2

04/10/23 7

•Local Features, Sampling, Visual Words, …






Bag-of-words (BoW) representation

04/10/23 8

Analogy to documents

Adapted from tutorial slides by Fei-Fei et al.

BoW for object categorization

04/10/23 9

• Works pretty well for whole-image classification• Works pretty well for whole-image classification

Slide credit: Svetlana Lazebnik

Csurka et al. (2004), Willamowski et al. (2005), Grauman & Darrell (2005), Sivic et al. (2003, 2005)

Unsupervised Dictionary Learning

04/10/23 10

image database

• Sample local features from images• Run k-mean or other clustering algorithm to get dictionary• Dictionary is also called “codebook”

• Sample local features from images• Run k-mean or other clustering algorithm to get dictionary• Dictionary is also called “codebook”

SIFTspace

R1

R2

R3

Compute BoW histogram for each image

04/10/23 11

R1

R2

R3

Assign sift features into

clusters

Compute the frequency of each cluster

within an image

R1

R2

R3

BoW histogram representations

Indication of BoW histogram

04/10/23 12

• Summarize entire image based on its distribution of visual word occurrences

• Turn bags of different sizes into a fixed length vector

• Analogous to bag of words representation commonly used for text categorization.

• Summarize entire image based on its distribution of visual word occurrences

• Turn bags of different sizes into a fixed length vector

• Analogous to bag of words representation commonly used for text categorization.

Image classification based on BoW histogram

04/10/23 13

dog

birdDecision

boundary

BoW histogram vector space

• Learn a classification model to determine the decision boundary• Nonlinear SVMs are commonly applied.

• Learn a classification model to determine the decision boundary• Nonlinear SVMs are commonly applied.

Issues

04/10/23 14

• Sampling strategy

• Learning codebook: size? supervised?, …

• Classification: which method? scalability?

• Scalability: how to handle millions of data?

• How to use spatial information?

• Sampling strategy

• Learning codebook: size? supervised?, …

• Classification: which method? scalability?

• Scalability: how to handle millions of data?

• How to use spatial information?

Spatial information

04/10/23 15

• The BoW removes spatial layout.

• This increases the invariance to scale, translation, and deformation,

• But sacrifices discriminative power, especially when the spatial layout is important.

• The BoW removes spatial layout.

• This increases the invariance to scale, translation, and deformation,

• But sacrifices discriminative power, especially when the spatial layout is important.

Slide adapted from Bill Freeman

Spatial pyramid matching

04/10/23 16

• Compute BoW for image regions at different locations in various scales• Compute BoW for image regions at different locations in various scales

Figure credit: Svetlana Lazebnik

A common pipeline for discriminative image classification using BoW

04/10/23 17

K-means

Dense/Sparse SIFT

dictionary

Dictionary Learning

VQ Coding

Dense/Sparse SIFT

Spatial Pyramid Pooling

Nonlinear SVM

Image Classification

Combining multiple descriptors

04/10/23 18

Multiple Feature Detectors

Multiple Descriptors: SIFT, shape, color, …

VQ Coding and Spatial Pooling Nonlinear SVM

Diagram from SurreyUVA_SRKDA, winner team in PASCAL VOC 2008

Outline of Part 2

04/10/23 19







04/10/23 20

Topic models for images

wN

c z

D

Latent Dirichlet Allocation (LDA)

Fei-Fei et al. ICCV 2005

“beach”

Slide credit Fei-Fei Li

Part-based Model

04/10/23 21

Fischler & Elschlager 1973Rob Fergus ICCV09 Tutorial

For a comprehensive coverage of object categorization models, please visit

04/10/23 22

Recognizing and Learning Object Categories

Li Fei-Fei (Stanford), Rob Fergus (NYU), Antonio Torralba (MIT)

http://people.csail.mit.edu/torralba/shortCourseRLOC/

eccv2010: feature learning for image classification, part 1

Education

bag of words representation

sampling local features

compute bow histogram

indication of bow histogram

image regions

image credits

entire image

spatial layout