1
Part 1:Classical Image Classification
Methods
Kai Yu
Dept. of Media AnalyticsNEC Laboratories America
Andrew Ng
Computer Science Dept.Stanford University
Outline of Part 2
04/18/23 2
•Local Features, Sampling, Visual Words
•Discriminative Methods- Bag-of-Words (BoW) representation- Spatial pyramid matching (SPM)
•Generative Methods- Part-based methods- Topic models
•Local Features, Sampling, Visual Words
•Discriminative Methods- Bag-of-Words (BoW) representation- Spatial pyramid matching (SPM)
•Generative Methods- Part-based methods- Topic models
Outline of Part 2
04/18/23 3
•Local Features, Sampling, Visual Words
•Discriminative Methods- Bag-of-Words (BoW) representation- Spatial pyramid matching (SPM)
•Generative Methods- Part-based methods- Topic models
•Local Features, Sampling, Visual Words
•Discriminative Methods- Bag-of-Words (BoW) representation- Spatial pyramid matching (SPM)
•Generative Methods- Part-based methods- Topic models
Local features
04/18/23 4
• Distinctive descriptors of local image patches• Invariant to local translation, scale, …• and sometimes rotation or general affine transformations• The most famous choice is the SIFT feature
• Distinctive descriptors of local image patches• Invariant to local translation, scale, …• and sometimes rotation or general affine transformations• The most famous choice is the SIFT feature
Sampling local features from images
04/18/23 5
A set of points
Image credits: F-F. Li, E. Nowak, J. Sivic
Visual words
04/18/23 6
• Similar points are grouped into one visual word• Algorithms: k-means, agglomerative clustering, …• Points from different images are then more easily compared.
• Similar points are grouped into one visual word• Algorithms: k-means, agglomerative clustering, …• Points from different images are then more easily compared.
Slide credit: Kristen Grauman
Outline of Part 2
04/18/23 7
•Local Features, Sampling, Visual Words, …
•Discriminative Methods- Bag-of-Words (BoW) representation- Spatial pyramid matching (SPM)
•Generative Methods- Part-based methods- Topic models
•Local Features, Sampling, Visual Words, …
•Discriminative Methods- Bag-of-Words (BoW) representation- Spatial pyramid matching (SPM)
•Generative Methods- Part-based methods- Topic models
Bag-of-words (BoW) representation
04/18/23 8
Analogy to documents
Adapted from tutorial slides by Fei-Fei et al.
BoW for object categorization
04/18/23 9
• Works pretty well for whole-image classification• Works pretty well for whole-image classification
Slide credit: Svetlana Lazebnik
Csurka et al. (2004), Willamowski et al. (2005), Grauman & Darrell (2005), Sivic et al. (2003, 2005)
Unsupervised Dictionary Learning
04/18/23 10
image database
• Sample local features from images• Run k-mean or other clustering algorithm to get dictionary• Dictionary is also called “codebook”
• Sample local features from images• Run k-mean or other clustering algorithm to get dictionary• Dictionary is also called “codebook”
SIFTspace
R1
R2
R3
Compute BoW histogram for each image
04/18/23 11
R1
R2
R3
Assign sift features into
clusters
Compute the frequency of each cluster
within an image
R1
R2
R3
BoW histogram representations
Indication of BoW histogram
04/18/23 12
• Summarize entire image based on its distribution of visual word occurrences
• Turn bags of different sizes into a fixed length vector
• Analogous to bag of words representation commonly used for text categorization.
• Summarize entire image based on its distribution of visual word occurrences
• Turn bags of different sizes into a fixed length vector
• Analogous to bag of words representation commonly used for text categorization.
Image classification based on BoW histogram
04/18/23 13
dog
birdDecision
boundary
BoW histogram vector space
• Learn a classification model to determine the decision boundary• Nonlinear SVMs are commonly applied.
• Learn a classification model to determine the decision boundary• Nonlinear SVMs are commonly applied.
Issues
04/18/23 14
• Sampling strategy
• Learning codebook: size? supervised?, …
• Classification: which method? scalability?
• Scalability: how to handle millions of data?
• How to use spatial information?
• Sampling strategy
• Learning codebook: size? supervised?, …
• Classification: which method? scalability?
• Scalability: how to handle millions of data?
• How to use spatial information?
Spatial information
04/18/23 15
• The BoW removes spatial layout.
• This increases the invariance to scale, translation, and deformation,
• But sacrifices discriminative power, especially when the spatial layout is important.
• The BoW removes spatial layout.
• This increases the invariance to scale, translation, and deformation,
• But sacrifices discriminative power, especially when the spatial layout is important.
Slide adapted from Bill Freeman
Spatial pyramid matching
04/18/23 16
• Compute BoW for image regions at different locations in various scales• Compute BoW for image regions at different locations in various scales
Figure credit: Svetlana Lazebnik
A common pipeline for discriminative image classification using BoW
04/18/23 17
K-means
Dense/Sparse SIFT
dictionary
Dictionary Learning
VQ Coding
Dense/Sparse SIFT
Spatial Pyramid Pooling
Nonlinear SVM
Image Classification
Combining multiple descriptors
04/18/23 18
Multiple Feature Detectors
Multiple Descriptors: SIFT, shape, color, …
VQ Coding and Spatial Pooling Nonlinear SVM
Diagram from SurreyUVA_SRKDA, winner team in PASCAL VOC 2008
Outline of Part 2
04/18/23 19
•Local Features, Sampling, Visual Words, …
•Discriminative Methods- Bag-of-Words (BoW) representation- Spatial pyramid matching (SPM)
•Generative Methods- Part-based methods- Topic models
•Local Features, Sampling, Visual Words, …
•Discriminative Methods- Bag-of-Words (BoW) representation- Spatial pyramid matching (SPM)
•Generative Methods- Part-based methods- Topic models
04/18/23 20
Topic models for images
wN
c z
D
Latent Dirichlet Allocation (LDA)
Fei-Fei et al. ICCV 2005
“beach”
Slide credit Fei-Fei Li
Part-based Model
04/18/23 21
Fischler & Elschlager 1973Rob Fergus ICCV09 Tutorial
For a comprehensive coverage of object categorization models, please visit
04/18/23 22
Recognizing and Learning Object Categories
Li Fei-Fei (Stanford), Rob Fergus (NYU), Antonio Torralba (MIT)
http://people.csail.mit.edu/torralba/shortCourseRLOC/