beyond bags of features: adding spatial informationlazebnik/research/spring08/lec19_spatial.pdf ·...
TRANSCRIPT
![Page 1: Beyond bags of features: Adding spatial informationlazebnik/research/spring08/lec19_spatial.pdf · Run pLSA on a regular visual vocabulary 2. Identify a small number of top visual](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa13a571f4af522244dd2ad/html5/thumbnails/1.jpg)
Beyond bags of features:Adding spatial information
Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba
![Page 2: Beyond bags of features: Adding spatial informationlazebnik/research/spring08/lec19_spatial.pdf · Run pLSA on a regular visual vocabulary 2. Identify a small number of top visual](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa13a571f4af522244dd2ad/html5/thumbnails/2.jpg)
Adding spatial information• Forming vocabularies from pairs of nearby
features – “doublets” or “bigrams”• Computing bags of features on sub-windows
of the whole image• Using codebooks to vote for object position• Generative part-based models
![Page 3: Beyond bags of features: Adding spatial informationlazebnik/research/spring08/lec19_spatial.pdf · Run pLSA on a regular visual vocabulary 2. Identify a small number of top visual](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa13a571f4af522244dd2ad/html5/thumbnails/3.jpg)
From single features to “doublets”1. Run pLSA on a regular visual vocabulary2. Identify a small number of top visual words
for each topic3. Form a “doublet” vocabulary from these top
visual words4. Run pLSA again on the augmented
vocabulary
J. Sivic, B. Russell, A. Efros, A. Zisserman, B. Freeman, Discovering Objects and their Location in Images, ICCV 2005
![Page 4: Beyond bags of features: Adding spatial informationlazebnik/research/spring08/lec19_spatial.pdf · Run pLSA on a regular visual vocabulary 2. Identify a small number of top visual](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa13a571f4af522244dd2ad/html5/thumbnails/4.jpg)
From single features to “doublets”
J. Sivic, B. Russell, A. Efros, A. Zisserman, B. Freeman, Discovering Objects and their Location in Images, ICCV 2005
Ground truth All features “Face” features initiallyfound by pLSA
One doublet Another doublet “Face” doublets
![Page 5: Beyond bags of features: Adding spatial informationlazebnik/research/spring08/lec19_spatial.pdf · Run pLSA on a regular visual vocabulary 2. Identify a small number of top visual](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa13a571f4af522244dd2ad/html5/thumbnails/5.jpg)
Spatial pyramid representationSpatial pyramid representation• Extension of a bag of features• Locally orderless representation at several levels of resolution
level 0
Lazebnik, Schmid & Ponce (CVPR 2006)
![Page 6: Beyond bags of features: Adding spatial informationlazebnik/research/spring08/lec19_spatial.pdf · Run pLSA on a regular visual vocabulary 2. Identify a small number of top visual](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa13a571f4af522244dd2ad/html5/thumbnails/6.jpg)
Spatial pyramid representationSpatial pyramid representation• Extension of a bag of features• Locally orderless representation at several levels of resolution
level 0 level 1
Lazebnik, Schmid & Ponce (CVPR 2006)
![Page 7: Beyond bags of features: Adding spatial informationlazebnik/research/spring08/lec19_spatial.pdf · Run pLSA on a regular visual vocabulary 2. Identify a small number of top visual](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa13a571f4af522244dd2ad/html5/thumbnails/7.jpg)
Spatial pyramid representationSpatial pyramid representation
level 0 level 1 level 2
• Extension of a bag of features• Locally orderless representation at several levels of resolution
Lazebnik, Schmid & Ponce (CVPR 2006)
![Page 8: Beyond bags of features: Adding spatial informationlazebnik/research/spring08/lec19_spatial.pdf · Run pLSA on a regular visual vocabulary 2. Identify a small number of top visual](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa13a571f4af522244dd2ad/html5/thumbnails/8.jpg)
Scene category datasetScene category dataset
Multi-class classification results(100 training images per class)
![Page 9: Beyond bags of features: Adding spatial informationlazebnik/research/spring08/lec19_spatial.pdf · Run pLSA on a regular visual vocabulary 2. Identify a small number of top visual](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa13a571f4af522244dd2ad/html5/thumbnails/9.jpg)
Caltech101 datasetCaltech101 datasethttp://www.vision.caltech.edu/Image_Datasets/Caltech101/Caltech101.html
Multi-class classification results (30 training images per class)
![Page 10: Beyond bags of features: Adding spatial informationlazebnik/research/spring08/lec19_spatial.pdf · Run pLSA on a regular visual vocabulary 2. Identify a small number of top visual](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa13a571f4af522244dd2ad/html5/thumbnails/10.jpg)
Implicit shape models• Visual codebook is used to index votes for
object position
B. Leibe, A. Leonardis, and B. Schiele, Combined Object Categorization and Segmentation with an Implicit Shape Model, ECCV Workshop on Statistical Learning in Computer Vision 2004
training image
visual codeword withdisplacement vectors
![Page 11: Beyond bags of features: Adding spatial informationlazebnik/research/spring08/lec19_spatial.pdf · Run pLSA on a regular visual vocabulary 2. Identify a small number of top visual](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa13a571f4af522244dd2ad/html5/thumbnails/11.jpg)
Implicit shape models• Visual codebook is used to index votes for
object position
B. Leibe, A. Leonardis, and B. Schiele, Combined Object Categorization and Segmentation with an Implicit Shape Model, ECCV Workshop on Statistical Learning in Computer Vision 2004
test image
![Page 12: Beyond bags of features: Adding spatial informationlazebnik/research/spring08/lec19_spatial.pdf · Run pLSA on a regular visual vocabulary 2. Identify a small number of top visual](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa13a571f4af522244dd2ad/html5/thumbnails/12.jpg)
Implicit shape models: Training1. Build codebook of patches around extracted
interest points using clustering
![Page 13: Beyond bags of features: Adding spatial informationlazebnik/research/spring08/lec19_spatial.pdf · Run pLSA on a regular visual vocabulary 2. Identify a small number of top visual](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa13a571f4af522244dd2ad/html5/thumbnails/13.jpg)
Implicit shape models: Training1. Build codebook of patches around extracted
interest points using clustering2. Map the patch around each interest point to
closest codebook entry
![Page 14: Beyond bags of features: Adding spatial informationlazebnik/research/spring08/lec19_spatial.pdf · Run pLSA on a regular visual vocabulary 2. Identify a small number of top visual](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa13a571f4af522244dd2ad/html5/thumbnails/14.jpg)
Implicit shape models: Training1. Build codebook of patches around extracted
interest points using clustering2. Map the patch around each interest point to
closest codebook entry3. For each codebook entry, store all positions
it was found, relative to object center
![Page 15: Beyond bags of features: Adding spatial informationlazebnik/research/spring08/lec19_spatial.pdf · Run pLSA on a regular visual vocabulary 2. Identify a small number of top visual](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa13a571f4af522244dd2ad/html5/thumbnails/15.jpg)
Implicit shape models: Testing1. Given test image, extract patches, match to
codebook entry 2. Cast votes for possible positions of object center3. Search for maxima in voting space4. Extract weighted segmentation mask based on
stored masks for the codebook occurrences
![Page 16: Beyond bags of features: Adding spatial informationlazebnik/research/spring08/lec19_spatial.pdf · Run pLSA on a regular visual vocabulary 2. Identify a small number of top visual](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa13a571f4af522244dd2ad/html5/thumbnails/16.jpg)
Generative part-based models
R. Fergus, P. Perona and A. Zisserman, Object Class Recognition by Unsupervised Scale-Invariant Learning, CVPR 2003
![Page 17: Beyond bags of features: Adding spatial informationlazebnik/research/spring08/lec19_spatial.pdf · Run pLSA on a regular visual vocabulary 2. Identify a small number of top visual](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa13a571f4af522244dd2ad/html5/thumbnails/17.jpg)
Probabilistic model
h: assignment of features to parts
)|(),|(),|(max)|,()|(
objecthpobjecthshapepobjecthappearancePobjectshapeappearancePobjectimageP
h==
Partdescriptors
Partlocations
Candidate parts
![Page 18: Beyond bags of features: Adding spatial informationlazebnik/research/spring08/lec19_spatial.pdf · Run pLSA on a regular visual vocabulary 2. Identify a small number of top visual](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa13a571f4af522244dd2ad/html5/thumbnails/18.jpg)
Probabilistic model
h: assignment of features to parts
Part 2
Part 3
Part 1
)|(),|(),|(max)|,()|(
objecthpobjecthshapepobjecthappearancePobjectshapeappearancePobjectimageP
h==
![Page 19: Beyond bags of features: Adding spatial informationlazebnik/research/spring08/lec19_spatial.pdf · Run pLSA on a regular visual vocabulary 2. Identify a small number of top visual](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa13a571f4af522244dd2ad/html5/thumbnails/19.jpg)
Probabilistic model
h: assignment of features to parts
Part 2
Part 3
Part 1
)|(),|(),|(max)|,()|(
objecthpobjecthshapepobjecthappearancePobjectshapeappearancePobjectimageP
h==
![Page 20: Beyond bags of features: Adding spatial informationlazebnik/research/spring08/lec19_spatial.pdf · Run pLSA on a regular visual vocabulary 2. Identify a small number of top visual](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa13a571f4af522244dd2ad/html5/thumbnails/20.jpg)
Probabilistic model
)|(),|(),|(max)|,()|(
objecthpobjecthshapepobjecthappearancePobjectshapeappearancePobjectimageP
h==
High-dimensional appearance space
Distribution over patchdescriptors
![Page 21: Beyond bags of features: Adding spatial informationlazebnik/research/spring08/lec19_spatial.pdf · Run pLSA on a regular visual vocabulary 2. Identify a small number of top visual](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa13a571f4af522244dd2ad/html5/thumbnails/21.jpg)
Probabilistic model
)|(),|(),|(max)|,()|(
objecthpobjecthshapepobjecthappearancePobjectshapeappearancePobjectimageP
h==
2D image space
Distribution over jointpart positions
![Page 22: Beyond bags of features: Adding spatial informationlazebnik/research/spring08/lec19_spatial.pdf · Run pLSA on a regular visual vocabulary 2. Identify a small number of top visual](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa13a571f4af522244dd2ad/html5/thumbnails/22.jpg)
Results: Faces
Faceshapemodel
Patchappearancemodel
Recognitionresults
![Page 23: Beyond bags of features: Adding spatial informationlazebnik/research/spring08/lec19_spatial.pdf · Run pLSA on a regular visual vocabulary 2. Identify a small number of top visual](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa13a571f4af522244dd2ad/html5/thumbnails/23.jpg)
Results: Motorbikes and airplanes
![Page 24: Beyond bags of features: Adding spatial informationlazebnik/research/spring08/lec19_spatial.pdf · Run pLSA on a regular visual vocabulary 2. Identify a small number of top visual](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa13a571f4af522244dd2ad/html5/thumbnails/24.jpg)
Summary: Adding spatial information• Doublet vocabularies
• Pro: takes co-occurrences into account, some geometric invariance is preserved
• Con: too many doublet probabilities to estimate
• Spatial pyramids• Pro: simple extension of a bag of features, works very well• Con: no geometric invariance
• Implicit shape models• Pro: can localize object, maintain translation and possibly
scale invariance• Con: need supervised training data (known object positions
and possibly segmentation masks)
• Generative part-based models• Pro: very nice conceptually• Con: combinatorial hypothesis search problem