24 february 2020cs.brown.edu/courses/cs143/lectures/2020spring_12_large... · 2020-02-24 ·...
TRANSCRIPT
![Page 1: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/1.jpg)
24 February 2020
![Page 2: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/2.jpg)
Thanks to Iuliu Balibanu
![Page 3: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/3.jpg)
Alt-text: “Crowdsourced steering” doesn’t sound quite as
appealing as “self driving”.
![Page 4: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/4.jpg)
Large-scale category recognitionand
Advanced feature encoding
Computer VisionMany slides from James Hays
![Page 5: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/5.jpg)
Scene Categorization
HighwayForestCoast Inside City
Tall Building
StreetOpen Country
Mountain
Oliva and Torralba, 2001
+Lazebnik, Schmid, and Ponce, 2006
Fei Fei and Perona, 2005
Living RoomKitchenBedroom Office Suburb
+StoreIndustrial
15 Scene Database
![Page 6: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/6.jpg)
15 Scene Recognition Rate
![Page 7: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/7.jpg)
How many object categories are there?
Biederman 1987OK, but how many places?
![Page 8: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/8.jpg)
abbey
![Page 9: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/9.jpg)
airplane cabin
![Page 10: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/10.jpg)
airport terminal
![Page 11: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/11.jpg)
…
![Page 12: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/12.jpg)
apple orchard
![Page 13: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/13.jpg)
assembly hall
![Page 14: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/14.jpg)
bakery
![Page 15: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/15.jpg)
…
![Page 16: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/16.jpg)
car factory
![Page 17: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/17.jpg)
cockpit
![Page 18: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/18.jpg)
construction site
![Page 19: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/19.jpg)
…
![Page 20: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/20.jpg)
food court
![Page 21: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/21.jpg)
interior car
![Page 22: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/22.jpg)
lounge
![Page 23: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/23.jpg)
…
![Page 24: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/24.jpg)
stadium
![Page 25: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/25.jpg)
stream
![Page 26: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/26.jpg)
train station
![Page 27: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/27.jpg)
…
![Page 28: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/28.jpg)
130k images899 categories
SUN Database – Xiao et al. CVPR 2010
![Page 29: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/29.jpg)
397 Well-sampled Categories
…at least 100 unique images each.
![Page 30: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/30.jpg)
?
Accuracy 98% 90% 68%
Evaluating Human Scene Classification
![Page 31: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/31.jpg)
![Page 32: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/32.jpg)
Scene category Most confusing categories
Inn (0%)
Bayou (0%)
Basilica (0%)
Restaurant patio (44%)
River (67%)
Cathedral(29%)
Chalet (19%)
Coast (8%)
Courthouse (21%)
![Page 33: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/33.jpg)
Conclusion: humans can do it
• The SUN database is reasonably consistent and categories can be told apart by humans.
• With many very specific categories, humans get it right 2/3rds of the time from experience and from exploring the label space.
So, how do humans classify scenes?
![Page 34: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/34.jpg)
How do we classify scenes?
Different objects, different spatial layout
Floor
Door
Light
WallWall Door
Ceiling
Painting
Fireplacearmchair armchair
Coffee table
DoorDoor
CeilingLamp
mirrormirrorwall
Door
wall
wall
painting
Bed
Side-table
Lamp
phonealarm
carpet
![Page 35: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/35.jpg)
Which are the important elements?
Similar objects, and similar spatial layout
seatseat
seatseat
seatseat
seatseat
window windowwindow
ceilingcabinets cabinets
seatseat
seatseat
seatseat
seatseat
window window
ceilingcabinets cabinets
seat seat
seat seatseat seatseat seat
seat seat
seat seat
seatseat
seat seat
screen
ceiling
wallcolumn
Different lighting, different materials, different “stuff”
![Page 36: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/36.jpg)
Scene emergent features
Suggestive edges and junctions Simple geometric forms
Blobs Textures
Bie
de
rman
, 19
81
Bru
ner
an
d P
ott
er, 1
96
9
Oliv
a an
d T
orr
alb
a, 2
00
1B
ied
erm
an, 1
98
1
“Recognition via features that are not those of individual objects but “emerge” as objects are brought into relation to each other to form a scene.” – Biederman 81
![Page 37: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/37.jpg)
Global Image Descriptors
• Tiny images (Torralba et al, 2008)
• Color histograms
• Self-similarity (Shechtman and Irani, 2007)
• Geometric class layout (Hoiem et al, 2005)
• Geometry-specific histograms (Lalonde et al, 2007)
• Dense and Sparse SIFT histograms
• Berkeley texton histograms (Martin et al, 2001)
• HoG 2x2 spatial pyramids
• Gist scene descriptor (Oliva and Torralba, 2008)
TextureFeatures
![Page 38: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/38.jpg)
Global Texture Descriptors
Sivic et. al., ICCV 2005Fei-Fei and Perona, CVPR 2005
Bag of words Spatially organized textures
Non-localized textons
S. Lazebnik, et al, CVPR 2006
Walker, Malik. Vision Research 2004 …
M. Gorkani, R. Picard, ICPR 1994A. Oliva, A. Torralba, IJCV 2001
…R. Datta, D. Joshi, J. Li, and J. Z. Wang, Image Retrieval: Ideas, Influences, and Trends of the New Age, ACM Computing Surveys, vol. 40, no. 2, pp. 5:1-60, 2008.
![Page 39: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/39.jpg)
Textons
Malik, Belongie, Shi, Leung, 1999
Filter bank
Vector of filter responses
at each pixel
Kmeans over a set ofvectors on a collectionof images
![Page 40: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/40.jpg)
TextonsFilter bank K-means (100 clusters)
Walker, Malik, 2004
Malik, Belongie, Shi, Leung, 1999
![Page 41: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/41.jpg)
Gabor filter
• Sinusoid modulated by a Gaussian kernel
Orientation
Frequency(Scale)
![Page 42: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/42.jpg)
Global scene descriptors: GIST
• The “gist” of a scene: Oliva & Torralba (2001)
http://people.csail.mit.edu/torralba/code/spatialenvelope/
![Page 43: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/43.jpg)
Gist descriptor
8 orientations
4 scales
x 16 bins
512 dimensions
Apply oriented Gabor filters
over different scales.
Average filter energy per bin.
Similar to SIFT (Lowe 1999)
applied to the entire image.
M. Gorkani, R. Picard, ICPR 1994; Walker, Malik. Vision Research 2004; Vogel et al. 2004;
Fei-Fei and Perona, CVPR 2005; S. Lazebnik, et al, CVPR 2006; …
Oliva and Torralba, 2001
![Page 44: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/44.jpg)
Example visual gists
Global features (I) ~ global features (I’) Oliva & Torralba (2001)
![Page 45: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/45.jpg)
Bag of words &
spatial pyramid matching
S. Lazebnik, et al, CVPR 2006
Sivic, Zisserman, 2003. Visual words = Kmeans of SIFT descriptors
But any way to improve the quantization approach itself?
![Page 46: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/46.jpg)
Better Bags of Visual Features
• More advanced quantization / encoding methods that are near the state-of-the-art in image classification and image retrieval.
– Mixtures of Gaussians
– Soft assignment (a.k.a. Kernel Codebook)
– VLAD – Vectors of Locally-Aggregated Descriptors
• Deep learning has taken attention away from these methods…
![Page 47: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/47.jpg)
Standard K-means Bag of Words
http://www.cs.utexas.edu/~grauman/courses/fall2009/papers/bag_of_visual_words.pdf
![Page 48: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/48.jpg)
Motivation
Bag of Visual Words is only about counting the number of local descriptors assigned to each Voronoi region
Why not including other statistics?
http://www.cs.utexas.edu/~grauman/courses/fall2009/papers/bag_of_visual_words.pdf
![Page 49: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/49.jpg)
Motivation
Bag of Visual Words is only about counting the number of local descriptors assigned to each Voronoi region
Why not including other statistics? For instance:
• mean of local descriptors
http://www.cs.utexas.edu/~grauman/courses/fall2009/papers/bag_of_visual_words.pdf
![Page 50: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/50.jpg)
Motivation
Bag of Visual Words is only about counting the number of local descriptors assigned to each Voronoi region
Why not including other statistics? For instance:
• mean of local descriptors
• (co)variance of local descriptors
http://www.cs.utexas.edu/~grauman/courses/fall2009/papers/bag_of_visual_words.pdf
![Page 51: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/51.jpg)
Gaussian Mixture Model (GMM)
• GMM can be thought of as “soft” k-means.
• Each component has a mean and a standard deviation along each direction (or full covariance)
• Can easily represent non-circular distributions
0.5
0.4 0.05
0.05
![Page 52: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/52.jpg)
Simple case: Soft Assignment
• “Kernel codebook encoding” by Chatfield et al. 2011.
• Cast a set of proportional votes (weights) to n most similar clusters, rather than a single ‘hard’ vote.
![Page 53: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/53.jpg)
Simple case: Soft Assignment
• “Kernel codebook encoding” by Chatfield et al. 2011.
• Cast a set of proportional votes (weights) to n most similar clusters, rather than a single ‘hard’ vote.
• This is fast and easy to implement, but it makes an inverted file index less sparse.
![Page 54: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/54.jpg)
VLAD – Vectors of Locally-Aggregated Descriptors
Given a codebook ,e.g. learned with K-means, and a set oflocal descriptors :
assign:
• compute:
• concatenate vi’s + normalize
3
x4
v1 v2v3 v4
v5
1
4
2
5
① assign descriptors
② compute x- i
③ vi=sum x- i for cell i
Jégou, Douze, Schmid and Pérez, “Aggregating local descriptors into a compact image representation”, CVPR’10.
x2
x1
x3
![Page 55: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/55.jpg)
VLAD – Vectors of Locally-Aggregated Descriptors
Given a codebook ,e.g. learned with K-means, and a set oflocal descriptors :
assign:
compute:
• concatenate vi’s + normalize
3
x
v1 v2v3 v4
v5
1
4
2
5
① assign descriptors
② compute x- i
③ vi=sum x- i for cell i
Jégou, Douze, Schmid and Pérez, “Aggregating local descriptors into a compact image representation”, CVPR’10.
![Page 56: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/56.jpg)
VLAD – Vectors of Locally-Aggregated Descriptors
Given a codebook ,e.g. learned with K-means, and a set oflocal descriptors :
assign:
compute:
• concatenate vi’s + normalize
3
x
v1 v2v3 v4
v5
1
4
2
5
① assign descriptors
② compute x- i
③ vi=sum x- i for cell i
Jégou, Douze, Schmid and Pérez, “Aggregating local descriptors into a compact image representation”, CVPR’10.
![Page 57: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/57.jpg)
A first example: the VLAD
A graphical representation of
Jégou, Douze, Schmid and Pérez, “Aggregating local descriptors into a compact image representation”, CVPR’10.
![Page 58: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/58.jpg)
Why can’t we train good recognition systems?
• Training Data
– Huge issue, but not always a variable we control.
• Representation
– Are the local features themselves lossy?
– What about feature quantization?
![Page 59: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/59.jpg)
What about skipping quantization completely?
In Defense of Nearest-Neighbor Based Image ClassificationBoiman, Shechtman, Irani
Quantization inherently averages the parts which are most discriminative !!!
Quantization error of densely computed image descriptors (SIFT) using a large codebook (size 6,000) of Caltech- 101. Red = high error; Blue = low error. The most informative descriptors (eye, nose, etc.) have the highest quantization error
![Page 60: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/60.jpg)
What about NN image-to-image matching?
In Defense of Nearest-Neighbor Based Image ClassificationBoiman, Shechtman, Irani
Image to class features NN:
Image to image features NN
![Page 61: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/61.jpg)
CalTech 101 (2004) –100 object classes; mean images
![Page 62: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/62.jpg)
If I do both of these, NN can be a pretty good classifier!
In Defense of Nearest-Neighbor Based Image ClassificationBoiman, Shechtman, Irani
= SIFT
![Page 63: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/63.jpg)
Summary
• Methods to better characterize the distribution of visual words in an image:
– Soft assignment (a.k.a. Kernel Codebook)
– VLAD
– No quantization
![Page 64: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/64.jpg)
Forest pathVs. all
Learning Scene Categorization
Living - roomVs. all
![Page 65: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/65.jpg)
Feature Accuracy
Classifier: 1-vs-all SVM with histogram intersection, chi squared, or RBF kernel.
Humans [68.5]
![Page 66: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/66.jpg)
A look into the results
Airplane cabin (64%)
Art gallery (38%)
…
Discotheque ToyshopVan interior
IcebergKitchenetteHotel room
All the results available on the web
![Page 67: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/67.jpg)
Humans goodComp. good
Human goodComp. bad
Human badComp. good
Humans badComp. bad
![Page 68: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/68.jpg)
How do we do better than 40%?
• Features from deep learning based on ImageNet allow us to reach 42%...
Not much better…
![Page 69: 24 February 2020cs.brown.edu/courses/cs143/lectures/2020Spring_12_Large... · 2020-02-24 · (Torralba et al, 2008) • Color histograms • Self-similarity (Shechtman and Irani,](https://reader030.vdocuments.us/reader030/viewer/2022040922/5e9bc6854b9e7359d121a72e/html5/thumbnails/69.jpg)
B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva. “Learning Deep Features for Scene Recognition using Places Database.” Advances in Neural Information Processing Systems 27 (NIPS), 2014