![Page 1: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast](https://reader035.vdocuments.us/reader035/viewer/2022063003/5f71447e727b92025c5d440e/html5/thumbnails/1.jpg)
6.819 / 6.869: Advances in Computer Vision
Image Retrieval:Retrieval: Information, images, objects, large-scale
Website: http://6.869.csail.mit.edu/fa15/
Instructor: Yusuf Aytar
Lecture TR 9:30AM – 11:00AM (Room 34-101)
![Page 2: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast](https://reader035.vdocuments.us/reader035/viewer/2022063003/5f71447e727b92025c5d440e/html5/thumbnails/2.jpg)
What is Image Retrieval ?
Fall in Boston
Text
Query
Image
User
Speech
Retrieval Results
![Page 3: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast](https://reader035.vdocuments.us/reader035/viewer/2022063003/5f71447e727b92025c5d440e/html5/thumbnails/3.jpg)
Applications
Art Retrieval Medical Image Retrieval Product Image Retrieval (Reviews, other prices etc.)
Fashion Image Retrieval
![Page 4: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast](https://reader035.vdocuments.us/reader035/viewer/2022063003/5f71447e727b92025c5d440e/html5/thumbnails/4.jpg)
Overview
Bag of Words, TF-IDF, Cosine Similarity, Inverted IndexInformation Retrieval
Bag of Visual Words, Video Google, Object Instance RetrievalObject Instance Retrieval
KD-trees, Locality Sensitive Hashing, Semantic Hashing, Compact Codes
Fast Object Detection/RetrievalFast detection, Part representations, Generalization from exemplar
Large Scale Image Search
![Page 5: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast](https://reader035.vdocuments.us/reader035/viewer/2022063003/5f71447e727b92025c5d440e/html5/thumbnails/5.jpg)
Information Retrieval
![Page 6: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast](https://reader035.vdocuments.us/reader035/viewer/2022063003/5f71447e727b92025c5d440e/html5/thumbnails/6.jpg)
The last duel After quarrelling over a bank loan, two men took part in the last fatal duel staged on Scottish soil. BBC News's James Landale retraces the steps of his ancestor, who made that final challenge.
1 1 0 0
Bank : Loan : Water : Farmer :
Doc-1
West Bank water row Palestinians have accused Israel of diverting water away from their towns in order to keep Jewish settlements in the occupied territories fully supplied. Israel denies the charge saying Palestinian farmers are to blame for using illegal connections to irrigate their fields.
Bank : Loan : Water : Farmer :
1 0 2 0
Doc-2
Bag of Words (BOW)
A widely used document representation method
![Page 7: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast](https://reader035.vdocuments.us/reader035/viewer/2022063003/5f71447e727b92025c5d440e/html5/thumbnails/7.jpg)
0.5 0.33 0 0
0.5 0 0 0.6
0 0.66 1 0.2
0 0 0 0.2
Bank
Loan
Water
Farmer
LexiconDocuments
Doc-1 Doc-2 Doc-3 Doc-4
1 1 0 0
1 0 0 3
0 2 1 1
0 0 0 1
Bank
Loan
Water
Farmer
LexiconDocuments
Doc-1 Doc-2 Doc-3 Doc-4
Term Frequency (TF)
Normalization
![Page 8: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast](https://reader035.vdocuments.us/reader035/viewer/2022063003/5f71447e727b92025c5d440e/html5/thumbnails/8.jpg)
Inverse Document Frequency (IDF)
!"
#$%
&= df
nidfi
i log IDF of ith word:
tf x idf
The last duel After quarrelling over a bank loan, two men took part in the last fatal duel staged on Scottish soil. BBC News's James Landale retraces the steps of his ancestor, who made that final challenge.
Doc-1Ba
nk
Loan
Wat
er .Fa
rmer ...
=
![Page 9: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast](https://reader035.vdocuments.us/reader035/viewer/2022063003/5f71447e727b92025c5d440e/html5/thumbnails/9.jpg)
Cosine Similarity
Query: fall in Boston
q TCosine Similarity Score =
![Page 10: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast](https://reader035.vdocuments.us/reader035/viewer/2022063003/5f71447e727b92025c5d440e/html5/thumbnails/10.jpg)
lexicon/dictionary3 8 10 13 16 20
bank
loan
water
…
1 2 3 9 16 18
PL(bank)
PL(loan)
Postings list
4 5 8 10 13 19 20 22 PL(water)
Allows quick lookup of document ids with a particular word
Inverted Index
![Page 11: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast](https://reader035.vdocuments.us/reader035/viewer/2022063003/5f71447e727b92025c5d440e/html5/thumbnails/11.jpg)
Bag of Words & Object Instance Retrieval
![Page 12: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast](https://reader035.vdocuments.us/reader035/viewer/2022063003/5f71447e727b92025c5d440e/html5/thumbnails/12.jpg)
Slide Credit - F. Perronnin, LSVR tutorial at CVPR’13
Feature Detectors
![Page 13: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast](https://reader035.vdocuments.us/reader035/viewer/2022063003/5f71447e727b92025c5d440e/html5/thumbnails/13.jpg)
Feature Descriptors
Slide Credit - F. Perronnin, LSVR tutorial at CVPR’13
![Page 14: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast](https://reader035.vdocuments.us/reader035/viewer/2022063003/5f71447e727b92025c5d440e/html5/thumbnails/14.jpg)
Video Google Feature Detectors / Descriptors
Harris-Affine & Hessian Affine as the feature detectorsSIFT as the feature descriptor
Video Google: A Text Retrieval Approach to Object Matching in Videos, Josef Sivic and Andrew Zisserman, ICCV 2003
![Page 15: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast](https://reader035.vdocuments.us/reader035/viewer/2022063003/5f71447e727b92025c5d440e/html5/thumbnails/15.jpg)
Video Google Bag of Visual Words
Video Google: A Text Retrieval Approach to Object Matching in Videos, Josef Sivic and Andrew Zisserman, ICCV 2003
Images
Affine Invariant Feature Detectors
Harris-Affine & MSERSIFT
~300K Feature
Descriptorsk-means
2,237 Visual Words
viewpoint invariance
illumination invariance
A wheel of an airplane A motorbike handleBack of a motorbike or tip of the wings (Polysemy)
Discovering objects and their location in images, Sivic et. al., ICCV 2005
http://www.robots.ox.ac.uk/~vgg/publications/2012/Arandjelovic12/presentation.pdf
![Page 16: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast](https://reader035.vdocuments.us/reader035/viewer/2022063003/5f71447e727b92025c5d440e/html5/thumbnails/16.jpg)
• Find all instances of the query object in a large scale dataset
• Do it instantly (< 1sec), and be robust to scale, viewpoint, lighting, partial occlusion
Video Google Large scale object instance retrieval
Video Google: A Text Retrieval Approach to Object Matching in Videos, Josef Sivic and Andrew Zisserman, ICCV 2003http://www.robots.ox.ac.uk/~vgg/publications/2012/Arandjelovic12/presentation.pdf
![Page 17: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast](https://reader035.vdocuments.us/reader035/viewer/2022063003/5f71447e727b92025c5d440e/html5/thumbnails/17.jpg)
Video Google Particular object retrieval - Bag of visual words
Video Google: A Text Retrieval Approach to Object Matching in Videos, Josef Sivic and Andrew Zisserman, ICCV 2003http://www.robots.ox.ac.uk/~vgg/publications/2012/Arandjelovic12/presentation.pdf
Object retrieval with large vocabularies and fast spatial matching, Philbin, Chum, Isard, Sivic, Zisserman, CVPR 2007
![Page 18: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast](https://reader035.vdocuments.us/reader035/viewer/2022063003/5f71447e727b92025c5d440e/html5/thumbnails/18.jpg)
Video Google BOW + Inverted File Indexing
Slide Credit - Chum, LSVR tutorial at CVPR’13
![Page 19: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast](https://reader035.vdocuments.us/reader035/viewer/2022063003/5f71447e727b92025c5d440e/html5/thumbnails/19.jpg)
1. Image Query 2. Initial Retrieval Set
3. Spatial Verification
Spatial Verification
http://www.robots.ox.ac.uk/~vgg/publications/2012/Arandjelovic12/presentation.pdf
Object retrieval with large vocabularies and fast spatial matching, Philbin, Chum, Isard, Sivic, Zisserman, CVPR 2007
![Page 20: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast](https://reader035.vdocuments.us/reader035/viewer/2022063003/5f71447e727b92025c5d440e/html5/thumbnails/20.jpg)
Query Expansion
Slide Credit - Chum, LSVR tutorial at CVPR’13
![Page 21: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast](https://reader035.vdocuments.us/reader035/viewer/2022063003/5f71447e727b92025c5d440e/html5/thumbnails/21.jpg)
Query Expansion
Slide Credit - Chum, LSVR tutorial at CVPR’13
![Page 22: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast](https://reader035.vdocuments.us/reader035/viewer/2022063003/5f71447e727b92025c5d440e/html5/thumbnails/22.jpg)
Query Expansion
Slide Credit - Chum, LSVR tutorial at CVPR’13
![Page 23: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast](https://reader035.vdocuments.us/reader035/viewer/2022063003/5f71447e727b92025c5d440e/html5/thumbnails/23.jpg)
http://www.robots.ox.ac.uk/~vgg/demo/
Video Google - Object Instance Retrieval
![Page 24: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast](https://reader035.vdocuments.us/reader035/viewer/2022063003/5f71447e727b92025c5d440e/html5/thumbnails/24.jpg)
Immediate, scalable object category detection
![Page 25: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast](https://reader035.vdocuments.us/reader035/viewer/2022063003/5f71447e727b92025c5d440e/html5/thumbnails/25.jpg)
• Running a detector fast on a single image (Cascades, PQ, etc.) [Felzenszwalb-CVPR10,Vedaldi-CVPR12, Sadeghi-NIPS13].
• Running multiple detectors fast on a single image (Sparselets, etc.) [Song-ECCV12, Dean-CVPR13].
. . .
• Running a detector fast (~1sec) on a large-scale image dataset, similar to Video Google [Sivic03] but for category detection.
Motivation: Object Detection
![Page 26: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast](https://reader035.vdocuments.us/reader035/viewer/2022063003/5f71447e727b92025c5d440e/html5/thumbnails/26.jpg)
Large Scale Object Instance Retrieval
• Retrieve instantly (< 1sec)
• Robust to: scale, viewpoint, lighting, partial occlusion
Large Scale Object Category Detection
• Retrieve instantly (~ 1sec)
• Robust to: scale, viewpoint, lighting, partial occlusion and
Intra-class variance
[Arandjelovic-CVPR12] -
query query
versus
retrieval results
![Page 27: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast](https://reader035.vdocuments.us/reader035/viewer/2022063003/5f71447e727b92025c5d440e/html5/thumbnails/27.jpg)
Overview
Uses the three stages of Video Google revamped for object category detection
. . . 1 Indexing and
inverted file
3 Reranking (a) Spatial Reranking (b) HOG Scoring
Classifier Part (CP) Dictionary
Query HOG Template Reconstructed Template 2 Shortlisting
![Page 28: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast](https://reader035.vdocuments.us/reader035/viewer/2022063003/5f71447e727b92025c5d440e/html5/thumbnails/28.jpg)
Query Representation
Reconstructed Template
. .
.
CP Dictionary
Query (HOG template) is first represented as a sparse combination of CPs.
Query HOG Template
𝛂 and the spatial layouts of CPs define the reconstructed template
![Page 29: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast](https://reader035.vdocuments.us/reader035/viewer/2022063003/5f71447e727b92025c5d440e/html5/thumbnails/29.jpg)
Image Representation
![Page 30: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast](https://reader035.vdocuments.us/reader035/viewer/2022063003/5f71447e727b92025c5d440e/html5/thumbnails/30.jpg)
Shortlisting
![Page 31: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast](https://reader035.vdocuments.us/reader035/viewer/2022063003/5f71447e727b92025c5d440e/html5/thumbnails/31.jpg)
Spatial Reranking
![Page 32: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast](https://reader035.vdocuments.us/reader035/viewer/2022063003/5f71447e727b92025c5d440e/html5/thumbnails/32.jpg)
Reranking (Spatial + Original Template)Shortlisted images are reranked via fast Hough-like voting of bounding box candidates suggested by each CP.
… ... ...
Spat
ial
Rera
nkin
g
Retrieved bounding box candidates are re-scored using the original HOG template with fast and memory efficient PQ compression.
… ... ... HO
G
Scor
ing
...
![Page 33: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast](https://reader035.vdocuments.us/reader035/viewer/2022063003/5f71447e727b92025c5d440e/html5/thumbnails/33.jpg)
Dictionary & Dataset
10K dictionaries of sizes 3x3 – 7x7 HOG cells are extracted from DPMs trained from 1000 ImageNet categories.
Tests are performed on PASCAL VOC07 test set (5K images) and validation sets (100K images) of ImageNet 2011 and 2012 challenges.
. . .
![Page 34: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast](https://reader035.vdocuments.us/reader035/viewer/2022063003/5f71447e727b92025c5d440e/html5/thumbnails/34.jpg)
Detection Results
Person Template
Reconstructed Person Template
Top 3 retrievals
TV/Monitor Template
Reconstructed TV/Monitor Template Top 3 retrievals
Cow Template
Reconstructed Cow Template
Top 3 retrievals
Motorbike Template
Reconstructed Motorbike Template
Top 3 retrievals
Immediate, scalable object category detection, Y. Aytar, A. Zisserman, CVPR 2014
![Page 35: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast](https://reader035.vdocuments.us/reader035/viewer/2022063003/5f71447e727b92025c5d440e/html5/thumbnails/35.jpg)
Exemplar SVM Results
![Page 36: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast](https://reader035.vdocuments.us/reader035/viewer/2022063003/5f71447e727b92025c5d440e/html5/thumbnails/36.jpg)
Large Scale Image Search
![Page 37: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast](https://reader035.vdocuments.us/reader035/viewer/2022063003/5f71447e727b92025c5d440e/html5/thumbnails/37.jpg)
• Find similar images in a large database
Slide Credit - Kristen Grauman et al
Large Scale Image Search
Fast & Accurate
![Page 38: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast](https://reader035.vdocuments.us/reader035/viewer/2022063003/5f71447e727b92025c5d440e/html5/thumbnails/38.jpg)
Internet contains billions of images
The Challenge:
Search the internet
Large Scale Image Search
Needs to scale to Internet (How?)
Need way of measuring similarity between images
(distance metric learning)
![Page 39: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast](https://reader035.vdocuments.us/reader035/viewer/2022063003/5f71447e727b92025c5d440e/html5/thumbnails/39.jpg)
• Search must be both fast, accurate and scalable to large data set
• Fast – Kd-trees: tree data structure to improve search speed – Locality Sensitive Hashing: hash tables to improve search speed – Small code: binary small code (010101101)
• Scalable – Require very little memory, enabling their use on standard hardware
or even on handheld devices • Accurate
– Learned distance metric
Requirements for image search
![Page 40: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast](https://reader035.vdocuments.us/reader035/viewer/2022063003/5f71447e727b92025c5d440e/html5/thumbnails/40.jpg)
Categorization of existing large scale image search algorithms
• Tree Based Structure – Spatial partitions (i.e. kd-tree) and recursive hyper plane
decomposition provide an efficient means to search low-dimensional vector data exactly.
• Hashing – Locality-sensitive hashing offers sub-linear time search by
hashing highly similar examples together.
• Binary Small Code – Compact binary code, with a few hundred bits per image
![Page 41: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast](https://reader035.vdocuments.us/reader035/viewer/2022063003/5f71447e727b92025c5d440e/html5/thumbnails/41.jpg)
Tree Based Structure
• Kd-tree – The kd-tree is a binary tree in which every node is a
k-dimensional point
• (No theoretical guarantee!)They are known to break down in practice for high dimensional data, and cannot provide better than a worst case linear query time guarantee.
![Page 42: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast](https://reader035.vdocuments.us/reader035/viewer/2022063003/5f71447e727b92025c5d440e/html5/thumbnails/42.jpg)
• Take random projections of data
• Quantize each projection with few bits
0
1
0
10
1
101
No learning involved
Feature vector
Slide Credit - Fergus et al
Locality Sensitive Hashing
![Page 43: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast](https://reader035.vdocuments.us/reader035/viewer/2022063003/5f71447e727b92025c5d440e/html5/thumbnails/43.jpg)
How to search from hash table?
Q111101
110111
110101
h r1…rkXi
N
h r1…rk
<< N
Q
Slide Credit - Kristen Grauman et al.
A set of data points
Hash function
Hash table
New query
Search the hash table for a small set of images
results
![Page 44: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast](https://reader035.vdocuments.us/reader035/viewer/2022063003/5f71447e727b92025c5d440e/html5/thumbnails/44.jpg)
Binary codes for images
• Want images with similar content to have similar binary codes
• Use Hamming distance between codes – Number of bit flips
– E.g.:
• Semantic Hashing [Salakhutdinov & Hinton, 2007]
– Text documents
Ham_Dist(10001010,10001110)=1Ham_Dist(10001010,11101110)=3
Slide Credit - Fergus et al
![Page 45: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast](https://reader035.vdocuments.us/reader035/viewer/2022063003/5f71447e727b92025c5d440e/html5/thumbnails/45.jpg)
Semantic Hashing
Address Space
Semantically similar images
Query address
Semantic Hash
Function
Query
Binary code
Images in database
Quite differentto a (conventional)randomizing hash
Slide Credit - Fergus et al Semantic Hashing, Ruslan Salakhutdinov and Geoffrey Hinton, International Journal of Approximate Reasoning, 2009 Small Codes and Large Databases for Recognition, A. Torralba, R. Fergus, and Y. Weiss, CVPR 2008
– Find neighbors by exploring Hamming ball around query address
– Lookup time depends on radius of ball, NOT on # data points
![Page 46: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast](https://reader035.vdocuments.us/reader035/viewer/2022063003/5f71447e727b92025c5d440e/html5/thumbnails/46.jpg)
Compact Binary Codes
• Google has few billion images (109)
• PC has ~10 Gbytes (1011 bits)
• Codes must fit in memory (disk too slow)
Budget of 102 bits/image
• 1 Megapixel image is 107 bits
• 32x32 color image is 104 bits
Semantic hash function must also reduce dimensionality
Slide Credit - Fergus et al
![Page 47: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast](https://reader035.vdocuments.us/reader035/viewer/2022063003/5f71447e727b92025c5d440e/html5/thumbnails/47.jpg)
RBM architecture
Hidden units: h
Visible units: v
Symmetric weights w
Learn weights and biases using Contrastive Divergence
Parameters: Weights w Biases b
• Network of binary stochastic units
• Hinton & Salakhutdinov, Science 2006
Convenient conditional distributions:
Slide Credit - Fergus et al
![Page 48: 6.819 / 6.869: Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/6.869-ImageRetrieval.pdf · • Search must be both fast, accurate and scalable to large data set • Fast](https://reader035.vdocuments.us/reader035/viewer/2022063003/5f71447e727b92025c5d440e/html5/thumbnails/48.jpg)
12 closest neighbors under different distance metrics
Examples of LabelMe retrieval
Slide Credit - Fergus et al