Content-Based Image Retrieval
Rong Jin
Content-based Image Retrieval Retrieval by text
Label database images by text tags Image retrieval as text retrieval
Find images for textual queries using standard text search engines
Example: Flickr.com
Con: require manually labeling
Image Labeling by Human Computing ESP game http://www.gwap.com/gwap/gamesPreview/espgame
Collect annotations for web images via a game
Content-based Image Retrieval Retrieval based on visual content
Represent images by their visual contents Each query is an image Search for images that have similar visual content
as the query image
Content-based Image RetrievalGiven a query image, try to find visually similar images from an image database
Image Database
Answer
Query
Example: www.like.com
CBIR Challenges: How to represent visual content of images
What are “visual contents” ? Colors, shapes, textures, objects, or meta-data (e.g.,
tags) derived from images
Which type of “visual content” should be used for representing image ? Difficult to understand the information needs of an
user from a query image
How to retrieve images efficiently Should avoid linear scan of the entire database
Image Representation
• Similar color distribution
• Similar texture pattern
• Similar shape/pattern
• Similar real content
Degree of difficulty
Histogram matching
Texture analysis
Image Segmentation,Pattern recognition
Life-time goal :-)
Vector based Image Representation Represent an image by a vector of fixed
number of elements Color histogram: discretize color space; count
pixels for each discretized color bin Texture: Gabor filters texture features …
Vector based Image Representation
0.3
0.5
0.2
Vq
0.4
0.5
0.1
V1
0.5
0.1
0.4
V2
|V1 – Vq| < |V2 – Vq| >
R
G
B
Images with Similar Colors
Images with Similar Shapes
Images with Similar Content
Challenges in CBIR You get drunk, REALLY drunk Hit over the head Kidnapped to another city
in a country on the other side of the world When you wake up,
You try to figure out what city are you in, and what is going on
That’s what it’s like to be a CBIR system!
Near Duplicate Image Retrieval Given a query image, identify gallery images
with high visual similarity.
Appearance based Image Matching Parts-based image representation
Parts (appearance) + shape (spatial relation) Parts: local features by interesting point operator Shape: graphical models or neighborhood
relationship
Interesting Point Detection Local features have been shown to be
effective for representing images They are image patterns which differ from
their immediate neighborhood. They could be points, edges, small patches. We call local features key points or interesting
points of an image
Interesting Point Detection An image example with key points detected
by a corner detector.
Interesting Point Detection The detection of interesting point needs to be
robust to various geometric transformations
Original Scaling+Rotation+Translation Projection
Interesting Point Detection The detection of interesting point needs to be
robust to imaging conditions, e.g. lighting, blurring.
Descriptor Representing each detected key point Take measurements from a region centered on
a interesting point E.g., texture, shape, …
Each descriptor is a vector with fixed length E.g. SIFT descriptor is a vector of 128 dimension
Descriptor The descriptor should also be robust under
different image transformation.
They should have similar descriptors
Image Representation
22 0 19 23 1
66 103 45 6 38
232 44 0 11 48
29 55 129 0 1
11 78 110 1 32
220 30 11 34 21
Descriptors of the key points
Original image
Detected key points
Bag-of-features representation: an exampleEach descriptor is 5 dimension
Retrieval
How to measure similarity?
22 0 19 23 1
66 103 45 6 38
232 44 0 11 48
29 55 129 0 1
...
Retrieval
Count number of matches !
22 0 19 23 1
66 103 45 6 38
232 44 0 11 48
29 55 129 0 1
...
Retrieval
If the distance between two vectors is smaller than the threshold, we get one match
Retrieval
Matched points: 1
Matched points: 5
Problems Computationally expensive
Requiring linear scan of the entire data base Example: match a query image to a database
of 1 million images 0.1 second for computing the match between two
images Take more than one day to answer a single query
Bag-of-words Model Compare to the bag-of-words representation
in text retrieval
A document
A collection of the words in the document
An image
A collection of the key points of the image
What is the
difference
Bag-of-wordsA document
A collection of the words in the document
An image
A collection of the key points of the image
What is the
difference
The same word appears in many documents
No “same key point”, but “similar key point” appears in many images which have similar “visual content”
Group “similar key point” in different images in to “visual words”
Bag-of-words Model
b1 b2
b3
b4
b5
b6
b7
b8
…
…
…
b1 b2 b3
b4
Group key points into visual words Represent images by histograms of visual words
Bag-of-words The “grouping” is usually done by clustering.
Clustering the key points of all images into a number of cluster centers (e.g 100,000 clusters).
Each cluster center is called a “visual word” The collection of all cluster centers is called “
visual vocabulary”
Retrieval by Bag-of-words Model Generate “visual vocabulary” Represent each key point by its nearest
“visual word” Represent an image by “a bag of visual
words” Text retrieval technique can be applied
directly.
Project Build a system for near duplicate image
retrieval A database with 10,000 images Construct bag-of-words models for each image
(offline) Construct a bag-of-words model for a query image Retrieve first 10 visually most “similar” images from
the database for the given query
Step 1: Dataset
10,000 color images under the folder ‘./img’ The key points of each image have already
been extracted Key points of all images are saved in a single
file ‘./feature/esp.feature’ Each line corresponds to a key point with 128
attributes Attributes in each line are separated by tabs
Step 1: Dataset To locate key points for individual images,
two other files are needed: ‘./imglist.txt’: the order of images when saving
their keypoints ‘./feature/esp.size’: the number of key points an
image have.
Step 1: Dataset Example: Three images imgA, imgB, imgC. imgA : 2 key points; imgB: 3 key points;
imgC: 2 key points.
imglist.txt esp.size esp.feature
imgB.jpg
imgC.jpg
imgA.jpg
3
2
2
imgB-key point 1
imgB-key point 2
imgB-key point 3
imgC-key point 1
imgC-key point 2
imgA-key point 1
imgA-key point 2
Step 2: Key Point Quantization Represent each image by a bag of visual
words: Construct the visual vocabulary
Clustering all the key points into 10,000 clusters Each cluster center is a visual word
Map each key point to a visual word Find the nearest cluster center for each key point
(nearest neighbor search)
Step 2: Key Point Quantization Clustering 7 key points into 3 clusters
The cluster centers are: cnt1, cnt2, cnt3 Each center is a visual word: w1, w2, w3
Find the nearest center to each key point
imglist.txt esp.size esp.feature
imgB.jpg
imgC.jpg
imgA.jpg
3
2
2
imgB-key point 1
imgB-key point 2
imgB-key point 3
imgC-key point 1
imgC-key point 2
imgA-key point 1
imgA-key point 2
Step 2: Key Point Quantization imgA.jpg
1st key point w2 2nd key point w1
imgB.jpg 1st key point w3 2nd key point w3 3rd key point w2
imgC.jpg 1st key point w3 2nd key point w2
Bag-of-words Rep.
imgA.jpg: w2 w1
imgB.jpg: w3 w3 w2
imgC.jpg: w3 w2
Step 2: Key Point Quantization We provide FLANN library for clustering and
nearest neighbor search. For clustering, use
flann_compute_cluster_centers(float* dataset, // your key pointsint rows, // number of key pointsint cols, // 128, dim of a key point int clusters, // number of clustersfloat* result, // cluster centersstruct IndexParameters* index_params,
struct FLANN
Step 2: Key Point Quantization For nearest neighbor search
1. Build index for the cluster centersflann_build_index(
float* dataset, // your cluster centers int rows, int cols, float* speedup, struct
IndexParameters* index_params, struct FLANNParameters* flann_params);
2. For each key point, search nearest cluster centerflann_find_nearest_neighbors_index(
FLANN_INDEX index_id, // your index abovefloat* testset, // your key pointsint trows, int* result, int nn, int checks, struct FLANNParameters* flann_params);
Step 2: Key Point Quantization In this step, you need to save:
the cluster centers to a file. You will use this later on for quantizing key points of query images
bag-of-words representation of each image in “trec” format.
<DOC>
<DOCNO>imgB</DOCNO>
<TEXT>
w3 w3 w2
</TEXT>
</DOC>
<DOC>
<DOCNO>imgA</DOCNO>
<TEXT>
w2 w1
</TEXT>
</DOC>
Bag-of-words Rep.
imgA.jpg: w2 w1
imgB.jpg: w3 w3 w2
imgC.jpg: w3 w2
<DOC>
<DOCNO>imgC</DOCNO>
<TEXT>
w3 w2
</TEXT>
</DOC>
Step 3: Build index using Lemur The same as what we did in the previous
home work Use “KeyfileIncIndex” index No stemming No stop words
Step 4: Extract key points for a query Three sample query images under ‘./sample
query/’ The query images are in the format of .pgm Extracting tool is under ‘./sift tool/’
For windows, use “siftW32.exe” For Linux, use “sift” Example: issue command
Sift < input.pgm > output.keypoints
Step 5: Generate a bag-of-words model for a query Map each key point of a given query to a
visual word. Use the cluster center file generated in step 2 Build index for the cluster centers using
flann_build_index()
For each key point, search nearest cluster center usingflann_find_nearest_neighbors_index()
Step 5: Generate a bag-of-words model for a query Write the bag-of-words model for a query
image in the Lemur format.<DOC 1>
The mapped cluster ID for the 1st key point
The mapped cluster ID for the 2nd key point
…
The mapped cluster ID for the 1st key point
</DOC>
Step 6: Image Retrieval by Lemur Use the Lemur command ‘RetEval’as:
RetEval <parameter_file>
An example of parameter file<parameters>
<index>/home/user1/myindex/myindex.key</index>
<retModel>tfidf</retModel>
<textQuery>/home/user1/query/q1.query</textQuery>
<resultFile>/home/user1/result/ret.result</resultFile>
<TRECResultFormat>1</TRECResultFormat>
<resultCount>10</resultCount>
</parameters>
Step 7: Graphical User Interface Build a GUI for the image retrieval system
Browse the image database Select an image from the database to query the
database and display the top 10 retrieved results Extract the bag-of-words representation of the query Write it into the file with the format specified in step7 Run the “RetEval” command for retrieval
Load in the external query image, search the images in the database and display the top 10 retrieved results
Step 8: Evaluation Demo your system in the classes of the last
week. We will provide a number of test query images Run your GUI, load in each test query image and
display the first ten most similar images from the database