image retrieval with geometry-preserving visual phrases
DESCRIPTION
Image Retrieval with Geometry-Preserving Visual Phrases. Yimeng Zhang, Zhaoyin Jia and Tsuhan Chen Cornell University. Similar Image Retrieval. …. Image Database. Ranked relevant images. Bag-of-Visual-Word (BoW) . Images are represented as the histogram of words - PowerPoint PPT PresentationTRANSCRIPT
Yimeng Zhang, Zhaoyin Jia and Tsuhan ChenCornell University
Image Retrieval with Geometry-Preserving Visual Phrases
Similar Image Retrieval
Ranked relevant images
…
Image Database
Bag-of-Visual-Word (BoW)
Images are represented as the histogram of words
Similarity of two images: cosine similarity of histograms
…Length: dictionary size
Geometry-preserving Visual Phrases length-k Phrase:: k words in a certain spatial layout
……
(length-2 phrases)Bag of Phrases:
Phrases vs. Words
Word
Length-2
Length-3
Word
Length-2
Length-3
Irrelevant Relevant
Previous Works
Geometry Verification
…
Searching Step with BoW
Post-processing (Geometry Verification)
Only on top ranked images
Encode Spatial Info
Modeling relationship between words
Co-occurrences in Entire image [L. Torresani, et al, CVPR 2009]
No spatial information
Phrases in a local neighborhoods [J. Yuan et al, CVPR07][Z. Wu et al., CVPR10]
[C.L.Zitnick, Tech.Report 07]
No long range interactions, weak geometry
Select a subset of phrases [J. Yuan et al, CVPR07]
Discard a large portion of phrases
……
(length-2 Phrase)
Dimension: exponential to # of words in Phrase
Previous works: reduce the number of phrases
Our work: All phrases, Linear computation time
Approach
Overview
BoW BoP
1. Similarity Measure
2. Large Scale Retrieval
InvertedFiles
Min-hash InvertedFiles Min-hash
[Zhang and Chen, 09]
This Paper
Co-occurring Phrases
A B
C
A B
C
D
F
D
F
A
A
E F
E F
[Zhang and Chen, 09]
Only consider the translation difference
F
F
Co-occurring Phrase Algorithm
A B
C
A B
Cxxx '
yyy '
-2 -1 0 1 2 3 4
32
1
0-1-2-3-4
BCA
DF
A
EF
Offset space
D
F
D
F
A
A
E F
E F
[Zhang and Chen, 09]
# of co-occurring length -2 Phrases:
1 +1
32
=5
A
FA
Relation with the feature vector
……
……
)(xk )(yk
)(),( yx kk
Inner product of the feature vectors
# of co-occurring length-k phrases)|||||(| 11 kkk YXO
M: # of corresponding pairs, in practice, linear to the number of local features
)(MO same as BOW!!!
Inverted Index with BoWAvoid comparing with every image
Score table
Image ID I1 I2 … InScore +1
…
…
…
…
…
Inverted Index
Inverted Index with Word Location
…
…
…
…
…
……
I1
Assume same word only occurs once in the same image, Same memory usage as BoW
Score TableCompute # of Co-occurring Phrases:
BoW
Compute the Offset Space
Image ID I1 I2 … InScore
…
I1 I2 In
BoP
wi
Inverted Files with Phrases
…Offset Space
+1 +1+1+1
I1 I10 …
I8 …
I5 …
…
…
……
Inverted Index
0,0 1,0
0,1
0,-1 1,-1-1,-1
-1,0
…… …
Final Score
…
I1 I2 In
OffsetSpace
Image ID I1 I2 … InScore
Final similarity scores
5
82
1
32
2
4 2
101
Overview
BoW BoP
InvertedFiles
Min-hash InvertedFiles Min-hash
Less storage and time complexity
Min-hash with BoW
Probability of min-hash collision(same word)= Image Similarity
I
I’
imf
Min-hash with Phrases
Probability of k min-hash collision with consistent geometry(Details are in the paper)
I
I’
imf
jmf
Offset spacexxx '
yyy '
-3 -2 -1 0 1 2
32
1
0
-1-2-3-4
Other Invariances
)ˆlog(s
''ˆssxxx
''ˆssyyy x y
'x 'y
Image I
Image I’
1p
2p3p
Add dimension to the offset spaceIncrease the memory usage
[Zhang and Chen, 10]
Variant MatchingLocal histogram matching
Evaluation
1. BoW + Inverted Index vs. BoP + inverted Index
2. BoW + Min-hash vs. BoP + Min-hash
Post-processing methods: complimentary to our work
Experiments –Inverted Index5K Oxford dataset (55 queries)1M flicker distracters
Philbin, J. et al. 07
Example Precision-recall curve
Higher precision at lower recall
BoWBoP
Recall
Prec
ision
BoPBoW
RecallPr
ecisi
on
BoW
ComparisonMean average precision: mean of the AP on 55 queries
0 100 200 300 400 500 600 700 800 900 10000.450
0.500
0.550
0.600
0.650
0.700
Vocabulary Size (K)
mAP
Outperform BoW (similar computation)Outperform BoW+RANSAC (10 times slower on 150 top images)Larger improvement on smaller vocabulary size
BoP
BoW BoW+RANSAC
BoP+RANSAC
+Flicker 1M Dataset
Computational ComplexityMethod Memory Runtime (seconds)
Quantization SearchBoW 8.1G 0.89s 0.137sBoP 8.5G 0.215s
BoW+RANSAC - 0.89s 4.137s
RANSAC: 4s on top 300 images
0 200 400 600 800 10000.4
0.450.5
0.550.6
0.65 BoWBoP
Number of Images
mAP
Experiment - min-hash
University of Kentucky dataset
Minhash with BoW: [O. Chum et al., BMVC08]
200 500 8002.80
2.90
3.00
3.10
3.20
3.30
BoWBoP
# of min-hash fun.
ConclusionEncode more spatial information into the BoW
Can be applied to all images in the database at the searching step
Same computational complexity as BoW
Better Retrieval Precision than BoW+RANSAC