image retrieval with geometry-preserving visual phrases

Yimeng Zhang, Zhaoyin Jia and Tsuhan ChenCornell University

Image Retrieval with Geometry-Preserving Visual Phrases

Similar Image Retrieval

Ranked relevant images

…

Image Database

Bag-of-Visual-Word (BoW)

Images are represented as the histogram of words

Similarity of two images: cosine similarity of histograms

…Length: dictionary size

Geometry-preserving Visual Phrases length-k Phrase:: k words in a certain spatial layout

……

(length-2 phrases)Bag of Phrases:

Phrases vs. Words

Word

Length-2

Length-3

Word

Length-2

Length-3

Irrelevant Relevant

Previous Works

Geometry Verification

…

Searching Step with BoW

Post-processing (Geometry Verification)

Only on top ranked images

Encode Spatial Info

Modeling relationship between words

Co-occurrences in Entire image [L. Torresani, et al, CVPR 2009]

No spatial information

Phrases in a local neighborhoods [J. Yuan et al, CVPR07][Z. Wu et al., CVPR10]

[C.L.Zitnick, Tech.Report 07]

No long range interactions, weak geometry

Select a subset of phrases [J. Yuan et al, CVPR07]

Discard a large portion of phrases

……

(length-2 Phrase)

Dimension: exponential to # of words in Phrase

Previous works: reduce the number of phrases

Our work: All phrases, Linear computation time

Approach

Overview

BoW BoP

1. Similarity Measure

2. Large Scale Retrieval

InvertedFiles

Min-hash InvertedFiles Min-hash

[Zhang and Chen, 09]

This Paper

Co-occurring Phrases

A B

C

A B

C

D

F

D

F

A

A

E F

E F


Only consider the translation difference

F

F

Co-occurring Phrase Algorithm

A B

C

A B

Cxxx '

yyy '

-2 -1 0 1 2 3 4

32

1

0-1-2-3-4

BCA

DF

A

EF

Offset space

D

F

D

F

A

A

E F

E F


# of co-occurring length -2 Phrases:

1 +1

32

=5

A

FA

Relation with the feature vector

……

……

)(xk )(yk

)(),( yx kk

Inner product of the feature vectors

# of co-occurring length-k phrases)|||||(| 11 kkk YXO

M: # of corresponding pairs, in practice, linear to the number of local features

)(MO same as BOW!!!

Inverted Index with BoWAvoid comparing with every image

Score table

Image ID I1 I2 … InScore +1

…

…

…

…

…

Inverted Index

Inverted Index with Word Location

…

…

…

…

…

……

I1

Assume same word only occurs once in the same image, Same memory usage as BoW

Score TableCompute # of Co-occurring Phrases:

BoW

Compute the Offset Space

Image ID I1 I2 … InScore

…

I1 I2 In

BoP

wi

Inverted Files with Phrases

…Offset Space

+1 +1+1+1

I1 I10 …

I8 …

I5 …

…

…

……

Inverted Index

0,0 1,0

0,1

0,-1 1,-1-1,-1

-1,0

…… …

Final Score

…

I1 I2 In

OffsetSpace

Image ID I1 I2 … InScore

Final similarity scores

5

82

1

32

2

4 2

101

Overview

BoW BoP

InvertedFiles

Min-hash InvertedFiles Min-hash

Less storage and time complexity

Min-hash with BoW

Probability of min-hash collision(same word)= Image Similarity

I

I’

imf

Min-hash with Phrases

Probability of k min-hash collision with consistent geometry(Details are in the paper)

I

I’

imf

jmf

Offset spacexxx '

yyy '

-3 -2 -1 0 1 2

32

1

0

-1-2-3-4

Other Invariances

)ˆlog(s

''ˆssxxx

''ˆssyyy x y

'x 'y

Image I

Image I’

1p

2p3p

Add dimension to the offset spaceIncrease the memory usage


Variant MatchingLocal histogram matching

Evaluation

1. BoW + Inverted Index vs. BoP + inverted Index

2. BoW + Min-hash vs. BoP + Min-hash

Post-processing methods: complimentary to our work

Experiments –Inverted Index5K Oxford dataset (55 queries)1M flicker distracters

Philbin, J. et al. 07

Example Precision-recall curve

Higher precision at lower recall

BoWBoP

Recall

Prec

ision

BoPBoW

RecallPr

ecisi

on

BoW

ComparisonMean average precision: mean of the AP on 55 queries

0 100 200 300 400 500 600 700 800 900 10000.450

0.500

0.550

0.600

0.650

0.700

Vocabulary Size (K)

mAP

Outperform BoW (similar computation)Outperform BoW+RANSAC (10 times slower on 150 top images)Larger improvement on smaller vocabulary size

BoP

BoW BoW+RANSAC

BoP+RANSAC

+Flicker 1M Dataset

Computational ComplexityMethod Memory Runtime (seconds)

Quantization SearchBoW 8.1G 0.89s 0.137sBoP 8.5G 0.215s

BoW+RANSAC - 0.89s 4.137s

RANSAC: 4s on top 300 images

0 200 400 600 800 10000.4

0.450.5

0.550.6

0.65 BoWBoP

Number of Images

mAP

Experiment - min-hash

University of Kentucky dataset

Minhash with BoW: [O. Chum et al., BMVC08]

200 500 8002.80

2.90

3.00

3.10

3.20

3.30

BoWBoP

# of min-hash fun.

ConclusionEncode more spatial information into the BoW

Can be applied to all images in the database at the searching step

Same computational complexity as BoW

Better Retrieval Precision than BoW+RANSAC

image retrieval with geometry-preserving visual phrases

Documents

visual phrases length

image similarity

subset of phrases

phrasesbag of phrases

phrases abcabcdfdfaaefefzhang

cooccurring length

inverted files19minhash

bop inverted index bow