deep image retrieval - learning global representations for image search - ub version

26
Deep Image Retrieval: Learning global representations for image search Albert Gordo, Jon Almazan, Jerome Revaud, Diane Larlus Original Slides by Albert Jiménez Computer Vision Reading Group 1 [ arXi v ]

Upload: universitat-de-barcelona

Post on 15-Apr-2017

122 views

Category:

Engineering


3 download

TRANSCRIPT

Page 2: Deep image retrieval - learning global representations for image search - ub version

1.Introduction

2

Page 3: Deep image retrieval - learning global representations for image search - ub version

3

Instance Retrieval + Ranking

1.

2.

3.

4.Image Retrieval

Slide credit: Amaia Sal

Ranking

Image Query

Page 4: Deep image retrieval - learning global representations for image search - ub version

CNN-based retrieval● CNNs trained for classification tasks

● Features are very robust to intra-class variability

● Lack of robustness to scaling, cropping and image clutter

Related Work

Lamp

We are interested in distinguishing between particular objects from the same class!

4

Page 5: Deep image retrieval - learning global representations for image search - ub version

R-MAC

● Regional Maximum Activation of Convolutions

● Compact feature vectors encode image regions

Related Work

Giorgos Tolias, Ronan Sicre, Hervé Jégou, Particular object retrieval with integral max-pooling of CNN activations (Submitted to ICLR 2016)

5

Page 6: Deep image retrieval - learning global representations for image search - ub version

R-MAC

● Regions selected using a rigid grid

● Compute a feature vector per region

● Combine all region feature vectors○ Dimension → 256 / 512

Related Work

Giorgos Tolias, Ronan Sicre, Hervé Jégou, Particular object retrieval with integral max-pooling of CNN activations (Submitted to ICLR 2016)

ConvNetLast

Layer

K feature maps

size = W x HDifferent scale region grids

maximum activation

6

Page 7: Deep image retrieval - learning global representations for image search - ub version

2. Methodology

7

Page 8: Deep image retrieval - learning global representations for image search - ub version

1st Contribution

● Three-stream siamese network

● PCA implemented as a shift + fully connected layer

● Optimize weights (CNN + PCA) from R-MAC representation with a triplet loss function

8

Page 9: Deep image retrieval - learning global representations for image search - ub version

where:

● m is a scalar that controls the margin

● q, d+, d- are the descriptors for the query, positive and negative images

1st Contribution

Ranking Loss Function

9

Page 10: Deep image retrieval - learning global representations for image search - ub version

2nd Contribution

● Localize regions of interest (ROIs)

● Train a Region Proposal Network with bounding boxes (Similar Fast R-CNN, [arXiv])

In R-MAC → Rigid grid

Replace

Region Proposal Network

10

Page 11: Deep image retrieval - learning global representations for image search - ub version

2nd ContributionRPN in a nutshell

11

● Predict, for a set of candidate boxes of various sizes and aspects ratio, and at all possible image locations, a score describing how likely each box contains an object of interest.

● Simultaneously, for each candidate box perform regression to improve its location.

Page 12: Deep image retrieval - learning global representations for image search - ub version

Summary

12

● Able to encode one image into a compact feature vector in a single forward pass

● Images can be compared using the dot product● Very efficient at test time

Page 13: Deep image retrieval - learning global representations for image search - ub version

3. Experiments

13

Page 14: Deep image retrieval - learning global representations for image search - ub version

Datasets

14

● Training Landmarks dataset: 214k images from 672 landmark sites

● Testing Oxford 5k, Paris 6k, Oxford 105k, Paris 106k, INRIA Holidays

● Remove all images contained in Oxford 5k and Paris 6k datasets○ Landmarks-full: 200k images from 592 landmarks

● Cleaning Landmarks dataset (Select most relevant images/discard incorrect)○ SIFT + Hessian Affine keypoint det. → Construct graph of

similar images○ Landmarks-clean: 52k images from 592 landmarks

Page 15: Deep image retrieval - learning global representations for image search - ub version

Bounding Box Estimation

15

● RPN trained using automatically estimated bounding box annotations

1. Define initial bounding box: min rectangle that encloses all matched keypoints

2. For a pair (i, j) we predict the bounding box Bj using Bi and an affine transform Aij

3. Update (Merge using geometrical mean)

4. Iterate until convergence

Bounding box projections

Initial vs Final estimations

Page 16: Deep image retrieval - learning global representations for image search - ub version

Experimental Details

16

● VGG-16 network pre-trained on ImageNet

● Fine-tune with Landmarks dataset

● Select triplets in an efficient manner ○ Forward pass to obtain image representations○ Select hard negatives (Large loss)

● Dimension of the feature vector = 512

● Evaluation: mean Average Precision (mAP)

VGG16

Page 17: Deep image retrieval - learning global representations for image search - ub version

1st Experiment

17

Comparison between R-MAC and their implementations

C: Classification NetworkR: Ranking (Trained with triplets)

Page 18: Deep image retrieval - learning global representations for image search - ub version

2nd Experiment

18

Comparison between fixed grid vs number of region proposals

16-32 proposals already outperform rigid grid!

Page 19: Deep image retrieval - learning global representations for image search - ub version

2nd Experiment

19

mAP - Number of triplets Recall - Number of region proposals

Page 20: Deep image retrieval - learning global representations for image search - ub version

2nd Experiment

20

Heatmap vs Bounding Box Estimation

Page 21: Deep image retrieval - learning global representations for image search - ub version

Comparison with state of the art

21

Page 22: Deep image retrieval - learning global representations for image search - ub version

Comparison with state of the art

22

Page 23: Deep image retrieval - learning global representations for image search - ub version

Top Retrieval Results

23

Page 24: Deep image retrieval - learning global representations for image search - ub version

4. Conclusions

24

Page 25: Deep image retrieval - learning global representations for image search - ub version

Conclusions

25

● They have proposed an effective and scalable method for image retrieval that encodes images into compact global signatures that can be compared with the dot-product.

● Proposal of a siamese network architecture trained for the specific task of image retrieval using ranking loss function (Triplets).

● Demonstrate the benefit of predicting the ROI of the images when encoding by using Region Proposal Networks.

Page 26: Deep image retrieval - learning global representations for image search - ub version

Thank You!Questions?

26