Transcript
Page 1: Convolutional Patch Representations for Image Retrieval An unsupervised approach

Convolutional Patch Representations for Image Retrieval: an Unsupervised Approach

29th Mar 2016

Original slides by Eva MohedanoInsight Centre for Data Analytics (Dublin City University

Mattis Paulin, Julien Mairal, Matthijs Douze, Zaid Harchaoui, Florent Perronnin, Cordelia Schmidt

Page 2: Convolutional Patch Representations for Image Retrieval An unsupervised approach

OverviewPublished ICCV 2015 (A.K.A. Local Convolutional Features With Unsupervised

Training for Image Retrieval)

Deep Convolutional Architecture to produce patch-level descriptors

• Unsupervised framework

• Comparison in patch and retrieval datasets

• “RomePatches” dataset

Page 3: Convolutional Patch Representations for Image Retrieval An unsupervised approach

Related Work

• Shallow patch descriptors

• Deep learning for image retrieval

• Deep patch descriptors

Page 4: Convolutional Patch Representations for Image Retrieval An unsupervised approach

Related Work• Shallow patch descriptors

SIFT – Scale-Invariant Feature Transform

- stereo matching

- retrieval

- classification

SURF, BRIEF, LIOP, (…)

Hand crafted → Relatively small number of parameters.

Note: A patch is an

image region extracted

from an image.

Page 5: Convolutional Patch Representations for Image Retrieval An unsupervised approach

Related Work• Deep learning for image retrieval

CNN learned on a sufficiently large labeled dataset (ImageNet) generates intermediate layers that

can be used as image descriptors.

Those descriptors work for a wide variety of tasks, including image retrieval

Page 6: Convolutional Patch Representations for Image Retrieval An unsupervised approach

Related Work• Deep learning for image retrieval

source image: http://pubs.sciepub.com/ajme/2/7/9/

Page 7: Convolutional Patch Representations for Image Retrieval An unsupervised approach

Related Work• Deep learning for image retrieval

source image: http://pubs.sciepub.com/ajme/2/7/9/

Fully connected layers → Global Image Descriptors

● Compact representation

● lack of geometric invariance

Below state-of-the art in image

retrieval

Compute at different scales(Babenko, Razavian)

Page 8: Convolutional Patch Representations for Image Retrieval An unsupervised approach

Related Work• Deep learning for image retrieval

source image: http://pubs.sciepub.com/ajme/2/7/9/

Convolutional layers

Page 9: Convolutional Patch Representations for Image Retrieval An unsupervised approach

Related Work• Deep patch descriptors

3 different kind of supervision:

1. Category labels of ImageNet. [Long et al, 2014]

2. Surrogate patch labels: Each class is a given patch under different transformations [Fischer et al, 2014]

3. Matching/non-matching pairs. [Simo-Serra et al, 2015]

Works focussed in patch-level metrics, not image retrieval.

All approaches requiered some kind of supervision.

Page 10: Convolutional Patch Representations for Image Retrieval An unsupervised approach

Image Retrieval Pipeline• Interest point detection

Hessian-Affine detector.

Rotation invariance.

• Interest point description

Feature representation in a Euclidean space

• Patch Matching

VLAD encoding.

Power normalization with exponent 0.5 + L2-norm.

Page 11: Convolutional Patch Representations for Image Retrieval An unsupervised approach

Image Retrieval Pipeline• Interest point detection

Hessian-Affine detector.

Rotation invariance.

• Interest point description

Feature representation in a Euclidean space

• Patch Matching

VLAD encoding.

Power normalization with exponent 0.5 + L2-norm.

Page 12: Convolutional Patch Representations for Image Retrieval An unsupervised approach

Convolutional DescriptorsPatch size = 51x51 – Optimal for SIFT on Oxford dataset.

CNN extended to retrieval by:

• Encoding local descriptors with model trained with an unrelated classification task

• Devising a surrogate classification problem that is as related as possible to image retrieval:

• Using unsupervised learning: Convolutional Kernel Network

Page 13: Convolutional Patch Representations for Image Retrieval An unsupervised approach

Convolutional Descriptors• Using unsupervised learning: Convolutional Kernel Network

Feature representation based in a kernel (feature) map -- Data independent

Page 14: Convolutional Patch Representations for Image Retrieval An unsupervised approach

Convolutional Descriptors• Using unsupervised learning: Convolutional Kernel Network

Projection in Hilbert space

Explicit kernel map can be computed to approximate it for computational efficiency.

- Sub-sample of patches

- Stochastic Gradient Optimization

Page 15: Convolutional Patch Representations for Image Retrieval An unsupervised approach

Convolutional Descriptors• Using unsupervised learning: Convolutional Kernel Network

4 possible inputs

From left to right: CKN-raw, CKN-mean subs, CKN-white (mean subs + PCA-whitening), CKN-grad (fully invariant to color)

Only CKN-raw, CKN-white and CKN-grad are evaluated.

Page 16: Convolutional Patch Representations for Image Retrieval An unsupervised approach

ExperimentsDatasets:

1. Rome Patches-Image

2. Oxford

3. UKbench and Holidays

CKN trained on 1M sub-patches. 300K iterations. Mini-batches size of 1000.

Page 17: Convolutional Patch Representations for Image Retrieval An unsupervised approach

Experiments

Page 18: Convolutional Patch Representations for Image Retrieval An unsupervised approach

Conclusions• CKN offer similar and sometimes better performance than CNN in the

context of patch description.

• Good patch retrieval translates into good image retrieval.

• CKNs are orders of magnitude faster to train than CNNs (10 min vs 2-3 days

on a modern GPU)

• Fully unsupervised – no labels.

Page 19: Convolutional Patch Representations for Image Retrieval An unsupervised approach

ResourcesRomePatches+Code (Although code is not accessible!)

Discriminative Unsupervised Feature Learning with Exemplar Convolutional

Neural Networks

- Code with augmentations in matlab

- Code for training models.

- Models already trained :-)

Triplet’s net + Code !!

- Greyscale local patches of 32x32. Tested in matching datasets


Top Related