canonical image selection from the web acm international conference on image and video retrieval,...

41
CANONICAL IMAGE SELECTION FROM THE WEB ACM International Conference on Image and Video Retrieval, 2007 Yushi Jing Shumeet Baluja Henry Rowley

Upload: corey-pierce

Post on 03-Jan-2016

225 views

Category:

Documents


1 download

TRANSCRIPT

CANONICAL IMAGE SELECTION FROM THE WEB

ACM International Conference on Image and Video Retrieval, 2007

Yushi JingShumeet BalujaHenry Rowley

Outline

Introduction Computation of Image Feature

SIFT Canonical Image Selection Experiments & Results Analysis Conclusions and Future Work

Introduction

Image search has become a popular feature Most search engines just use text-based

search. Image searches use very little image

information Success of text-based search of web page Difficulty and expense using image-based

signal Most search engines like Yahoo, MSN,

Google, etc., exam the text of the pages from which the images are linked.

Example: Searching for Taipei 101 by text-based, rather than examining visual contents.

Picture from:http://zh-yue.wikipedia.org/wiki/TAIPEI_101

Picture from: http://jerome.anyday.com.tw

↑Search results for “cayman”snapshot from Google.

Search results for “coca-cola”→

Why yield the results? Difficulty in associating images with keywords Large variation in image quality User perceived semantic content

Approach: Visual similarities among the images

Rather than assuming that every user who get a good image result, the approach relies on the combined preference of many users.

Common “visual theme” best capture the visual themes returned to the

user Content-based image retrieval is an actively

explored area Analyzing the “coherence” of the top

results from a traditional image search engine G. Park, Y. Baek, and H. Lee. Majority based

ranking approach in web image retrieval. 2003

R. Fergus, P. Perona, and A. Zisserman. A visual category filter for google images. 2004

The approach is an logical extension of their

Global Feature like color histograms and curvature, only capture few information, has no distinctive information.

Example: Given 1000 images from Google Search for “starbucks”, only color histogram is used.

Local features are more robust against image deformation , variations and noise

They don’t check whether image-based system can improve the quality of search results when apply to a large set of query.

Attempts to find the single most representative image for popular product using only image feature

Experiment: Human evaluators

Product searches (i.e. “ipod”,“Coca Cola”, “polo shirt”, etc) for two reasons. This is an extremely popular category of searches. It provide a good set of queries from which to

quantitatively evaluate our performance. Examining the single most representative

image Importance and wide-applicability of this task.

Froogle, NextTag.com, Shopping.com, to Amazon.com. Showing a single image next to a product listing.

Computation of Image Feature

Query on “golden gate” or “Starbucks”

The ability to identify similar sub-images. Global features are too restrictive for our task. Use local features: local information content

Harris corners, Scale Invariant Feature Transform (SIFT) , Shape Context , Spin Images and etc.

K. Mikolajczyk and C. Schmid. A performance evaluation of local descriptors. 2005

Demonstrated experimentally that SIFT gives the best matching results.

SIFT(Scale Invariant Feature Transform)

Advantage Its ability to generate highly distinctive

features that are invariant to image transformations (translation, rotation, scaling) and robust to illumination variation.

SIFT algorithm’s main four stage: Scale-space extrema detection Accurate keypoint localization Orientation assignment Keypoint descriptor

convolution operation

octave = s layer

Accurate keypoint localization

Canonical Image Selection

Local Coherence-based Image Selection Algorithm

1. Given a text query, retrieve the top 1000 images from Google image search and generate SIFT features for these images.

2. Identify matching features with Spill Tree algorithm.

3. Identify common regions shared between images by clustering the matched feature points.

4. Construct a similarity graph. If there is more than one cluster, select the best cluster based on its size and average similarity among the images.

5. From the chosen cluster, select the image with the most and highly connected edges.

Image(1000) are resized to have a max dimension of 400 pixel

Resized image contains 300 to 800 SIFT Algorithm: most matching features Find nearest matches roughly half a million

high dimensional features can be computationally expensive

Spill tree, an approximation to metric tree Euclidian distance is less than a threshold,

potential match

Common Object Verification

Similar local features can originate from different objects. Clustering Geometric verification

Group the matched points according to their corresponding image pairs.

Hough transform, object verification A 4 dimensional histogram is used to store the

“votes” the pose space(translation, scaling and rotation)

Final, we select the histogram entry with the most votes as the most consistent interpretation.

Image Selection

Similarity scores between two images Matching points divided by their total number

of interest points Similarity graph

Images as nodes, similarity as weighted edges Outliers, and removed Multiple themes, the resulting graph usually

contain several distinctive clusters of image

How to select the image? If similarity graph does not have a

cluster, select the first image returned by google as the best image.

Why have no cluster? EX? Lacks visually distinctive features Object category is too vague or broad

Experiment & Results

Experiment

Environment 130 product query Extract images(up to 1000) from Yahoo,

MSN, Google 105 human evaluators 50 randomly selected sets of images, with

randomly adjusted Resize, maximum dimension of 130 pixel

“Which of the following image best describes”

If it fails to find “common theme” among images

53/130 Each position receiving approximately

24%~26%

Analysis

Analysis

LC significantly outperforms Google, Yahoo and MSN. Analysis table 3

Some images selected by search engines are relevant and appropriate, but better choices are available. “Batman returns” screen shots

LC algorithm is able to improve image selection by identifying the common “theme” in the initial image set, and select images containing the most visually distinctive representation of that theme

There are three reasons behind this result People usually strive to take the best photos

they can Popularity images on the web. Relevant and

good quality photo tend to be repeatedly used. Starbucks

Images contain a dominant view of the object usually have more matches. This is crucial in selecting not only relevant, but also high quality images. Mona Lisa

Conclusions & Futurework

Conclusions

Presented a method for selecting the best image among a group of images returned by a conventional text-based image search engine

Computationally expensive Similarity measurements can only be generated off-

line over a list of queries. To explore methods to improve the efficiency

Limiting the size of the image The number of interest points Reducing the dimensions of local features Use discriminative selecting features that are most

related to the query we are interested in.

Future work

Expanding the range of queries Further domains might require the use of other

image features. Face recognition methods may provide a useful

similarity measure when a large portion of image results contain faces.

For queries where the results are an object category (eg “chair”), features typically used for content-based retrieval (color distributions) may be more fruitful.

The spanning trees illustrated in Figures 8 and 9 contain a great deal of information to be exploited. The edges may be usable in the same way the web link

structure is used to improve web page ranking.