canonical image selection from the web acm international conference on image and video retrieval,...
TRANSCRIPT
CANONICAL IMAGE SELECTION FROM THE WEB
ACM International Conference on Image and Video Retrieval, 2007
Yushi JingShumeet BalujaHenry Rowley
Outline
Introduction Computation of Image Feature
SIFT Canonical Image Selection Experiments & Results Analysis Conclusions and Future Work
Image search has become a popular feature Most search engines just use text-based
search. Image searches use very little image
information Success of text-based search of web page Difficulty and expense using image-based
signal Most search engines like Yahoo, MSN,
Google, etc., exam the text of the pages from which the images are linked.
Example: Searching for Taipei 101 by text-based, rather than examining visual contents.
Picture from:http://zh-yue.wikipedia.org/wiki/TAIPEI_101
Picture from: http://jerome.anyday.com.tw
Why yield the results? Difficulty in associating images with keywords Large variation in image quality User perceived semantic content
Approach: Visual similarities among the images
Rather than assuming that every user who get a good image result, the approach relies on the combined preference of many users.
Common “visual theme” best capture the visual themes returned to the
user Content-based image retrieval is an actively
explored area Analyzing the “coherence” of the top
results from a traditional image search engine G. Park, Y. Baek, and H. Lee. Majority based
ranking approach in web image retrieval. 2003
R. Fergus, P. Perona, and A. Zisserman. A visual category filter for google images. 2004
The approach is an logical extension of their
Global Feature like color histograms and curvature, only capture few information, has no distinctive information.
Example: Given 1000 images from Google Search for “starbucks”, only color histogram is used.
Local features are more robust against image deformation , variations and noise
They don’t check whether image-based system can improve the quality of search results when apply to a large set of query.
Attempts to find the single most representative image for popular product using only image feature
Experiment: Human evaluators
Product searches (i.e. “ipod”,“Coca Cola”, “polo shirt”, etc) for two reasons. This is an extremely popular category of searches. It provide a good set of queries from which to
quantitatively evaluate our performance. Examining the single most representative
image Importance and wide-applicability of this task.
Froogle, NextTag.com, Shopping.com, to Amazon.com. Showing a single image next to a product listing.
The ability to identify similar sub-images. Global features are too restrictive for our task. Use local features: local information content
Harris corners, Scale Invariant Feature Transform (SIFT) , Shape Context , Spin Images and etc.
K. Mikolajczyk and C. Schmid. A performance evaluation of local descriptors. 2005
Demonstrated experimentally that SIFT gives the best matching results.
SIFT(Scale Invariant Feature Transform)
Advantage Its ability to generate highly distinctive
features that are invariant to image transformations (translation, rotation, scaling) and robust to illumination variation.
SIFT algorithm’s main four stage: Scale-space extrema detection Accurate keypoint localization Orientation assignment Keypoint descriptor
Local Coherence-based Image Selection Algorithm
1. Given a text query, retrieve the top 1000 images from Google image search and generate SIFT features for these images.
2. Identify matching features with Spill Tree algorithm.
3. Identify common regions shared between images by clustering the matched feature points.
4. Construct a similarity graph. If there is more than one cluster, select the best cluster based on its size and average similarity among the images.
5. From the chosen cluster, select the image with the most and highly connected edges.
Image(1000) are resized to have a max dimension of 400 pixel
Resized image contains 300 to 800 SIFT Algorithm: most matching features Find nearest matches roughly half a million
high dimensional features can be computationally expensive
Spill tree, an approximation to metric tree Euclidian distance is less than a threshold,
potential match
Common Object Verification
Similar local features can originate from different objects. Clustering Geometric verification
Group the matched points according to their corresponding image pairs.
Hough transform, object verification A 4 dimensional histogram is used to store the
“votes” the pose space(translation, scaling and rotation)
Final, we select the histogram entry with the most votes as the most consistent interpretation.
Image Selection
Similarity scores between two images Matching points divided by their total number
of interest points Similarity graph
Images as nodes, similarity as weighted edges Outliers, and removed Multiple themes, the resulting graph usually
contain several distinctive clusters of image
How to select the image? If similarity graph does not have a
cluster, select the first image returned by google as the best image.
Why have no cluster? EX? Lacks visually distinctive features Object category is too vague or broad
Experiment
Environment 130 product query Extract images(up to 1000) from Yahoo,
MSN, Google 105 human evaluators 50 randomly selected sets of images, with
randomly adjusted Resize, maximum dimension of 130 pixel
“Which of the following image best describes”
If it fails to find “common theme” among images
53/130 Each position receiving approximately
24%~26%
Some images selected by search engines are relevant and appropriate, but better choices are available. “Batman returns” screen shots
LC algorithm is able to improve image selection by identifying the common “theme” in the initial image set, and select images containing the most visually distinctive representation of that theme
There are three reasons behind this result People usually strive to take the best photos
they can Popularity images on the web. Relevant and
good quality photo tend to be repeatedly used. Starbucks
Images contain a dominant view of the object usually have more matches. This is crucial in selecting not only relevant, but also high quality images. Mona Lisa
Conclusions
Presented a method for selecting the best image among a group of images returned by a conventional text-based image search engine
Computationally expensive Similarity measurements can only be generated off-
line over a list of queries. To explore methods to improve the efficiency
Limiting the size of the image The number of interest points Reducing the dimensions of local features Use discriminative selecting features that are most
related to the query we are interested in.
Future work
Expanding the range of queries Further domains might require the use of other
image features. Face recognition methods may provide a useful
similarity measure when a large portion of image results contain faces.
For queries where the results are an object category (eg “chair”), features typically used for content-based retrieval (color distributions) may be more fruitful.
The spanning trees illustrated in Figures 8 and 9 contain a great deal of information to be exploited. The edges may be usable in the same way the web link
structure is used to improve web page ranking.