pagerank for product image search kevin jing (googlc incgvu, college of computing, georgia institute...

27
PageRank for Product Image Search Kevin Jing (Googlc IncGVU, College of Computing, Georgia Institute of Technology) Shumeet Baluja (Google Inc.) WWW 2008 Shimin Chen Big Data Reading Group

Upload: janis-carpenter

Post on 29-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

PageRank for Product Image SearchKevin Jing (Googlc IncGVU, College of Computing, Georgia Institute of Technology)

Shumeet Baluja (Google Inc.)

WWW 2008

Shimin Chen

Big Data Reading Group

Motivation

• Important part of Commercial Search Engines

• Based on the text of the pages from the images are linked.

– Anchor Text– Quality of the anchor page– Etc.

Why?

• Text-based search is well studied.

• General object detection/recognition in images remains an open problem.

• Image processing is much more expensive than text processing

Discussions (by shimin)

Search Result (Eiffel Tower)

Search Result (d80)

Search Result (McDonalds)

Search Result (coca cola)

Image Search (Integrated Results)

Search quality is more important

Contribution

• Extending PageRank to image search

• Visual-hyperlinks estimated from local feature patches

• Most comprehensive experiment to date– Limited and Noisy real-world

images– Large number of user evaluations– 2000 queries

Limitations of prior works:

• Visual Category Recognition Filters (Fergus et al. ECCV 2004)

– Probabilistic Graphical Model with hidden layers» Susceptible to data noise» Large number of parameters to estimate» High dimensionality in feature space» Limited training data.

– Limited Experiment» 11 hand-selected, hand-labeled queries

(bottles, etc)

– Can not handle multiple visual-concept

– Computationally Expensive»

Our observation• Due to the high dimensionality of feature space, learning

feature correlations can be difficult with limited and noisy data

• Estimating image similarities is a slightly easier task.

• Visual Image Ranking != Object Category model– Modeling the relationship among images, instead of the

features

– As most users rarely look beyond the first page of results,

Outline

• VisualRank– Robust estimation of image similarities (Visual-Hyperlinks).– Random-walk on visual-hyperlinks to find “visual authority.”

• Experiments– 2000 product queries

– 150 user evaluation

– Click analysis

Idea

• Extract local features of an image

• Construct a graph with images as nodes, similarity as edge weights

• Use PageRank to generate the ranking

• Visual-hyperlinks

discussions

Visual-hyperlinksStep 1) Generate Visual-hyperlinks via robust image similarity estimation

Find similar patches (L2 distance)

Geometric Verification (Affine Transformation)

Interest point selection + descriptor representation

SIFT: 128 dimensional vectors

Similarity= (# similar patches)/ average # patches

Step 1) Generate Visual-hyperlinks via robust image similarity estimation

Visual-hyperlinks

Query Dependent Ranking

• Too expensive to construct a graph for all images

• Construct a graph for images returned from a (text-based) search

• In other words, the purpose is to better rank images returned from a text-based search

discussions

Visual-hyperlinksGenerated from the top 1000 results of “mona-lisa”

SPAM!

Visual-hyperlinks + PageRank

PageRank Without PageRank

Outline

• VisualRank– Robust estimation of image similarities (Visual-Hyperlinks).– Random-walk on visual-hyperlinks to find “visual authority.”

• Experiments– 2000 product queries

– 150 user evaluation

– Click analysis

Experiment/Results

• Selection of queries– 2000 most popular product search queries

• Product items are popular set of queries• Well suited for the patch-based features we are studying.

• 153 user evaluation– Combined both results, and ask which images are irrelevant to the query?– User click analysis

• Back testing.• Lower bound on the improvement

• Alternative experiment method considered– Mark our own Groundtruth data– Ask user to rank results– Ask users to compare groups of results

Experiment/Results

wii

picasso

Microsoft zune

ipod

Experiment/Results

1) 85% of the irrelevant images are removed.

2) 10% increase in user clicks on the top 20 results.

Mistakes

Dell

Playstation

USB keychain

Click Study

• Idea: images clicked after a search are good• Given click stats for top 40 images of 130

common product queries• Examine: # of clicks of the first 20 images

• ImageRank: 17.5% more clicks than default ranking

More results

Conclusion/Future Work

• Conclusion– Robust visual-hyperlinks + graph algorithms are

pragmatic choice for web images

• Future work– How to make local feature matching efficient– Incorporate more features into the construction of

visual-hyperlink.– Incorporate Google Initial ranking into PageRank