pagerank for product image search kevin jing (googlc incgvu, college of computing, georgia institute...
TRANSCRIPT
PageRank for Product Image SearchKevin Jing (Googlc IncGVU, College of Computing, Georgia Institute of Technology)
Shumeet Baluja (Google Inc.)
WWW 2008
Shimin Chen
Big Data Reading Group
Motivation
• Important part of Commercial Search Engines
• Based on the text of the pages from the images are linked.
– Anchor Text– Quality of the anchor page– Etc.
Why?
• Text-based search is well studied.
• General object detection/recognition in images remains an open problem.
• Image processing is much more expensive than text processing
Discussions (by shimin)
Contribution
• Extending PageRank to image search
• Visual-hyperlinks estimated from local feature patches
• Most comprehensive experiment to date– Limited and Noisy real-world
images– Large number of user evaluations– 2000 queries
Limitations of prior works:
• Visual Category Recognition Filters (Fergus et al. ECCV 2004)
– Probabilistic Graphical Model with hidden layers» Susceptible to data noise» Large number of parameters to estimate» High dimensionality in feature space» Limited training data.
– Limited Experiment» 11 hand-selected, hand-labeled queries
(bottles, etc)
– Can not handle multiple visual-concept
– Computationally Expensive»
Our observation• Due to the high dimensionality of feature space, learning
feature correlations can be difficult with limited and noisy data
• Estimating image similarities is a slightly easier task.
• Visual Image Ranking != Object Category model– Modeling the relationship among images, instead of the
features
– As most users rarely look beyond the first page of results,
Outline
• VisualRank– Robust estimation of image similarities (Visual-Hyperlinks).– Random-walk on visual-hyperlinks to find “visual authority.”
• Experiments– 2000 product queries
– 150 user evaluation
– Click analysis
Idea
• Extract local features of an image
• Construct a graph with images as nodes, similarity as edge weights
• Use PageRank to generate the ranking
• Visual-hyperlinks
discussions
Visual-hyperlinksStep 1) Generate Visual-hyperlinks via robust image similarity estimation
Find similar patches (L2 distance)
Geometric Verification (Affine Transformation)
Interest point selection + descriptor representation
SIFT: 128 dimensional vectors
Similarity= (# similar patches)/ average # patches
Query Dependent Ranking
• Too expensive to construct a graph for all images
• Construct a graph for images returned from a (text-based) search
• In other words, the purpose is to better rank images returned from a text-based search
discussions
Visual-hyperlinks
Lincoln Memorial Top 5 Images with the highest weighted “neighbors.”
Visual-hyperlinks + PageRank
• Intuition
• Eigen-centrality
• Visual “authority”
• Random Surfer
• Principle Eigenvector of weighted similarity matrix
Outline
• VisualRank– Robust estimation of image similarities (Visual-Hyperlinks).– Random-walk on visual-hyperlinks to find “visual authority.”
• Experiments– 2000 product queries
– 150 user evaluation
– Click analysis
Experiment/Results
• Selection of queries– 2000 most popular product search queries
• Product items are popular set of queries• Well suited for the patch-based features we are studying.
• 153 user evaluation– Combined both results, and ask which images are irrelevant to the query?– User click analysis
• Back testing.• Lower bound on the improvement
• Alternative experiment method considered– Mark our own Groundtruth data– Ask user to rank results– Ask users to compare groups of results
Experiment/Results
1) 85% of the irrelevant images are removed.
2) 10% increase in user clicks on the top 20 results.
Click Study
• Idea: images clicked after a search are good• Given click stats for top 40 images of 130
common product queries• Examine: # of clicks of the first 20 images
• ImageRank: 17.5% more clicks than default ranking
More results