surrogate functions for maximizing precision at the top€¦ · mountains, museums, mythology,...
TRANSCRIPT
32nd International Conference on Machine Learning (ICML 2015) Full Paper: http://tinyurl.com/p3vjpg7
Surrogate Functions for Maximizing Precision at the TopPurushottam Kar#, Harikrishna Narasimhan*, and Prateek Jain#
#Microsoft Research, Bengaluru, INDIA*Indian Institute of Science, Bengaluru, INDIA
Scalable routines for provable maximization of precision at the top of ranked lists
The Goal
Drug Discovery Recommendation Systems
Convex, Upper-bounding and Conditionally Consistent surrogates for prec@𝑘
Question I
Scalable optimization of prec@𝑘 in large-scale and streaming data settings
Question II
Applications
prec@𝑘 ⋅ Error Function
• Non-convex, non-smooth performance measure• Non-additive: direct application of SGD/Perceptron methods not possible• Existing Work [Joachims 05]
Struct-SVM surrogates: not satisfactoryCutting-plane solvers: unsuitable for large, streaming data
Methodology
• Given 𝑛 objects• Assign scores• Predict top-𝑘 scoring objects as relevant• Learn models that predict good score vectors• Learning on streaming data?
What is a good Surrogate for prec@𝑘?
• Convexity (CV)
• Upper-bounding Property (UB)
• Tight under a Margin (TuM)For classes of score vectors satisfying an appropriate margin condition
Struct-SVM Ramp (Our) Avg (Our)
CV
UB
TuM
A Notion of Margin for prec@𝑘
Classification Margin (Weak) prec@𝑘 Margin
Surrogates for prec@𝑘
Penalize score vectors that don’t
give 𝑘 relevant objects the highest
scores
PERCEPTRON@𝑘 Algorithm for Optimizing prec@𝑘
PERCEPTRON@𝑘-MAX PERCEPTRON@𝑘-AVG
PERCEPTRON@𝑘-MAX
PERCEPTRON@𝑘-AVG
Experiments
Relax the Ramp surrogate or else
add corrections to the struct-SVM
surrogate
Lemma: The avg (ramp) surrogate is tight for any class of score vectors thatcontains a score vector realizing a unit (weak) prec@𝑘 margin.
Lemma: If there exists a linear scoring function that realizes a prec@𝑘 margin of 𝛾,then PERCEPTRON@𝑘-AVG terminates in 4𝑘 𝛾2 steps
Lemma: If PERCEPTRON@𝑘-AVG is executed for 𝑇 steps and ,
• Gradient descent-based approach GD@𝑘 based on surrogates• Mini-batch versions of PERCEPTRON@𝑘 and GD@𝑘• Mistake/generalization bounds via OTB/UC
Other Results: OTB and UC bounds for preck@𝑘 and its surrogates
Document Tagging
American Football, Architecture, Artificial Intelligence, Aviation,Billionaires, Bodies of water, Broadcasting, Card Games, Cardiology,Celebrity, Censorship in the arts, Comics, Continents, Countries, Crafts,Crime, Critical theory, Cultural anthropology, Dances by name, Deserts,Design, Drawing, Earth, Epidemiology, Family, Film, Film, Folklore, Foodand drink, Food culture, Fishing, Geometry, Humor, Indian culture,Internet, Lacrosse, Lakes, Landforms, Languages, Literature, Magazines,Mountains, Museums, Mythology, Navigation, Newspapers, Nootropics,Oceanography, Opera, Oral hygiene, Performing arts in India, Philosophers,Plumbing, Pilates, Protected areas, Public Administration, Puppetry,Puzzles, Real Estate, Regions, Rivers, Rugby union, Skiing, Sleep, SouthIndia, Suffixes, Swimming, Tamil culture, Telecommunications, Tennis,Territories, Towns, Urology, Villages, Walking trails, Water Sports, World
Dances by nameMuseums
South IndiaPerforming arts in India
PilatesCelebrity
Tamil cultureCensorship in the arts
PuppetryWalking trails
RegionsFilm
Indian culturePublic Administration
Protected areasArchitecture
Rank objects in order of (perceived) relevance scores and retrieve top ranked objects
Prec@𝑘 𝑠 : # irrelevant objects in top-𝑘 positions if sorted by scores 𝑠