surrogate functions for maximizing precision at the top€¦ · mountains, museums, mythology,...

1
32nd International Conference on Machine Learning (ICML 2015) Full Paper: http://tinyurl.com/p3vjpg7 Surrogate Functions for Maximizing Precision at the Top Purushottam Kar # , Harikrishna Narasimhan * , and Prateek Jain # # Microsoft Research, Bengaluru, INDIA * Indian Institute of Science, Bengaluru, INDIA Scalable routines for provable maximization of precision at the top of ranked lists The Goal Drug Discovery Recommendation Systems Convex, Upper-bounding and Conditionally Consistent surrogates for prec@ Question I Scalable optimization of prec@ in large-scale and streaming data settings Question II Applications prec@ Error Function Non-convex, non-smooth performance measure Non-additive: direct application of SGD/Perceptron methods not possible Existing Work [Joachims 05] Struct-SVM surrogates: not satisfactory Cutting-plane solvers: unsuitable for large, streaming data Methodology Given objects Assign scores Predict top- scoring objects as relevant Learn models that predict good score vectors Learning on streaming data? What is a good Surrogate for prec@ ? Convexity (CV) Upper-bounding Property (UB) Tight under a Margin (TuM) For classes of score vectors satisfying an appropriate margin condition Struct-SVM Ramp (Our) Avg (Our) CV UB TuM A Notion of Margin for prec@ Classification Margin (Weak) prec@ Margin Surrogates for prec@ Penalize score vectors that don’t give relevant objects the highest scores PERCEPTRON@ Algorithm for Optimizing prec@ PERCEPTRON@ -MAX PERCEPTRON@ -AVG PERCEPTRON@ -MAX PERCEPTRON@ -AVG Experiments Relax the Ramp surrogate or else add corrections to the struct-SVM surrogate Lemma: The avg (ramp) surrogate is tight for any class of score vectors that contains a score vector realizing a unit (weak) prec@ margin. Lemma: If there exists a linear scoring function that realizes a prec@ margin of , then PERCEPTRON@ -AVG terminates in 4 2 steps Lemma: If PERCEPTRON@ -AVG is executed for steps and , Gradient descent-based approach GD@ based on surrogates Mini-batch versions of PERCEPTRON@ and GD@ Mistake/generalization bounds via OTB/UC Other Results: OTB and UC bounds for preck@ and its surrogates Document Tagging American Football, Architecture, Artificial Intelligence, Aviation, Billionaires, Bodies of water, Broadcasting, Card Games, Cardiology, Celebrity, Censorship in the arts, Comics, Continents, Countries, Crafts, Crime, Critical theory, Cultural anthropology, Dances by name, Deserts, Design, Drawing, Earth, Epidemiology, Family, Film, Film, Folklore, Food and drink, Food culture, Fishing, Geometry, Humor, Indian culture, Internet, Lacrosse, Lakes, Landforms, Languages, Literature, Magazines, Mountains, Museums, Mythology, Navigation, Newspapers, Nootropics, Oceanography, Opera, Oral hygiene, Performing arts in India, Philosophers, Plumbing, Pilates, Protected areas, Public Administration, Puppetry, Puzzles, Real Estate, Regions, Rivers, Rugby union, Skiing, Sleep, South India, Suffixes, Swimming, Tamil culture, Telecommunications, Tennis, Territories, Towns, Urology, Villages, Walking trails, Water Sports, World Dances by name Museums South India Performing arts in India Pilates Celebrity Tamil culture Censorship in the arts Puppetry Walking trails Regions Film Indian culture Public Administration Protected areas Architecture Rank objects in order of (perceived) relevance scores and retrieve top ranked objects Prec@ : # irrelevant objects in top- positions if sorted by scores

Upload: others

Post on 16-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Surrogate Functions for Maximizing Precision at the Top€¦ · Mountains, Museums, Mythology, Navigation, Newspapers, Nootropics, Oceanography, Opera, Oral hygiene, Performing arts

32nd International Conference on Machine Learning (ICML 2015) Full Paper: http://tinyurl.com/p3vjpg7

Surrogate Functions for Maximizing Precision at the TopPurushottam Kar#, Harikrishna Narasimhan*, and Prateek Jain#

#Microsoft Research, Bengaluru, INDIA*Indian Institute of Science, Bengaluru, INDIA

Scalable routines for provable maximization of precision at the top of ranked lists

The Goal

Drug Discovery Recommendation Systems

Convex, Upper-bounding and Conditionally Consistent surrogates for prec@𝑘

Question I

Scalable optimization of prec@𝑘 in large-scale and streaming data settings

Question II

Applications

prec@𝑘 ⋅ Error Function

• Non-convex, non-smooth performance measure• Non-additive: direct application of SGD/Perceptron methods not possible• Existing Work [Joachims 05]

Struct-SVM surrogates: not satisfactoryCutting-plane solvers: unsuitable for large, streaming data

Methodology

• Given 𝑛 objects• Assign scores• Predict top-𝑘 scoring objects as relevant• Learn models that predict good score vectors• Learning on streaming data?

What is a good Surrogate for prec@𝑘?

• Convexity (CV)

• Upper-bounding Property (UB)

• Tight under a Margin (TuM)For classes of score vectors satisfying an appropriate margin condition

Struct-SVM Ramp (Our) Avg (Our)

CV

UB

TuM

A Notion of Margin for prec@𝑘

Classification Margin (Weak) prec@𝑘 Margin

Surrogates for prec@𝑘

Penalize score vectors that don’t

give 𝑘 relevant objects the highest

scores

PERCEPTRON@𝑘 Algorithm for Optimizing prec@𝑘

PERCEPTRON@𝑘-MAX PERCEPTRON@𝑘-AVG

PERCEPTRON@𝑘-MAX

PERCEPTRON@𝑘-AVG

Experiments

Relax the Ramp surrogate or else

add corrections to the struct-SVM

surrogate

Lemma: The avg (ramp) surrogate is tight for any class of score vectors thatcontains a score vector realizing a unit (weak) prec@𝑘 margin.

Lemma: If there exists a linear scoring function that realizes a prec@𝑘 margin of 𝛾,then PERCEPTRON@𝑘-AVG terminates in 4𝑘 𝛾2 steps

Lemma: If PERCEPTRON@𝑘-AVG is executed for 𝑇 steps and ,

• Gradient descent-based approach GD@𝑘 based on surrogates• Mini-batch versions of PERCEPTRON@𝑘 and GD@𝑘• Mistake/generalization bounds via OTB/UC

Other Results: OTB and UC bounds for preck@𝑘 and its surrogates

Document Tagging

American Football, Architecture, Artificial Intelligence, Aviation,Billionaires, Bodies of water, Broadcasting, Card Games, Cardiology,Celebrity, Censorship in the arts, Comics, Continents, Countries, Crafts,Crime, Critical theory, Cultural anthropology, Dances by name, Deserts,Design, Drawing, Earth, Epidemiology, Family, Film, Film, Folklore, Foodand drink, Food culture, Fishing, Geometry, Humor, Indian culture,Internet, Lacrosse, Lakes, Landforms, Languages, Literature, Magazines,Mountains, Museums, Mythology, Navigation, Newspapers, Nootropics,Oceanography, Opera, Oral hygiene, Performing arts in India, Philosophers,Plumbing, Pilates, Protected areas, Public Administration, Puppetry,Puzzles, Real Estate, Regions, Rivers, Rugby union, Skiing, Sleep, SouthIndia, Suffixes, Swimming, Tamil culture, Telecommunications, Tennis,Territories, Towns, Urology, Villages, Walking trails, Water Sports, World

Dances by nameMuseums

South IndiaPerforming arts in India

PilatesCelebrity

Tamil cultureCensorship in the arts

PuppetryWalking trails

RegionsFilm

Indian culturePublic Administration

Protected areasArchitecture

Rank objects in order of (perceived) relevance scores and retrieve top ranked objects

Prec@𝑘 𝑠 : # irrelevant objects in top-𝑘 positions if sorted by scores 𝑠