learning to rank: new techniques and applications martin szummer microsoft research cambridge, uk
Post on 19-Dec-2015
220 views
TRANSCRIPT
![Page 1: Learning to Rank: New Techniques and Applications Martin Szummer Microsoft Research Cambridge, UK](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d2f5503460f94a06ff7/html5/thumbnails/1.jpg)
Learning to Rank: New Techniques and Applications
Martin SzummerMicrosoft Research
Cambridge, UK
![Page 2: Learning to Rank: New Techniques and Applications Martin Szummer Microsoft Research Cambridge, UK](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d2f5503460f94a06ff7/html5/thumbnails/2.jpg)
Martin Szummer2 Microsoft Research
Why learning to rank?
• Current rankers use many features, in complex combinations
• Applications– Web search ranking, enterprise search – Image search– Ad selection– Merging multiple results lists
• The good: uses training data to find combinations of features that optimize IR metrics
• The bad: requires judged training data. Expensive, subjective, not provided by end-users, out-of-date
![Page 3: Learning to Rank: New Techniques and Applications Martin Szummer Microsoft Research Cambridge, UK](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d2f5503460f94a06ff7/html5/thumbnails/3.jpg)
Martin Szummer3 Microsoft Research
This talk
• Learning to rank with IR metrics A single, simple yet competition-winning, recipe. Works for NDCG, MAP, Precision with linear or non-linear ranking functions (neural nets, boosted trees etc)
• Semi-supervised rankingA new technique. Reduce the amount of judged training data required.
• Learning to mergeApplication: merging results lists from multiple query reformulations
Actually – I apply the same recipe in three
different settings!
![Page 4: Learning to Rank: New Techniques and Applications Martin Szummer Microsoft Research Cambridge, UK](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d2f5503460f94a06ff7/html5/thumbnails/4.jpg)
Martin Szummer4 Microsoft Research
Ranking Background
• Classification: determine the class of an item i (operates on individual items)
• Ranking: determine the preference of item i versus j (operates on pairs of items)
• Ranking function:
Example: Linear function Ranking function induces a preference: when
score function query-docfeatures
parameters
![Page 5: Learning to Rank: New Techniques and Applications Martin Szummer Microsoft Research Cambridge, UK](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d2f5503460f94a06ff7/html5/thumbnails/5.jpg)
Martin Szummer5 Microsoft Research
From Ranking Function to the Ranking
• Applying the ranking function to define a ranking Sort {}
• Above: had a deterministic model of preference• Henceforth: a probabilistic model
translates score differences into a probability of preference Bradley-Terry/Mallows
![Page 6: Learning to Rank: New Techniques and Applications Martin Szummer Microsoft Research Cambridge, UK](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d2f5503460f94a06ff7/html5/thumbnails/6.jpg)
Martin Szummer6 Microsoft Research
Learning to Rank
• Learning to rank Sort {}
• Maximize likelihood of the preference pairs given in training data
indicator when in train
e.g. RankNet model [Burges et al 2005]
given givendetermine w
preferencepairs
![Page 7: Learning to Rank: New Techniques and Applications Martin Szummer Microsoft Research Cambridge, UK](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d2f5503460f94a06ff7/html5/thumbnails/7.jpg)
Martin Szummer7 Microsoft Research
Learning to Rank for IR metrics
• IR metrics such as NDCG, MAP or Precision depend on:– sorted order of items– ranks of items: weight the top of the ranking more
Recipe1) Express the metric as a sum of pairwise swap deltas2) Smooth it by multiplying by a Bradley-Terry term3) Optimize parameters by gradient descent over a judged
training set
LambdaRank & LambdaMART [Burges et al] are instances of this recipe. The latter won the Yahoo! Learning to rank challenge (2010).
![Page 8: Learning to Rank: New Techniques and Applications Martin Szummer Microsoft Research Cambridge, UK](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d2f5503460f94a06ff7/html5/thumbnails/8.jpg)
Martin Szummer8 Microsoft Research
Example: Apply recipe to NDCG metric
Unpublished material. Email me if interested.
![Page 9: Learning to Rank: New Techniques and Applications Martin Szummer Microsoft Research Cambridge, UK](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d2f5503460f94a06ff7/html5/thumbnails/9.jpg)
Martin Szummer9 Microsoft Research
Gradients - intuition
• Gradients act as forces on doc pairs
x Lr12345
𝑑𝐶𝑑𝑠𝑖 𝑗
𝑑𝐶𝑑𝑠𝑖 𝑗
![Page 10: Learning to Rank: New Techniques and Applications Martin Szummer Microsoft Research Cambridge, UK](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d2f5503460f94a06ff7/html5/thumbnails/10.jpg)
Martin Szummer10 Microsoft Research
Semi-supervised Ranking
prefer
[with Emine Yilmaz]
Train with judged AND unjudged query-document pairs
![Page 11: Learning to Rank: New Techniques and Applications Martin Szummer Microsoft Research Cambridge, UK](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d2f5503460f94a06ff7/html5/thumbnails/11.jpg)
Martin Szummer11 Microsoft Research
Semi-supervised Ranking
• Applications– (Pseudo) Relevance feedback– Reduce the number of (expensive) human judgments– Use when judgments are hard to obtain
• Customers may not want to judge their collections
– adaptation to a specific company in enterprise search– ranking for small markets, special interest domains,
• Approach– preference learning– end-to-end optimization of ranking metrics (NDCG, MAP)– multiple and completely unlabeled rank instances– scalability
![Page 12: Learning to Rank: New Techniques and Applications Martin Szummer Microsoft Research Cambridge, UK](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d2f5503460f94a06ff7/html5/thumbnails/12.jpg)
Martin Szummer12 Microsoft Research
How to benefit from unlabeled data?
Unlabeled data gives information about the data distribution P(x). We must make assumptions about what the structure of the unlabeled data tells us about the ranking distribution P(R|x).
A common assumption: the cluster assumption Unlabeled data defines the extent of clusters, Labeled data determines the class/function value of each cluster
![Page 13: Learning to Rank: New Techniques and Applications Martin Szummer Microsoft Research Cambridge, UK](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d2f5503460f94a06ff7/html5/thumbnails/13.jpg)
Martin Szummer13 Microsoft Research
Semi-supervised classification: similar documents Þ same class regression: similar documents Þ similar function value ranking: similar documents Þ similar preference i.e. neither is preferred to the other
• Differences from classification & regression:– Preferences provide weaker constraints than function values or classes
is a type of regularizer on the function we are learning.
Similarity can be defined based on content. Does not require judgments.
![Page 14: Learning to Rank: New Techniques and Applications Martin Szummer Microsoft Research Cambridge, UK](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d2f5503460f94a06ff7/html5/thumbnails/14.jpg)
Martin Szummer14 Microsoft Research
Quantify Similarity
similar documents Þ similar preference i.e. neither is preferred to the other
Unpublished material. Email me if interested.
![Page 15: Learning to Rank: New Techniques and Applications Martin Szummer Microsoft Research Cambridge, UK](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d2f5503460f94a06ff7/html5/thumbnails/15.jpg)
Martin Szummer15 Microsoft Research
Semi-supervised Gradients
x L 𝑑𝐶𝐿
𝑑 𝑠𝑖 𝑗
𝑑𝐶𝑈
𝑑 𝑠𝑖 𝑗
𝑑𝐶𝐿
𝑑 𝑠𝑖 𝑗+𝛽 𝑑𝐶
𝑈
𝑑 𝑠𝑖𝑗
![Page 16: Learning to Rank: New Techniques and Applications Martin Szummer Microsoft Research Cambridge, UK](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d2f5503460f94a06ff7/html5/thumbnails/16.jpg)
Martin Szummer16 Microsoft Research
ExperimentsRelevance Feedback task: 1) user issues a query and labels a few of the resulting documents from a traditional ranker (BM25) 2) system trains query-specific ranker, and re-ranks
Data: TREC collection. 528,000 documents, 150 queries1000 total documents per query; 2-15 docs are labeled
Features: ranking features (q, d): 22 features from LETOR content features (d1, d2): TF-IDF dist between top 50 words
Neighbors in input space using either of the above Note: at test time, only ranking features are used; method allows using features of type (d1, d2) and (q, d1, d2) at training that other algos cannot use
Ranking function f(): neural network, 3 hidden unitsK=5 neighbors
![Page 17: Learning to Rank: New Techniques and Applications Martin Szummer Microsoft Research Cambridge, UK](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d2f5503460f94a06ff7/html5/thumbnails/17.jpg)
Martin Szummer17 Microsoft Research
Relevance Feedback Task
2 3 5 10 150.1
0.2
0.3
0.4
0.5
0.6
Number of labeled documents
ND
CG
(10)
LambdaRank L&U ContLambdaRank L&ULambdaRank LTSVM L&U
RankBoost L&URankingSVM L
RankBoost L
![Page 18: Learning to Rank: New Techniques and Applications Martin Szummer Microsoft Research Cambridge, UK](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d2f5503460f94a06ff7/html5/thumbnails/18.jpg)
Martin Szummer18 Microsoft Research
Novel Queries Task
90,000 training documents3500 preference pairs
40 million unlabeled pairs
![Page 19: Learning to Rank: New Techniques and Applications Martin Szummer Microsoft Research Cambridge, UK](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d2f5503460f94a06ff7/html5/thumbnails/19.jpg)
Martin Szummer19 Microsoft Research
Novel Queries Task
102
103
0.1
0.2
0.3
0.4
0.5
Number of labeled preference pairs
ND
CG
(10)
LambdaRank L&U ContLambdaRank L&ULambdaRank L
Upper Bound
![Page 20: Learning to Rank: New Techniques and Applications Martin Szummer Microsoft Research Cambridge, UK](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d2f5503460f94a06ff7/html5/thumbnails/20.jpg)
Martin Szummer20 Microsoft Research
Learning to MergeTask: learn a ranker that merges results from other rankers
Example applicationusers do not know the best way to express their web search querya single query may not be enough to reach all relevant documents
merge results
wp7
wp7 phonereformulatein parallel: microsoft wp7
user:Solution
![Page 21: Learning to Rank: New Techniques and Applications Martin Szummer Microsoft Research Cambridge, UK](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d2f5503460f94a06ff7/html5/thumbnails/21.jpg)
Martin Szummer21 Microsoft Research
Merging Multiple Queries [with Sheldon, Shokouhi, Craswell]
• Traditional approach: alter before retrieval• Merging: alter after retrieval
– Prospecting: see results first, then decide– Flexibility: any is rewrite allowed, arbitrary
features– Upside potential: better than any individual list– Increased query load on engine: use cache to
mitigate it
![Page 22: Learning to Rank: New Techniques and Applications Martin Szummer Microsoft Research Cambridge, UK](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d2f5503460f94a06ff7/html5/thumbnails/22.jpg)
Martin Szummer22 Microsoft Research
LambdaMerge: learn to merge
A weighted mixture of ranking function
Rewrite features: Rewrite-difficulty: ListMean, ListStd, Clarity Rewrite-drift: IsRewrite, RewriteRank, RewriteScore,Overlap@N
Scoring features: Dynamic rank score, BM25, Rank, IsTopN
rewrite feat
score feat score feat
jupiters mass mass of jupiter
![Page 23: Learning to Rank: New Techniques and Applications Martin Szummer Microsoft Research Cambridge, UK](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d2f5503460f94a06ff7/html5/thumbnails/23.jpg)
Martin Szummer23 Microsoft Research
![Page 24: Learning to Rank: New Techniques and Applications Martin Szummer Microsoft Research Cambridge, UK](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d2f5503460f94a06ff7/html5/thumbnails/24.jpg)
Martin Szummer24 Microsoft Research
![Page 25: Learning to Rank: New Techniques and Applications Martin Szummer Microsoft Research Cambridge, UK](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d2f5503460f94a06ff7/html5/thumbnails/25.jpg)
Martin Szummer25 Microsoft Research
Reformulation – Original NDCG
Mer
ged
– O
rigin
al N
DCG
-Merge Results
![Page 26: Learning to Rank: New Techniques and Applications Martin Szummer Microsoft Research Cambridge, UK](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d2f5503460f94a06ff7/html5/thumbnails/26.jpg)
Martin Szummer26 Microsoft Research
Summary
• Learning to Rank– An indispensable tool– Requires judgments: but semi-supervised learning can help
crowd-sourcing is also a possibility research frontier: implicit judgments from clicks
– Many applications beyond those shown• Merging: multiple local search engines, multiple language engines• Rank recommendations in collaborative filtering• Many thresholding tasks (filtering) can be posed as ranking• Rank ads for relevance • Elections
– Use it!