learning to ranksifaka.cs.uiuc.edu/~wang296/course/ir_fall/docs/pdfs/learning2ran… · lambdamart:...
TRANSCRIPT
![Page 1: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/1.jpg)
Learning to Rankfrom heuristics to theoretic approaches
Hongning Wang
![Page 2: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/2.jpg)
Congratulations
• Job Offer
– Design the ranking module for Bing.com
CS 6501: Information Retrieval 2CS@UVa
![Page 3: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/3.jpg)
How should I rank documents?
Answer: Rank by relevance!
CS 6501: Information Retrieval 3CS@UVa
![Page 4: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/4.jpg)
Relevance ?!
CS 6501: Information Retrieval 4CS@UVa
![Page 5: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/5.jpg)
The Notion of Relevance
Relevance
(Rep(q), Rep(d))
Similarity
P(r=1|q,d) r {0,1}Probability of Relevance
P(d q) or P(q d)
Probabilistic inference
Different
rep & similarity
Vector space
model
(Salton et al., 75)
Prob. distr.
model
(Wong & Yao, 89)
…
Generative
Model
Regression
Model (Fuhr 89)
Classical
prob. Model
(Robertson &
Sparck Jones, 76)
Doc
generation
Query
generation
LM
approach
(Ponte & Croft, 98)
(Lafferty & Zhai, 01a)
Prob. concept
space model
(Wong & Yao, 95)
Different
inference system
Inference
network
model
(Turtle & Croft, 91)
Div. from Randomness
(Amati & Rijsbergen 02)
Learn. To Rank
(Joachims 02,
Berges et al. 05)
Relevance constraints[Fang et al. 04]
Lecture notes from CS598CXZCS 6501: Information Retrieval 5CS@UVa
![Page 6: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/6.jpg)
Relevance Estimation
• Query matching
– Language model
– BM25
– Vector space cosine similarity
• Document importance
– PageRank
– HITS
CS 6501: Information Retrieval 6CS@UVa
![Page 7: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/7.jpg)
Did I do a good job of ranking documents?
• IR evaluations metrics
– Precision
– MAP
– NDCG
PageRankBM25
CS 6501: Information Retrieval 7CS@UVa
![Page 8: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/8.jpg)
Take advantage of different relevance estimator?
• Ensemble the cues
– Linear?
•
– Non-linear?
• Decision tree-like
𝑎1 × 𝐵𝑀25 + 𝛼2 × 𝐿𝑀 + 𝛼3 × 𝑃𝑎𝑔𝑒𝑅𝑎𝑛𝑘 + 𝛼4 × 𝐻𝐼𝑇𝑆
BM25>0.5
LM > 0.1PageRank
> 0.3
True
FalseTrue
r= 1.0 r= 0.7
FalseTrue
r= 0.4 r= 0.1
False
{𝛼1= 0.2, 𝛼2 = 0.2, 𝛼3 = 0.3, 𝛼4 = 0.1} → {𝑀𝐴𝑃 = 0.12, 𝑁𝐷𝐶𝐺 = 0.5 }
{𝛼1= 0.1, 𝛼2 = 0.1, 𝛼3 = 0.5, 𝛼4 = 0.2} → {𝑀𝐴𝑃 = 0.18, 𝑁𝐷𝐶𝐺 = 0.7}
{𝛼1= 0.4, 𝛼2 = 0.2, 𝛼3 = 0.1, 𝛼4 = 0.1} → {𝑀𝐴𝑃 = 0.20, 𝑁𝐷𝐶𝐺 = 0.6}
CS 6501: Information Retrieval 8CS@UVa
![Page 9: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/9.jpg)
What if we have thousands of features?
• Is there any way I can do better?
– Optimizing the metrics automatically!
How to determine those 𝛼s? Where to find those tree structures?
CS 6501: Information Retrieval 9CS@UVa
![Page 10: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/10.jpg)
Rethink the task
• Given: (query, document) pairs represented by a set of relevance estimators, a.k.a., features
• Needed: a way of combining the estimators
– 𝑓 𝑞, 𝑑 𝑖=1𝐷 → ordered 𝑑 𝑖=1
𝐷
• Criterion: optimize IR metrics
– P@k, MAP, NDCG, etc.
DocID BM25 LM PageRank Label
0001 1.6 1.1 0.9 0
0002 2.7 1.9 0.2 1
Key!
CS 6501: Information Retrieval 10CS@UVa
![Page 11: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/11.jpg)
Machine Learning
• Input: { 𝑋1, 𝑌1 , 𝑋2, 𝑌2 , … , (𝑋𝑛, 𝑌𝑛)}, where 𝑋𝑖 ∈ 𝑅
𝑁, 𝑌𝑖 ∈ 𝑅𝑀
• Object function : 𝑂(𝑌′, 𝑌)
• Output: 𝑓 𝑋 → 𝑌, such that 𝑓 =argmax𝑓′⊂𝐹𝑂(𝑓
′(𝑋), 𝑌)
𝑂 𝑌′, 𝑌 = 𝛿(𝑌′ = 𝑌)
Classification
http://en.wikipedia.org/wiki/Statistical_classification
Regression
𝑂 𝑌′, 𝑌 = −||𝑌′ − 𝑌||
http://en.wikipedia.org/wiki/Regression_analysis
M
NOTE: We will only talk about supervised learning.
CS 6501: Information Retrieval 11CS@UVa
![Page 12: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/12.jpg)
Learning to Rank
• General solution in optimization framework
– Input: (𝑞𝑖 , 𝑑1), 𝑦1 , (𝑞𝑖 , 𝑑2), 𝑦2 , … , (𝑞𝑖 , 𝑑𝑛), 𝑦𝑛 ,
where dn ∈ 𝑅𝑁, 𝑦𝑖 ∈ {0,… , 𝐿}
– Object: O = {P@k, MAP, NDCG}
– Output: 𝑓 𝑞, 𝑑 → 𝑌, s.t., 𝑓 =argmax𝑓′⊂𝐹𝑂(𝑓
′(𝑞, 𝑑), 𝑌)
DocID BM25 LM PageRank Label
0001 1.6 1.1 0.9 0
0002 2.7 1.9 0.2 1
CS 6501: Information Retrieval 12CS@UVa
![Page 13: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/13.jpg)
Challenge: how to optimize?
• Evaluation metric recap
– Average Precision
•
– DCG
•
• Order is essential!
– 𝑓 → order→metric
Not continuous with respect to f(X)!
CS 6501: Information Retrieval 13CS@UVa
![Page 14: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/14.jpg)
Approximating the Objects!
• Pointwise
– Fit the relevance labels individually
• Pairwise
– Fit the relative orders
• Listwise
– Fit the whole order
CS 6501: Information Retrieval 14CS@UVa
![Page 15: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/15.jpg)
Pointwise Learning to Rank
• Ideally perfect relevance prediction leads to perfect ranking– 𝑓 → score→ order →metric
• Reducing ranking problem to– Regression
• 𝑂 𝑓 𝑄,𝐷 , 𝑌 = − 𝑖 |𝑓 𝑞𝑖 , 𝑑𝑖 − 𝑦𝑖 |
• Subset Ranking using Regression, D.Cossock and T.Zhang, COLT 2006
– (multi-)Classification
• 𝑂 𝑓 𝑄,𝐷 , 𝑌 = 𝑖 𝛿(𝑓 𝑞𝑖 , 𝑑𝑖 = 𝑦𝑖)
• Ranking with Large Margin Principles, A. Shashua and A. Levin, NIPS 2002
CS 6501: Information Retrieval 15CS@UVa
![Page 16: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/16.jpg)
Subset Ranking using RegressionD.Cossock and T.Zhang, COLT 2006
• Fit relevance labels via regression
–
– Emphasize more on relevant documents
•
http://en.wikipedia.org/wiki/Regression_analysis
Weights on each document Most positive document
CS 6501: Information Retrieval 16CS@UVa
![Page 17: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/17.jpg)
Ranking with Large Margin PrinciplesA. Shashua and A. Levin, NIPS 2002
• Goal: correctly placing the documents in the corresponding category and maximize the margin
Y=0 Y=2
Y=1
𝛼𝑠
Reduce the violations
CS 6501: Information Retrieval 17CS@UVa
![Page 18: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/18.jpg)
• Maximizing the sum of margins
Ranking with Large Margin Principles A. Shashua and A. Levin, NIPS 2002
Y=0
Y=2
Y=1
𝛼𝑠 CS 6501: Information Retrieval 18CS@UVa
![Page 19: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/19.jpg)
Ranking with Large Margin Principles A. Shashua and A. Levin, NIPS 2002
• Ranking lost is consistently decreasing with more training data
CS 6501: Information Retrieval 19CS@UVa
![Page 20: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/20.jpg)
What did we learn
• Machine learning helps!
– Derive something optimizable
– More efficient and guided
CS 6501: Information Retrieval 20CS@UVa
![Page 21: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/21.jpg)
There is always a catch
• Cannot directly optimize IR metrics
– (0→ 1, 2→ 0) worse than (0->-2, 2->4)
• Position of documents are ignored
– Penalty on documents at higher positions should be larger
• Favor the queries with more documents
CS 6501: Information Retrieval 21CS@UVa
![Page 22: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/22.jpg)
Pairwise Learning to Rank
• Ideally perfect partial order leads to perfect ranking
– 𝑓 → partial order→ order →metric
• Ordinal regression– 𝑂 𝑓 𝑄,𝐷 , 𝑌 = 𝑖≠𝑗 𝛿(𝑦𝑖 > 𝑦𝑗)𝛿(𝑓 𝑞𝑖 , 𝑑𝑖 > 𝑓(𝑞𝑖 , 𝑑𝑖))
• Relative ordering between different documents is significant
• E.g., (0->-2, 2->4) is better than (0→ 1, 2→ 0)
– Large body of work
CS 6501: Information Retrieval 22CS@UVa
![Page 23: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/23.jpg)
Optimizing Search Engines using Clickthrough DataThorsten Joachims, KDD’02
• Minimizing the number of mis-ordered pairs
𝑦1 > 𝑦2, 𝑦2 > 𝑦3 , 𝑦1 > 𝑦41
0
linear combination of features
𝑓 𝑞, 𝑑 = 𝑤𝑇𝑋𝑞,𝑑
RankingSVM
𝑦1 > 𝑦2
Keep the relative orders
CS 6501: Information Retrieval 23CS@UVa
![Page 24: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/24.jpg)
Optimizing Search Engines using Clickthrough DataThorsten Joachims, KDD’02
• How to use it?
– 𝑓 → s𝐜𝐨𝐫𝐞 → order
𝑦1 > 𝑦2 > 𝑦3 > 𝑦4
CS 6501: Information Retrieval 24CS@UVa
![Page 25: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/25.jpg)
Optimizing Search Engines using Clickthrough DataThorsten Joachims, KDD’02
• What did it learn from the data?
– Linear correlations
Positive correlated features
Negative correlated features CS 6501: Information Retrieval 25CS@UVa
![Page 26: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/26.jpg)
• How good is it?
– Test on real system
Optimizing Search Engines using Clickthrough DataThorsten Joachims, KDD’02
CS 6501: Information Retrieval 26CS@UVa
![Page 27: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/27.jpg)
An Efficient Boosting Algorithm for Combining Preferences
Y. Freund, R. Iyer, et al. JMLR 2003
• Smooth the loss on mis-ordered pairs
– 𝑦𝑖>𝑦𝑗 𝑃𝑟 𝑑𝑖 , 𝑑𝑗 𝑒𝑥𝑝[𝑓 𝑞, 𝑑𝑗 − 𝑓 𝑞, 𝑑𝑖 ]
exponential loss
hinge loss (RankingSVM)
0/1 loss
square loss
from Pattern Recognition and Machine Learning, P337
CS 6501: Information Retrieval 27CS@UVa
![Page 28: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/28.jpg)
• RankBoost: optimize via boosting
– Vote by a committee
An Efficient Boosting Algorithm for Combining Preferences
Y. Freund, R. Iyer, et al. JMLR 2003
from Pattern Recognition and Machine Learning, P658
Updating Pr(𝑑𝑖 , 𝑑𝑗)
Credibility of each committee member (ranking feature)
BM25 PageRank Cosine
CS 6501: Information Retrieval 28CS@UVa
![Page 29: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/29.jpg)
• How good is it?
An Efficient Boosting Algorithm for Combining Preferences
Y. Freund, R. Iyer, et al. JMLR 2003
CS 6501: Information Retrieval 29CS@UVa
![Page 30: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/30.jpg)
A Regression Framework for Learning Ranking Functions Using Relative Relevance Judgments
Zheng et al. SIRIG’07
• Non-linear ensemble of features
– Object: 𝑦𝑖>𝑦𝑗 max 0, 𝑓 𝑞, 𝑑𝑗 − 𝑓 𝑞, 𝑑𝑖2
– Gradient descent boosting tree
• Boosting tree– Using regression tree to minimize the residuals
– 𝑟𝑡(𝑞, 𝑑, 𝑦) = 𝑂𝑡(𝑞, 𝑑, 𝑦) − 𝑓 𝑡−1 (𝑞, 𝑑, 𝑦)
r= 0.4 r= 0.1
BM25>0.5
LM > 0.1PageRank
> 0.3
True
FalseTrue
r= 1.0 r= 0.7
FalseTrue
False
CS 6501: Information Retrieval 30CS@UVa
![Page 31: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/31.jpg)
A Regression Framework for Learning Ranking Functions Using Relative Relevance Judgments
Zheng et al. SIRIG’07
• Non-linear v.s. linear
– Comparing with RankingSVM
CS 6501: Information Retrieval 31CS@UVa
![Page 32: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/32.jpg)
Where do we get the relative orders
• Human annotations
– Small scale, expensive to acquire
• Clickthroughs
– Large amount, easy to acquire
CS 6501: Information Retrieval 32CS@UVa
![Page 33: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/33.jpg)
Accurately Interpreting Clickthrough Data as Implicit Feedback
Thorsten Joachims, et al., SIGIR’05
• Position biasYour click is not because of its relevance, but its position?!
CS 6501: Information Retrieval 33CS@UVa
![Page 34: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/34.jpg)
Accurately Interpreting Clickthrough Data as Implicit Feedback
Thorsten Joachims, et al., SIGIR’05
• Controlled experiment
– Over trust of the top ranked positions
CS 6501: Information Retrieval 34CS@UVa
![Page 35: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/35.jpg)
• Pairwise preference matters
– Click: examined and clicked document
– Skip: examined but non-clicked document
Accurately Interpreting Clickthrough Data as Implicit Feedback
Thorsten Joachims, et al., SIGIR’05
Click > SkipCS 6501: Information Retrieval 35CS@UVa
![Page 36: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/36.jpg)
What did we learn
• Predicting relative order
– Getting closer to the nature of ranking
• Promising performance in practice
– Pairwise preferences from click-throughs
CS 6501: Information Retrieval 36CS@UVa
![Page 37: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/37.jpg)
Listwise Learning to Rank
• Can we directly optimize the ranking?
– 𝑓 → order→metric
• Tackle the challenge
– Optimization without gradient
CS 6501: Information Retrieval 37CS@UVa
![Page 38: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/38.jpg)
From RankNet to LambdaRank to LambdaMART: An Overview
Christopher J.C. Burges, 2010
• Minimizing mis-ordered pair => maximizing IR metrics?
Mis-ordered pairs: 6 Mis-ordered pairs: 4
AP: 5
8AP:
5
12
DCG: 1.333 DCG: 0.931
Position is crucial!
CS 6501: Information Retrieval 38CS@UVa
![Page 39: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/39.jpg)
From RankNet to LambdaRank to LambdaMART: An Overview
Christopher J.C. Burges, 2010
• Weight the mis-ordered pairs?
– Some pairs are more important to be placed in the right order
– Inject into object function
• 𝑦𝑖>𝑦𝑗Ω 𝑑𝑖 , 𝑑𝑗 exp −𝑤ΔΦ 𝑞𝑛, 𝑑𝑖 , 𝑑𝑗
– Inject into gradient
• 𝜆𝑖𝑗 =𝜕𝑂𝑎𝑝𝑝𝑟𝑜
𝜕𝑤Δ𝑂
Gradient with respect to approximated object, i.e., exponential loss on mis-ordered pairs
Change in original object, e.g., NDCG, if we switch the documents i and j, leaving the other documents unchanged
Depend on the ranking of document i, j in the whole list
CS 6501: Information Retrieval 39CS@UVa
![Page 40: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/40.jpg)
• Lambda functions
– Gradient?
• Yes, it meets the sufficient and necessary condition of being partial derivative
– Lead to optimal solution of original problem?
• Empirically
From RankNet to LambdaRank to LambdaMART: An Overview
Christopher J.C. Burges, 2010
CS 6501: Information Retrieval 40CS@UVa
![Page 41: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/41.jpg)
• Evolution
From RankNet to LambdaRank to LambdaMART: An Overview
Christopher J.C. Burges, 2010
RankNet LambdaRank LambdaMART
ObjectCross entropy over
the pairsUnknown Unknown
Gradient (𝜆 function)
Gradient of cross entropy
Gradient of cross entropy times
pairwise change in target metric
Gradient of cross entropy times
pairwise change in target metric
Optimization method
neural networkstochastic gradient
descent
Multiple Additive Regression Trees
(MART)
As we discussed in RankBoost
Optimize solely by gradient
Non-linear combination
CS 6501: Information Retrieval 41CS@UVa
![Page 42: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/42.jpg)
• A Lambda tree
From RankNet to LambdaRank to LambdaMART: An Overview
Christopher J.C. Burges, 2010
splitting
Combination of features
CS 6501: Information Retrieval 42CS@UVa
![Page 43: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/43.jpg)
AdaRank: a boosting algorithm for information retrieval
Jun Xu & Hang Li, SIGIR’07
• Loss defined by IR metrics
– 𝑞∈𝑄𝑃𝑟 𝑞 𝑒𝑥𝑝[−𝑂(𝑞)]
– Optimizing by boostingTarget metrics: MAP, NDCG, MRR
from Pattern Recognition and Machine Learning, P658
Updating Pr(𝑞)
Credibility of each committee member (ranking feature)
BM25 PageRank Cosine
CS 6501: Information Retrieval 43CS@UVa
![Page 44: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/44.jpg)
A Support Vector Machine for Optimizing Average Precision
Yisong Yue, et al., SIGIR’07
RankingSVM
• Minimizing the pairwise loss
SVM-MAP
• Minimizing the structural loss
Loss defined on the number of mis-ordered document pairs
Loss defined on the quality of the whole list of ordered documents
CS 6501: Information Retrieval 44
MAP difference
CS@UVa
![Page 45: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/45.jpg)
• Max margin principle
– Push the ground-truth far away from any mistakes you might make
– Finding the most violated constraints
A Support Vector Machine for Optimizing Average Precision
Yisong Yue, et al., SIGIR’07
CS 6501: Information Retrieval 45CS@UVa
![Page 46: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/46.jpg)
• Finding the most violated constraints
– MAP is invariant to permutation of (ir)relevant documents
– Maximize MAP over a series of swaps between relevant and irrelevant documents
•
A Support Vector Machine for Optimizing Average Precision
Yisong Yue, et al., SIGIR’07
Right-hand side of constraints
Greedy solutionCS 6501: Information Retrieval 46
Start from the reverse order of ideal ranking
CS@UVa
![Page 47: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/47.jpg)
• Experiment results
A Support Vector Machine for Optimizing Average Precision
Yisong Yue, et al., SIGIR’07
CS 6501: Information Retrieval 47CS@UVa
![Page 48: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/48.jpg)
Other listwise solutions
• Soften the metrics to make them differentiable
– Michael Taylor et al., SoftRank: optimizing non-smooth rank metrics, WSDM'08
• Minimize a loss function defined on permutations
– Zhe Cao et al., Learning to rank: from pairwise approach to listwise approach, ICML'07
CS 6501: Information Retrieval 48CS@UVa
![Page 49: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/49.jpg)
What did we learn
• Taking a list of documents as a whole
– Positions are visible for the learning algorithm
– Directly optimizing the target metric
• Limitation
– The search space is huge!
CS 6501: Information Retrieval 49CS@UVa
![Page 50: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/50.jpg)
Summary
• Learning to rank– Automatic combination of ranking features for
optimizing IR evaluation metrics
• Approaches– Pointwise
• Fit the relevance labels individually
– Pairwise• Fit the relative orders
– Listwise• Fit the whole order
CS 6501: Information Retrieval 50CS@UVa
![Page 51: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/51.jpg)
Experimental Comparisons
• Ranking performance
CS 6501: Information Retrieval 51CS@UVa
![Page 52: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/52.jpg)
Experimental Comparisons
• Winning count
– Over seven different data sets
CS 6501: Information Retrieval 52CS@UVa
![Page 53: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/53.jpg)
Experimental Comparisons
• My experiments
– 1.2k queries, 45.5K documents with 1890 features
– 800 queries for training, 400 queries for testing
MAP P@1 ERR MRR NDCG@5
ListNET 0.2863 0.2074 0.1661 0.3714 0.2949
LambdaMART 0.4644 0.4630 0.2654 0.6105 0.5236
RankNET 0.3005 0.2222 0.1873 0.3816 0.3386
RankBoost 0.4548 0.4370 0.2463 0.5829 0.4866RankingSVM 0.3507 0.2370 0.1895 0.4154 0.3585
AdaRank 0.4321 0.4111 0.2307 0.5482 0.4421pLogistic 0.4519 0.3926 0.2489 0.5535 0.4945
Logistic 0.4348 0.3778 0.2410 0.5526 0.4762
CS 6501: Information Retrieval 53CS@UVa
![Page 54: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/54.jpg)
Analysis of the Approaches
• What are they really optimizing?
– Relation with IR metrics
gap
CS 6501: Information Retrieval 54CS@UVa
![Page 55: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/55.jpg)
Pointwise Approaches
• Regression based
• Classification based
Regression lossDiscount coefficients in DCG
Classification lossDiscount coefficients in DCG
CS 6501: Information Retrieval 55CS@UVa
![Page 56: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/56.jpg)
From
CS 6501: Information Retrieval 56CS@UVa
![Page 57: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/57.jpg)
From
Discount coefficients in DCG
CS 6501: Information Retrieval 57CS@UVa
![Page 58: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/58.jpg)
Listwise Approaches
• No general analysis
– Method dependent
– Directness and consistency
CS 6501: Information Retrieval 58CS@UVa
![Page 59: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/59.jpg)
Connection with Traditional IR
• People have foreseen this topic long time ago
– Nicely fit in the risk minimization framework
CS 6501: Information Retrieval 59CS@UVa
![Page 60: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/60.jpg)
Applying Bayesian Decision Theory
Choice: (D1,1)
Choice: (D2,2)
Choice: (Dn,n)
...
query q
user U
doc set C
source S
q
1
N
dSCUqpDLDD
),,,|(),,(minarg*)*,(,
hidden observedloss
Bayes risk for choice (D, )RISK MINIMIZATION
Loss
L
L
L
Metric to be optimized Available ranking features
CS 6501: Information Retrieval 60CS@UVa
![Page 61: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/61.jpg)
Traditional Solution
• Set-based models (choose D)
• Ranking models (choose )
– Independent loss
• Relevance-based loss
• Distance-based loss
– Dependent loss
• MMR loss
• MDR loss
Boolean model
Probabilistic relevance model
Generative Relevance Theory
Subtopic retrieval model
Vector-space Model
Two-stage LM
KL-divergence model
Pointwise
Pairwise/Listwise
CS 6501: Information Retrieval 61CS@UVa
![Page 62: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/62.jpg)
Traditional Notion of Relevance
Relevance
(Rep(q), Rep(d))
Similarity
P(r=1|q,d) r {0,1}Probability of Relevance
P(d q) or P(q d)
Probabilistic inference
Different
rep & similarity
Vector space
model
(Salton et al., 75)
Prob. distr.
model
(Wong & Yao, 89)
…
Generative
Model
Regression
Model (Fuhr 89)
Classical
prob. Model
(Robertson &
Sparck Jones, 76)
Doc
generation
Query
generation
LM
approach
(Ponte & Croft, 98)
(Lafferty & Zhai, 01a)
Prob. concept
space model
(Wong & Yao, 95)
Different
inference system
Inference
network
model
(Turtle & Croft, 91)
Div. from Randomness
(Amati & Rijsbergen 02)
Learn. To Rank
(Joachims 02,
Berges et al. 05)
Relevance constraints[Fang et al. 04]
CS 6501: Information Retrieval 62CS@UVa
![Page 63: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/63.jpg)
Broader Notion of Relevance
• Traditional view– Content-driven
• Vector space model
• Probability relevance model
• Language model
• Modern view– Anything related to the quality of the document
• Clicks/views
• Link structure
• Visual structure
• Social network
• ….
Query-Document specificUnsupervised
Query, Document, Query-Document specificSupervised
CS 6501: Information Retrieval 63CS@UVa
![Page 64: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/64.jpg)
Broader Notion of Relevance
DocumentsQuery
BM25
Language Model
Cosine
DocumentsQuery
BM25
Language Model
Cosine
likes
Linkage structure
Query relation
Query relation
Linkage structure
DocumentsQuery
BM25
Language Model
Cosine
Click/View
Visual StructureSocial network
CS 6501: Information Retrieval 64CS@UVa
![Page 65: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/65.jpg)
Future
• Tighter bounds
• Faster solution
• Larger scale
• Wider application scenario
CS 6501: Information Retrieval 65CS@UVa
![Page 66: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/66.jpg)
Resources• Books
– Liu, Tie-Yan. Learning to rank for information retrieval. Vol. 13. Springer, 2011.
– Li, Hang. "Learning to rank for information retrieval and natural language processing." Synthesis Lectures on Human Language Technologies 4.1 (2011): 1-113.
• Helpful pages– http://en.wikipedia.org/wiki/Learning_to_rank
• Packages– RankingSVM: http://svmlight.joachims.org/– RankLib: http://people.cs.umass.edu/~vdang/ranklib.html
• Data sets– LETOR http://research.microsoft.com/en-
us/um/beijing/projects/letor//– Yahoo! Learning to rank challenge
http://learningtorankchallenge.yahoo.com/
CS 6501: Information Retrieval 66CS@UVa
![Page 67: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/67.jpg)
References• Liu, Tie-Yan. "Learning to rank for information retrieval." Foundations and
Trends in Information Retrieval 3.3 (2009): 225-331.
• Cossock, David, and Tong Zhang. "Subset ranking using regression." Learning theory (2006): 605-619.
• Shashua, Amnon, and Anat Levin. "Ranking with large margin principle: Two approaches." Advances in neural information processing systems 15 (2003): 937-944.
• Joachims, Thorsten. "Optimizing search engines using clickthrough data." Proceedings of the eighth ACM SIGKDD. ACM, 2002.
• Freund, Yoav, et al. "An efficient boosting algorithm for combining preferences." The Journal of Machine Learning Research 4 (2003): 933-969.
• Zheng, Zhaohui, et al. "A regression framework for learning ranking functions using relative relevance judgments." Proceedings of the 30th annual international ACM SIGIR. ACM, 2007.
CS 6501: Information Retrieval 67CS@UVa
![Page 68: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/68.jpg)
References
• Joachims, Thorsten, et al. "Accurately interpreting clickthrough data as implicit feedback." Proceedings of the 28th annual international ACM SIGIR. ACM, 2005.
• Burges, C. "From ranknet to lambdarank to lambdamart: An overview." Learning 11 (2010): 23-581.
• Xu, Jun, and Hang Li. "AdaRank: a boosting algorithm for information retrieval." Proceedings of the 30th annual international ACM SIGIR. ACM, 2007.
• Yue, Yisong, et al. "A support vector method for optimizing average precision." Proceedings of the 30th annual international ACM SIGIR. ACM, 2007.
• Taylor, Michael, et al. "Softrank: optimizing non-smooth rank metrics." Proceedings of the international conference WSDM. ACM, 2008.
• Cao, Zhe, et al. "Learning to rank: from pairwise approach to listwiseapproach." Proceedings of the 24th ICML. ACM, 2007.
CS 6501: Information Retrieval 68CS@UVa
![Page 69: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/69.jpg)
Thank you!
• Q&A
CS 6501: Information Retrieval 69CS@UVa
![Page 70: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/70.jpg)
Recap of last lecture
• Goal
– Design the ranking module for Bing.com
CS 6501: Information Retrieval 70CS@UVa
![Page 71: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/71.jpg)
Basic Search Engine Architecture
• The anatomy of a large-scale hypertextualWeb search engine
– Sergey Brin and Lawrence Page. Computer networks and ISDN systems 30.1 (1998): 107-117.
CS 6501: Information Retrieval 71
Your JobCS@UVa
![Page 72: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/72.jpg)
Learning to Rank
• Given: (query, document) pairs represented by a set of relevance estimators, a.k.a., features
• Needed: a way of combining the estimators
– 𝑓 𝑞, 𝑑 𝑖=1𝐷 → ordered 𝑑 𝑖=1
𝐷
• Criterion: optimize IR metrics
– P@k, MAP, NDCG, etc.
QueryID DocID BM25 LM PageRank Label
0001 0001 1.6 1.1 0.9 0
0001 0002 2.7 1.9 0.2 1
Key!
CS 6501: Information Retrieval 72CS@UVa
![Page 73: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/73.jpg)
Challenge: how to optimize?
• Order is essential!
– 𝑓 → order→metric
• Evaluation metrics are not continuous and not differentiable
CS 6501: Information Retrieval 73CS@UVa
![Page 74: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/74.jpg)
Approximating the Objects!
• Pointwise
– Fit the relevance labels individually
• Pairwise
– Fit the relative orders
• Listwise
– Fit the whole order
CS 6501: Information Retrieval 74CS@UVa
![Page 75: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/75.jpg)
Pointwise Learning to Rank
• Ideally perfect relevance prediction leads to perfect ranking
– 𝑓 → score→ order →metric
• Reduce ranking problem to
– Regression
CS 6501: Information Retrieval 75
http://en.wikipedia.org/wiki/Statistical_classification
http://en.wikipedia.org/wiki/Regression_analysis
- Classification
CS@UVa
![Page 76: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/76.jpg)
Deficiency
• Cannot directly optimize IR metrics
– (0→ 1, 2→ 0) worse than (0->-2, 2->4)
• Position of documents are ignored
– Penalty on documents at higher positions should be larger
• Favor the queries with more documents
CS 6501: Information Retrieval 76
𝑑1′
𝑑2𝑑1
1
0
-2
2
-1
𝑑1′′
CS@UVa
![Page 77: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/77.jpg)
Pairwise Learning to Rank
• Ideally perfect partial order leads to perfect ranking
– 𝑓 → partial order→ order →metric
• Ordinal regression– 𝑂 𝑓 𝑄,𝐷 , 𝑌 = 𝑖≠𝑗 𝛿(𝑦𝑖 > 𝑦𝑗)𝛿(𝑓 𝑞𝑖 , 𝑑𝑖 < 𝑓(𝑞𝑖 , 𝑑𝑖))
• Relative ordering between different documents is significant
CS 6501: Information Retrieval 77
𝑑1′
𝑑2𝑑1
1
0
-2
2
-1
𝑑1′′
CS@UVa
![Page 78: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/78.jpg)
RankingSVMThorsten Joachims, KDD’02
• Minimizing the number of mis-ordered pairs
𝑦1 > 𝑦2, 𝑦2 > 𝑦3 , 𝑦1 > 𝑦41
0
linear combination of features
𝑓 𝑞, 𝑑 = 𝑤𝑇𝑋𝑞,𝑑𝑦1 > 𝑦2
Keep the relative orders
CS 6501: Information Retrieval 78
QueryID DocID BM25 PageRank Label
0001 0001 1.6 0.9 4
0001 0002 2.7 0.2 2
0001 0003 1.3 0.2 1
0001 0004 1.2 0.7 0
CS@UVa
![Page 79: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/79.jpg)
RankingSVMThorsten Joachims, KDD’02
• Minimizing the number of mis-ordered pairs
CS 6501: Information Retrieval 79
𝑤ΔΦ 𝑞𝑛, 𝑑𝑖 , 𝑑𝑗 ≥ 1 − 𝜉𝑖,𝑗,𝑛
CS@UVa
![Page 80: Learning to Ranksifaka.cs.uiuc.edu/~wang296/Course/IR_Fall/docs/PDFs/learning2ran… · LambdaMART: An Overview Christopher J.C. Burges, 2010 •Minimizing mis-ordered pair => maximizing](https://reader035.vdocuments.us/reader035/viewer/2022071007/5fc50dec1ca4e1756528a844/html5/thumbnails/80.jpg)
General Idea of Pairwise Learning to Rank
• For any pair of 𝑦𝑖 > 𝑦𝑗
CS 6501: Information Retrieval 80
exponential loss
hinge loss (RankingSVM)
0/1 loss
square loss (GBRank)
80Guest Lecture for Learing to Rank
max 0,1 − 𝑤ΔΦ 𝑞𝑛, 𝑑𝑖 , 𝑑𝑗
max 0,1 − 𝑤ΔΦ 𝑞𝑛, 𝑑𝑖 , 𝑑𝑗2
exp −𝑤ΔΦ 𝑞𝑛, 𝑑𝑖 , 𝑑𝑗
𝛿 𝑤ΔΦ 𝑞𝑛, 𝑑𝑖 , 𝑑𝑗 < 0
𝑤ΔΦ 𝑞𝑛, 𝑑𝑖 , 𝑑𝑗CS@UVa