cs 572: information retrieval - emory universityeugene/cs572/lectures/lecture15-ltr2.pdf ·...
TRANSCRIPT
![Page 1: CS 572: Information Retrieval - Emory Universityeugene/cs572/lectures/lecture15-ltr2.pdf · •Midterm exam to be handed out, answers due back by Thursday ... Sec.14.5 3/2/2016 CS](https://reader030.vdocuments.us/reader030/viewer/2022040509/5e4f7e6e57a2b656fd30617f/html5/thumbnails/1.jpg)
CS 572: Information Retrieval
Learning to Rank (wrap-up)
Acknowledgements
Some slides in this lecture are adapted from Chris Manning (Stanford), Jan
Pedersen (YahooMSFT), Ysong Yue (Cornell) and Filip Radlinski (MSR)
![Page 2: CS 572: Information Retrieval - Emory Universityeugene/cs572/lectures/lecture15-ltr2.pdf · •Midterm exam to be handed out, answers due back by Thursday ... Sec.14.5 3/2/2016 CS](https://reader030.vdocuments.us/reader030/viewer/2022040509/5e4f7e6e57a2b656fd30617f/html5/thumbnails/2.jpg)
“Real World” Ranking
• Many different possible sources of evidence:
– Relevance: Is the page relevant to the query?
– Page quality: Is this a reliable source/site?
– Freshness: How old is the index record for result?
– Spam: Is this page likely to be optimized or spammed?
– Clickthrough: how often do people click on this result? Why?
– Context: Is this a reformulation of previous query?
3/2/2016 CS 572: Information Retrieval. Spring 2016 2
![Page 3: CS 572: Information Retrieval - Emory Universityeugene/cs572/lectures/lecture15-ltr2.pdf · •Midterm exam to be handed out, answers due back by Thursday ... Sec.14.5 3/2/2016 CS](https://reader030.vdocuments.us/reader030/viewer/2022040509/5e4f7e6e57a2b656fd30617f/html5/thumbnails/3.jpg)
Today's Plan
• Learning to Rank (LTR)
– RankNet
– Gradient Boosted Decision Trees
– Resources
• Talk: NLP for IR: Yuval Pinter (Yahoo Research)
• Midterm exam to be handed out, answers due back by Thursday (tomorrow) 10pm EST.
3/2/2016 CS 572: Information Retrieval. Spring 2016 3
![Page 4: CS 572: Information Retrieval - Emory Universityeugene/cs572/lectures/lecture15-ltr2.pdf · •Midterm exam to be handed out, answers due back by Thursday ... Sec.14.5 3/2/2016 CS](https://reader030.vdocuments.us/reader030/viewer/2022040509/5e4f7e6e57a2b656fd30617f/html5/thumbnails/4.jpg)
4
Neural Nets
• RankNet: Burges et al., [ICML 2005]
– Scalable Neural Net implementation
– Input: feature vectors and relevance labels
• LambdaRank: extension to RankNet that directly optimizes IR measures (MRR, MAP, nDCG).
• LambdaMART: adds boosting w/ weighted trees (inspired by GBDT success).
3/2/2016 CS 572: Information Retrieval. Spring 2016
![Page 5: CS 572: Information Retrieval - Emory Universityeugene/cs572/lectures/lecture15-ltr2.pdf · •Midterm exam to be handed out, answers due back by Thursday ... Sec.14.5 3/2/2016 CS](https://reader030.vdocuments.us/reader030/viewer/2022040509/5e4f7e6e57a2b656fd30617f/html5/thumbnails/5.jpg)
5
Training RankNet
• For query results 1 and 2, present pair of vectors and labels, label(1) > label(2)
3/2/2016 CS 572: Information Retrieval. Spring 2016
![Page 6: CS 572: Information Retrieval - Emory Universityeugene/cs572/lectures/lecture15-ltr2.pdf · •Midterm exam to be handed out, answers due back by Thursday ... Sec.14.5 3/2/2016 CS](https://reader030.vdocuments.us/reader030/viewer/2022040509/5e4f7e6e57a2b656fd30617f/html5/thumbnails/6.jpg)
6
RankNet [Burges et al. 2005]
Feature Vector1 Label1
NN output 1
• For query results 1 and 2, present pair of vectors and labels, label(1) > label(2)
3/2/2016 CS 572: Information Retrieval. Spring 2016
![Page 7: CS 572: Information Retrieval - Emory Universityeugene/cs572/lectures/lecture15-ltr2.pdf · •Midterm exam to be handed out, answers due back by Thursday ... Sec.14.5 3/2/2016 CS](https://reader030.vdocuments.us/reader030/viewer/2022040509/5e4f7e6e57a2b656fd30617f/html5/thumbnails/7.jpg)
7
RankNet [Burges et al. 2005]
Feature Vector2 Label2
NN output 1 NN output 2
• For query results 1 and 2, present pair of vectors and labels, label(1) > label(2)
3/2/2016 CS 572: Information Retrieval. Spring 2016
![Page 8: CS 572: Information Retrieval - Emory Universityeugene/cs572/lectures/lecture15-ltr2.pdf · •Midterm exam to be handed out, answers due back by Thursday ... Sec.14.5 3/2/2016 CS](https://reader030.vdocuments.us/reader030/viewer/2022040509/5e4f7e6e57a2b656fd30617f/html5/thumbnails/8.jpg)
8
RankNet [Burges et al. 2005]
NN output 1 NN output 2
Error is function of both outputs
(Desire output1 > output2)
• For query results 1 and 2, present pair of vectors and labels, label(1) > label(2)
3/2/2016 CS 572: Information Retrieval. Spring 2016
![Page 9: CS 572: Information Retrieval - Emory Universityeugene/cs572/lectures/lecture15-ltr2.pdf · •Midterm exam to be handed out, answers due back by Thursday ... Sec.14.5 3/2/2016 CS](https://reader030.vdocuments.us/reader030/viewer/2022040509/5e4f7e6e57a2b656fd30617f/html5/thumbnails/9.jpg)
9
Predicting with RankNet
Feature Vector1
NN output
• Present individual vector and get score
3/2/2016 CS 572: Information Retrieval. Spring 2016
![Page 10: CS 572: Information Retrieval - Emory Universityeugene/cs572/lectures/lecture15-ltr2.pdf · •Midterm exam to be handed out, answers due back by Thursday ... Sec.14.5 3/2/2016 CS](https://reader030.vdocuments.us/reader030/viewer/2022040509/5e4f7e6e57a2b656fd30617f/html5/thumbnails/10.jpg)
Regression Trees
3/2/2016 CS 572: Information Retrieval. Spring 2016 10
![Page 11: CS 572: Information Retrieval - Emory Universityeugene/cs572/lectures/lecture15-ltr2.pdf · •Midterm exam to be handed out, answers due back by Thursday ... Sec.14.5 3/2/2016 CS](https://reader030.vdocuments.us/reader030/viewer/2022040509/5e4f7e6e57a2b656fd30617f/html5/thumbnails/11.jpg)
Regression Tree Ensemble
3/2/2016 CS 572: Information Retrieval. Spring 2016 11
![Page 12: CS 572: Information Retrieval - Emory Universityeugene/cs572/lectures/lecture15-ltr2.pdf · •Midterm exam to be handed out, answers due back by Thursday ... Sec.14.5 3/2/2016 CS](https://reader030.vdocuments.us/reader030/viewer/2022040509/5e4f7e6e57a2b656fd30617f/html5/thumbnails/12.jpg)
Tree Ensemble Methods
3/2/2016 CS 572: Information Retrieval. Spring 2016 12
![Page 13: CS 572: Information Retrieval - Emory Universityeugene/cs572/lectures/lecture15-ltr2.pdf · •Midterm exam to be handed out, answers due back by Thursday ... Sec.14.5 3/2/2016 CS](https://reader030.vdocuments.us/reader030/viewer/2022040509/5e4f7e6e57a2b656fd30617f/html5/thumbnails/13.jpg)
Boosted Decision Trees
3/2/2016 CS 572: Information Retrieval. Spring 2016 13
![Page 14: CS 572: Information Retrieval - Emory Universityeugene/cs572/lectures/lecture15-ltr2.pdf · •Midterm exam to be handed out, answers due back by Thursday ... Sec.14.5 3/2/2016 CS](https://reader030.vdocuments.us/reader030/viewer/2022040509/5e4f7e6e57a2b656fd30617f/html5/thumbnails/14.jpg)
Boosting (Training)
3/2/2016 CS 572: Information Retrieval. Spring 2016 14
![Page 15: CS 572: Information Retrieval - Emory Universityeugene/cs572/lectures/lecture15-ltr2.pdf · •Midterm exam to be handed out, answers due back by Thursday ... Sec.14.5 3/2/2016 CS](https://reader030.vdocuments.us/reader030/viewer/2022040509/5e4f7e6e57a2b656fd30617f/html5/thumbnails/15.jpg)
Overview of GBDT Algorithm
3/2/2016 CS 572: Information Retrieval. Spring 2016 15
![Page 16: CS 572: Information Retrieval - Emory Universityeugene/cs572/lectures/lecture15-ltr2.pdf · •Midterm exam to be handed out, answers due back by Thursday ... Sec.14.5 3/2/2016 CS](https://reader030.vdocuments.us/reader030/viewer/2022040509/5e4f7e6e57a2b656fd30617f/html5/thumbnails/16.jpg)
Resources
• Many Learning to Rank Tutorials:http://research.microsoft.com/en-us/um/beijing/projects/letor/tutorial.aspx
• LETOR benchmark datasets (Microsoft)– Website with data, links to papers, benchmarks, etc.– http://research.microsoft.com/users/LETOR/– Everything you need to start research in this area!
• RankLib: https://sourceforge.net/p/lemur/wiki/RankLib/
• Yahoo Learning to Rank challenge:– http://webscope.sandbox.yahoo.com/catalog.php?datatype=c
3/2/2016 CS 572: Information Retrieval. Spring 2016 16
![Page 17: CS 572: Information Retrieval - Emory Universityeugene/cs572/lectures/lecture15-ltr2.pdf · •Midterm exam to be handed out, answers due back by Thursday ... Sec.14.5 3/2/2016 CS](https://reader030.vdocuments.us/reader030/viewer/2022040509/5e4f7e6e57a2b656fd30617f/html5/thumbnails/17.jpg)
ADDITIONAL MATERIAL
3/2/2016 CS 572: Information Retrieval. Spring 2016 17
![Page 18: CS 572: Information Retrieval - Emory Universityeugene/cs572/lectures/lecture15-ltr2.pdf · •Midterm exam to be handed out, answers due back by Thursday ... Sec.14.5 3/2/2016 CS](https://reader030.vdocuments.us/reader030/viewer/2022040509/5e4f7e6e57a2b656fd30617f/html5/thumbnails/18.jpg)
Multi-class classification
• Given: some data items that belong to one of M possible classes
• Task: Train the classifier and predict the class for a new data item
• Geometrically: harder problem, no more simple geometry
3/2/2016 CS 572: Information Retrieval. Spring 2016 18
![Page 19: CS 572: Information Retrieval - Emory Universityeugene/cs572/lectures/lecture15-ltr2.pdf · •Midterm exam to be handed out, answers due back by Thursday ... Sec.14.5 3/2/2016 CS](https://reader030.vdocuments.us/reader030/viewer/2022040509/5e4f7e6e57a2b656fd30617f/html5/thumbnails/19.jpg)
Multi-class classification
3/2/2016 CS 572: Information Retrieval. Spring 2016 19
![Page 20: CS 572: Information Retrieval - Emory Universityeugene/cs572/lectures/lecture15-ltr2.pdf · •Midterm exam to be handed out, answers due back by Thursday ... Sec.14.5 3/2/2016 CS](https://reader030.vdocuments.us/reader030/viewer/2022040509/5e4f7e6e57a2b656fd30617f/html5/thumbnails/20.jpg)
Multi-class classification: Examples
• Author identification
• Language identification
• Text categorization (topics)
3/2/2016 CS 572: Information Retrieval. Spring 2016 20
![Page 21: CS 572: Information Retrieval - Emory Universityeugene/cs572/lectures/lecture15-ltr2.pdf · •Midterm exam to be handed out, answers due back by Thursday ... Sec.14.5 3/2/2016 CS](https://reader030.vdocuments.us/reader030/viewer/2022040509/5e4f7e6e57a2b656fd30617f/html5/thumbnails/21.jpg)
21
More Than Two Classes
• Any-of or multivalue classification– Classes are independent of each other.
– A document can belong to 0, 1, or >1 classes.
– Decompose into n binary problems
– Quite common for documents
• One-of or multinomial or polytomous classification– Classes are mutually exclusive.
– Each document belongs to exactly one class
– E.g., digit recognition is polytomous classification• Digits are mutually exclusive
Sec.14.5
3/2/2016 CS 572: Information Retrieval. Spring 2016
![Page 22: CS 572: Information Retrieval - Emory Universityeugene/cs572/lectures/lecture15-ltr2.pdf · •Midterm exam to be handed out, answers due back by Thursday ... Sec.14.5 3/2/2016 CS](https://reader030.vdocuments.us/reader030/viewer/2022040509/5e4f7e6e57a2b656fd30617f/html5/thumbnails/22.jpg)
22
Set of Binary Classifiers: Any of
• Build a separator between each class and its complementary set (docs from all other classes).
• Given test doc, evaluate it for membership in each class.
• Apply decision criterion of classifiers independently
• Done
– Though maybe you could do better by considering dependencies between categories
Sec.14.5
3/2/2016 CS 572: Information Retrieval. Spring 2016
![Page 23: CS 572: Information Retrieval - Emory Universityeugene/cs572/lectures/lecture15-ltr2.pdf · •Midterm exam to be handed out, answers due back by Thursday ... Sec.14.5 3/2/2016 CS](https://reader030.vdocuments.us/reader030/viewer/2022040509/5e4f7e6e57a2b656fd30617f/html5/thumbnails/23.jpg)
23
Set of Binary Classifiers: One of
• Build a separator between each class and its complementary set (docs from all other classes).
• Given test doc, evaluate it for membership in each class.
• Assign document to class with:– maximum score
– maximum confidence
– maximum probability
• Why different from multiclass/ any of classification?
?
?
?
Sec.14.5
3/2/2016 CS 572: Information Retrieval. Spring 2016
![Page 24: CS 572: Information Retrieval - Emory Universityeugene/cs572/lectures/lecture15-ltr2.pdf · •Midterm exam to be handed out, answers due back by Thursday ... Sec.14.5 3/2/2016 CS](https://reader030.vdocuments.us/reader030/viewer/2022040509/5e4f7e6e57a2b656fd30617f/html5/thumbnails/24.jpg)
(Some) Algorithms for Multi-class classification
• Linear
– Parallel class separators: Decision Trees
– Non parallel class separators: Naïve Bayes
• Non Linear
– K-nearest neighbors
– Decision Trees
– Neural Networks
3/2/2016 CS 572: Information Retrieval. Spring 2016 24
![Page 25: CS 572: Information Retrieval - Emory Universityeugene/cs572/lectures/lecture15-ltr2.pdf · •Midterm exam to be handed out, answers due back by Thursday ... Sec.14.5 3/2/2016 CS](https://reader030.vdocuments.us/reader030/viewer/2022040509/5e4f7e6e57a2b656fd30617f/html5/thumbnails/25.jpg)
Linear, parallel class separators (ex: Decision Trees)
3/2/2016 CS 572: Information Retrieval. Spring 2016 25
![Page 26: CS 572: Information Retrieval - Emory Universityeugene/cs572/lectures/lecture15-ltr2.pdf · •Midterm exam to be handed out, answers due back by Thursday ... Sec.14.5 3/2/2016 CS](https://reader030.vdocuments.us/reader030/viewer/2022040509/5e4f7e6e57a2b656fd30617f/html5/thumbnails/26.jpg)
Linear, NON parallel class separators (ex: Naïve Bayes)
3/2/2016 CS 572: Information Retrieval. Spring 2016 26