ranking
DESCRIPTION
ranking. David Kauchak CS 451 – Fall 2013. Admin. Assignment 4 Assignment 5. Course feedback. Thanks!. Course feedback. Difficulty. Course feedback. Time/week. Course feedback. Experiments/follow-up questions More interesting data sets More advanced topics I’m hoping to get to: - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: ranking](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130a0550346895d9697d7/html5/thumbnails/1.jpg)
RANKING
David KauchakCS 451 – Fall 2013
![Page 2: ranking](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130a0550346895d9697d7/html5/thumbnails/2.jpg)
Admin
Assignment 4
Assignment 5
![Page 3: ranking](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130a0550346895d9697d7/html5/thumbnails/3.jpg)
Course feedback
Thanks!
![Page 4: ranking](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130a0550346895d9697d7/html5/thumbnails/4.jpg)
Course feedback
Difficulty
![Page 5: ranking](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130a0550346895d9697d7/html5/thumbnails/5.jpg)
Course feedback
Time/week
![Page 6: ranking](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130a0550346895d9697d7/html5/thumbnails/6.jpg)
Course feedback
Experiments/follow-up questions
More interesting data sets
More advanced topics I’m hoping to get to:- large margin classifiers- probabilistic modeling- Unsupervised learning/clustering- Ensemble learning- Collaborative filtering- MapReduce/Hadoop
![Page 7: ranking](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130a0550346895d9697d7/html5/thumbnails/7.jpg)
Java tip for the day
static
ArrayList<Example> data = dataset.getData();
How can I iterate over it?
![Page 8: ranking](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130a0550346895d9697d7/html5/thumbnails/8.jpg)
Java tip for the day
ArrayList<Example> data = dataset.getData();
for( int i = 0; i < data.size(); i++ ){
Example ex = data.get(i)
}
OR
// can do on anything that implements the Iterable interface
for( Example ex: data ){
}
![Page 9: ranking](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130a0550346895d9697d7/html5/thumbnails/9.jpg)
An aside: text classification
Raw data labels
Chardonnay
Pinot Grigio
Zinfandel
![Page 10: ranking](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130a0550346895d9697d7/html5/thumbnails/10.jpg)
Text: raw data
Raw data Features?labels
Chardonnay
Pinot Grigio
Zinfandel
![Page 11: ranking](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130a0550346895d9697d7/html5/thumbnails/11.jpg)
Feature examples
Raw data Features
(1, 1, 1, 0, 0, 1, 0, 0, …)
clin
ton
said
calif
orni
aac
ross tv
wro
ngca
pita
l
pino
t
Clinton said pinot repeatedly last week on tv, “pinot, pinot, pinot”
Occurrence of words
labels
Chardonnay
Pinot Grigio
Zinfandel
![Page 12: ranking](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130a0550346895d9697d7/html5/thumbnails/12.jpg)
Feature examples
Raw data Features
(4, 1, 1, 0, 0, 1, 0, 0, …)
clin
ton
said
calif
orni
aac
ross tv
wro
ngca
pita
l
pino
t
Clinton said pinot repeatedly last week on tv, “pinot, pinot, pinot”
Frequency of word occurrences
labels
Chardonnay
Pinot Grigio
Zinfandel
This is the representation we’re using for assignment 5
![Page 13: ranking](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130a0550346895d9697d7/html5/thumbnails/13.jpg)
Decision trees for text
Each internal node represents whether or not the text has a particular word
![Page 14: ranking](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130a0550346895d9697d7/html5/thumbnails/14.jpg)
Decision trees for text
wheat is a commodity that can be found in states across the nation
![Page 15: ranking](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130a0550346895d9697d7/html5/thumbnails/15.jpg)
Decision trees for text
The US views technology as a commodity that it can export by the buschl.
![Page 16: ranking](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130a0550346895d9697d7/html5/thumbnails/16.jpg)
Printing out decision trees
(wheat
(buschl
predict: not wheat
(export
predict: not wheat
predict: wheat))
(farm
(commodity
(agriculture
predict: not wheat
predict: wheat)
predict: wheat)
predict: wheat))
![Page 17: ranking](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130a0550346895d9697d7/html5/thumbnails/17.jpg)
Ranking problems
Suggest a simpler word for the word below:
vital
![Page 18: ranking](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130a0550346895d9697d7/html5/thumbnails/18.jpg)
Suggest a simpler word
Suggest a simpler word for the word below:
vitalword frequency
important 13
necessary 12
essential 11
needed 8
critical 3
crucial 2
mandatory 1
required 1
vital 1
![Page 19: ranking](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130a0550346895d9697d7/html5/thumbnails/19.jpg)
Suggest a simpler word
Suggest a simpler word for the word below:
acquired
![Page 20: ranking](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130a0550346895d9697d7/html5/thumbnails/20.jpg)
Suggest a simpler word
Suggest a simpler word for the word below:
acquiredword frequency
gotten 12
received 9
gained 8
obtained 5
got 3
purchased 2
bought 2
got hold of 1
acquired 1
![Page 21: ranking](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130a0550346895d9697d7/html5/thumbnails/21.jpg)
Suggest a simpler word
vitalimportantnecessaryessentialneededcriticalcrucialmandatoryrequiredvital
gottenreceivedgainedobtainedgotpurchasedboughtgot hold ofacquired
acquired
… training data
train
rankerlist of synonyms list ranked by simplicity
![Page 22: ranking](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130a0550346895d9697d7/html5/thumbnails/22.jpg)
Ranking problems in general
ranking1 ranking2 ranking3
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
training data:a set of rankings where each ranking consists of a set of ranked examples
…
train
ranker ranking/ordering or examples
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
![Page 23: ranking](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130a0550346895d9697d7/html5/thumbnails/23.jpg)
Ranking problems in general
ranking1 ranking2 ranking3
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
training data:a set of rankings where each ranking consists of a set of ranked examples
…
Real-world ranking problems?
![Page 24: ranking](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130a0550346895d9697d7/html5/thumbnails/24.jpg)
Netflix My List
![Page 25: ranking](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130a0550346895d9697d7/html5/thumbnails/25.jpg)
Search
![Page 26: ranking](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130a0550346895d9697d7/html5/thumbnails/26.jpg)
Ranking Applications
reranking N-best output lists- machine translation- computational biology- parsing- …
flight search
…
![Page 27: ranking](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130a0550346895d9697d7/html5/thumbnails/27.jpg)
Black box approach to rankingAbstraction: we have a generic binary classifier, how can we use it to solve our new problem
binary classifier
+1
-1
optionally: also output a confidence/score
Can we solve our ranking problem with this?
![Page 28: ranking](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130a0550346895d9697d7/html5/thumbnails/28.jpg)
Predict better vs. worse
ranking1
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
Train a classifier to decide if the first input is better than second:- Consider all possible pairings of the examples in a ranking- Label as positive if the first example is higher ranked, negative otherwise
![Page 29: ranking](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130a0550346895d9697d7/html5/thumbnails/29.jpg)
Predict better vs. worse
ranking1
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
Train a classifier to decide if the first input is better than second:- Consider all possible pairings of the examples in a ranking- Label as positive if the first example is higher ranked, negative otherwise
f1, f2, …, fn f1, f2, …, fn
f1, f2, …, fn f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
new examples binary label
+1+1-1+1-1-1
![Page 30: ranking](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130a0550346895d9697d7/html5/thumbnails/30.jpg)
Predict better vs. worse
binary classifier
+1
-1
Our binary classifier only takes one example as input
![Page 31: ranking](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130a0550346895d9697d7/html5/thumbnails/31.jpg)
Predict better vs. worse
binary classifier
+1
-1
Our binary classifier only takes one example as input
a1, a2, …, an b1, b2, …, bnf’1, f’2, …, f’n’
How can we do this?We want features that compare the two examples.
![Page 32: ranking](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130a0550346895d9697d7/html5/thumbnails/32.jpg)
Combined feature vector
Many approaches! Will depend on domain and classifier
Two common approaches:1. difference:
2. greater than/less than:
![Page 33: ranking](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130a0550346895d9697d7/html5/thumbnails/33.jpg)
Training
f1, f2, …, fn f1, f2, …, fn
f1, f2, …, fn f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
new examples label
+1+1-1+1-1-1
f’1, f’2, …, f’n’
f’1, f’2, …, f’n’
f’1, f’2, …, f’n’
f’1, f’2, …, f’n’
f’1, f’2, …, f’n’
f’1, f’2, …, f’n’
extr
act
featu
res
train
cla
ssifi
er
binary classifier
![Page 34: ranking](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130a0550346895d9697d7/html5/thumbnails/34.jpg)
Testing
unranked
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
binary classifier
ranking?
![Page 35: ranking](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130a0550346895d9697d7/html5/thumbnails/35.jpg)
Testing
unranked
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn f1, f2, …, fn
f1, f2, …, fn f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fnf1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fnextr
act
featu
res
f’1, f’2, …, f’n’
f’1, f’2, …, f’n’
f’1, f’2, …, f’n’
f’1, f’2, …, f’n’
f’1, f’2, …, f’n’
f’1, f’2, …, f’n’
![Page 36: ranking](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130a0550346895d9697d7/html5/thumbnails/36.jpg)
Testing
f1, f2, …, fn f1, f2, …, fn
f1, f2, …, fn f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fnf1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
extr
act
featu
res
f’1, f’2, …, f’n’
f’1, f’2, …, f’n’
f’1, f’2, …, f’n’
f’1, f’2, …, f’n’
f’1, f’2, …, f’n’
f’1, f’2, …, f’n’
binary classifier
-1-1+1+1-1+1
![Page 37: ranking](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130a0550346895d9697d7/html5/thumbnails/37.jpg)
Testing
f1, f2, …, fn f1, f2, …, fn
f1, f2, …, fn f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fnf1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
-1-1+1+1-1+1
What is the ranking?Algorithm?
![Page 38: ranking](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130a0550346895d9697d7/html5/thumbnails/38.jpg)
Testing
f1, f2, …, fn f1, f2, …, fn
f1, f2, …, fn f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fnf1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
-1-1+1+1-1+1
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
for each binary example ejk: label[j] += fjk(ejk) label[k] -= fjk(ejk)
rank according to label scores
![Page 39: ranking](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130a0550346895d9697d7/html5/thumbnails/39.jpg)
An improvement?
ranking1
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn f1, f2, …, fn
f1, f2, …, fn f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
new examples binary label
+1+1-1+1-1-1
Are these two examples the same?
![Page 40: ranking](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130a0550346895d9697d7/html5/thumbnails/40.jpg)
Weighted binary classification
ranking1
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn f1, f2, …, fn
f1, f2, …, fn f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
new examples weighted label
+1+2-1+1-2-1
Weight based on distance in ranking
![Page 41: ranking](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130a0550346895d9697d7/html5/thumbnails/41.jpg)
Weighted binary classification
ranking1
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn f1, f2, …, fn
f1, f2, …, fn f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
f1, f2, …, fn
new examples weighted label
+1+2-1+1-2-1
In general can weight with any consistent distance metric
Can we solve this problem?
![Page 42: ranking](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130a0550346895d9697d7/html5/thumbnails/42.jpg)
Testing
If the classifier outputs a confidence, then we’ve learned a distance measure between examples
During testing we want to rank the examples based on the learned distance measure
Ideas?
![Page 43: ranking](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130a0550346895d9697d7/html5/thumbnails/43.jpg)
Testing
If the classifier outputs a confidence, then we’ve learned a distance measure between examples
During testing we want to rank the examples based on the learned distance measure
Sort the examples and use the output of the binary classifier as the similarity between examples!