pranking with ranking koby crammer and yoram singer
DESCRIPTION
Pranking with Ranking Koby Crammer and Yoram Singer. Lecture: Dudu Yanay. The Problem . Input: Each instance is associated with a rank or a rating, i.e. an integer from ‘1’ to ‘K’. - PowerPoint PPT PresentationTRANSCRIPT
Pranking with RankingKoby Crammer and Yoram Singer
Lecture: Dudu Yanay
Input:Each instance is associated with a rank or a rating, i.e. an integer from ‘1’ to ‘K’.
Goal:To find a rank-prediction rule which assigns to each instance a rank which is as close as possible to the instance true rank.
Similar problems:◦ Classifications.◦ Regression.
The Problem
Information Retrieval.
Collaborative filtering:Predict a user’s rating on new items (books, movies etc) given the user’s past rating of similar items.
Natural Setting For…
To cast a rating problem as a regression problem.
To reduce a total order into a set of preferences over pairs.◦ Time consuming since it might require to increase the sample
size from to .
Possible Solutions
n 2O n
Online Algorithm (Littlestone 1988):◦ Each can be computed
in polynomial time.
◦ If the problem is separable,after polynomial failures(no) the learner doesn’t makea mistake. Meaning:
Lets try another approach…
מורהלומד1x
)(1 xh1 1( )y f x
noyes /
)(2 xh2 2( )y f x
noyes /
2x
( )ih x
( )ih x f x
Animation from Nader Bshouty’s
Course.
The PERCEPTRON algorithm
1x
2x
1 2( , )w w
1 2( , )w w
Animation from Nader
Bshouty’sCourse.
The PERCEPTRON algorithm
0
1
1) w 0; 0;2) Get ( , )
3) Predict ( ).4) If Mistake ( ) 4.2) ; 4.3) 1;5) Goto 2.
Ti
i i
ix y
z sign w xz yw w y xi i
A slide from Nader Bshouty’s
Course.
1x
2x
2|| || 1w
| |Tw x
| |Tw x
2|| ||x R
R
w
2
#( ) RMistakes
A slide from Nader Bshouty’s Course.
PRank algorithm - The model Input:
A sequence ◦ .
Output:A ranking rule where:◦ .◦ .◦ .
Ranking loss after T rounds is: where is the TRUE rank of the instance in round ‘t’ and .
1 1 2 2, , , ,..., ,t tx y x y x y , 1, 2,..., with ">" as the order relationi n ix y k
, : 1, 2,...,nw bH k
nw 1 2 1 1 2 1, ,..., , ...k k kb b b b b b b b
, 1,2,..,min : 0rw b r kH x r w x b
1
T t t
t
y y
ty
,
ttw bH x y
PRank algorithm - The update rule Given an input instance-rank pair , if:
◦ .◦ .
Lets represent the above inequalities by where
,x y ,w bH x y 1,..., 1 , rr y w x b ,.., 1 , rr y k w x b
1 1,..., ,..., 1,..., 1, 1,..., 1y ky y y
1 if else 1r ry r y y
, , 0r rw bH x y r y w x b The TRUE rank vector
PRank algorithm - The update rule Given an input instance-rank pair , if
.
So, let’s “move” the values of and towards each other:◦ .
◦ , where the sum is only over the indices ‘r’ for which there was a prediction error, i.e., .
,x y ,w bH x y : 0r rr y w x b
w x rb
r r rb b y
rr
w w y x
0r ry w x b
The update rule - Illustrasion
1 2 3 4 5
Predicted Rank
Correct interval
The PRank algorithm
0 0
1,2,..
1) w 0; ;2) Get ( , )
3) Predict min : 0 .
4) If Mistake ( )
4.1) 1
1.
4.2) 0
0.
r
t trr k
tr
tr
t t t t tr r r r
tr
bx y
z r w x b
z y
if y r then y
else y
if w x b y then y
else
1
1
4.3) ;
4.5) ; 4.4) t 1;5) Goto 2.
tt t r
r
t t tr r r
w w x
b bt
Building the TRUErank vector
Checking which thresholdprediction is wrong
Updating the hypothesis
First, we need to show that the output hypothesis of Prank is acceptable. Meaning, if and is the final ranking rule then .
Proof – By induction:Since the initialization of the thresholds is such that , then it suffices to show that the claim hold inductively.
Lemma 1 (Order Preservation):Let and be the current ranking rule, where and let be an instance-rank pair fed to Prank on round ‘t’. Denote by and the resulting ranking after the update of Prank, then
PRank Analysis – Consistent Hypothesis
tw tb1 2 1...t t t
kb b b ,t tx y1tw 1tb
1 1 11 2 1...t t t
kb b b
fw fb1 2 1...f f f
kb b b
0 0 01 2 1... kb b b
Lemma 1 – Proof
2 3 4 5 6
Predicted Rank
Correct interval
1
1Option 1
1 2 3 4 5
Correct interval
Predicted Rank
1
1Option 2
1
Theorem 2:Let be an input sequence for PRank where . and . Denote by . Assume that there is a ranking rule with of a unit norm that classifies the entire sequence correctly with margin . . Then, the rank loss of the algorithm , is at the most .
PRank Analysis – Mistake bound
1 1, ,..., ,T Tx y x y
t nx 1,...,ty k 22 max ttR x
* * *,v w b * * *1 2 1... kb b b
* *,min 0t t
r t r rw x b y
1
T t t
t
y y
2
2
1 1k R
Comparison between:◦ Prank.◦ MultiClass Perceptron – MCP.◦ Widrow-Hoff (online regression) – WH.
Datasets:◦ Synthetic.◦ EachMovie.
Experiments
Randomly generated points - uniformly at random.
Each point was assign a rank according to:
◦ - noise. Generated 100
sequences of instance-rank pairs, each of length 7000.
Synthetic Dataset 21 2, 0,1x x x
1,...,5y
1 2max 10 0.5 05 , , 1, 0.1,0.25,1r ry x x b whereb
0,0.125N
Collaborative filtering dataset. Contains ratings of movies provided by 61,265 people.
6 possible rating: 0, 0.2, 0.4, 0.6, 0.8, 1. Only people with at
least 100 rating whereconsidered.
Chose at random oneperson to be the TRUE rank and otherratings where used asfeatures(-0.5,-0.3,-0.1,0.1, 0.3, 0.5).
EachMovie Dataset
Batch setting Ran Prank over the training data as an online algorithm
and used its last hypothesis to rank the unseen data.
EachMovie Dataset – cont’
Thank You
PERCEPTRON משפט2|| || 1w | |Tw x 2|| ||x R
2
#( ) RMistakes הוכחה
2
cos( )|| ||
Tt
tt
w ww
1 ( )T Ti i i iw w w w y a
( )T Ti i iw w y w a
| |T Ti iw w w a
Tiw w
Ttw w t
PERCEPTRON משפט2|| || 1w | |Tw x 2|| ||x R
2
#( ) RMistakes הוכחה
2
cos( )|| ||
Tt
tt
w ww
Ttw w t
21 2|| ||iw
22|| ||i i iw y a
2 22 2|| || || || 2 ( )T
i i i i i iw y a y w a 2 22 2|| || || ||i iw a 2 22|| ||iw R
2 22|| ||tw tR
PERCEPTRON משפט2|| || 1w | |Tw x 2|| ||x R
2
#( ) RMistakes הוכחה
2
cos( )|| ||
Tt
tt
w ww
Ttw w t 2 2
2|| ||tw tR
2
1 cos( )|| ||
Tt
tt
w ww
t tRR t
2
#( ) RMistakes t