Stable Coactive Learning via Perturbation
Karthik Raman 1 Thorsten Joachims 1 Pannaga Shivaswamy 2
Tobias Schnabel 3
1Cornell University {karthik,tj}@cs.cornell.edu
2AT&T Research [email protected]
3Stuttgart University [email protected]
June 19, 2013
Karthik Raman (Cornell) Stable Coactive Learning June 19, 2013 1 / 5
Coactive Learning
Learning modelRepeat forever:
System receives context xt .
System makes prediction yt .
Regret = Regret + U(xt , y∗t )− U(xt , yt)
System gets feedback:Full information: U(xt , y(1)),U(xt , y(2)), . . .Bandit: U(xt , yt)Coactive: U(xt , yt) ≥α U(xt , yt)
e.g. : SearchEngine
User Query
Ranking
User utility
Unrealistic for users to provide (e.g., implicit feedback).Perceptron has regret O(
1α√
T) for linear utility (U(x, y)=w>∗φ(x, y)).
Karthik Raman (Cornell) Stable Coactive Learning June 19, 2013 2 / 5
Coactive Learning
Learning modelRepeat forever:
System receives context xt .
System makes prediction yt .
Regret = Regret + U(xt , y∗t )− U(xt , yt)
System gets feedback:Full information: U(xt , y(1)),U(xt , y(2)), . . .Bandit: U(xt , yt)Coactive: U(xt , yt) ≥α U(xt , yt)
e.g. : SearchEngine
User Query
Ranking
User utility
Unrealistic for users to provide (e.g., implicit feedback).Perceptron has regret O(
1α√
T) for linear utility (U(x, y)=w>∗φ(x, y)).
Karthik Raman (Cornell) Stable Coactive Learning June 19, 2013 2 / 5
Coactive Learning
Learning modelRepeat forever:
System receives context xt .
System makes prediction yt .
Regret = Regret + U(xt , y∗t )− U(xt , yt)
System gets feedback:Full information: U(xt , y(1)),U(xt , y(2)), . . .Bandit: U(xt , yt)Coactive: U(xt , yt) ≥α U(xt , yt)
e.g. : SearchEngine
User Query
Ranking
User utility
Unrealistic for users to provide (e.g., implicit feedback).Perceptron has regret O(
1α√
T) for linear utility (U(x, y)=w>∗φ(x, y)).
Karthik Raman (Cornell) Stable Coactive Learning June 19, 2013 2 / 5
Coactive Learning
Learning modelRepeat forever:
System receives context xt .
System makes prediction yt .
Regret = Regret + U(xt , y∗t )− U(xt , yt)
System gets feedback:Full information: U(xt , y(1)),U(xt , y(2)), . . .Bandit: U(xt , yt)Coactive: U(xt , yt) ≥α U(xt , yt)
e.g. : SearchEngine
User Query
Ranking
User utility
Unrealistic for users to provide (e.g., implicit feedback).Perceptron has regret O(
1α√
T) for linear utility (U(x, y)=w>∗φ(x, y)).
Karthik Raman (Cornell) Stable Coactive Learning June 19, 2013 2 / 5
Coactive Learning
Learning modelRepeat forever:
System receives context xt .
System makes prediction yt .
Regret = Regret + U(xt , y∗t )− U(xt , yt)
System gets feedback:Full information: U(xt , y(1)),U(xt , y(2)), . . .Bandit: U(xt , yt)Coactive: U(xt , yt) ≥α U(xt , yt)
e.g. : SearchEngine
User Query
Ranking
User utility
Unrealistic for users to provide (e.g., implicit feedback).Perceptron has regret O(
1α√
T) for linear utility (U(x, y)=w>∗φ(x, y)).
Karthik Raman (Cornell) Stable Coactive Learning June 19, 2013 2 / 5
Coactive Learning
Learning modelRepeat forever:
System receives context xt .
System makes prediction yt .
Regret = Regret + U(xt , y∗t )− U(xt , yt)
System gets feedback:Full information: U(xt , y(1)),U(xt , y(2)), . . .
Bandit: U(xt , yt)Coactive: U(xt , yt) ≥α U(xt , yt)
e.g. : SearchEngine
User Query
Ranking
User utility
Unrealistic for users to provide (e.g., implicit feedback).Perceptron has regret O(
1α√
T) for linear utility (U(x, y)=w>∗φ(x, y)).
Karthik Raman (Cornell) Stable Coactive Learning June 19, 2013 2 / 5
Coactive Learning
Learning modelRepeat forever:
System receives context xt .
System makes prediction yt .
Regret = Regret + U(xt , y∗t )− U(xt , yt)
System gets feedback:Full information: U(xt , y(1)),U(xt , y(2)), . . .
Bandit: U(xt , yt)Coactive: U(xt , yt) ≥α U(xt , yt)
e.g. : SearchEngine
User Query
Ranking
User utility
Unrealistic for users to provide (e.g., implicit feedback).
Perceptron has regret O(1
α√
T) for linear utility (U(x, y)=w>∗φ(x, y)).
Karthik Raman (Cornell) Stable Coactive Learning June 19, 2013 2 / 5
Coactive Learning
Learning modelRepeat forever:
System receives context xt .
System makes prediction yt .
Regret = Regret + U(xt , y∗t )− U(xt , yt)
System gets feedback:Full information: U(xt , y(1)),U(xt , y(2)), . . .Bandit: U(xt , yt)
Coactive: U(xt , yt) ≥α U(xt , yt)
e.g. : SearchEngine
User Query
Ranking
User utility
Unrealistic for users to provide (e.g., implicit feedback).
Perceptron has regret O(1
α√
T) for linear utility (U(x, y)=w>∗φ(x, y)).
Karthik Raman (Cornell) Stable Coactive Learning June 19, 2013 2 / 5
Coactive Learning
Learning modelRepeat forever:
System receives context xt .
System makes prediction yt .
Regret = Regret + U(xt , y∗t )− U(xt , yt)
System gets feedback:Full information: U(xt , y(1)),U(xt , y(2)), . . .Bandit: U(xt , yt)Coactive: U(xt , yt) ≥α U(xt , yt)
e.g. : SearchEngine
User Query
Ranking
User utility
Unrealistic for users to provide (e.g., implicit feedback).Perceptron has regret O(
1α√
T) for linear utility (U(x, y)=w>∗φ(x, y)).
Karthik Raman (Cornell) Stable Coactive Learning June 19, 2013 2 / 5
Coactive Learning
Learning modelRepeat forever:
System receives context xt .
System makes prediction yt .
Regret = Regret + U(xt , y∗t )− U(xt , yt)
System gets feedback:Full information: U(xt , y(1)),U(xt , y(2)), . . .Bandit: U(xt , yt)Coactive: U(xt , yt) ≥α U(xt , yt)
e.g. : SearchEngine
User Query
Ranking
User utility
Unrealistic for users to provide (e.g., implicit feedback).
Perceptron has regret O(1
α√
T) for linear utility (U(x, y)=w>∗φ(x, y)).
Karthik Raman (Cornell) Stable Coactive Learning June 19, 2013 2 / 5
User Study: Learning Rankings using Perceptron
Preference Perceptron Algo:1 Initialize weight vector w1 ← 0.2 Given context xt present
yt ← argmaxyw>t φ(xt , y).
3 Observe clicks and constructfeedback ranking yt .
4 wt+1←wt+φ(xt , yt)−φ(xt , yt).5 Repeat from step 2.
On live search engine.Goal: Learn ranking functionfrom user clicks.Interleaved comparison againsthand-tuned baseline.
Win ratio of 1 means no betterthan baseline (Higher = Better).
0.8
1
1.2
1.4
1.6
1.8
2
2.2
0 5000 10000 15000 20000 25000 30000
Win
Rat
io
Number Of Iterations
User Study Results
Preference Perceptron
Perceptron performs poorly!
Karthik Raman (Cornell) Stable Coactive Learning June 19, 2013 3 / 5
User Study: Learning Rankings using Perceptron
Preference Perceptron Algo:1 Initialize weight vector w1 ← 0.2 Given context xt present
yt ← argmaxyw>t φ(xt , y).
3 Observe clicks and constructfeedback ranking yt .
4 wt+1←wt+φ(xt , yt)−φ(xt , yt).5 Repeat from step 2.
On live search engine.Goal: Learn ranking functionfrom user clicks.Interleaved comparison againsthand-tuned baseline.Win ratio of 1 means no betterthan baseline (Higher = Better).
0.8
1
1.2
1.4
1.6
1.8
2
2.2
0 5000 10000 15000 20000 25000 30000
Win
Rat
io
Number Of Iterations
User Study Results
Preference Perceptron
Perceptron performs poorly!
Karthik Raman (Cornell) Stable Coactive Learning June 19, 2013 3 / 5
User Study: Learning Rankings using Perceptron
Preference Perceptron Algo:1 Initialize weight vector w1 ← 0.2 Given context xt present
yt ← argmaxyw>t φ(xt , y).
3 Observe clicks and constructfeedback ranking yt .
4 wt+1←wt+φ(xt , yt)−φ(xt , yt).5 Repeat from step 2.
On live search engine.Goal: Learn ranking functionfrom user clicks.Interleaved comparison againsthand-tuned baseline.Win ratio of 1 means no betterthan baseline (Higher = Better).
0.8
1
1.2
1.4
1.6
1.8
2
2.2
0 5000 10000 15000 20000 25000 30000
Win
Rat
io
Number Of Iterations
User Study Results
Preference Perceptron
Perceptron performs poorly!Karthik Raman (Cornell) Stable Coactive Learning June 19, 2013 3 / 5
User Study: Learning Rankings using Perceptron
Preference Perceptron Algo:1 Initialize weight vector w1 ← 0.2 Given context xt present
yt ← argmaxyw>t φ(xt , y).
3 Observe clicks and constructfeedback ranking yt .
4 wt+1←wt+φ(xt , yt)−φ(xt , yt).5 Repeat from step 2.
Presented Ranking (y)
On live search engine.Goal: Learn ranking functionfrom user clicks.Interleaved comparison againsthand-tuned baseline.Win ratio of 1 means no betterthan baseline (Higher = Better).
0.8
1
1.2
1.4
1.6
1.8
2
2.2
0 5000 10000 15000 20000 25000 30000
Win
Rat
io
Number Of Iterations
User Study Results
Preference Perceptron
Perceptron performs poorly!Karthik Raman (Cornell) Stable Coactive Learning June 19, 2013 3 / 5
User Study: Learning Rankings using Perceptron
Preference Perceptron Algo:1 Initialize weight vector w1 ← 0.2 Given context xt present
yt ← argmaxyw>t φ(xt , y).3 Observe clicks and construct
feedback ranking yt .
4 wt+1←wt+φ(xt , yt)−φ(xt , yt).5 Repeat from step 2.
Click!
Click!
Presented Ranking (y)
On live search engine.Goal: Learn ranking functionfrom user clicks.Interleaved comparison againsthand-tuned baseline.Win ratio of 1 means no betterthan baseline (Higher = Better).
0.8
1
1.2
1.4
1.6
1.8
2
2.2
0 5000 10000 15000 20000 25000 30000
Win
Rat
io
Number Of Iterations
User Study Results
Preference Perceptron
Perceptron performs poorly!Karthik Raman (Cornell) Stable Coactive Learning June 19, 2013 3 / 5
User Study: Learning Rankings using Perceptron
Preference Perceptron Algo:1 Initialize weight vector w1 ← 0.2 Given context xt present
yt ← argmaxyw>t φ(xt , y).3 Observe clicks and construct
feedback ranking yt .
4 wt+1←wt+φ(xt , yt)−φ(xt , yt).5 Repeat from step 2.
Click!
Click!
Presented Ranking (y) Feedback Ranking (y )
On live search engine.Goal: Learn ranking functionfrom user clicks.Interleaved comparison againsthand-tuned baseline.Win ratio of 1 means no betterthan baseline (Higher = Better).
0.8
1
1.2
1.4
1.6
1.8
2
2.2
0 5000 10000 15000 20000 25000 30000
Win
Rat
io
Number Of Iterations
User Study Results
Preference Perceptron
Perceptron performs poorly!Karthik Raman (Cornell) Stable Coactive Learning June 19, 2013 3 / 5
User Study: Learning Rankings using Perceptron
Preference Perceptron Algo:1 Initialize weight vector w1 ← 0.2 Given context xt present
yt ← argmaxyw>t φ(xt , y).3 Observe clicks and construct
feedback ranking yt .4 wt+1←wt+φ(xt , yt)−φ(xt , yt).5 Repeat from step 2.
Click!
Click!
Presented Ranking (y) Feedback Ranking (y )
On live search engine.Goal: Learn ranking functionfrom user clicks.Interleaved comparison againsthand-tuned baseline.Win ratio of 1 means no betterthan baseline (Higher = Better).
0.8
1
1.2
1.4
1.6
1.8
2
2.2
0 5000 10000 15000 20000 25000 30000
Win
Rat
io
Number Of Iterations
User Study Results
Preference Perceptron
Perceptron performs poorly!Karthik Raman (Cornell) Stable Coactive Learning June 19, 2013 3 / 5
Perturbed Preference Perceptron1 Initialize weight vector w1 ← 0.2 Given context xt compute
yt ← argmaxyw>t φ(xt , y).3 Present yt ← Perturb(yt)
(Randomly swap adjacent pairs).4 Observe clicks and construct
feedback ranking yt .5 wt+1←wt+φ(xt , yt)−φ(xt , yt).6 Repeat from step 2.
0.8
1
1.2
1.4
1.6
1.8
2
2.2
0 5000 10000 15000 20000 25000 30000
Win
Rat
io
Number Of Iterations
User Study Results
Preference Perceptron
Perturbed Preference Perceptron
Presented Ranking (y) Predicted Ranking (y )
PERTURB
Karthik Raman (Cornell) Stable Coactive Learning June 19, 2013 4 / 5
Please come to our poster
I will tell you:
Why the preference perceptron performs poorly?Why does perturbation fix the problem?What are the regret bounds for the algorithm?How do we do this more generally for non-rankingproblems?
Karthik Raman (Cornell) Stable Coactive Learning June 19, 2013 5 / 5