Download - Stable Coactive Learning via Perturbationkarthik/Publications/PPT/ICML-2013.pdf · Stable Coactive Learning via Perturbation Karthik Raman 1 Thorsten Joachims 1 Pannaga Shivaswamy

Stable Coactive Learning via Perturbation

Karthik Raman 1 Thorsten Joachims 1 Pannaga Shivaswamy 2

Tobias Schnabel 3

1Cornell University {karthik,tj}@cs.cornell.edu

2AT&T Research [email protected]

3Stuttgart University [email protected]

June 19, 2013

Karthik Raman (Cornell) Stable Coactive Learning June 19, 2013 1 / 5

Coactive Learning

Learning modelRepeat forever:

System receives context xt .

System makes prediction yt .

Regret = Regret + U(xt , y∗t )− U(xt , yt)

System gets feedback:Full information: U(xt , y(1)),U(xt , y(2)), . . .Bandit: U(xt , yt)Coactive: U(xt , yt) ≥α U(xt , yt)

e.g. : SearchEngine

User Query

Ranking

User utility

Unrealistic for users to provide (e.g., implicit feedback).Perceptron has regret O(

1α√

T) for linear utility (U(x, y)=w>∗φ(x, y)).


Coactive Learning





System gets feedback:Full information: U(xt , y(1)),U(xt , y(2)), . . .

Bandit: U(xt , yt)Coactive: U(xt , yt) ≥α U(xt , yt)

e.g. : SearchEngine

User Query

Ranking

User utility


1α√



Coactive Learning





System gets feedback:Full information: U(xt , y(1)),U(xt , y(2)), . . .

Bandit: U(xt , yt)Coactive: U(xt , yt) ≥α U(xt , yt)

e.g. : SearchEngine

User Query

Ranking

User utility

Unrealistic for users to provide (e.g., implicit feedback).

Perceptron has regret O(1

α√



Coactive Learning





System gets feedback:Full information: U(xt , y(1)),U(xt , y(2)), . . .Bandit: U(xt , yt)

Coactive: U(xt , yt) ≥α U(xt , yt)

e.g. : SearchEngine

User Query

Ranking

User utility



α√



Coactive Learning






e.g. : SearchEngine

User Query

Ranking

User utility


1α√



Coactive Learning






e.g. : SearchEngine

User Query

Ranking

User utility



α√



User Study: Learning Rankings using Perceptron

Preference Perceptron Algo:1 Initialize weight vector w1 ← 0.2 Given context xt present

yt ← argmaxyw>t φ(xt , y).

3 Observe clicks and constructfeedback ranking yt .

4 wt+1←wt+φ(xt , yt)−φ(xt , yt).5 Repeat from step 2.

On live search engine.Goal: Learn ranking functionfrom user clicks.Interleaved comparison againsthand-tuned baseline.

Win ratio of 1 means no betterthan baseline (Higher = Better).

0.8

1

1.2

1.4

1.6

1.8

2

2.2

0 5000 10000 15000 20000 25000 30000

Win

Rat

io

Number Of Iterations

User Study Results

Preference Perceptron

Perceptron performs poorly!







On live search engine.Goal: Learn ranking functionfrom user clicks.Interleaved comparison againsthand-tuned baseline.Win ratio of 1 means no betterthan baseline (Higher = Better).

0.8

1

1.2

1.4

1.6

1.8

2

2.2

0 5000 10000 15000 20000 25000 30000

Win

Rat

io


User Study Results


Perceptron performs poorly!








0.8

1

1.2

1.4

1.6

1.8

2

2.2

0 5000 10000 15000 20000 25000 30000

Win

Rat

io


User Study Results


Perceptron performs poorly!Karthik Raman (Cornell) Stable Coactive Learning June 19, 2013 3 / 5






Presented Ranking (y)


0.8

1

1.2

1.4

1.6

1.8

2

2.2

0 5000 10000 15000 20000 25000 30000

Win

Rat

io


User Study Results





yt ← argmaxyw>t φ(xt , y).3 Observe clicks and construct

feedback ranking yt .


Click!

Click!

Presented Ranking (y)


0.8

1

1.2

1.4

1.6

1.8

2

2.2

0 5000 10000 15000 20000 25000 30000

Win

Rat

io


User Study Results






feedback ranking yt .


Click!

Click!

Presented Ranking (y) Feedback Ranking (y )


0.8

1

1.2

1.4

1.6

1.8

2

2.2

0 5000 10000 15000 20000 25000 30000

Win

Rat

io


User Study Results






feedback ranking yt .4 wt+1←wt+φ(xt , yt)−φ(xt , yt).5 Repeat from step 2.

Click!

Click!

Presented Ranking (y) Feedback Ranking (y )


0.8

1

1.2

1.4

1.6

1.8

2

2.2

0 5000 10000 15000 20000 25000 30000

Win

Rat

io


User Study Results



Perturbed Preference Perceptron1 Initialize weight vector w1 ← 0.2 Given context xt compute

yt ← argmaxyw>t φ(xt , y).3 Present yt ← Perturb(yt)

(Randomly swap adjacent pairs).4 Observe clicks and construct

feedback ranking yt .5 wt+1←wt+φ(xt , yt)−φ(xt , yt).6 Repeat from step 2.

0.8

1

1.2

1.4

1.6

1.8

2

2.2

0 5000 10000 15000 20000 25000 30000

Win

Rat

io


User Study Results


Perturbed Preference Perceptron

Presented Ranking (y) Predicted Ranking (y )

PERTURB


Please come to our poster

I will tell you:

Why the preference perceptron performs poorly?Why does perturbation fix the problem?What are the regret bounds for the algorithm?How do we do this more generally for non-rankingproblems?


Download - Stable Coactive Learning via Perturbationkarthik/Publications/PPT/ICML-2013.pdf · Stable Coactive Learning via Perturbation Karthik Raman 1 Thorsten Joachims 1 Pannaga Shivaswamy

Top Related