beat the mean bandit
DESCRIPTION
Beat the Mean Bandit. ICML 2011 Yisong Yue Carnegie Mellon University Joint work with Thorsten Joachims (Cornell University). Optimizing Information Retrieval Systems. Increasingly reliant on user feedback E.g., clicks on search results Online learning is a popular modeling tool - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Beat the Mean Bandit](https://reader034.vdocuments.us/reader034/viewer/2022042608/56815fb7550346895dceb10e/html5/thumbnails/1.jpg)
Beat the Mean Bandit
ICML 2011
Yisong Yue Carnegie Mellon University
Joint work with Thorsten Joachims (Cornell University)
![Page 2: Beat the Mean Bandit](https://reader034.vdocuments.us/reader034/viewer/2022042608/56815fb7550346895dceb10e/html5/thumbnails/2.jpg)
Optimizing Information Retrieval Systems
• Increasingly reliant on user feedback– E.g., clicks on search results
• Online learning is a popular modeling tool– Especially partial-information (bandit) settings
• Our focus: learning from relative preferences– Motivated by recent work on interleaved retrieval
evaluation (example following)
![Page 3: Beat the Mean Bandit](https://reader034.vdocuments.us/reader034/viewer/2022042608/56815fb7550346895dceb10e/html5/thumbnails/3.jpg)
Team Draft Interleaving(Comparison Oracle for Search)
Ranking A1.Napa Valley – The authority for lodging...
www.napavalley.com2.Napa Valley Wineries - Plan your wine...
www.napavalley.com/wineries3.Napa Valley College
www.napavalley.edu/homex.asp4. Been There | Tips | Napa Valley
www.ivebeenthere.co.uk/tips/166815. Napa Valley Wineries and Wine
www.napavintners.com6. Napa Country, California – Wikipedia
en.wikipedia.org/wiki/Napa_Valley
Ranking B1. Napa Country, California – Wikipedia
en.wikipedia.org/wiki/Napa_Valley2. Napa Valley – The authority for lodging...
www.napavalley.com3. Napa: The Story of an American Eden...
books.google.co.uk/books?isbn=...4. Napa Valley Hotels – Bed and Breakfast...
www.napalinks.com5. NapaValley.org
www.napavalley.org6. The Napa Valley Marathon
www.napavalleymarathon.org
Presented Ranking1.Napa Valley – The authority for lodging...
www.napavalley.com2. Napa Country, California – Wikipedia
en.wikipedia.org/wiki/Napa_Valley3. Napa: The Story of an American Eden...
books.google.co.uk/books?isbn=...4.Napa Valley Wineries – Plan your wine...
www.napavalley.com/wineries5. Napa Valley Hotels – Bed and Breakfast...
www.napalinks.com 6.Napa Balley College
www.napavalley.edu/homex.asp7 NapaValley.org
www.napavalley.org
AB
[Radlinski et al. 2008]
![Page 4: Beat the Mean Bandit](https://reader034.vdocuments.us/reader034/viewer/2022042608/56815fb7550346895dceb10e/html5/thumbnails/4.jpg)
Ranking A1.Napa Valley – The authority for lodging...
www.napavalley.com2.Napa Valley Wineries - Plan your wine...
www.napavalley.com/wineries3.Napa Valley College
www.napavalley.edu/homex.asp4. Been There | Tips | Napa Valley
www.ivebeenthere.co.uk/tips/166815. Napa Valley Wineries and Wine
www.napavintners.com6. Napa Country, California – Wikipedia
en.wikipedia.org/wiki/Napa_Valley
Ranking B1. Napa Country, California – Wikipedia
en.wikipedia.org/wiki/Napa_Valley2. Napa Valley – The authority for lodging...
www.napavalley.com3. Napa: The Story of an American Eden...
books.google.co.uk/books?isbn=...4. Napa Valley Hotels – Bed and Breakfast...
www.napalinks.com5. NapaValley.org
www.napavalley.org6. The Napa Valley Marathon
www.napavalleymarathon.org
Presented Ranking1.Napa Valley – The authority for lodging...
www.napavalley.com2. Napa Country, California – Wikipedia
en.wikipedia.org/wiki/Napa_Valley3. Napa: The Story of an American Eden...
books.google.co.uk/books?isbn=...4.Napa Valley Wineries – Plan your wine...
www.napavalley.com/wineries5. Napa Valley Hotels – Bed and Breakfast...
www.napalinks.com 6.Napa Balley College
www.napavalley.edu/homex.asp7 NapaValley.org
www.napavalley.org
B wins!
Click
[Radlinski et al. 2008]
Click
Team Draft Interleaving(Comparison Oracle for Search)
![Page 5: Beat the Mean Bandit](https://reader034.vdocuments.us/reader034/viewer/2022042608/56815fb7550346895dceb10e/html5/thumbnails/5.jpg)
…A B C Total wins Total losses
A wins vs… 0 1 0 1 0B wins vs… 0 0 0 0 1C wins vs… 0 0 0 0 0
Interleave A vs B
![Page 6: Beat the Mean Bandit](https://reader034.vdocuments.us/reader034/viewer/2022042608/56815fb7550346895dceb10e/html5/thumbnails/6.jpg)
…
Interleave A vs C
A B C Total wins Total lossesA wins vs… 0 1 0 1 1B wins vs… 0 0 0 0 1C wins vs… 1 0 0 1 0
![Page 7: Beat the Mean Bandit](https://reader034.vdocuments.us/reader034/viewer/2022042608/56815fb7550346895dceb10e/html5/thumbnails/7.jpg)
…
Interleave B vs C
A B C Total wins Total lossesA wins vs… 0 1 0 1 1B wins vs… 0 1 0 1 1C wins vs… 1 0 0 1 1
![Page 8: Beat the Mean Bandit](https://reader034.vdocuments.us/reader034/viewer/2022042608/56815fb7550346895dceb10e/html5/thumbnails/8.jpg)
…
Interleave A vs B
A B C Total wins Total lossesA wins vs… 0 1 0 1 2B wins vs… 0 2 0 2 1C wins vs… 1 0 0 1 1
![Page 9: Beat the Mean Bandit](https://reader034.vdocuments.us/reader034/viewer/2022042608/56815fb7550346895dceb10e/html5/thumbnails/9.jpg)
Outline
• Learning Formulation– Dueling Bandits Problem [Yue et al. 2009]
• Modeling transitivity violation– E.g., (A >> B) AND (B >> C) IMPLIES (A >> C) ??– Not done in previous work
![Page 10: Beat the Mean Bandit](https://reader034.vdocuments.us/reader034/viewer/2022042608/56815fb7550346895dceb10e/html5/thumbnails/10.jpg)
Outline
• Learning Formulation– Dueling Bandits Problem [Yue et al. 2009]
• Modeling transitivity violation– E.g., (A >> B) AND (B >> C) IMPLIES (A >> C) ??– Not done in previous work
• Algorithm: Beat-the-Mean
• Empirical Validation
![Page 11: Beat the Mean Bandit](https://reader034.vdocuments.us/reader034/viewer/2022042608/56815fb7550346895dceb10e/html5/thumbnails/11.jpg)
Dueling Bandits Problem
• Given K bandits b1, …, bK
• Each iteration: compare (duel) two bandits– E.g., interleaving two retrieval functions
[Yue et al. 2009]
![Page 12: Beat the Mean Bandit](https://reader034.vdocuments.us/reader034/viewer/2022042608/56815fb7550346895dceb10e/html5/thumbnails/12.jpg)
Dueling Bandits Problem
• Given K bandits b1, …, bK
• Each iteration: compare (duel) two bandits– E.g., interleaving two retrieval functions
• Cost function (regret):
• (bt, bt’) are the two bandits chosen• b* is the overall best one• (% users who prefer best bandit over chosen ones)
T
tttT bbPbbPR
1
1)'*()*(
[Yue et al. 2009]
![Page 13: Beat the Mean Bandit](https://reader034.vdocuments.us/reader034/viewer/2022042608/56815fb7550346895dceb10e/html5/thumbnails/13.jpg)
Example Pairwise PreferencesA B C D E F
A 0 0.05 0.05 0.04 0.11 0.11B -0.05 0 0.05 0.06 0.08 0.10C -0.05 -0.05 0 0.04 0.01 0.06D -0.04 -0.04 -0.04 0 0.04 0.00E -0.11 -0.08 -0.01 -0.04 0 0.01F -0.11 -0.10 -0.06 -0.00 -0.01 0
•Values are Pr(row > col) – 0.5•Derived from interleaving experiments on http://arXiv.org
![Page 14: Beat the Mean Bandit](https://reader034.vdocuments.us/reader034/viewer/2022042608/56815fb7550346895dceb10e/html5/thumbnails/14.jpg)
Example Pairwise PreferencesA B C D E F
A 0 0.05 0.05 0.04 0.11 0.11B -0.05 0 0.05 0.06 0.08 0.10C -0.05 -0.05 0 0.04 0.01 0.06D -0.04 -0.04 -0.04 0 0.04 0.00E -0.11 -0.08 -0.01 -0.04 0 0.01F -0.11 -0.10 -0.06 -0.00 -0.01 0
•Values are Pr(row > col) – 0.5•Derived from interleaving experiments on http://arXiv.org
Compare E & F:•P(A > E) = 0.61•P(A > F) = 0.61•Incurred Regret = 0.22
T
tttT bbPbbPR
1
1)'*()*(
![Page 15: Beat the Mean Bandit](https://reader034.vdocuments.us/reader034/viewer/2022042608/56815fb7550346895dceb10e/html5/thumbnails/15.jpg)
Example Pairwise PreferencesA B C D E F
A 0 0.05 0.05 0.04 0.11 0.11B -0.05 0 0.05 0.06 0.08 0.10C -0.05 -0.05 0 0.04 0.01 0.06D -0.04 -0.04 -0.04 0 0.04 0.00E -0.11 -0.08 -0.01 -0.04 0 0.01F -0.11 -0.10 -0.06 -0.00 -0.01 0
•Values are Pr(row > col) – 0.5•Derived from interleaving experiments on http://arXiv.org
Compare B & C:•P(A > B) = 0.55•P(A > C) = 0.55•Incurred Regret = 0.10
T
tttT bbPbbPR
1
1)'*()*(
![Page 16: Beat the Mean Bandit](https://reader034.vdocuments.us/reader034/viewer/2022042608/56815fb7550346895dceb10e/html5/thumbnails/16.jpg)
Example Pairwise PreferencesA B C D E F
A 0 0.05 0.05 0.04 0.11 0.11B -0.05 0 0.05 0.06 0.08 0.10C -0.05 -0.05 0 0.04 0.01 0.06D -0.04 -0.04 -0.04 0 0.04 0.00E -0.11 -0.08 -0.01 -0.04 0 0.01F -0.11 -0.10 -0.06 -0.00 -0.01 0
•Values are Pr(row > col) – 0.5•Derived from interleaving experiments on http://arXiv.org
Compare A & A:•P(A > A) = 0.50•P(A > A) = 0.50•Incurred Regret = 0.00
T
tttT bbPbbPR
1
1)'*()*(
Interleaving shows ranking produced by A.
![Page 17: Beat the Mean Bandit](https://reader034.vdocuments.us/reader034/viewer/2022042608/56815fb7550346895dceb10e/html5/thumbnails/17.jpg)
Example Pairwise PreferencesA B C D E F
A 0 0.05 0.05 0.04 0.11 0.11B -0.05 0 0.05 0.06 0.08 0.10C -0.05 -0.05 0 0.04 0.01 0.06D -0.04 -0.04 -0.04 0 0.04 0.00E -0.11 -0.08 -0.01 -0.04 0 0.01F -0.11 -0.10 -0.06 -0.00 -0.01 0
Violation in internal consistency!For strong stochastic transitivity: •A > D should be at least 0.06
•Values are Pr(row > col) – 0.5•Derived from interleaving experiments on http://arXiv.org
![Page 18: Beat the Mean Bandit](https://reader034.vdocuments.us/reader034/viewer/2022042608/56815fb7550346895dceb10e/html5/thumbnails/18.jpg)
Example Pairwise PreferencesA B C D E F
A 0 0.05 0.05 0.04 0.11 0.11B -0.05 0 0.05 0.06 0.08 0.10C -0.05 -0.05 0 0.04 0.01 0.06D -0.04 -0.04 -0.04 0 0.04 0.00E -0.11 -0.08 -0.01 -0.04 0 0.01F -0.11 -0.10 -0.06 -0.00 -0.01 0
Violation in internal consistency!For strong stochastic transitivity: •C > E should be at least 0.04
•Values are Pr(row > col) – 0.5•Derived from interleaving experiments on http://arXiv.org
![Page 19: Beat the Mean Bandit](https://reader034.vdocuments.us/reader034/viewer/2022042608/56815fb7550346895dceb10e/html5/thumbnails/19.jpg)
Example Pairwise PreferencesA B C D E F
A 0 0.05 0.05 0.04 0.11 0.11B -0.05 0 0.05 0.06 0.08 0.10C -0.05 -0.05 0 0.04 0.01 0.06D -0.04 -0.04 -0.04 0 0.04 0.00E -0.11 -0.08 -0.01 -0.04 0 0.01F -0.11 -0.10 -0.06 -0.00 -0.01 0
Violation in internal consistency!For strong stochastic transitivity: •D > F should be at least 0.04
•Values are Pr(row > col) – 0.5•Derived from interleaving experiments on http://arXiv.org
![Page 20: Beat the Mean Bandit](https://reader034.vdocuments.us/reader034/viewer/2022042608/56815fb7550346895dceb10e/html5/thumbnails/20.jpg)
Modeling Assumptions
• P(bi > bj) = ½ + εij
• Let b1 be the best overall bandit
• Relaxed Stochastic Transitivity– For three bandits b1 > bj > bk :– γ ≥ 1 (γ = 1 for strong transitivity **)– Relaxed internal consistency property
• Stochastic Triangle Inequality– For three bandits b1 > bj > bk :– Diminishing returns property
jkjk 11
(** γ = 1 required in previous work, and required to apply for all bandit triplets)
![Page 21: Beat the Mean Bandit](https://reader034.vdocuments.us/reader034/viewer/2022042608/56815fb7550346895dceb10e/html5/thumbnails/21.jpg)
Example Pairwise Preferences
A B C D E FA 0 0.05 0.05 0.04 0.11 0.11B -0.05 0 0.05 0.06 0.08 0.10C -0.05 -0.05 0 0.04 0.01 0.06D -0.04 -0.04 -0.04 0 0.04 0.00E -0.11 -0.08 -0.01 -0.04 0 0.01F -0.11 -0.10 -0.06 -0.00 -0.01 0
γ = 1.5
jkjk , max 11
•Values are Pr(row > col) – 0.5•Derived from interleaving experiments on http://arXiv.org
![Page 22: Beat the Mean Bandit](https://reader034.vdocuments.us/reader034/viewer/2022042608/56815fb7550346895dceb10e/html5/thumbnails/22.jpg)
Beat-the-MeanA B C D E F Mean Lower
BoundUpperBound
A winsTotal
00
00
00
00
00
00
--0
0.00 1.00
B wins Total
00
00
0 0
00
00
00
--0
0.00 1.00
C wins Total
00
00
00
00
00
00
--0
0.00 1.00
D winsTotal
00
00
00
00
00
00
--0
0.00 1.00
E wins Total
00
00
00
00
00
00
--0
0.00 1.00
F wins Total
00
00
00
00
00
00
--0
0.00 1.00
![Page 23: Beat the Mean Bandit](https://reader034.vdocuments.us/reader034/viewer/2022042608/56815fb7550346895dceb10e/html5/thumbnails/23.jpg)
Beat-the-MeanA B C D E F Mean Lower
BoundUpperBound
A winsTotal
00
00
00
00
00
00
--0
0.00 1.00
B wins Total
00
00
0 0
00
00
00
--0
0.00 1.00
C wins Total
00
00
00
00
00
00
--0
0.00 1.00
D winsTotal
00
00
00
00
00
00
--0
0.00 1.00
E wins Total
00
00
00
00
00
00
--0
0.00 1.00
F wins Total
00
00
00
00
00
00
--0
0.00 1.00
Comparison Results
![Page 24: Beat the Mean Bandit](https://reader034.vdocuments.us/reader034/viewer/2022042608/56815fb7550346895dceb10e/html5/thumbnails/24.jpg)
Beat-the-MeanA B C D E F Mean Lower
BoundUpperBound
A winsTotal
00
00
00
00
00
00
--0
0.00 1.00
B wins Total
00
00
0 0
00
00
00
--0
0.00 1.00
C wins Total
00
00
00
00
00
00
--0
0.00 1.00
D winsTotal
00
00
00
00
00
00
--0
0.00 1.00
E wins Total
00
00
00
00
00
00
--0
0.00 1.00
F wins Total
00
00
00
00
00
00
--0
0.00 1.00
Mean Score &Confidence Interval
![Page 25: Beat the Mean Bandit](https://reader034.vdocuments.us/reader034/viewer/2022042608/56815fb7550346895dceb10e/html5/thumbnails/25.jpg)
Beat-the-MeanA B C D E F Mean Lower
BoundUpperBound
A winsTotal
00
00
00
00
00
00
--0
0.00 1.00
B wins Total
00
00
0 0
00
00
00
--0
0.00 1.00
C wins Total
00
00
00
00
00
00
--0
0.00 1.00
D winsTotal
00
00
00
00
00
00
--0
0.00 1.00
E wins Total
00
00
00
00
00
00
--0
0.00 1.00
F wins Total
00
00
00
00
00
00
--0
0.00 1.00
A’s performance vs rest
![Page 26: Beat the Mean Bandit](https://reader034.vdocuments.us/reader034/viewer/2022042608/56815fb7550346895dceb10e/html5/thumbnails/26.jpg)
Beat-the-MeanA B C D E F Mean Lower
BoundUpperBound
A winsTotal
00
00
00
00
00
00
--0
0.00 1.00
B wins Total
00
00
0 0
00
00
00
--0
0.00 1.00
C wins Total
00
00
00
00
00
00
--0
0.00 1.00
D winsTotal
00
00
00
00
00
00
--0
0.00 1.00
E wins Total
00
00
00
00
00
00
--0
0.00 1.00
F wins Total
00
00
00
00
00
00
--0
0.00 1.00
A’s mean performance
![Page 27: Beat the Mean Bandit](https://reader034.vdocuments.us/reader034/viewer/2022042608/56815fb7550346895dceb10e/html5/thumbnails/27.jpg)
Beat-the-MeanA B C D E F Mean Lower
BoundUpperBound
A winsTotal
00
11
00
00
00
00
1.001
0.00 1.00
B wins Total
00
00
0 0
00
00
00
--0
0.00 1.00
C wins Total
00
00
00
00
00
00
--0
0.00 1.00
D winsTotal
00
00
00
00
00
00
--0
0.00 1.00
E wins Total
00
00
00
00
00
00
--0
0.00 1.00
F wins Total
00
00
00
00
00
00
--0
0.00 1.00
![Page 28: Beat the Mean Bandit](https://reader034.vdocuments.us/reader034/viewer/2022042608/56815fb7550346895dceb10e/html5/thumbnails/28.jpg)
Beat-the-MeanA B C D E F Mean Lower
BoundUpperBound
A winsTotal
00
11
00
00
00
00
1.001
0.00 1.00
B wins Total
00
00
0 0
00
01
00
0.001
0.00 1.00
C wins Total
00
00
00
00
00
00
--0
0.00 1.00
D winsTotal
00
00
00
00
00
00
--0
0.00 1.00
E wins Total
00
00
00
00
00
00
--0
0.00 1.00
F wins Total
00
00
00
00
00
00
--0
0.00 1.00
![Page 29: Beat the Mean Bandit](https://reader034.vdocuments.us/reader034/viewer/2022042608/56815fb7550346895dceb10e/html5/thumbnails/29.jpg)
Beat-the-MeanA B C D E F Mean Lower
BoundUpperBound
A winsTotal
00
11
00
00
00
00
1.001
0.00 1.00
B wins Total
00
00
0 0
00
01
00
0.001
0.00 1.00
C wins Total
00
00
00
00
00
11
1.001
0.00 1.00
D winsTotal
00
00
00
00
00
00
--0
0.00 1.00
E wins Total
00
00
00
00
00
00
--0
0.00 1.00
F wins Total
00
00
00
00
00
00
--0
0.00 1.00
![Page 30: Beat the Mean Bandit](https://reader034.vdocuments.us/reader034/viewer/2022042608/56815fb7550346895dceb10e/html5/thumbnails/30.jpg)
Beat-the-MeanA B C D E F Mean Lower
BoundUpperBound
A winsTotal
00
11
00
00
00
00
1.001
0.00 1.00
B wins Total
00
00
0 0
00
01
00
0.001
0.00 1.00
C wins Total
00
00
00
00
00
11
1.001
0.00 1.00
D winsTotal
00
00
01
00
00
00
0.001
0.00 1.00
E wins Total
00
00
00
00
00
00
--0
0.00 1.00
F wins Total
00
00
00
00
00
00
--0
0.00 1.00
![Page 31: Beat the Mean Bandit](https://reader034.vdocuments.us/reader034/viewer/2022042608/56815fb7550346895dceb10e/html5/thumbnails/31.jpg)
Beat-the-MeanA B C D E F Mean Lower
BoundUpperBound
A winsTotal
00
11
00
00
00
00
1.001
0.00 1.00
B wins Total
00
00
0 0
00
01
00
0.001
0.00 1.00
C wins Total
00
00
00
00
00
11
1.001
0.00 1.00
D winsTotal
00
00
01
00
00
00
0.001
0.00 1.00
E wins Total
01
00
00
00
00
00
0.001
0.00 1.00
F wins Total
00
00
00
00
00
00
--0
0.00 1.00
![Page 32: Beat the Mean Bandit](https://reader034.vdocuments.us/reader034/viewer/2022042608/56815fb7550346895dceb10e/html5/thumbnails/32.jpg)
Beat-the-MeanA B C D E F Mean Lower
BoundUpperBound
A winsTotal
00
11
00
00
00
00
1.001
0.00 1.00
B wins Total
00
00
0 0
00
01
00
0.001
0.00 1.00
C wins Total
00
00
00
00
00
11
1.001
0.00 1.00
D winsTotal
00
00
01
00
00
00
0.001
0.00 1.00
E wins Total
01
00
00
00
00
00
0.001
0.00 1.00
F wins Total
00
00
01
00
00
00
0.001
0.00 1.00
![Page 33: Beat the Mean Bandit](https://reader034.vdocuments.us/reader034/viewer/2022042608/56815fb7550346895dceb10e/html5/thumbnails/33.jpg)
Beat-the-MeanA B C D E F Mean Lower
BoundUpperBound
A winsTotal
1325
1624
1122
1628
2030
1321
0.59150
0.49 0.69
B wins Total
1430
1530
1319
1520
1726
2025
0.63150
0.53 0.73
C wins Total
1228
1022
1323
1528
2024
1325
0.55150
0.45 0.65
D winsTotal
920
1528
1021
1123
1528
1530
0.50150
0.40 0.60
E wins Total
824
1125
622
1429
1431
1019
0.42150
0.32 0.52
F wins Total
1129
425
1018
1225
1430
1323
0.43150
0.33 0.53
![Page 34: Beat the Mean Bandit](https://reader034.vdocuments.us/reader034/viewer/2022042608/56815fb7550346895dceb10e/html5/thumbnails/34.jpg)
Beat-the-MeanA B C D E F Mean Lower
BoundUpperBound
A winsTotal
1325
1624
1122
1628
2030
1321
0.59150
0.49 0.69
B wins Total
1430
1530
1319
1520
1726
2025
0.63150
0.53 0.73
C wins Total
1228
1022
1323
1528
2024
1325
0.55150
0.45 0.65
D winsTotal
920
1528
1021
1123
1528
1530
0.50150
0.40 0.60
E wins Total
824
1125
622
1429
1431
1019
0.42150
0.32 0.52
F wins Total
1129
425
1018
1225
1430
1323
0.43150
0.33 0.53
B dominates E!(B’s lower bound greater than E’s upper bound)
![Page 35: Beat the Mean Bandit](https://reader034.vdocuments.us/reader034/viewer/2022042608/56815fb7550346895dceb10e/html5/thumbnails/35.jpg)
Beat-the-MeanA B C D E F Mean Lower
BoundUpperBound
A winsTotal
1325
1624
1122
1628
2030
1321
0.58120
0.49 0.67
B wins Total
1430
1530
1319
1520
1526
2025
0.62124
0.51 0.73
C wins Total
1228
1022
1323
1528
2024
1325
0.50126
0.39 0.61
D winsTotal
920
1528
1021
1123
1528
1530
0.49122
0.38 0.60
E wins Total
824
1125
622
1429
1431
1019
0.42150
0.32 0.52
F wins Total
1129
425
1018
1225
1430
1323
0.42120
0.31 0.53
![Page 36: Beat the Mean Bandit](https://reader034.vdocuments.us/reader034/viewer/2022042608/56815fb7550346895dceb10e/html5/thumbnails/36.jpg)
Beat-the-MeanA B C D E F Mean Lower
BoundUpperBound
A winsTotal
1325
1725
1122
1628
2030
1321
0.58121
0.49 0.67
B wins Total
1430
1530
1319
1520
1526
2025
0.62124
0.51 0.73
C wins Total
1228
1022
1323
1528
2024
1325
0.50126
0.39 0.61
D winsTotal
920
1528
1021
1123
1528
1530
0.49122
0.38 0.60
E wins Total
824
1125
622
1429
1431
1019
0.42150
0.32 0.52
F wins Total
1129
425
1018
1225
1430
1323
0.42120
0.31 0.53
![Page 37: Beat the Mean Bandit](https://reader034.vdocuments.us/reader034/viewer/2022042608/56815fb7550346895dceb10e/html5/thumbnails/37.jpg)
Beat-the-MeanA B C D E F Mean Lower
BoundUpperBound
A winsTotal
1530
1929
1428
1833
2330
1525
0.56145
0.46 0.66
B wins Total
1533
1734
1524
2027
1526
2327
0.62145
0.52 0.72
C wins Total
1331
1128
1429
1530
2024
1627
0.48145
0.38 0.68
D winsTotal
1126
1731
1226
1429
1528
1733
0.49145
0.39 0.59
E wins Total
824
1125
622
1429
1431
1019
0.42150
0.32 0.52
F wins Total
1232
730
1326
1328
1430
1529
0.41145
0.31 0.51
![Page 38: Beat the Mean Bandit](https://reader034.vdocuments.us/reader034/viewer/2022042608/56815fb7550346895dceb10e/html5/thumbnails/38.jpg)
Beat-the-MeanA B C D E F Mean Lower
BoundUpperBound
A winsTotal
1530
1929
1428
1833
2330
1525
0.56145
0.46 0.66
B wins Total
1533
1734
1524
2027
1526
2327
0.62145
0.52 0.72
C wins Total
1331
1128
1429
1530
2024
1627
0.48145
0.38 0.68
D winsTotal
1126
1731
1226
1429
1528
1733
0.49145
0.39 0.59
E wins Total
824
1125
622
1429
1431
1019
0.42150
0.32 0.52
F wins Total
1232
730
1326
1328
1430
1529
0.41145
0.31 0.51
B dominates F!(B’s lower bound greater than F’s upper bound)
![Page 39: Beat the Mean Bandit](https://reader034.vdocuments.us/reader034/viewer/2022042608/56815fb7550346895dceb10e/html5/thumbnails/39.jpg)
Beat-the-MeanA B C D E F Mean Lower
BoundUpperBound
A winsTotal
1530
1929
1428
1833
2330
1525
0.55120
0.43 0.67
B wins Total
1533
1734
1524
2027
1526
2327
0.56118
0.44 0.68
C wins Total
1331
1128
1429
1530
2024
1627
0.45118
0.33 0.57
D winsTotal
1126
1731
1226
1429
1528
1733
0.48112
0.36 0.60
E wins Total
824
1125
622
1429
1431
1019
0.42150
0.32 0.52
F wins Total
1232
730
1326
1328
1430
1529
0.41145
0.31 0.51
![Page 40: Beat the Mean Bandit](https://reader034.vdocuments.us/reader034/viewer/2022042608/56815fb7550346895dceb10e/html5/thumbnails/40.jpg)
Beat-the-MeanA B C D E F Mean Lower
BoundUpperBound
A winsTotal
4180
4475
3870
4275
2330
1525
0.55300
0.48 0.62
B wins Total
3169
3878
4778
5175
1526
2327
0.56300
0.49 0.63
C wins Total
3377
3177
3570
3976
2024
1627
0.46300
0.49 0.53
D winsTotal
3076
2777
3574
3573
1528
1733
0.42300
0.35 0.49
E wins Total
824
1125
622
1429
1431
1019
0.42150
0.32 0.52
F wins Total
1232
730
1326
1328
1430
1529
0.41145
0.31 0.51
B dominates D!(B’s lower bound greater than D’s upper bound)
![Page 41: Beat the Mean Bandit](https://reader034.vdocuments.us/reader034/viewer/2022042608/56815fb7550346895dceb10e/html5/thumbnails/41.jpg)
Beat-the-MeanA B C D E F Mean Lower
BoundUpperBound
A winsTotal
4180
4475
3870
4275
2330
1525
0.55225
0.46 0.64
B wins Total
3169
3878
4778
5175
1526
2327
0.52225
0.43 0.61
C wins Total
3377
3177
3570
3976
2024
1627
0.33225
0.24 0.42
D winsTotal
3076
2777
3574
3573
1528
1733
0.42300
0.35 0.49
E wins Total
824
1125
622
1429
1431
1019
0.42150
0.32 0.52
F wins Total
1232
730
1326
1328
1430
1529
0.41145
0.31 0.51
A dominates C!(A’s lower bound greater than C’s upper bound)
![Page 42: Beat the Mean Bandit](https://reader034.vdocuments.us/reader034/viewer/2022042608/56815fb7550346895dceb10e/html5/thumbnails/42.jpg)
Beat-the-MeanA B C D E F Mean Lower
BoundUpperBound
A winsTotal
4180
4475
3870
4275
2330
1525
0.5180
0.38 0.64
B wins Total
3169
3878
4778
5175
1526
2327
0.52147
0.45 0.49
C wins Total
3377
3177
3570
3976
2024
1627
0.33225
0.24 0.42
D winsTotal
3076
2777
3574
3573
1528
1733
0.42300
0.35 0.49
E wins Total
824
1125
622
1429
1431
1019
0.42150
0.32 0.52
F wins Total
1232
730
1326
1328
1430
1529
0.41145
0.31 0.51
Eventually… A is last bandit remaining. A is declared best bandit!
![Page 43: Beat the Mean Bandit](https://reader034.vdocuments.us/reader034/viewer/2022042608/56815fb7550346895dceb10e/html5/thumbnails/43.jpg)
Regret Guarantee• Playing against mean bandit calibrates preference scores
– Estimates of (active) bandits directly comparable – One estimate per active bandit = linear number of estimates
![Page 44: Beat the Mean Bandit](https://reader034.vdocuments.us/reader034/viewer/2022042608/56815fb7550346895dceb10e/html5/thumbnails/44.jpg)
Regret Guarantee• Playing against mean bandit calibrates preference scores
– Estimates of (active) bandits directly comparable – One estimate per active bandit = linear number of estimates
• We can bound comparisons needed to remove worst bandit– Varies smoothly with transitivity parameter γ– High probability bound
• We can bound the regret incurred by each comparison– Varies smoothly with transitivity parameter γ
![Page 45: Beat the Mean Bandit](https://reader034.vdocuments.us/reader034/viewer/2022042608/56815fb7550346895dceb10e/html5/thumbnails/45.jpg)
Regret Guarantee• Playing against mean bandit calibrates preference scores
– Estimates of (active) bandits directly comparable – One estimate per active bandit = linear number of estimates
• We can bound comparisons needed to remove worst bandit– Varies smoothly with transitivity parameter γ– High probability bound
• We can bound the regret incurred by each comparison– Varies smoothly with transitivity parameter γ
• Thus, we can bound the total regret with high probability:– γ is typically close to 1
TKORT log
7
We also have a similar PAC guarantee.
![Page 46: Beat the Mean Bandit](https://reader034.vdocuments.us/reader034/viewer/2022042608/56815fb7550346895dceb10e/html5/thumbnails/46.jpg)
Regret Guarantee• Playing against mean bandit calibrates preference scores
– Estimates of (active) bandits directly comparable – One estimate per active bandit = linear number of estimates
• We can bound comparisons needed to remove worst bandit– Varies smoothly with transitivity parameter γ– High probability bound
• We can bound the regret incurred by each comparison– Varies smoothly with transitivity parameter γ
• Thus, we can bound the total regret with high probability:– γ is typically close to 1
TKORT log
7
We also have a similar PAC guarantee.
Not possible with previous approaches!
![Page 47: Beat the Mean Bandit](https://reader034.vdocuments.us/reader034/viewer/2022042608/56815fb7550346895dceb10e/html5/thumbnails/47.jpg)
•Simulation experiment where γ = 1.3•Light = Beat-the-Mean•Dark = Interleaved Filter [Yue et al. 2009]
•Beat-the-Mean maintains linear regret guarantee•Interleaved Filter suffers quadratic regret in the worst case
![Page 48: Beat the Mean Bandit](https://reader034.vdocuments.us/reader034/viewer/2022042608/56815fb7550346895dceb10e/html5/thumbnails/48.jpg)
•Simulation experiment where γ = 1 (original DB setting)•Light = Beat-the-Mean•Dark = Interleaved Filter [Yue et al. 2009]
•Beat-the-Mean has high probability bound•Beat-the-Mean exhibits significantly lower variance
![Page 49: Beat the Mean Bandit](https://reader034.vdocuments.us/reader034/viewer/2022042608/56815fb7550346895dceb10e/html5/thumbnails/49.jpg)
Conclusions
• Online learning approach using pairwise feedback– Well-suited for optimizing information retrieval systems
from user feedback– Models violations in preference transitivity
• Algorithm: Beat-the-Mean– Regret linear in #bandits and logarithmic in #iterations– Degrades smoothly with transitivity violation– Stronger guarantees than previous work– Empirically supported