sumeet katariya, ardhendu tripathy, robert nowak=1 2 =0.98 0 500 1k 1.5k 2k #samples 0.0 0.2 0.4 0.6...
TRANSCRIPT
MaxGap Bandit:Adaptive Sampling for Approximate RankingSumeet Katariya, Ardhendu Tripathy, Robert Nowak
University of Wisconsin-Madison
ProblemMotivation
1) Surveys
µ(1)µ(2)µ(3)µ(4)µ(5)µ(6)
maybe unsafe safeunsafelarge gap
2) Grading
3) Identify Outliers
FormulationAdaptively sample to partially order distributions according totheir means (identify large gaps).
The simplest problem for 2 clusters is: MaxGap Bandit
MaxGap Bandit: Cluster using largest gap
•K arms with means µ1, . . . , µK.•Arm with the largest gap
m = arg maxi∈[K−1]
(µ(i) − µ(i+1))
•Defines two clustersC1 = {(1), . . . , (m)} and C2 = [K] \ C1
• Let algorithm A stop after T (A) samples and return clustersC1, C2 = [K] \ C1
Objective: minAT (A)
subject to P(C1 6= C1) ≤ 1− δ
Existing Solutions don’t work
•Best Arm Identification on gaps:1) Only K out of
(K2
)gaps matter
2) arm rewards not independent
•Adaptive Sorting
∆min ∆max
Requires O(K/∆2min) samples (compared to O(K/∆2
max)needed in this work)
SolutionAlgorithms
•Elimination: Play active arms whose gap upper bound ishigher than the gap lower bound•UCB: Play all (there will be multiple) arms with the highestgap UCB•Top2UCB: Play arms with the top-two highest gap UCBsHow to compute confidence intervals on gaps from confidenceintervals on means?
Gap Upper Bounds: MIP
Computing upper bounds requires solving a MIPΔU r
a = max{b∈[K]:b≠a}
maxµ1,…,µK
(µb − µa)
1) li ≤ µi ≤ ri ∀ i ∈ [K ]2)∀i ∉ {a, b}, µi ∉ (µa, µb)
µi ≤ µa + M(1 − yi)µi ≥ µb − Myi
yi ∈ {0,1}
Gap Upper Bounds: Efficient O(K2)Algorithm
Max-gap if µa known
Max-gap if µa unknown
Gap Lower Bounds
L∆
AnalysisSample Complexity
H =∑
i 6={m,m+1}
1γ2i
whereγi = min{γri , γli}
γri = maxj:∆i,j>0
min{∆i,j,∆max −∆i,j}
γli = maxj:∆i,j<0
min{∆j,i,∆max −∆j,i}
Distribution i can be eliminated quickly if there is another dis-tribution j that has a moderately large gap from i (so that thisgap can be easily detected), but not too large so that gap is easyto distinguish from ∆max.
MaxGap Bandit 6= Best-arm Identificationon Gaps
max
Ui L
Gap of iLarge gap in nbrhd
An arm may have a small gap, but if there is a large gap inits neighborhood, it cannot be eliminated quickly.
Minimax Lower Bound
For the above problem, we prove a lower bound that matches theupper bound.
ExperimentsEmpirical vs Theoretical Stopping Time
∆2 ∆1∆3
(a)
(b)20K 25K 30K 35K 40K
Hardness parameter0
5K
10K
15K
20K
No. o
f sam
ples
bef
ore
stop
ping Elimination
UCBTop2UCB
20K 25K 30K 35K 40KHardness parameter
0
20K
40K
60K
80K
No. o
f sam
ples
bef
ore
stop
ping Random
EliminationUCBTop2UCB
(c)
Adaptive Gains
μ(1)μ(9)μ(10)μ(18)μ(19)μ(24)
∆max=1 ∆2=0.98
0 500 1K 1.5K 2K#Samples
0.0
0.2
0.4
0.6
Mist
ake
prob
abilit
y
RandomEliminationUCBTop2UCB
0
2
4
6
8 t=40 t=60 t=80
24 19 18 10 9 10
50
100
150
200 t=200
24 19 18 10 9 1
t=400
24 19 18 10 9 1
t=600
Profile of samples allocated by MaxGapUCB
Arm index
Num
ber o
f sam
ples
Streetscore Dataset
μ(1)μ(90)
∆max