sumeet katariya, ardhendu tripathy, robert nowak=1 2 =0.98 0 500 1k 1.5k 2k #samples 0.0 0.2 0.4 0.6...

1
MaxGap Bandit:Adaptive Sampling for Approximate Ranking Sumeet Katariya, Ardhendu Tripathy, Robert Nowak University of Wisconsin-Madison Problem Motivation 1) Surveys μ (1) μ (2) μ (3) μ (4) μ (5) μ (6) maybe unsafe safe unsafe large gap 2) Grading 3) Identify Outliers Formulation Adaptively sample to partially order distributions according to their means (identify large gaps). The simplest problem for 2 clusters is: MaxGap Bandit MaxGap Bandit: Cluster using largest gap K arms with means μ 1 ,...,μ K . Arm with the largest gap m = arg max i[ K -1] (μ (i) - μ (i+1) ) Defines two clusters C 1 = {(1),..., (m)} and C 2 =[ K ] \ C 1 Let algorithm A stop after T (A) samples and return clusters ˆ C 1 , ˆ C 2 =[ K ] \ ˆ C 1 Objective: min A T (A) subject to P( ˆ C 1 = C 1 ) 1 - δ Existing Solutions don’t work Best Arm Identification on gaps: 1) Only K out of K 2 gaps matter 2) arm rewards not independent Adaptive Sorting Δ min Δ max Requires O (K/Δ 2 min ) samples (compared to O (K/Δ 2 max ) needed in this work) Solution Algorithms Elimination : Play active arms whose gap upper bound is higher than the gap lower bound UCB : Play all (there will be multiple) arms with the highest gap UCB Top2UCB : Play arms with the top-two highest gap UCBs How to compute confidence intervals on gaps from confidence intervals on means? Gap Upper Bounds: MIP Computing upper bounds requires solving a MIP U r a = max {b[K ]:ba} max μ 1 ,…,μ K ( μ b - μ a ) 1) l i μ i r i i [K ] 2) i {a, b}, μ i ( μ a , μ b ) μ i μ a + M(1 - y i ) μ i μ b - My i y i {0,1} Gap Upper Bounds: Efficient O (K 2 ) Algorithm Max-gap if μ a known Max-gap if μ a unknown Gap Lower Bounds L∆ Analysis Sample Complexity H = X i={m,m+1} 1 γ 2 i where γ i = min{γ r i l i } γ r i = max j i,j >0 min{Δ i,j , Δ max - Δ i,j } γ l i = max j i,j <0 min{Δ j,i , Δ max - Δ j,i } Distribution i can be eliminated quickly if there is another dis- tribution j that has a moderately large gap from i (so that this gap can be easily detected), but not too large so that gap is easy to distinguish from Δ max . MaxGap Bandit = Best-arm Identification on Gaps max U i L Gap of i Large gap in nbrhd An arm may have a small gap, but if there is a large gap in its neighborhood, it cannot be eliminated quickly. Minimax Lower Bound For the above problem, we prove a lower bound that matches the upper bound. Experiments Empirical vs Theoretical Stopping Time Δ 2 Δ 1 Δ 3 (a) (b) (c) Adaptive Gains μ (1) μ (9) μ (10) μ (18) μ (19) μ (24) Δ max =1 Δ 2 =0.98 Streetscore Dataset μ (1) μ (90) Δ max

Upload: others

Post on 02-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sumeet Katariya, Ardhendu Tripathy, Robert Nowak=1 2 =0.98 0 500 1K 1.5K 2K #Samples 0.0 0.2 0.4 0.6 Mistake probability Random Elimination UCB Top2UCB 0 2 4 6 8 t=40 t=60 t=80 24

MaxGap Bandit:Adaptive Sampling for Approximate RankingSumeet Katariya, Ardhendu Tripathy, Robert Nowak

University of Wisconsin-Madison

ProblemMotivation

1) Surveys

µ(1)µ(2)µ(3)µ(4)µ(5)µ(6)

maybe unsafe safeunsafelarge gap

2) Grading

3) Identify Outliers

FormulationAdaptively sample to partially order distributions according totheir means (identify large gaps).

The simplest problem for 2 clusters is: MaxGap Bandit

MaxGap Bandit: Cluster using largest gap

•K arms with means µ1, . . . , µK.•Arm with the largest gap

m = arg maxi∈[K−1]

(µ(i) − µ(i+1))

•Defines two clustersC1 = {(1), . . . , (m)} and C2 = [K] \ C1

• Let algorithm A stop after T (A) samples and return clustersC1, C2 = [K] \ C1

Objective: minAT (A)

subject to P(C1 6= C1) ≤ 1− δ

Existing Solutions don’t work

•Best Arm Identification on gaps:1) Only K out of

(K2

)gaps matter

2) arm rewards not independent

•Adaptive Sorting

∆min ∆max

Requires O(K/∆2min) samples (compared to O(K/∆2

max)needed in this work)

SolutionAlgorithms

•Elimination: Play active arms whose gap upper bound ishigher than the gap lower bound•UCB: Play all (there will be multiple) arms with the highestgap UCB•Top2UCB: Play arms with the top-two highest gap UCBsHow to compute confidence intervals on gaps from confidenceintervals on means?

Gap Upper Bounds: MIP

Computing upper bounds requires solving a MIPΔU r

a = max{b∈[K]:b≠a}

maxµ1,…,µK

(µb − µa)

1) li ≤ µi ≤ ri ∀ i ∈ [K ]2)∀i ∉ {a, b}, µi ∉ (µa, µb)

µi ≤ µa + M(1 − yi)µi ≥ µb − Myi

yi ∈ {0,1}

Gap Upper Bounds: Efficient O(K2)Algorithm

Max-gap if µa known

Max-gap if µa unknown

Gap Lower Bounds

L∆

AnalysisSample Complexity

H =∑

i 6={m,m+1}

1γ2i

whereγi = min{γri , γli}

γri = maxj:∆i,j>0

min{∆i,j,∆max −∆i,j}

γli = maxj:∆i,j<0

min{∆j,i,∆max −∆j,i}

Distribution i can be eliminated quickly if there is another dis-tribution j that has a moderately large gap from i (so that thisgap can be easily detected), but not too large so that gap is easyto distinguish from ∆max.

MaxGap Bandit 6= Best-arm Identificationon Gaps

max

Ui L

Gap of iLarge gap in nbrhd

An arm may have a small gap, but if there is a large gap inits neighborhood, it cannot be eliminated quickly.

Minimax Lower Bound

For the above problem, we prove a lower bound that matches theupper bound.

ExperimentsEmpirical vs Theoretical Stopping Time

∆2 ∆1∆3

(a)

(b)20K 25K 30K 35K 40K

Hardness parameter0

5K

10K

15K

20K

No. o

f sam

ples

bef

ore

stop

ping Elimination

UCBTop2UCB

20K 25K 30K 35K 40KHardness parameter

0

20K

40K

60K

80K

No. o

f sam

ples

bef

ore

stop

ping Random

EliminationUCBTop2UCB

(c)

Adaptive Gains

μ(1)μ(9)μ(10)μ(18)μ(19)μ(24)

∆max=1 ∆2=0.98

0 500 1K 1.5K 2K#Samples

0.0

0.2

0.4

0.6

Mist

ake

prob

abilit

y

RandomEliminationUCBTop2UCB

0

2

4

6

8 t=40 t=60 t=80

24 19 18 10 9 10

50

100

150

200 t=200

24 19 18 10 9 1

t=400

24 19 18 10 9 1

t=600

Profile of samples allocated by MaxGapUCB

Arm index

Num

ber o

f sam

ples

Streetscore Dataset

μ(1)μ(90)

∆max