sumeet katariya, ardhendu tripathy, robert nowak=1 2 =0.98 0 500 1k 1.5k 2k #samples 0.0 0.2 0.4 0.6...

MaxGap Bandit:Adaptive Sampling for Approximate RankingSumeet Katariya, Ardhendu Tripathy, Robert Nowak

University of Wisconsin-Madison

ProblemMotivation

1) Surveys

µ(1)µ(2)µ(3)µ(4)µ(5)µ(6)

maybe unsafe safeunsafelarge gap

2) Grading

3) Identify Outliers

FormulationAdaptively sample to partially order distributions according totheir means (identify large gaps).

The simplest problem for 2 clusters is: MaxGap Bandit

MaxGap Bandit: Cluster using largest gap

•K arms with means µ1, . . . , µK.•Arm with the largest gap

m = arg maxi∈[K−1]

(µ(i) − µ(i+1))

•Defines two clustersC1 = {(1), . . . , (m)} and C2 = [K] \ C1

• Let algorithm A stop after T (A) samples and return clustersC1, C2 = [K] \ C1

Objective: minAT (A)

subject to P(C1 6= C1) ≤ 1− δ

Existing Solutions don’t work

•Best Arm Identification on gaps:1) Only K out of

(K2

)gaps matter

2) arm rewards not independent

•Adaptive Sorting

∆min ∆max

Requires O(K/∆2min) samples (compared to O(K/∆2

max)needed in this work)

SolutionAlgorithms

•Elimination: Play active arms whose gap upper bound ishigher than the gap lower bound•UCB: Play all (there will be multiple) arms with the highestgap UCB•Top2UCB: Play arms with the top-two highest gap UCBsHow to compute confidence intervals on gaps from confidenceintervals on means?

Gap Upper Bounds: MIP

Computing upper bounds requires solving a MIPΔU r

a = max{b∈[K]:b≠a}

maxµ1,…,µK

(µb − µa)

1) li ≤ µi ≤ ri ∀ i ∈ [K ]2)∀i ∉ {a, b}, µi ∉ (µa, µb)

µi ≤ µa + M(1 − yi)µi ≥ µb − Myi

yi ∈ {0,1}

Gap Upper Bounds: Efficient O(K2)Algorithm

Max-gap if µa known

Max-gap if µa unknown

Gap Lower Bounds

L∆

AnalysisSample Complexity

H =∑

i 6={m,m+1}

1γ2i

whereγi = min{γri , γli}

γri = maxj:∆i,j>0

min{∆i,j,∆max −∆i,j}

γli = maxj:∆i,j<0

min{∆j,i,∆max −∆j,i}

Distribution i can be eliminated quickly if there is another dis-tribution j that has a moderately large gap from i (so that thisgap can be easily detected), but not too large so that gap is easyto distinguish from ∆max.

MaxGap Bandit 6= Best-arm Identificationon Gaps

max

Ui L

Gap of iLarge gap in nbrhd

An arm may have a small gap, but if there is a large gap inits neighborhood, it cannot be eliminated quickly.

Minimax Lower Bound

For the above problem, we prove a lower bound that matches theupper bound.

ExperimentsEmpirical vs Theoretical Stopping Time

∆2 ∆1∆3

(a)

(b)20K 25K 30K 35K 40K

Hardness parameter0

5K

10K

15K

20K

No. o

f sam

ples

bef

ore

stop

ping Elimination

UCBTop2UCB

20K 25K 30K 35K 40KHardness parameter

0

20K

40K

60K

80K

No. o

f sam

ples

bef

ore

stop

ping Random

EliminationUCBTop2UCB

(c)

Adaptive Gains

μ(1)μ(9)μ(10)μ(18)μ(19)μ(24)

∆max=1 ∆2=0.98

0 500 1K 1.5K 2K#Samples

0.0

0.2

0.4

0.6

Mist

ake

prob

abilit

y

RandomEliminationUCBTop2UCB

0

2

4

6

8 t=40 t=60 t=80

24 19 18 10 9 10

50

100

150

200 t=200

24 19 18 10 9 1

t=400

24 19 18 10 9 1

t=600

Profile of samples allocated by MaxGapUCB

Arm index

Num

ber o

f sam

ples

Streetscore Dataset

μ(1)μ(90)

∆max

sumeet katariya, ardhendu tripathy, robert nowak=1 2 =0.98 0 500 1k 1.5k 2k #samples 0.0 0.2 0.4 0.6...

Documents