idsia lugano switzerland on the convergence speed of mdl predictions for bernoulli sequences jan...

IDSIA Lugano Switzerland

On the Convergence Speed of MDL Predictions for Bernoulli Sequences

Jan Poland and Marcus Hutter

Is MDL Really So Bad?

or

2

Big Picture

MDL Bayes Other methods, e.g. PAC-Bayes

3

Bernoulli Classes

121410

38164

111001

18164

111000

58164

111010

341161101

78164

111011

01400

11401

#

wCode

141161100

Code = 111|{z}1+#bi ts

0|{z}stop

10|{z}data

² Set of parameters £ = f#1;#2; : : :g½[0;1]² Weights w# for each#2 £² Weights correspond to codes: w#=2¡ (Code#)

4

² Givenobservedsequencex=x1x2:::xn

² Probabilityof x given#:p#(x) =##ones(x)(1¡ #)n¡ #ones(x)

² Posterior weights w#(x) =w#p#(x)P#w#p#(x)

² Bayesmixture»(x) =P#w#(x)#

² MDL/MAP #¤(x) =argmax#w#(x)#

² MaximumLikelihood (ML):SameasMAP,butwithprior weightsset to1

Estimators

5

An Example Process

Sequence x

Bayes mixture

ML estimate

MAP (MDL) *

0.5

0

0

0

0.21

0

0

1

0.5

0.5

0.5

0

0.45

0.34

0.5

0000011

0.4

5/16

0.5

...(32)...

0.27

0.25

0.25

...(640)...

0.3

5/16

5/16

Trueparameter#0= 5

16 =0:3125

6

² Let #0 2 £ bethe trueparameter withweight w0

² » converges to #0 almost surely and fast,

preciselyP 1t=0E(»¡ #0)2 · ln(w¡ 10 )

² #¤ convergesto#0 almost surelyandingeneral slow,

preciselyP 1t=0E(#¤ ¡ #0)2 · O(w¡ 10 )

² Even true for arbitrary non-i.i.d. (semi-) measures!² TheML estimates converge to #0 almost surely,no such assertion about convergencespeed possible

What We Know

7

² Bayesmixturebound is descri pt i on l ength(#0)

² MDL bound is exp(descri pt i on l ength(#0))

² ) MDL is exponentiallyworse in general

² This is also a loss bound!

² Howabout simple classes?

² Deterministic classes: can showboundhuge constant£(descri pt i on l ength(#0))3

² Simplestochastic classes, e.g. Bernoulli?

Is MDL Really So Bad?

8

N parameters, w#= 1Nfor all #, #0= 1

2

MDL Is Really So Bad!

12+ 116

12+ 18

12+ 14

12

: :: } }}}}

Pt E(#

¤ ¡ #0)21#¤2[12+18;12+14]

=O(1)

Pt E(#

¤ ¡ #0)21#¤2[12+ 116;12+ 18]

=O(1)

Pt E(#

¤ ¡ #0)2=O(w¡ 10 ) in the following example:

9

² The instantaneous loss bound is good,

precisely E (#¤ ¡ #0)2 · 1nO¡ln(w¡ 10 )

¢

² This does not imply a ¯nitely bounded cumulativeloss!

² The cumulative loss bound is good for certain niceclasses (parameters+weights)

² Intuitively: Bound is good if parameters of equalweights areuniformly distributed

MDL Is Not That Bad!

10

² De ne interval construction (I k; J k) which exponen-tially contracts to #0

² Let K (I k) betheshortest description lengthof some#2 I k

Prepare Sharper Upper Bound

0 18

178

34

58

12

38#0= 1

4}J 0= [0; 12)

}I 0= [12;1]

}}}

I1 J1I1

11

² Let K (J k) betheshortest description lengthof some#2 J k

² Let ¢ (k) =max©K (I k) ¡ K (J k);0

ª

² Theorem:X

t

E(#¤ ¡ #0)2 · O¡lnw¡ 10 +

1X

k=1

2¡ ¢ (k)p¢ (k)

¢

² Corollaries: \Uniformly distributed weights ) goodbounds

Sharper Upper Bound

12

² £ = fall computable#2 [0;1]g

² w#=2¡ K (#), whereK denotesthepre xKolmogorovcomplexity

²Pk 2¡ ¢ (k)

p¢ (k) = 1 ) Theoremnot applicable

² Conjecture:X

t

E(#¤¡ #0)2 · O¡lnw¡ 10 +

1X

k=1

2¡ ¢ (k)¢

² ) bound huge constant£pol ynomi al holds forincompressible#0

² Compare to determistic case

The Universal Case

13

² Cumulativeand instantaneousboundsareincompat-ible

² Main positivegeneralizes to arbitrary i.i.d. classes

² Openproblem: goodboundsformoregeneral classes?

² Thank you!

Conclusions

idsia lugano switzerland on the convergence speed of mdl predictions for bernoulli sequences jan...

Documents

t rueparame t

s e t o f parame t ers

w e i g h t s w

w c o d e

n t h e f o ll

s t op

bi t s

pacbayes slide