idsia lugano switzerland on the convergence speed of mdl predictions for bernoulli sequences jan...
TRANSCRIPT
IDSIA Lugano Switzerland
On the Convergence Speed of MDL Predictions for Bernoulli Sequences
Jan Poland and Marcus Hutter
Is MDL Really So Bad?
or
2
Big Picture
MDL Bayes Other methods, e.g. PAC-Bayes
3
Bernoulli Classes
121410
38164
111001
18164
111000
58164
111010
341161101
78164
111011
01400
11401
#
wCode
141161100
Code = 111|{z}1+#bi ts
0|{z}stop
10|{z}data
² Set of parameters £ = f#1;#2; : : :g½[0;1]² Weights w# for each#2 £² Weights correspond to codes: w#=2¡ (Code#)
4
² Givenobservedsequencex=x1x2:::xn
² Probabilityof x given#:p#(x) =##ones(x)(1¡ #)n¡ #ones(x)
² Posterior weights w#(x) =w#p#(x)P#w#p#(x)
² Bayesmixture»(x) =P#w#(x)#
² MDL/MAP #¤(x) =argmax#w#(x)#
² MaximumLikelihood (ML):SameasMAP,butwithprior weightsset to1
Estimators
5
An Example Process
Sequence x
Bayes mixture
ML estimate
MAP (MDL) *
0.5
0
0
0
0.21
0
0
1
0.5
0.5
0.5
0
0.45
0.34
0.5
0000011
0.4
5/16
0.5
...(32)...
0.27
0.25
0.25
...(640)...
0.3
5/16
5/16
Trueparameter#0= 5
16 =0:3125
6
² Let #0 2 £ bethe trueparameter withweight w0
² » converges to #0 almost surely and fast,
preciselyP 1t=0E(»¡ #0)2 · ln(w¡ 10 )
² #¤ convergesto#0 almost surelyandingeneral slow,
preciselyP 1t=0E(#¤ ¡ #0)2 · O(w¡ 10 )
² Even true for arbitrary non-i.i.d. (semi-) measures!² TheML estimates converge to #0 almost surely,no such assertion about convergencespeed possible
What We Know
7
² Bayesmixturebound is descri pt i on l ength(#0)
² MDL bound is exp(descri pt i on l ength(#0))
² ) MDL is exponentiallyworse in general
² This is also a loss bound!
² Howabout simple classes?
² Deterministic classes: can showboundhuge constant£(descri pt i on l ength(#0))3
² Simplestochastic classes, e.g. Bernoulli?
Is MDL Really So Bad?
8
N parameters, w#= 1Nfor all #, #0= 1
2
MDL Is Really So Bad!
12+ 116
12+ 18
12+ 14
12
: :: } }}}}
Pt E(#
¤ ¡ #0)21#¤2[12+18;12+14]
=O(1)
Pt E(#
¤ ¡ #0)21#¤2[12+ 116;12+ 18]
=O(1)
Pt E(#
¤ ¡ #0)2=O(w¡ 10 ) in the following example:
9
² The instantaneous loss bound is good,
precisely E (#¤ ¡ #0)2 · 1nO¡ln(w¡ 10 )
¢
² This does not imply a ¯nitely bounded cumulativeloss!
² The cumulative loss bound is good for certain niceclasses (parameters+weights)
² Intuitively: Bound is good if parameters of equalweights areuniformly distributed
MDL Is Not That Bad!
10
² De ne interval construction (I k; J k) which exponen-tially contracts to #0
² Let K (I k) betheshortest description lengthof some#2 I k
Prepare Sharper Upper Bound
0 18
178
34
58
12
38#0= 1
4}J 0= [0; 12)
}I 0= [12;1]
}}}
I1 J1I1
11
² Let K (J k) betheshortest description lengthof some#2 J k
² Let ¢ (k) =max©K (I k) ¡ K (J k);0
ª
² Theorem:X
t
E(#¤ ¡ #0)2 · O¡lnw¡ 10 +
1X
k=1
2¡ ¢ (k)p¢ (k)
¢
² Corollaries: \Uniformly distributed weights ) goodbounds
Sharper Upper Bound
12
² £ = fall computable#2 [0;1]g
² w#=2¡ K (#), whereK denotesthepre xKolmogorovcomplexity
²Pk 2¡ ¢ (k)
p¢ (k) = 1 ) Theoremnot applicable
² Conjecture:X
t
E(#¤¡ #0)2 · O¡lnw¡ 10 +
1X
k=1
2¡ ¢ (k)¢
² ) bound huge constant£pol ynomi al holds forincompressible#0
² Compare to determistic case
The Universal Case
13
² Cumulativeand instantaneousboundsareincompat-ible
² Main positivegeneralizes to arbitrary i.i.d. classes
² Openproblem: goodboundsformoregeneral classes?
² Thank you!
Conclusions