prml bernulli beta distribution
DESCRIPTION
PRML Bernulli Beta distributionTRANSCRIPT
1 / 12
PRML2.1.1a
M-1!W$S
5n 23|
G`djNlg
2 / 12
Bernoulli,[NlgNMLEdj'
µML =m
N(2.8)
0 (2.8)NfKm:Q,M x = 1Nst (3$sNlg=)$N'Q,5
l?G<?Nmt
� s`,[ (0 (2.9))Nlg$b7N, mNM,,+kH$
Bin(m|N, µ)rGg=7F@? µNdjMO18/:m
N
� /J$G<?;CH→*<P<U#CH (p70r2H)
Y$:j!Nlg
3 / 12
Y$:j!G*<P<U#CHrrC9klg$µKD$FNv0N
(,[ p(µ),,W∵`Y:
p(D|µ) =N∏
n=1
µxn(1 − µ)1−xn (2.5)
Y$:j}'
p(µ|D) ∝ p(D|µ)p(µ)
JeN 2D0+i$b7v0N(,[ p(µ)b µ∗$(1− µ)∗Nh&J0(2.5)NV$t,Hw?h&J0G=.5l?i$v0N(,[p(µ|D)bw?h&JAKJkO:→conjugacy
Beta,[
4 / 12
Beta(µ|a, b) =Γ(a + b)
Γ(a)Γ(b)µa−1(1 − µ)b−1 (2.13)
0 (2.13)f'
� Γ(•)O,s^Xt$J<N0KhCFjA'
Γ(x) =
∫
∞
0
ux−1e−udu (1.141)
�Γ(a+b)Γ(a)Γ(b)
O5,=8t$D^jJ<Nt0r.)9k?aNjt
∫ 1
0
Beta(µ|a, b)dµ = 1 (2.14)
Beta,[
5 / 12
?QMH,6'
E[µ] =a
a + b(2.15)
var[µ] =ab
(a + b)2(a + b + 1)(2.16)
a, bOQia<? µN,[NQia<?G"k?a$6Qia<?
(hyperparameters)HFV
µ
a = 0.1
b = 0.1
0 0.5 10
1
2
3
µ
a = 1
b = 1
0 0.5 10
1
2
3
µ
a = 2
b = 3
0 0.5 10
1
2
3
µ
a = 8
b = 4
0 0.5 10
1
2
3
Figure 2.2 [Jk a, bMNlgN Beta,[NA
veN(,[
6 / 12
0 (2.9)(s`,[NjA0)H0 (2.13)(v0N(,['Beta,[)hj$µKX9kveN(,[OJ<Nh&JAKJk
p(µ|m, l, a, b) ∝ µm+a−1(1 − µ)l+b−1 (2.17)
� l = N − m$3$sdjfN"Nt
� veN(,[Ov0N(,[H18AKJCF$k3H,,+k
→Beta,[
Beta,[NA0rxQ7F$veN(,[rJ<Nh&Kq1k
p(µ|m, l, a, b) =Γ(m + a + l + b)
Γ(m + a)Γ(l + b)µm+a−1(1 − µ)l+b−1 (2.18)
a, bO=l>l$x = 1, x = 0NlgN effective number of observations
veN(,[
7 / 12
µ
prior
0 0.5 10
1
2
µ
likelihood function
0 0.5 10
1
2
µ
posterior
0 0.5 10
1
2
Figure 2.3 `!Y$:dj
� v0N(,[O a = 2, b = 2N Beta,[
� `YXtO0 (2.9)G$=NfN = m = 1$3Nlg0 (2.9)O0: f(µ) = µKJk
3Nlg$veN(,[O a = 3, b = 2N Beta,[KJk
`!"Wm<A (sequantial approach)
8 / 12
� Y$:Q@rHkH$`!"Wm<AO+3KPF/k
� v0N(,[H`YXtN*rKX8J/$Q,G<?O i.i.dG"k3H7+>j7J$
� `!j!Nlg$1s/tJG<?rH$$eG=NG<?rK~7Fbh$→gLJG<?;CH"k$Oj"k?$`X,Nlg,9k
� G`djb`!"Wm<AKhj~lk3H,G-k
=,,[ (predictive distribution)
9 / 12
!NnTNkLr=,9k
p(x = 1|D) =
∫ 1
0
p(x = 1|µ)p(µ|D)dµ =
∫ 1
0
µp(µ|D)dµ = E[µ|D]
(2.19)0 (2.18)(veN(,[),0 (2.15)(Beta,[N|TM)NkLrxQ9kH
p(x = 1|D) =m + a
m + a + l + b(2.20)
=,,[ (predictive distribution)
10 / 12
b7m, l → ∞,
p(x = 1|D) =m
m + l=
m
N
HJk%
0 (2.8)NG`djkLHlW9k
� 5BgJG<?;CHNlg$Y$7"sHG`djNkLOl
W9k
� -BJG<?;CHNlg$veN(,[N|TM'E[µ|D]Ov0N(,[N|TMHG`YNdjM (0 (2.7))NVK"k
E[µ] < E[µ|D] < µML
veN(,[
11 / 12
µ
a = 0.1
b = 0.1
0 0.5 10
1
2
3
µ
a = 1
b = 1
0 0.5 10
1
2
3
µ
a = 2
b = 3
0 0.5 10
1
2
3
µ
a = 8
b = 4
0 0.5 10
1
2
3
Figure 2.2 [Jk a, bMNlgN Beta,[NA
Q,MNtN}CKDlF$veN(,[NAO7c<WKJCF
$/
var[µ] =ab
(a + b)2(a + b + 1)(2.16)
b7$a → ∞"k$O b → ∞H9kH$var[µ] → 0KJklL*JY$7"sX,NC''Q,G<?NtN}CKDlF$v
eN(,[NfNTNj- (uncertainty),:/7Ff/ ?
Z@
12 / 12
dj� �lL*JY$7"sX,NC''Q,G<?NtN}CKDlF$
veN(,[NfNTNj- (uncertainty),:/7Ff/.Qia<?'θ;Q,G<?'D;1~N(,[:p(θ,D)
� �
Z@
12 / 12
dj� �lL*JY$7"sX,NC''Q,G<?NtN}CKDlF$
veN(,[NfNTNj- (uncertainty),:/7Ff/.Qia<?'θ;Q,G<?'D;1~N(,[:p(θ,D)
� �
Eθ[θ] = ED[Eθ[θ|D]] (2.21)
veN(,[N|TM Eθ[θ|D]rG<?r/89k,[ p(D)e?Q9kHv0N(,[N|TM Eθ[θ]Hy7$
Z@
12 / 12
dj� �lL*JY$7"sX,NC''Q,G<?NtN}CKDlF$
veN(,[NfNTNj- (uncertainty),:/7Ff/.Qia<?'θ;Q,G<?'D;1~N(,[:p(θ,D)
� �
varθ[θ] = ED[varθ[θ|D]] + varD[Eθ[θ|D]] (2.24)
∵veN(,[N?QMN,6:varD[Eθ[θ|D]] ≥ 0$veN(,[,6r p(D),[eN?QOv0N(,[N,6hj.5$
Z@
12 / 12
dj� �lL*JY$7"sX,NC''Q,G<?NtN}CKDlF$
veN(,[NfNTNj- (uncertainty),:/7Ff/.Qia<?'θ;Q,G<?'D;1~N(,[:p(θ,D)
� �
varθ[θ] = ED[varθ[θ|D]] + varD[Eθ[θ|D]] (2.24)
∵veN(,[N?QMN,6:varD[Eθ[θ|D]] ≥ 0$veN(,[,6r p(D),[eN?QOv0N(,[N,6hj.5$mU'
veN(,[N,6r p(D),[eN?QOv0N(,[N,6hj.5$.
0 (2.21),(2.24)NZ@
Beta ,[
13 / 12
Proof of (2.21)
14 / 12
Eθ[θ] = ED[Eθ[θ|D]] (2.21)
Z@'
ED[Eθ[θ|D]]0 (2.23)≡
∫{
∫
θp(θ|D)dθ
}
p(D)dD
=
∫
θ
{∫
p(θ|D)p(D)dD
}
dθ
=
∫
p(θ)θdθ0 (2.22)≡ Eθ[θ]
Proof of (2.24)
15 / 12
varθ[θ] = ED[varθ[θ|D]] + varD[Eθ[θ|D]] (2.24)
Z@'
varθ[θ] = Eθ[(θ − Eθ[θ])2]
= ED
[
Eθ[(θ − Eθ[θ])2] (1)
Eθ[(θ − Eθ[θ])2] = Eθ
[
(θ−Eθ[θ|D] + Eθ[θ|D] − Eθ[θ])2]
= Eθ
[
(θ − Eθ[θ|D])2]
+ 2Eθ
[
(θ − Eθ[θ|D])(Eθ[θ|D] − Eθ[θ])]
+[
(Eθ[θ|D] − Eθ[θ])2]
= varθ[θ|D] + 0 +[
(Eθ[θ|D] − ED[Eθ[θ|D]])2]
(2)
h 2`, 0KJC?KO$0 (2.21)rHQ7?,0 (2)r0 (1)Ke~9kH0 (2.24),.j)D