computation of the marginal likelihood
DESCRIPTION
First talk at BigMC seminar on 06/01/2010 (Institut Henri Poincaré, Paris), by Jean-Louis Foulley, INRA, on "Computation of the marginal likelihood".TRANSCRIPT
Computation of the marginal likelihood:
brief summary and method of power posteriors
JLF/BigMC 106/01/2011
Jean-Louis Foulley
Outline
Objectives
Brief summary of current methods Monte Carlo direct
Harmonic mean
Generalized harmonic mean
Chib Chib
Bridge sampling
Nested sampling
Power Posteriors
Relationship with fractional BF
Algorithm
Examples
Conclusion
06/01/2011 2JLF/BigMC
Objectives
( ) ( ) ( )
( )
( )( )( )
( ) ( ) ( )*
*
*
Marginal likelihood ("Prior Predictive", "Evidence")
-Normalization c
|
| where | |
onstant of |
m f d
fm
π
ππ π π
π
Θ=
= =
∫y y θ θ θ
θθ y θ y y θ θ
y
θ y
JLF/BigMC 3
( )
( ) ( )( ) ( )
( )( )
1 2 1
12
1 2 2
,12
M | / M
-Component of the Bayes facto
|
M / M
2 n
r
lm
m
mBF
m
D
π π
π π= =
∆ = −
y
y y y
y
( )12 ,1 ,2
,
10
2ln : Marginal deviance
Calibration: Jeffreys & Turing (Deciban: 10log BF)
m m
m j j
BF D D
D m
= −
= − y
06/01/2011 3
Methods/Monte Carlo, Harmonic Mean
( ) ( )( )( ) ( ) ( )
( )
1
1
Converges (a. s) to but very ineffi
Many samples outside reg
1ˆ |
,..., : draws from
ions ofhigh likelih
1)
ood
cien
Direct Monte Carlo
2)Harmonic mean (Newton
t
& Raftery, 1994)
G g
MC g
g
m
m
fG
π
== ∑y y θ
y
θ θ θ
1−
JLF/BigMC 4
( )1
ˆNRm =y
( )( )( ) ( ) ( )
( )( ) ( )( ) ( )( )( )( ) ( ) ( ) ( ) ( ) ( )
1
1
1 1
1 ,..., : draws from |
|
A special case of WIS: | /
wher
Conve
e / for
rges (a.s) but very instable (infinite variance): to be absolutely a d
oi e
|
v
G g
g g
J Jj j j
j j
j
G f
f w w
w g g f
π
π π
=
= =
∝ ∝
∑
∑ ∑
θ θ θ yy θ
y θ θ θ
θ θ θ θ y θ θ
"Worst Monte Carlo Method Ever" Radford Neal (2010)
Harmonic mean not really affected by change in prior while true marginal
highly sensitive to p
d
rior
06/01/2011 4
Methods/Gelfand&Dey & Chib
( )( )( )
( )( ) ( )( )( ) ( ) ( )
1
1
1
1ˆ
|
,...
3) Generalized harmonic mean
(Gelfand & Dey, 1994; Chen & Shao, 1997)
, : draws from |
g
G
GD g g g
g
gm
G f π
π
−
=
=
∑θ
yy θ θ
θ θ θ y
JLF/BigMC 5
( ) ( ) ( )
( )g . as an approx of the posterior: pbs in large
,...
dimensi
4)Chib's meth
on
, : draws from
od
|
s
πθ θ θ y
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )( ) ( )
* * *
*
ˆ ˆln ln | ln ln |
ˆ | to be estimated
(1995)
ln ln |
& ML, M
Simple & often effec
AP, | sele
ln ln |
tiv
ct d
e
,
e
SCm f
E
m f π π
π π
π
= + −
= + ∀
=
−y y θ θ θ y
y y θ θ θ y
θ y θ θ y
θ
06/01/2011 5
Chib(Cont.)
( ) ( ) ( ) ( )* * *
4)Chib (1995)
a) Gibbs & RaoBlackwellization (Chib,1995)
ˆ ˆln ln | ln ln |SCm f π π= + −y y θ θ θ y
JLF/BigMC 6
a) Gibbs & RaoBlackwellization (Chib,1995)
b) Metropolis-Hastings (Chib & Jeliazkov, 2001)
c) Kernel estimator (Chen, 1994)
06/01/2011 6
Chib via Gibbs
( )
( ) ( ) ( )
( ) ( ) ( )
1 2
1 2 1 2 2
known estimated
2 2 1 1 1
If ,
, | | , |
| | , | d
π π π
π π π
=
=
= ∫
θ θ θ
θ θ y θ y θ θ y
θ y θ y θ θ y θ
JLF/BigMC 7
( ) ( ) ( )
( ) ( )( )( ) ( )
2 2 1 1 1
known MCMC dr
* *
2 2
aws
1 1
11
| | , |
"Estimation by Rao-Blackwellization"
: draws from
| ,
|
1ˆ |
G g
g
g
G
d
π
π π π
π
π
=
=
=
∫
∑
θ y θ y θ θ y θ
θ θ
θ y θ y θ
y
06/01/2011 7
Bridge sampling
( ) ( )
5) (Meng & Wong, 19
|
96)
f π
Bridge samp
y θ θ
il ng
JLF/BigMC 88
( ) ( )( ) ( )
( )( ) ( ) ( )
1
|
|
fg d
g d
m
πα
α π=
∫
∫
y
y θ θθ θ θ
θ θ θ y θ
806/01/2011
Bridge sampling/cont.
( )( ) ( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( ) ( )( )( ) ( ) ( )( )
( ) ( )
( ) ( )
( ) ( )( ) ( )( ) ( )( )1
1
|
5) (Meng & Wong, 1996)
"bridge function" density to be calibated
For 1/
| |
ˆ | / ( )
|
L l l l
gf f
gg
m L f g IS
g d Em
Ed
g
g
π
α
α
α π α π
αα
π
π
−−
= =
=
=
= ∑
∫∫
θ
θ y
Bridge samplin
θ
yθ
θ θ
θ
θ y θ θ θ y θ θ
θ θθ θ
y y θ θ
θ
g
θ
θ
θ y
JLF/BigMC 99
( ) ( )( ) ( )( ) ( )( )( ) ( ) ( )
1
1 1
For 1/ |
ˆ | / ( )
ˆ
L l l l
l
BS
BSm L f g IS
mf
π
α π
−
=
=
= ∑y y θ θ
θ y θ θ
θ
( )
( ) ( ) ( ) ( ) ( )
( )( )( ) ( )( ) ( )( )
( )( ) ( )( ) ( )( )( ) ( ) ( ) ( )
3
1/ 21
1
3
1/
1/ 21
2
2
1
ˆ Lopes-West (2004)
ˆ
For 1/ |
: draws from g ; : draws from
Gelfand-Dey (1994)
| /
|
|
/
BS
L l
l
B
l
l
SM m m m
m
m
l
m
L g
g
f
M
g
mf
f
π
π
α π
π
−
=
−
=
=
=
=
=
∑
∑
y
y θ
y
θ
y
θ y θ θ θ
θ θ
θ
θ
θ
θ y
θ
θ
y
906/01/2011
Bridge sampling (cont.)
( )( ) ( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( ) ( )( )( ) ( ) ( )( )
( ) ( ) ( ) ( )
( )( )( )
( )( ) ( )( )( )
1
1
4 1/ 2
1
|
1
| |
1/(Lopes & West, 20
5) (Meng
ˆ 04; Ando, 2010)
& Wong, 1996)
For 1/ |
: dr
aws from g
1/ |
|
l
L l
l
SM m
m
g
Bm
d Em
Ed
f g
f f
gg
gm
M
g
L
f
π
π
π
α π α π
π
α
α
α
−
=
−
=
=
= =
=
∫
∑
∫
∑
θ
θ y
θ y θ θ θ y
θ
y
θ
yθ
θ
θ
Bridge sampling
y
θ θ
θ θθ
θ θ
y θ θ θ
θ
y
θ
θ
( ) ( ) ( ) (cf numer; : draws from | Od ator)dm πθ θ θ y
JLF/BigMC 10
( ) : draws from g
lθ ( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( )
( )( )( )( ) ( )( )
-1
1
1
1
5 5
(Meng & Wong, 1996; Lopes
(cf numer
& West, 20
; : draws from | Od ator)d
For | + ,
04; Fruhwirth-Schn
esti
ˆ
ˆˆ
m. wr
atter,200
ˆ
t E(
4)
|
RMS )
|
E
t
l l
M t Lt t
BS B
M L
l
S
m
l
s s
Ls s g
m m
g
π
α π
π
π
−
=
+
∝
+=
θ y
θ y
θ θ θ y
θ θ y θ optim
θ
y
um
( )( )( )( ) ( )( )
( ) ( ) ( ) ( ) ( )5 5 1 2
1
1
0ˆ ˆ ˆ ˆ ˆwhere | | / and ou
1 /( )
ˆ |
t
t BS BS BS B
L
m
M
m m
M L
L
t
m
S
M
f m m
gM
s s
m m
s s M L
g
M
π π
π
−
=
= =
= − = +
+
∑
∑
θ y
θ y
y θ θ
θ
θ
1006/01/2011 10
Nested sampling
( )
( )( )
( ) ( )
( ) ( ) ( )1
6)
(Skilling, 2006; Murray et al, 2006; Chopin & Robert, 2010)
|
Let Pr be the survival function of rv
Z L
m f d E L
x l L l L
ππ
ϕ−
= =
= = >
∫θ
y y θ θ
Nested sampli
θ θ
ng
θ θ
JLF/BigMC 11
( ) ( ) ( )Let Pr be the survival function of rv
where ( ) (upper tail) quantile fun
x l L l L
l x
ϕ
ϕ
= = >
=
θ θ
( )
( ) ( )
( )
1
10
1 1 1
ction of so that (0,1)
ˆThen area under curve = and
with or ½ if trapezoidal integration
i
i i
m
x ii
x i i x i i
L x U
Z x dx l x Z l
x x x x
ϕ ϕ=
− − +
= = ∆
∆ = − ∆ = −
∑∫
θ ~
1106/01/2011 11
Nested sampling/Cont.
( ) ( )
( )
( )
1,i 1 1,.., 1,i 1 1
2, 1,i 1
1
2 1,.., 2,i
1) Draw points from prior, Argmin set
2) Obtain points by repeating θ except replaced by a draw
from prior constrained by ,
record Argmin and
i N
i
i N
N L l L
N
L l
L
θ θ θ θ
θ θ
θ
θ θ
=
=
= =
>
= ( )
( )
2 2
1
set
3) Repeat 1 & 2 until a stopping rule (change in max of )
l L
L
θ
ε
ϕ −
=
≤
=
JLF/BigMC 12
( )
( )( )( )
1
i
1
1 0
Since is unknown
Set a) deterministic exp( / ) so that ln ln
or b) random
Ma
with 1, ,1
in di
i
i i i
i i i i
x l
x i N x E l
x t x x t Be N
ϕ
ϕ
−
−
+
=
= − =
= = ~
( )
( ) ( ) ( ) ( ) ( )1
See Chopin & Robert (2010) Extended Imp
fficulty in sampling from the prior co
ortance Sampling scheme
with
nstrained by ?
i
m
x i iiZ w L
L l
L w
θ
ϕ π θ θ π θ θ θ=
= ∆ =
>
∑
θ
1206/01/2011 12
Power Posteriors/basic principle
( )( ) ( )
( )
( ) ( ) ( )
Method due to Friel & Petit (2008)
Lartillot & Philippe (2006) "Annealing-Melting"
Power Posterior defined as
where
|| ,
|
t
t
t
ft
z
z f d
ππ
π
=
= ∫
y θ θθ y
y
y y θ θ θ
JLF/BigMC 13
( ) ( ) ( )
] [ 1
where
and 0,1 with equivalent to "physical temperatu e
|
r
t
tz f d
t t
π
−∈
= ∫y y θ θ θ
( ) ( ) ( )
( ) ( ) ( ) ( )0
1
"
0 to 1: cooling down or "annealing"; 1 to 0 "melting"
Notice the path sampling scheme (Gelman & Meng, 1998)
| ,0 with 1
| ,1 | with
t t
z
z m
π π
π π
= =
= =
= =
θ y θ y
θ y θ y y y
06/01/2011 13
PP/key result
( ) ( )
( )( ) ( )
1
| ,0
where | , has density:
|
log log |
t
t
t
f
m E f dt
π
= ∫ θ yy y θ
θ y
y θ θ
JLF/BigMC 14
( )( ) ( )
( )Thermodynamic integration (end of the 70's)
Ripley (1988),Ogata (1989), Neal (1993)
"Path sampling" (Gelman & Meng, 199
|| ,
8)
t
ft
z
ππ =
y θ θθ y
y
06/01/2011 14
PP formula/proof as a special case of path sampling
( ) ( ) ( ) ( ) ( )
( ) ( )
( )( )
1
If / où
Let label , ln as the potential
1z
p t q t z t z t q t d
dU t q t
dt
θ θ θ θ
θ θ
= =
=
∫
∫| | |
|
JLF/BigMC 15
( )( )
( )
( ) ( ) ( ) ( ) ( )
( ) ( )
1
0One has
1ln ,
Here | , ;
0
|
Then , ln |
t
t
z
p t t q t f
E U t
f
z
U
t
t
dθ
θ π θ π
θ
θ=
= =
=
∫
θ y y θ θ
y θ
|
| |
06/01/2011 15
PP/Example
( )
( )
( )
( )
2
2
22
2 2
| ~ ,1 , 1,..,
~ ,
Alors | , ~ ,
1;
log
2 |
i iid
t t
t t
y N i N
N
t N
Nty
Nt Nt
E f
θ θ
θ µ τ
θ µ τ
µτµ τ
τ τ
θ
−
− −
=
+= =
+ +
− =
y
y
JLF/BigMC 16
( )( )
( )
( )
( )
( ) ( )
2
2
2 22
21 2 1
1 1
| ,
2
0
log2
1log 2 log
1
;
|
tD
N N
i ii
t
i
N
yN s
Ntt
y N y s N y y
D N Cte y
E f
θ
θ
τ
µπ
τµτ
θ
θ
µ
−
− −
= =
− =
− + + + ++
= = −
= + − +
∑ ∑
y y
( )0
2 2
2
High sensitivity t ( , )o Dτ τ θ→ ∞ → ∞
06/01/2011 16
PP/Example/cont.
JLF/BigMC 1706/01/2011 1717
KL distance Prior-Posterior
( ) ( )( )( )
( )( )
( ) ( )( ) ( )
( )
|| , ln |
|ln |
KL d
fKL d
m
ππ π π
π
ππ
π
=
=
∫
∫
θ yθ y θ θ y θ
θ
y θ θθ
y θy θ
JLF/BigMC 18
( ) ( )
( ) ( )
( )
| ln | ln
2 (by-product of PP 2
where model complexity
) m
D
m
D
m
KL E f
D D KL
DIC D p p
m
KL D D
D D
π
= −
− = = +
= +
− ⇒
= −
θ y
y θ
θ
y θ y
06/01/2011 18
PP/partial BF
( ) ( )1)if improper marginal also improper
resulting in problems for defining BF
2) High sensitivity of BF to priors (does not vanish with
increasing sample size)
f xπ θ ⇒
JLF/BigMC 19
increasing samp
partial
le
B
size)
Idea behind (Lempers,1971) F P=y y( ),
-Learning or pilot sample to tune the prior
-Testing sample for data analysis
(Berger & Perrichi, 1996)Intrinsinc BF
Fracti (O'Hagan, 19onal )B 95F
T
P
T
y
y
y
06/01/2011 19
Fractional BF
( ) ( )
A fraction of the likelihood is used to tune the prior
| | / 1 (O'Hagan, 1995)
resulting in:
b
P
b
f f b m N= <≈y θ y θ
20
( ) ( ) ( )
resulting i
,
n
|
:
bfbπ π∝θ y θ θ
06/01/2011 20
PP & fractional BF
( ) ( ) ( )
( ) ( ) ( )
( )( ) ( )
( ) ( )
( )( )
1
, |
, | ,
| ,1,
,
b
bF
F
b
b f
m b f b d
f d mm b
m b
π π
π
π
−
∝
=
=
=
∫
∫∫
θ y θ θ
y y θ θ θ
y θ θ θ yy
y
JLF/BigMC 21
( )( ) ( ) ( )
( ) ( )
( ) ( )1
| ,
,,|
PP directly provide
| ,
log , log
s
|
,
tb
b
F
m bm bf d
b via t b
m b E f dt
π
π π =
− =
=
=
−
∫
∫ θ y
θ y
y y θ
yyy θ θ θ
θ
06/01/2011 21
PP/algorithm[ [
( ) ( )
( ) ( )( )
0 1 1
| , 1
MCMC with discretization of on 0,1
0 ... ... 1
( / ) with 1,.., ; 20 100; 2 5
1)Make draws of MCMC from | ,
2)Compute 1ˆ log | log | i
i
i
i
i n
G g
t t
n
c
i
g
g
i
E
t
t t t t t
t i n i n n
G
t
p p
c
π
−
= =
= < < < < < < =
= = = − =
−
= ∑θ y y θ θ
θ θ y
y
JLF/BigMC 22
( )Often cond
iG
( ) ( )
( )
( ) ( )( )1 10
1itional independence, log | log |
eg if if the closest stochastic parent of (as for DIC)
3)Approximate the integral (eg trapezoidal rule)
l
Error due to this
og ½
nume
n
i i
N
i
i
i
i i
i
m y t
p p y
t E E
y
+ +=
==
=
= − +∑
∑y θ θ
θ y
rical approx. (Calderhead & Girolami,2009)
Formula for MC sampling error: see Friel & Pettitt
06/01/2011 22
PP/Little toy example
( ) ( )( ) ( )
( ) ( )( )
( )
( ) ( )
id
1
exp0) | ~ |
!
exp1) ~ ,
0 1) ~ , where /
iy
i i i i
i i i i i i
i
i i
i id i
i id i i i
x xy x f y
y
y p p x
α α
λ λλ λ λ
β λ βλλ α β π λ
α
α β β
−
−⇔ =
−⇔ =
Γ
+ = +
P
G
BN
JLF/BigMC 23
( ) ( )
( )( )( )
( )
( ) ( ) ( ) ( )
( )
1 1
1 1
0 1) ~ , where /
Direct approach: 1!
ln ln ln !
ln ln 1
i
i id i i i
yi
i i i
i
n n
i ii i
n n
i i ii i
y p p x
yf y p p
y
f n y y
p y p
α
α β β
α
α
α α
α
= =
= =
+ = +
Γ += −
Γ
= − Γ + Γ + −
+ + −
∑ ∑
∑
y
BN
( ) ( ) ( )1
Indirect approach: |n
i i i iif f y dλ π λ λ
==
∑
∏ ∫y
06/01/2011 2306/01/2011 23
PP/Little toy example/cont.
( )
3#failures of pumps in (10 )
5,1,5,14,3,19,1,1, 4, 22 ; 10; 1
/ Pump data: Ex#2 in Winbugs, Carlin-Louis (p126)
y x hrs
y n
Ex
α β
=
= = = =
JLF/BigMC 24
( )
( )
5,1,5,14,3,19,1,1, 4, 22 ; 10; 1
(94.3,15.7,62.9,126,5.24,31.4,1.05,1.05, 2.1,10.5)
ˆ2 ln 66.03 66.2 08 .FP
y n
x
D f D
α β= = = =
=
= − = = ±y 03 (20pts)
06/01/2011 2406/01/2011 24
PP/Toy example in Openbugs
JLF/BigMC 2506/01/2011 2506/01/2011 25
PP/Toy example in Openbugs/Cont.
JLF/BigMC 2606/01/2011 2606/01/2011 26
PP/Toy example in Openbugs/Cont.
JLF/BigMC 2706/01/2011 2706/01/2011 27
PP/Toy example in Openbugs/Cont.
JLF/BigMC 2806/01/2011 2806/01/2011 28
Sampling both θθθθ & t
( ) ( ) ( )
( )( )
( )( )
( )( )
1
0
0
1
, |
log log | | ,
log |log | , ( )
( )
log |
t
f
m f t dt
fm t p t dt
p tπ
π
π
=
=
=
∫
∫θ y
y y θ θ y
y θy θ y
y θ
JLF/BigMC06/01/2011 29
( )( )
( ) ( ) ( )
( ) ( ) ( )
( )
, |
| , |
if we assume ( ) | , |
Sampling , in such conditions gives poor
lo
es
g |log
tim
( )
ati
t
t
t
t
fm
t f
p t z
Ep t
y t f
t
π π
π
=
∝
∝ ⇒ ∝
yθ
θ y y θ θ
θ y y θ
y
θ
θy
on
(too few draws of t close to 0)
2906/01/2011 29
Example 1/ Pothoff&Roy’s data
Growth measurements in 11 girls and 16 boys: Pothoff and Roy,1964; Little and Rubin, 1987
Age (years) Age (years)
Girl 8 10 12 14 Boy 8 10 12 14
1 210 200 215 230 1 260 250 290 310
2 210 215 240 255 2 215 230 265
3 205 245 260 3 230 225 240 275
4 235 245 250 265 4 255 275 265 270
5 215 230 225 235 5 200 225 260
6 200 210 225 6 245 255 270 285
JLF/BigMC 3006/01/2011
6 200 210 225 6 245 255 270 285
7 215 225 230 250 7 220 220 245 265
8 230 230 235 240 8 240 215 245 255
9 200 220 215 9 230 205 310 260
10 165 190 195 10 275 280 310 315
11 245 250 280 280 11 230 230 235 250
12 215 240 280
13 170 260 295
14 225 255 255 260
15 230 245 260 300
16 220 235 250
distance from the centre of the pituary to the pteryomaxillary fissure (unit 10-4m)
Model comparison on Pothoff’s data
( ) ( )( )0 0
int
: subscript for individual 1,.., 25 (11girls+16boys)
: subscript for measurement at age (8,10,12,14 )
1)Purely Fixed Model
8
2)Random intercept mod
j
ij i i j ij
ercept pente
i i I
j t yrs
y x x t eα α β β
= =
= + + + − +
( ) ( )( )0 0
el
8ij i i j iji
y x x t eaα α β β= + + + + − +
JLF/BigMC 3106/01/2011
( ) ( )( )
( ) ( )
0 0
2
1 2 id
1 0
2 0
3)Random intercept & slope model assuming independent effects
8
or
8 , ~ ,
with ~ ,
ij i i j ij
ij i i j ij ij ij
i
e
i
i
i
i
i
i
y x a x t e
y t e
x
x
b
y
α α β β
φ φ η σ
φ α αφ
φ β β
= + + + + + − +
= + − +
+ =
+
N
N
2
2
21 0
22 0
0
0
4)Random intercept & slope model assuming correlated effects
~ ,
a
b
i i a ab
i
i i ab b
x
x
σ
σ
φ α α σ σφ
φ β β σ σ
+ =
+ N
Model presentation:Hierarchical Bayes
( ) ( )
( ) ( )
st 2
id 1 2
nd
21 0
22 0
2 2
e
rd
1 level: ~ , with 8
2 level :
2 ) ~ ,
2 ) ~ U 0, ~ InvG 1,
3 level:
ij ij e ij i i j
i i a ab
i
i i ab b
e e e
y t
xa
x
b or
η σ η φ φ
φ α α σ σφ
φ β β σ σ
σ σ σ
Σ
= + −
+
= +
∆
N
N
JLF/BigMC 3206/01/2011
rd
0 0
3 level:
Fixed effects: , , , ~ U(inf,suα α β β
( ) ( )
( ) ( )( ) ( ) ( )
( )( )
2 2 2 2
*1 1
p)
Var (Covar) components:
0, then i) ~ U 0, , same for ~ U 0,
or ii) ~ InvG 1, ,same for ~ InvG 1,
0, then i) ~ U 0, , ~ U 0, , ~ U -1,1
or ii) ~ , for
with dim( ) 1 and
ab a a b b
a a b b
ab a a b b
If
If
W
σ σ σ
σ σ σ σ
σ σ σ ρ
ν ν
ν
− −
− = ∆ ∆
− ≠ ∆ ∆
Ω Σ Ω = Σ
= Ω + Σ
( )( ) known location parameter
*Take care as Winbugs uses another notation ie ,W ν νΣ
Results
JLF/BigMC 3306/01/2011
Results/fractional priors (b=0 vs 0.125)
JLF/BigMC 3406/01/2011
Example 2:Models of genetic differentiation
( )
2 level hierarchical model
=locus; =(sub)population
=Nbre of genes carrying a given allele at locus in pop.
Frequency of that allele at loc
| ~ B ,
us in pop.
0)
ij
ij
ij ij id ij ijy n
i j
a i j
p i j
α α
=
JLF/BigMC 3506/01/2011
( )| ~ B ,0)
1) | , ~
ij ij id ij ij
ij i ij id
y n
x
α α
α λ ( )( ) ( )
( ) ( )
1 where Dif. index
Frequency of that allele at locus i in
Beta , 1
~ Beta
the gene pool
2)
Migration-Drift at equilibrium (Baldi
, , ~ Beta ,
ng)
j i j i
i id j id
j
j j
j
i
c ca b
c
c a b
cc
π π
τ π τ π τ
π
π
−=−
=
Ex2: Nicholson’s model
( )( )
( )
| , ~ , 1
Truncated normal with masses in 0 and 1
Nicholson et al (2002) same as previously but
1) ij i ij id i j i ix N cα λ π π π−
JLF/BigMC 3606/01/2011
( )
( ) ( )
*
*
so that | ~ B ,
where max(0, min(1, ))
2) ~ Beta , , ~ Beta ,
ij ij id ij ij
ij ij
i id j id c c
y n
a b c a bπ π
α α
α α
π
=
Pure drift model
Results
JLF/BigMC 3706/01/2011
Conclusion
Derived from thermodynamical integration
Link with « path sampling »
Easy to understand and quite general
Well suited to complex hierarchical models
« Theta’s » can be defined as the closest stochastic parents « Theta’s » can be defined as the closest stochastic parents
of data making the latter conditionally independent
Draws only from posterior distributions
Gives as a by product fractional BF
Easy to implement (including in Openbugs) but time
consuming
Caution needed in discretization of t (close to 0)
06/01/2011 38JLF/BigMC
Some references Chen M, Shao Q, Ibrahim J (2000) Monte Carlo methods in Bayesian
computation. Springer
Chib S (1995) Marginal likelihood from the Gibbs output. JASA 90,1313-1321
Chopin N, Robert CP (2010) Properties of nested sampling. Biometrika, 97, 741-755
Friel N, Pettitt AN (2008) Marginal likelihood estimation via power posteriors, JRSS, B, 70, 589-607
Frühwirth-Schnatter (2004) Estimating marginal likelihoods from mixtures & Markov switching models using bridge sampling techniques. EconometricsMarkov switching models using bridge sampling techniques. EconometricsJournal, 7,143-167
Gelman A, Meng X-L (1998) Simulating normalizing constants: fromimportance sampling to bridge sampling and path sampling, Statistical Science, 13, 163-185
Lartillot N, Philippe H (2006) Computing Bayes factors using thermodynamicintegration. Systematic Biology, 55, 195-207
Marin JM, Robert CP (2009) Importance sampling methods for Bayesiandiscrimination between embedded models. arXiv:0910.2325v1
Meng X-L, Wong WH (1996) Simulating ratios of normalizing constants via a simple identity: a theoretical exploration. Statistica Sinica,6,831-860
O Hagan A (1995) Fractional Bayes factors for model comparison. JRSS, B, 57, 99-138
06/01/2011 39JLF/BigMC
Acknowledgements
Nial Friel (U College, Dublin) for his interest in these
applications and his unvaluable explanations &
suggestions
Tony O’Hagan for further insight into FBF
Gilles Celeux, Mathieu Gautier as coadvisors of the Gilles Celeux, Mathieu Gautier as coadvisors of the
Master dissertation of Yoan Soussan (Paris VI)
Christian Robert for his blog and his relevant
comments, standpoints and bibliographical references
The Applibugs & Babayes groups for stimulating
discussions on DIC, BF,CPO & other information
criteria (AIC,BIC)
06/01/2011 40JLF/BigMC