nested sampling
Post on 14-Jul-2015
2.677 Views
Preview:
TRANSCRIPT
OutlineNested Sampling
Posterior SimulationNested Sampling Termination and Size of N
Numerical ExamplesConclusion
Nested Sampling for General BayesianComputation
Represented by WU Changye
12 février 2015
Represented by WU Changye Nested Sampling for General Bayesian Computation
OutlineNested Sampling
Posterior SimulationNested Sampling Termination and Size of N
Numerical ExamplesConclusion
Nested Sampling
Posterior Simulation
Nested Sampling Termination and Size of N
Numerical Examples
Conclusion
Represented by WU Changye Nested Sampling for General Bayesian Computation
OutlineNested Sampling
Posterior SimulationNested Sampling Termination and Size of N
Numerical ExamplesConclusion
Introduction
In the Bayesian paradigm, the parameter θ follows the priordistribution π, the observations y follow the distribution L(y |θ)given θ, then the posterior distribution f (θ|y) which indicates thedistribution of θ given the observations y has the following form :
f (θ|y) =L(y |θ)π(θ)∫
Θ L(y |θ)π(θ)dθ
The objective of nested sampling is to compute the ’evidence’ :
Z =
∫Θ
L(y |θ)π(θ)dθ
Represented by WU Changye Nested Sampling for General Bayesian Computation
OutlineNested Sampling
Posterior SimulationNested Sampling Termination and Size of N
Numerical ExamplesConclusion
θ is a random variable, then
Z = Eπ(L(θ))
For simplicity, let L(θ) denote the likelihood L(y |θ). The cumulativedistribution function of L(θ) is
F (λ) =
∫L(θ)<λ
π(θ)dθ
Define the induced measure µ on R by the likelihood function andthe prior as follwing
µ(A) = Pπ(L(θ) ∈ A)
Represented by WU Changye Nested Sampling for General Bayesian Computation
OutlineNested Sampling
Posterior SimulationNested Sampling Termination and Size of N
Numerical ExamplesConclusion
Lemma 1 : Eπ(L(θ)) = Eµ(X ).Proof : ∀g is a indication function of a measurable set A in R.Then
Eπ(g(L(θ))) = Eπ(IA(L(θ))) =
∫L(θ)∈A
π(θ)dθ
However, µ(dx) =∫
Θ δ{L(θ)}(dx)π(θ)dθ.
Eµ(g(X )) =
∫RIA(x)µ(dx) =
∫Θ
(∫RIA(x)δ{L(θ)}(dx)
)π(θ)dθ
Therefore,
Eµ(g(X )) = Eπ(IA(L(θ))) = Eπ(g(L(θ)))
Represented by WU Changye Nested Sampling for General Bayesian Computation
OutlineNested Sampling
Posterior SimulationNested Sampling Termination and Size of N
Numerical ExamplesConclusion
In the general case, let {gn} be an increasing sequence of stepfunctions converging to identity function Id ; then {gn ◦ L} is anincreasing sequence of step functions converging to L and thedesired conclusion follows by taking limits.
Represented by WU Changye Nested Sampling for General Bayesian Computation
OutlineNested Sampling
Posterior SimulationNested Sampling Termination and Size of N
Numerical ExamplesConclusion
Lemma 2 : If X is a positive-valued random variable, has p.d.f. fand c.d.f. F, then :∫ ∞
0(1− F (x))dx =
∫ ∞0
xf (x)dx = E(X ).
Proof : ∫ ∞0
(1− F (x))dx =
∫ ∞0
(1− P(X < x))dx
=
∫ ∞0
P(X ≥ x)dx
=
∫ ∞0
∫ ∞x
f (y) · dy · dx
=
∫ ∞0
f (y)
∫ y
0dx · dy
=
∫ ∞0
f (y) · ydy = E(X )
Represented by WU Changye Nested Sampling for General Bayesian Computation
OutlineNested Sampling
Posterior SimulationNested Sampling Termination and Size of N
Numerical ExamplesConclusion
According to Lemma 1 and 2,
Z = Eµ(X ) =
∫ ∞0
xdF (x) =
∫ ∞0
(1− F (x))dx
Let ϕ−1(x) = 1− F (x) = P{θ : L(θ) > x}
Z =
∫ ∞0
ϕ−1(x)dx =
∫ 1
0ϕ(x)dx
Therefore, we have the evidence represented by an one-dimensionalintegration.
Represented by WU Changye Nested Sampling for General Bayesian Computation
OutlineNested Sampling
Posterior SimulationNested Sampling Termination and Size of N
Numerical ExamplesConclusion
In order to compute the following integration :
J =
∫ 1
0ϕ(x)dx
there are three methods based on sampling.
Represented by WU Changye Nested Sampling for General Bayesian Computation
OutlineNested Sampling
Posterior SimulationNested Sampling Termination and Size of N
Numerical ExamplesConclusion
1) Importance Sampling :i = 1, · · · , n,Ui ∼ U[0,1],
J1 = 1n∑n
i=1 ϕ(Ui )
2) Riemann approximation :i = 1, · · · , n,Ui ∼ U[0,1],U(i) is the order statistics of(U1, · · · ,Un), U(1) ≤ · · · ≤ U(n),
J2 =∑n−1
i=1 ϕ(U(i))(U(i+1) − U(i))
3) A complicated method :x0 = 1step1 : i = 1, · · · ,N,U1
i ∼ U[0,1], x1 = max{U11 , · · · ,U1
N}step2 : i = 1, · · · ,N,U2
i ∼ U[0,x1], x2 = max{U21 , · · · ,U2
N}· · · · · ·setp n : i = 1, · · · ,N,Un
i ∼ U[0,xn−1], xn = max{Un1 , · · · ,Un
N}J3 =
∑ni=1 ϕ(xi )(xi−1 − xi )
Represented by WU Changye Nested Sampling for General Bayesian Computation
OutlineNested Sampling
Posterior SimulationNested Sampling Termination and Size of N
Numerical ExamplesConclusion
Nested sampling takes the third method and the reason is that ϕ isa decreasing function and in many cases it decreases rapidly.
Figure: Graph of ϕ(x) and the trace of (xi , ϕ(xi ))
Represented by WU Changye Nested Sampling for General Bayesian Computation
OutlineNested Sampling
Posterior SimulationNested Sampling Termination and Size of N
Numerical ExamplesConclusion
First, we consider the distributions of x1, · · · , xn :for u ∈ [0, 1],
P(x1 < u) = P(U11 < u, · · · ,U1
N < u)
=N∏
i=1
P(U1i < u)
= uN
As a result, the density function of x1 is
f (x1) = NxN−11
By the same method, we have :
f (xk |xk−1) =N
xk−1
(xk
xk−1
)N−1
Represented by WU Changye Nested Sampling for General Bayesian Computation
OutlineNested Sampling
Posterior SimulationNested Sampling Termination and Size of N
Numerical ExamplesConclusion
Note tk = xkxk−1
,
P(tk ≤ t) =
∫P(xk ≤ tx |xk−1 = x)fxk−1(x)dx
=
∫xk−1
∫ tx
0fxk |xk−1(y |x)fxk−1(x)dxdy
=
∫xk−1
∫ tx
0
Nx
(yx
)N−1fxk−1(x)dxdy
=
∫xk−1
tN fxk−1(x)dx = tN
Besides,
P(tk ≤ t|xk−1 = x) = P(xk ≤ tx |xk−1 = x) = tN
As a result, we have tk ⊥ xk−1.Represented by WU Changye Nested Sampling for General Bayesian Computation
OutlineNested Sampling
Posterior SimulationNested Sampling Termination and Size of N
Numerical ExamplesConclusion
Moreover, a point estimate for xk can be written entirely in termsof point estimates for the tk ,
xk =xk
xk−1× xk−1
xk−2×· · ·× x1
x0×x0 = tk ·tk−1 · · · t1 ·x0 =
(k∏
i=1
ti
)·x0
More appropriate to the large range common to many problems,log xk becomes
log xk = log
(k∏
i=1
ti
)· x0 =
k∑i=1
log ti + log x0
where the logarithmic shrinkage is distributed as
f (log t) = Ne(N−1) log t
with the mean and the variance :
E(log t) = − 1N
V(log t) =1
N2Represented by WU Changye Nested Sampling for General Bayesian Computation
OutlineNested Sampling
Posterior SimulationNested Sampling Termination and Size of N
Numerical ExamplesConclusion
Taking the mean as the point estimate for each log ti finally gives
logxk
x0= − k
N±√
kN
Parameterizing xk in terms of the shrinkage proves immediatelyadvantageous – because the log ti are independent, the errors in thepoint estimates tend to cancel and the estimates for the xk growincreasingly more accurate with k .
xk = exp(− kN
)
Represented by WU Changye Nested Sampling for General Bayesian Computation
OutlineNested Sampling
Posterior SimulationNested Sampling Termination and Size of N
Numerical ExamplesConclusion
Next, we consider the distribution of ϕ(X ), where X ∼ U [0, 1]Considering the random variable X = ϕ−1(L(θ)), where θ ∼ π.Notice that :
ϕ−1 : [0, Lmax]→ [0, 1],
λ 7→ P(L(θ) > λ)
for u ∈ [0, 1],
P(X < u) = P(ϕ−1(L(θ)) < u)
= P(L(θ) > ϕ(u))
= ϕ−1(ϕ(u))
= u
This means that ϕ−1(L(θ)) follows the U [0, 1] and ϕ(X ) ∼ L(θ).
Represented by WU Changye Nested Sampling for General Bayesian Computation
OutlineNested Sampling
Posterior SimulationNested Sampling Termination and Size of N
Numerical ExamplesConclusion
Considering the situation on the truncated distribution :
π(θ) ∝{π(θ) L(θ) > L00 otherwise
Let X0 = ϕ−1(L0) and X = ϕ−1(L(θ)), where θ ∼ π.For u ∈ [0,X0],
P(X < u) = P(ϕ−1(L(θ)) < u|L(θ) > L0)
=P(L(θ) > ϕ(u))
P(L(θ) > L0)
=ϕ−1(ϕ(u))
X0=
uX0
X ∼ U [0,X0],
As a result, ϕ(X ) ∼ L(θ), where X ∼ U [0,X0] and θ ∼ π.Represented by WU Changye Nested Sampling for General Bayesian Computation
OutlineNested Sampling
Posterior SimulationNested Sampling Termination and Size of N
Numerical ExamplesConclusion
AlgorithmThe algorithm based on the method discussed in the previoussection is described in below :– Iteration 1 : sample independently N points θ1,i from the priorπ(θ), determine θ1 = argmin1≤i≤N L(θ1,i ) and set ϕ1 = L(θ1)
– Iteration 2 : obtain the N current values θ2,i , by reproducing theθ1,i ’s except for θ1 that is replaced by a draw from the priordistribution π conditional upon L(θ) ≥ ϕ1 ; then select θ2 asθ2 = argmin1≤i≤N L(θ2,i ), and set ϕ2 = L(θ2)
– Iterate the above step until a given stopping rule is satisfied, forinstance when observing very small changes in the approximationZ or when reaching the maximal value of L(θ) when it is known.
Represented by WU Changye Nested Sampling for General Bayesian Computation
OutlineNested Sampling
Posterior SimulationNested Sampling Termination and Size of N
Numerical ExamplesConclusion
Z =J∑
i=1
ϕi (xi−1 − xi )
Represented by WU Changye Nested Sampling for General Bayesian Computation
OutlineNested Sampling
Posterior SimulationNested Sampling Termination and Size of N
Numerical ExamplesConclusion
By-product of Nested Sampling
Skilling indicates that nested sampling provides simulations fromthe posterior distribution at no extra cost : "the existing sequenceof points θ1, θ2, θ3, . . . already gives a set of posteriorrepresentatives, provided the i ’th is assigned the appropriateimportance ωiLi"
Eπ(f (θ)) =
∫Θ π(θ)L(θ)f (θ)dθ∫
Θ π(θ)L(θ)dθWe can use a single run of nested sampling to obtain estimators ofboth the numerator and the denominator, the latter being theevidence Z . The estimator of the numerator is
j∑i=1
(xi−1 − xi )ϕi f (θi ) (1)
Represented by WU Changye Nested Sampling for General Bayesian Computation
OutlineNested Sampling
Posterior SimulationNested Sampling Termination and Size of N
Numerical ExamplesConclusion
Lemma 3(N.Chopin & C.P Robert) :Let f (l) = Eπ{f (θ)|L(θ) = l} for l > 0, then, if f is absolutelycontinuous, ∫ 1
0ϕ(x)f (ϕ(x)) dx =
∫π(θ)L(θ)f (θ)dθ
Proof : Let ψ : x → x f (x),∫π(θ)L(θ)f (θ)dθ = Eπ[ψ{L(θ}]
=
∫ +∞
0Pπ(ψ{L(θ} > l)dl
=
∫ +∞
0ϕ−1(ψ−1(l))dl =
∫ 1
0ψ(ϕ(x))dx
Represented by WU Changye Nested Sampling for General Bayesian Computation
OutlineNested Sampling
Posterior SimulationNested Sampling Termination and Size of N
Numerical ExamplesConclusion
Termination
The author suggests that
max(L1, · · · , LN)Xj < fZj =⇒ termination
where f is some fraction.
Represented by WU Changye Nested Sampling for General Bayesian Computation
OutlineNested Sampling
Posterior SimulationNested Sampling Termination and Size of N
Numerical ExamplesConclusion
N ?
The larger N is, the smaller the variability of the approximation is.
Represented by WU Changye Nested Sampling for General Bayesian Computation
OutlineNested Sampling
Posterior SimulationNested Sampling Termination and Size of N
Numerical ExamplesConclusion
How to sample N points from the constraint parametricspace
Using a MCMC method which constructs a Markov Chain that hasthe invariant distribution of the truncated distribution.
Represented by WU Changye Nested Sampling for General Bayesian Computation
OutlineNested Sampling
Posterior SimulationNested Sampling Termination and Size of N
Numerical ExamplesConclusion
A decentred gaussian example
The prior is
π(θ) =d∏
i=1
1√2π
exp(−12
(θ(k))2)
and the likelihood is
L(y |θ) =d∏
i=1
1√2π
exp(−12
(yk − θ(k))2)
In this example, we can calculate the evidence analytically
Z =
∫Rd
L(θ)π(θ)dθ =exp(−
∑dk=1 y2
k4 )
2dπd/2
Represented by WU Changye Nested Sampling for General Bayesian Computation
OutlineNested Sampling
Posterior SimulationNested Sampling Termination and Size of N
Numerical ExamplesConclusion
Figure: Graph of ϕ(x) and the trace of (xi , ϕ(xi )) with d = 1 and y = 10.
Represented by WU Changye Nested Sampling for General Bayesian Computation
OutlineNested Sampling
Posterior SimulationNested Sampling Termination and Size of N
Numerical ExamplesConclusion
Figure: The prior distribution and the likelihood with d = 1 and y = 10.
Represented by WU Changye Nested Sampling for General Bayesian Computation
OutlineNested Sampling
Posterior SimulationNested Sampling Termination and Size of N
Numerical ExamplesConclusion
Figure: the box-plot of log Z − log Z with d = 1 and y = 10 for Nestedsampling and Monte Carlo.
Represented by WU Changye Nested Sampling for General Bayesian Computation
OutlineNested Sampling
Posterior SimulationNested Sampling Termination and Size of N
Numerical ExamplesConclusion
Figure: the box-plot of log Z − log Z with d = 5 and y = (3, 3, 3, 3, 3).
Represented by WU Changye Nested Sampling for General Bayesian Computation
OutlineNested Sampling
Posterior SimulationNested Sampling Termination and Size of N
Numerical ExamplesConclusion
A Probit Model
We consider the arsenic dataset and a probit model studied inChapter 5 of Gelman & Hill (2006). The observations areindependent Bernoulli variables yi such thatP(yi = 1|xi ) = Φ(xT
i θ), where xi is a vector of d covariates, θ is avector parameter of size d , and Φ denotes the standard normaldistribution function. In this particular example, d = 7.
Represented by WU Changye Nested Sampling for General Bayesian Computation
OutlineNested Sampling
Posterior SimulationNested Sampling Termination and Size of N
Numerical ExamplesConclusion
The prior isθ ∼ N (0, 102Id )
L(θ) =n∏
i=1
(Φ(xT
i θ))yi(1− Φ(xT
i θ))1−yi
Represented by WU Changye Nested Sampling for General Bayesian Computation
OutlineNested Sampling
Posterior SimulationNested Sampling Termination and Size of N
Numerical ExamplesConclusion
Figure: the box-plot of log Z with N = 20 for HMC and random walkMCMC. The blue line remarks the true value of log Z (Chib’s method).
Represented by WU Changye Nested Sampling for General Bayesian Computation
OutlineNested Sampling
Posterior SimulationNested Sampling Termination and Size of N
Numerical ExamplesConclusion
Posterior Samples
We use the Gaussian example to illustrate this result. Letf (θ) = exp(−3θ + 9d
2 ).
Figure: The box-plot of the log-relative error of log Z − log Z andlog ˆE(f )− logE(f )
Represented by WU Changye Nested Sampling for General Bayesian Computation
OutlineNested Sampling
Posterior SimulationNested Sampling Termination and Size of N
Numerical ExamplesConclusion
Conclusion
– Nested sampling reverses the accepted approach to Bayesiancomputation by putting the evidence first.
– Nested sampling samples more sparsely from the prior in regionswhere the likelihood is low and more densely where the likelihoodis high, resulting in greater efficiency than a sampler that drawsdirectly from the prior.
– The procedure runs with an evolving collection of N points,where N can be chosen small for speed or large for accuracy.
– Nested sampling always reduces a multidimensional integral tothe integral of a one-dimensional monotonic function, no matterhow many dimensions θ occupies, and no matter how strange theshape of the likelihood function L(θ) is.
Represented by WU Changye Nested Sampling for General Bayesian Computation
OutlineNested Sampling
Posterior SimulationNested Sampling Termination and Size of N
Numerical ExamplesConclusion
Problems
– How to generate N independent points in the constraintparametric space is an important problem. Techniques to do soeffectively and efficiently may vary from problem to problem.
– Termination is also another problem in practice.
Represented by WU Changye Nested Sampling for General Bayesian Computation
OutlineNested Sampling
Posterior SimulationNested Sampling Termination and Size of N
Numerical ExamplesConclusion
Thank you !
Represented by WU Changye Nested Sampling for General Bayesian Computation
top related