introduction to the bayesian approach to inverse problems ...€¦ · problems - part 2: algorithms...
TRANSCRIPT
Introduction to the Bayesian Approach to InverseProblems - Part 2: Algorithms
Lecture 3
Claudia Schillings, Aretha Teckentrup∗,†
∗School of Mathematics, University of Edinburgh†Alan Turing Institute, London
LMS Short Course - May 11, 2017
Schillings/Teckentrup (Edinburgh) Bayesian Inverse Problems 11 May 2017 1 / 24
Contents
The second half of this short course will focus on algorithms inBayesian inverse problems, in particular algorithms for computingexpectations with respect to the posterior distribution.
The emphasis will be on convergence properties of the algorithmsrather than implementation.
The first lecture will focus on standard Monte Carlo methods:sampling methods based on independent and identically distributed(i.i.d.) samples.
The second lecture will focus on Markov chain Monte Carlomethods: sampling methods based on correlated and approximatesamples.
Schillings/Teckentrup (Edinburgh) Bayesian Inverse Problems 11 May 2017 2 / 24
Outline of first lecture
1 Bayesian Inverse Problems
2 Standard Monte Carlo Method
3 Convergence of Standard Monte Carlo Method
4 Multilevel Monte Carlo Method
5 Convergence of Multilevel Monte Carlo Method
Schillings/Teckentrup (Edinburgh) Bayesian Inverse Problems 11 May 2017 3 / 24
Bayesian Inverse ProblemsMathematical Formulation [Kaipio, Somersalo ’04] [Stuart ’10]
We are interested in the following inverse problem: given observationaldata y ∈ RJ , determine model parameter u ∈ Rn such that
y = G(u) + η,
where η ∼ N(0,Γ) represents observational noise.
In the Bayesian approach, the solution to the inverse problem is theposterior distribution µy on Rn, given by
dµy
dµ0(u) =
1
Zexp
(− Φ(u; y)
),
where Z = Eµ0(
exp(− Φ(·; y)
))and Φ(u; y) = 1
2‖y − G(u)‖2Γ.
Schillings/Teckentrup (Edinburgh) Bayesian Inverse Problems 11 May 2017 4 / 24
Bayesian Inverse ProblemsComputing expectations
We will here focus on computing the expected value of a quantity ofinterest φ(u), φ : Rn → R, under the posterior distribution µy.
In most cases, we do not have a closed form expression for theposterior distribution µy, since the normalising constant Z is notknown explicitly.(Exception: forward map G linear and prior µ0 Gaussian ⇒ posteriorµy also Gaussian.)
However, the prior distribution is known in closed form, andfurthermore often has a simple structure (e.g. multivariate Gaussianor independent uniform).
Schillings/Teckentrup (Edinburgh) Bayesian Inverse Problems 11 May 2017 5 / 24
Bayesian Inverse ProblemsComputing Expectations
Using Bayes’ Theorem, we can write Eµy [φ] as
Eµy [φ] =
∫Rnφ(u) dµy(u)
=
∫Rnφ(u)
dµy
dµ0(u) dµ0(u)
=1
Z
∫Rnφ(u) exp[−Φ(u; y)] dµ0(u)
=Eµ0 [φ exp[−Φ(·; y)]]
Eµ0 [exp[−Φ(·; y)]].
We have rewritten the posterior expectation as a ratio of two priorexpectations.
We will now use Monte Carlo methods to estimate the two priorexpectations.
Schillings/Teckentrup (Edinburgh) Bayesian Inverse Problems 11 May 2017 6 / 24
Standard Monte Carlo MethodSampling methods and random number generators
The standard Monte Carlo method is a sampling method.
To estimate Eµ0 [f ], for some f : Rn → R, sampling methods use asample average:
Eµ0 [f ] =
∫Rnf(u) dµ0(u) ≈
N∑i=1
wi f(u(i)),
where the choice of samples {u(i)}Ni=1 and weights {wi}Ni=1
determines the sampling method.
In standard Monte Carlo, wi = 1N and {u(i)}Ni=1 is a sequence of
independent and identically distributed (i.i.d.) random variables:{u(i)}Ni=1 are mutually independent and u(i) ∼ µ0, for all 1 ≤ i ≤ N .
Since µ0 is fully known and simple, i.i.d. samples from µ0 can begenerated on a computer using a (pseudo-)random number generator.For more details, see [Robert, Casella ’99], [L’Ecuyer ’11].
Schillings/Teckentrup (Edinburgh) Bayesian Inverse Problems 11 May 2017 7 / 24
Standard Monte Carlo MethodDefinition of Monte Carlo Estimator
In the Bayesian inverse problem, we want to compute
Eµy [φ] =Eµ0 [φ exp[−Φ(·; y)]]
Eµ0 [exp[−Φ(·; y)]].
Using Monte Carlo, we approximate this by
Eµ0 [φ exp[−Φ(·; y)]] ≈ 1
N
N∑i=1
φ(u(i)) exp[−Φ(u(i); y)],
Eµ0 [exp[−Φ(·; y)]] ≈ 1
N
N∑i=1
exp[−Φ(u(i); y)],
where {u(i)}Ni=1 is an i.i.d. sequence distributed according to µ0.
(It is also possible to use different samples in the two estimators.)
Schillings/Teckentrup (Edinburgh) Bayesian Inverse Problems 11 May 2017 8 / 24
Standard Monte Carlo MethodDefinition of Monte Carlo Estimator
In applications, it is usually not possible to evaluate φ and Φ exactly,since this involves the solution of the forward problem.
I In the groundwater flow example, it involves the solution of a PDE.
Denote by φh and Φh numerical approximations to φ and Φ,respectively, where h is the step length of the numerical method.
The computable Monte Carlo ratio estimator of Eµy [φ] is then
Eµy [φ] ≈1N
∑Ni=1 φh(u(i)) exp[−Φh(u(i); y)]1N
∑Ni=1 exp[−Φh(u(i); y)]
:=QMCh,N
ZMCh,N
.
There are two sources of error in the Monte Carlo ratio estimator:
I the sampling error due to using Monte Carlo,
I the discretisation error due to the numerical approximation.
Schillings/Teckentrup (Edinburgh) Bayesian Inverse Problems 11 May 2017 9 / 24
Convergence of Standard Monte Carlo MethodExpected Value and Variance [Billingsley ’95]
Consider a general Monte Carlo estimator EMCh,N = 1
N
∑Ni=1 fh(u(i)), with
{u(i)}Ni=1 an i.i.d. sequence distributed as µ0.
Lemma (Expected Value and Variance)
E[EMCh,N ] = Eµ0 [fh], V[EMC
h,N ] =Vµ0 [fh]
N.
Proof: Since {u(i)}Ni=1 is an i.i.d. sequence, we have
E[ 1
N
N∑i=1
fh(u(i))]
=1
NE[ N∑i=1
fh(u(i))]
=1
N
N∑i=1
Eµ0 [fh] = Eµ0 [fh],
and
V[ 1
N
N∑i=1
fh(u(i))]
=1
N2V[ N∑i=1
fh(u(i))]
=1
N2
N∑i=1
Vµ0 [fh] =1
NVµ0 [fh].
Schillings/Teckentrup (Edinburgh) Bayesian Inverse Problems 11 May 2017 10 / 24
Convergence of Standard Monte Carlo MethodCentral Limit Theorem [Billingsley ’95]
Theorem (Central Limit Theorem)
If Vµ0 [fh] ∈ (0,∞), then as N →∞ we have
EMCh,N
D−→ N (Eµ0 [fh],Vµ0 [fh]
N).
Here,D−→ denotes convergence in distribution, i.e. point-wise convergence
of the distribution function: with X ∼ N (Eµ0 [fh],Vµ0 [fh]N ),
Pr[ EMCh,N ≤ x ]→ Pr[X ≤ x ], ∀x ∈ R.
The Central Limit Theorem crucially uses the fact that EMCh,N is based on
i.i.d. samples. If this is not the case, we require stronger assumptionsand/or obtain a different limiting distribution.
Schillings/Teckentrup (Edinburgh) Bayesian Inverse Problems 11 May 2017 11 / 24
Convergence of Standard Monte Carlo MethodStrong Law of Large Numbers [Billingsley ’95]
Theorem (Strong Law of Large Numbers)
If Eµ0 [|fh|] <∞, then as N →∞ we have
EMCh,N
a.s.−−→ Eµ0 [fh].
Here,a.s.−−→ denotes almost sure convergence, i.e. convergence with
probability 1:Pr[EMCh,N → Eµ0 [fh]
]= 1.
The Strong Law of Large Numbers crucially uses the fact that EMCh,N is
based on i.i.d. samples. If this is not the case, we may require strongerassumptions
Schillings/Teckentrup (Edinburgh) Bayesian Inverse Problems 11 May 2017 12 / 24
Convergence of Standard Monte Carlo MethodMean Square Error [Billingsley ’95]
A measure of accuracy of EMCh,N = 1
N
∑Ni=1 fh(u(i)) as an estimator of
Eµ0 [f ] is given by the mean square error (MSE):
e(EMCh,N )2 := E[(EMC
h,N − Eµ0 [f ])2].
Theorem (Mean Square Error)
e(EMCh,N )2 =
Vµ0 [fh]
N︸ ︷︷ ︸sampling error
+ (Eµ0 [fh − f ])2︸ ︷︷ ︸numerical error
.
Proof: Since E[EMCh,N ] = Eµ0 [fh] and V[EMC
h,N ] =Vµ0 [fh]N , we have
e(EMCh,N )2 = E
[(EMCh,N − Eµ0 [fh] + Eµ0 [fh]− Eµ0 [f ]
)2]= E
[(EMCh,N − Eµ0 [fh]
)2]+ E
[(Eµ0 [fh − f ]
)2]=
Vµ0 [fh]
N+ (Eµ0 [fh − f ])2.
Schillings/Teckentrup (Edinburgh) Bayesian Inverse Problems 11 May 2017 13 / 24
Convergence of Standard Monte Carlo MethodMean Square Error of Monte Carlo ratio estimator [Scheichl, Stuart, ALT ’16]
Recall: Eµy [φ] =Eµ0 [φ exp[−Φ(·;y)]]
Eµ0 [exp[−Φ(·;y)]] =: QZ ≈QMCh,N
ZMCh,N
.
We know how to bound the MSEs of the individual estimators QMCh,N
and ZMCh,N . Can we bound the MSE of QMC
h,N/ZMCh,N?
Rearranging the MSE and applying the triangle inequality, we have
e(QMC
h,N
ZMCh,N
)2= E
[(QZ−QMCh,N
ZMCh,N
)2]≤ 2
Z2
(E[(Q− QMC
h,N )2]
+ E[(QMC
h,N/ZMCh,N )2(Z − ZMC
h,N )2]).
Schillings/Teckentrup (Edinburgh) Bayesian Inverse Problems 11 May 2017 14 / 24
Convergence of Standard Monte Carlo MethodMean Square Error of Monte Carlo ratio estimator [Scheichl, Stuart, ALT ’16]
e(QMCh,N
ZMCh,N
)2≤ 2
Z2
(E[(Q− QMC
h,N )2]
+ E[(QMC
h,N/ZMCh,N )2(Z − ZMC
h,N )2])
Theorem (Holder’s Inequality)
For any random variables X,Y and p, q ∈ [1,∞], with p−1 + q−1 = 1,
E[|XY |] ≤ E[|X|p]1/p E[|Y |q]1/q.
Here, E[|X|∞]1/∞ := ess supX.
If ess sup{u(i)}Ni=1(QMC
h,N/ZMCh,N )2 ≤ C, for a constant C independent of N
and h, then the MSE of QMCh,N/Z
MCh,N can be bounded in terms of the
individual MSEs of QMCh,N and ZMC
h,N .
For more details, see [Scheichl, Stuart, ALT ’16].
Schillings/Teckentrup (Edinburgh) Bayesian Inverse Problems 11 May 2017 15 / 24
Multilevel Monte Carlo MethodMotivation
The standard Monte Carlo estimator EMCh,N = 1
N
∑Ni=1 fh(u(i)) of Eµ0 [f ]
has mean square error
e(EMCh,N )2 =
Vµ0 [fh]
N+ (Eµ0 [fh − f ])2.
To make e(EMCh,N )2, small we need to
choose a large number of samples N ,
choose a small step length h in our numerical approximation.
Since the cost of sampling methods grows as
cost per sample× number of samples
the cost of standard Monte Carlo can be prohibitively large in applications.
Schillings/Teckentrup (Edinburgh) Bayesian Inverse Problems 11 May 2017 16 / 24
Multilevel Monte Carlo MethodDefinition of Multilevel Monte Carlo Estimator [Giles, ’08], [Heinrich ’01]
The multilevel method works with a decreasing sequence of step lengths{h`}L`=0, where hL gives the most accurate numerical approximation.
Linearity of expectation gives us
Eµ0 [fhL ] = Eµ0 [fh0 ] +
L∑`=1
Eµ0[fh` − fh`−1
].
The multilevel Monte Carlo (MLMC) estimator
EML{M`,N`} =
1
N0
N0∑i=1
fh0(u(i,0)) +
L∑`=1
1
N`
N∑i=1
fh`(u(i,`))− fh`−1
(u(i,`)),
is a sum of L+ 1 independent MC estimators.
Schillings/Teckentrup (Edinburgh) Bayesian Inverse Problems 11 May 2017 17 / 24
Convergence of Multilevel Monte Carlo MethodExpected Value and Variance [Giles, ’08]
EML{M`,N`} =
1
N0
N0∑i=1
fh0(u(i,0)) +
L∑`=1
1
N`
N∑i=1
fh`(u(i,`))− fh`−1
(u(i,`))
Lemma (Expected Value and Variance)
E[EML{M`,N`}] = Eµ0 [fhL ], V[EML
{M`,N`}] =V[fh0 ]
N0+
L∑`=1
V[fh` − fh`−1]
N`.
Proof: Uses the linearity of expectation and the fact the L+ 1 estimatorsare independent, together with results for standard Monte Carlo.
Schillings/Teckentrup (Edinburgh) Bayesian Inverse Problems 11 May 2017 18 / 24
Convergence of Multilevel Monte Carlo MethodCentral Limit Theorem and Strong Law of Large Numbers [Billingsley ’95]
Theorem (Central Limit Theorem)
If σ2ML := V[fh0 ]N−1
0 +∑L
`=1 V[fh` − fh`−1]N−1
` ∈ (0,∞) and{V[fh` − fh`−1
]}L`=1 satisfies a Lindeberg condition, then as {N`}L`=0 →∞we have
EML{M`,N`}
D−→ N (Eµ0 [fhL ], σ2ML).
Proof: Requires Lindeberg condition to deal with sum of L+ 1 MonteCarlo estimators. For details, see [Collier et al ’15] and [Billingsley ’95].
Theorem (Strong Law of Large Numbers)
If Eµ0 [|fh` |] <∞ for 0 ≤ ` ≤ L, then as {N`}L`=0 →∞ we have
EML{M`,N`}
a.s.−−→ Eµ0 [fhL ].
Proof: Follows from the linearity of a.s. convergence, together withresults for standard Monte Carlo.Schillings/Teckentrup (Edinburgh) Bayesian Inverse Problems 11 May 2017 19 / 24
Convergence of Multilevel Monte Carlo MethodMean Square Error of Multilevel Monte Carlo [Giles, ’08]
Theorem (Mean Square Error)
e(EML{M`,N`})
2 =V[fh0 ]
N0+
L∑`=1
V[fh` − fh`−1]
N`︸ ︷︷ ︸sampling error
+ (Eµ0 [fhL − f ])2︸ ︷︷ ︸numerical error
.
Proof: The derivation is identical to the standard Monte Carlo case.
Thus,
N0 still needs to be large, but samples are much cheaper to obtain oncoarse grid.
N` (` > 0) much smaller, since V[fh` − fh`−1]→ 0 as h` → 0.
Schillings/Teckentrup (Edinburgh) Bayesian Inverse Problems 11 May 2017 20 / 24
Convergence of Multilevel Monte Carlo MethodNumerical Comparison: Mean Square Error
We compute Eµy [φ] for a typical model problem in groundwater flow,using a ratio of standard Monte Carlo and multilevel Monte Carloestimators.
Computational Cost is computed as number of FLOPS required.
Root Mean Square Error ǫ10
-210
-110
0
Co
mp
uta
tio
na
l co
st
103
104
105
106
107
108
109
1010
MC (dep.)MLMC (dep.)
ǫ-4
ǫ-2
Root Mean Square Error ǫ10
-210
-110
0
Co
mp
uta
tio
na
l co
st
103
104
105
106
107
108
109
1010
MC (indep.)MLMC (indep.)
ǫ-4
ǫ-2
[Scheichl, Stuart, ALT ’16]
Schillings/Teckentrup (Edinburgh) Bayesian Inverse Problems 11 May 2017 21 / 24
References I
P. Billingsley, Probability and measure, John Wiley & Sons, 1995.
N. Collier, A.-L. Haji-Ali, F. Nobile, E. von Schwerin,and R. Tempone, A continuation multilevel Monte Carlo algorithm,BIT Numerical Mathematics, 55 (2015), pp. 399–432.
S. Cotter, M. Dashti, and A. Stuart, Variational dataassimilation using targetted random walks, International Journal forNumerical Methods in Fluids, 68 (2012), pp. 403–421.
T. Dodwell, C. Ketelsen, R. Scheichl, andA. Teckentrup, A hierarchical multilevel Markov chain MonteCarlo algorithm with applications to uncertainty quantification insubsurface flow, SIAM/ASA Journal on Uncertainty Quantification, 3(2015), pp. 1075–1108.
Schillings/Teckentrup (Edinburgh) Bayesian Inverse Problems 11 May 2017 22 / 24
References II
Y. Efendiev, B. Jin, M. Presho, and X. Tan, MultilevelMarkov Chain Monte Carlo Method for High-Contrast Single-PhaseFlow Problems, Communications in Computational Physics, 17(2015), pp. 259–286.
M. Giles, Multilevel Monte Carlo Path Simulation, OperationsResearch, 56 (2008), pp. 607–617.
S. Heinrich, Multilevel Monte Carlo Methods, in InternationalConference on Large-Scale Scientific Computing, Springer, 2001,pp. 58–67.
V. Hoang, C. Schwab, and A. Stuart, Complexity analysis ofaccelerated MCMC methods for Bayesian inversion, Inverse Problems,29 (2013), p. 085010.
J. Kaipio and E. Somersalo, Statistical and computationalinverse problems, Springer, 2004.
Schillings/Teckentrup (Edinburgh) Bayesian Inverse Problems 11 May 2017 23 / 24
References III
P. L‘Ecuyer, Random number generation, in Handbook ofComputational Statistics, Springer, 2011, pp. 35–71.
C. Robert and G. Casella, Monte Carlo Statistical Methods,Springer, 1999.
D. Rudolf, Explicit error bounds for Markov chain Monte Carlo,arXiv preprint arXiv:1108.3201, (2011).
R. Scheichl, A. Stuart, and A. Teckentrup, Quasi-MonteCarlo and Multilevel Monte Carlo Methods for Computing PosteriorExpectations in Elliptic Inverse Problems, arXiv preprintarXiv:1602.04704, (2016).
A. Stuart, Inverse Problems: A Bayesian Perspective, ActaNumerica, 19 (2010), pp. 451–559.
Schillings/Teckentrup (Edinburgh) Bayesian Inverse Problems 11 May 2017 24 / 24