probabilistic meshless methods for bayesian …gverdool/maxent2016/sp/cockayne...why do this? using...
TRANSCRIPT
Probabilistic Meshless Methods for Bayesian
Inverse Problems
Jon Cockayne
July 8, 2016
1
Co-Authors
Chris Oates Tim Sullivan Mark Girolami
2
What is PN?
Many problems in mathematics have no analytical solution, and
must be solved numerically.
In Probabilistic Numerics we phrase such problems as inference
problems and construct a probabilistic description of this error.
This is not a new idea1!
Lots of recent development on Integration, Optimization, ODEs,
PDEs... see http://probnum.org/
3
What is PN?
Many problems in mathematics have no analytical solution, and
must be solved numerically.
In Probabilistic Numerics we phrase such problems as inference
problems and construct a probabilistic description of this error.
This is not a new idea1!
Lots of recent development on Integration, Optimization, ODEs,
PDEs... see http://probnum.org/
2[Kadane, 1985, Diaconis, 1988, O’Hagan, 1992, Skilling, 1991]
3
What does this mean for PDEs?
A prototypical linear PDE. Given g , κ, b find u
−∇ · (κ(x)∇u(x)) = g(x) in D
u(x) = b(x) on ∂D
For general D, κ(x) this cannot be solved analytically.
The majority of PDE solvers produce an approximation like:
u(x) =N∑i=1
wiφi (x)
We want to quantify the error from finite N probabilistically.
4
What does this mean for PDEs?
A prototypical linear PDE. Given g , κ, b find u
−∇ · (κ(x)∇u(x)) = g(x) in D
u(x) = b(x) on ∂D
For general D, κ(x) this cannot be solved analytically.
The majority of PDE solvers produce an approximation like:
u(x) =N∑i=1
wiφi (x)
We want to quantify the error from finite N probabilistically.
4
What does this mean for PDEs?
Inverse Problem: Given partial information of g , b, u find κ
−∇ · (κ(x)∇u(x)) = g(x) in D
u(x) = b(x) on ∂D
Bayesian Inverse Problem:
κ ∼ Πκ Data u(xi ) = yi κ|y ∼ Πyκ
We want to account for an inaccurate forward solver in the inverse
problem.
5
What does this mean for PDEs?
Inverse Problem: Given partial information of g , b, u find κ
−∇ · (κ(x)∇u(x)) = g(x) in D
u(x) = b(x) on ∂D
Bayesian Inverse Problem:
κ ∼ Πκ Data u(xi ) = yi κ|y ∼ Πyκ
We want to account for an inaccurate forward solver in the inverse
problem.
5
Why do this?
Using an inaccurate forward solver in an inverse problem can
produce biased and overconfident posteriors.
0.0 0.5 1.0 1.5 2.0θ
0
2
4
6
8
10
12
14
16Probabilistic
0.0 0.5 1.0 1.5 2.0θ
Standard mA = 4
mA = 8
mA = 16
mA = 32
mA = 64
Comparison of inverse problem posteriors produced using the
Probabilistic Meshless Method (PMM) vs. symmetric collocation.
6
Why do this?
Using an inaccurate forward solver in an inverse problem can
produce biased and overconfident posteriors.
0.0 0.5 1.0 1.5 2.0θ
0
2
4
6
8
10
12
14
16Probabilistic
0.0 0.5 1.0 1.5 2.0θ
Standard mA = 4
mA = 8
mA = 16
mA = 32
mA = 64
Comparison of inverse problem posteriors produced using the
Probabilistic Meshless Method (PMM) vs. symmetric collocation.
6
Forward Problem
7
Abstract Formulation
Au(x) = g(x) in D
Forward inference procedure:
u ∼ Πu “Data” Au(xi ) = g(xi ) u|g ∼ Πgu
8
Posterior for the forward problem
Use a Gaussian Process prior u ∼ Πu = GP(0, k). Assuming
linearity, the posterior Πgu is available in closed-form2.
Πgu ∼ GP(m1,Σ1)
m1(x) = AK (x ,X )[AAK (X ,X )
]−1 g
Σ1(x , x ′) = k(x , x ′)− AK (x ,X )[AAK (X ,X )
]−1AK (X , x ′)
A the adjoint of A
Observation: The mean function is the same as in symmetric
collocation!
2[Cockayne et al., 2016, Sarkka, 2011, Cialenco et al., 2012, Owhadi, 2014]
9
Posterior for the forward problem
Use a Gaussian Process prior u ∼ Πu = GP(0, k). Assuming
linearity, the posterior Πgu is available in closed-form2.
Πgu ∼ GP(m1,Σ1)
m1(x) = AK (x ,X )[AAK (X ,X )
]−1 g
Σ1(x , x ′) = k(x , x ′)− AK (x ,X )[AAK (X ,X )
]−1AK (X , x ′)
A the adjoint of A
Observation: The mean function is the same as in symmetric
collocation!
2[Cockayne et al., 2016, Sarkka, 2011, Cialenco et al., 2012, Owhadi, 2014]
9
Theoretical Results
Theorem (Forward Contraction)
For a ball Bε(u0) of radius ε centered on the true solution u0 of the
PDE, we have
1− Πgu [Bε(u0)] = O
(h2β−2ρ−d
ε
)
• h the fill distance
• β the smoothness of the prior
• ρ < β − d/2 the order of the PDE
• d the input dimension
10
Toy Example
−∇2u(x) = g(x) x ∈ (0, 1)
u(x) = 0 x = 0, 1
To associate with the notation from before...
Πu ∼ GP(0, k(x , y))
A =d2
dx2A =
d2
dy2
11
Forward problem: posterior samples
g(x) = sin(2πx)
0.0 0.2 0.4 0.6 0.8 1.0
x
0.06
0.04
0.02
0.00
0.02
0.04
0.06
u
Mean
Truth
Samples
12
Forward problem: convergence
0 20 40 60 80 100
mA
10-5
10-4
10-3
10-2
10-1
‖µ−u‖
(a) Mean error from truth
0 20 40 60 80 100
mA
10-4
10-3
10-2
10-1
Tr(
Σ1)
(b) Trace of posterior covariance
Figure 2: Convergence
13
Inverse Problem
14
Recap
−∇ · (κ(x)∇u(x)) = g(x) in D
u(x) = b(x) on ∂D
Now we need to incorporate the forward posterior measure Πgu into
the posterior measure for the inverse problem, κ
15
Incorporation of Forward Measure
Assuming the data in the inverse problem is:
yi = u(xi ) + ξi i = 1, . . . , n
ξ ∼ N(0, Γ)
implies the standard likelihood:
p(y |κ, u) ∼ N(y ; u, Γ)
But we don’t know u
Marginalise the forward posterior Πgu to obtain a “PN” likelihood:
pPN(y |κ) ∝∫
p(y |κ, u)dΠgu
∼ N(y ; m1, Γ + Σ1)
16
Incorporation of Forward Measure
Assuming the data in the inverse problem is:
yi = u(xi ) + ξi i = 1, . . . , n
ξ ∼ N(0, Γ)
implies the standard likelihood:
p(y |κ, u) ∼ N(y ; u, Γ)
But we don’t know u
Marginalise the forward posterior Πgu to obtain a “PN” likelihood:
pPN(y |κ) ∝∫
p(y |κ, u)dΠgu
∼ N(y ; m1, Γ + Σ1)
16
Inverse Contraction
Denote by Πyκ the posterior for κ from likelihood p, and by Πy
κ,PN
the posterior for κ from likelihood pPN.
Theorem (Inverse Contraction)
Assume Πyκ → δ(κ0) as n→∞.
Then Πyκ,PN → δ(κ0) provided
h = o(n−1/(β−ρ−d/2))
17
Back to the Toy Example
−∇ · (κ∇u(x)) = sin(2πx) x ∈ (0, 1)
u(x) = 0 x = 0, 1
Infer κ ∈ R+; data generated for κ = 1 at x = 0.25, 0.75.
Corrupted with independent Gaussian noise ξ ∼ N(0, 0.012)
18
Posteriors for κ
0.0 0.5 1.0 1.5 2.0θ
0
2
4
6
8
10
12
14
16Probabilistic
0.0 0.5 1.0 1.5 2.0θ
Standard mA = 4
mA = 8
mA = 16
mA = 32
mA = 64
(a) Posterior Distributions for different numbers of design points.
0 10 20 30 40 50 60 70n
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
θ
Probabilistic
0 10 20 30 40 50 60 70n
θ
Standard
(b) Convergence of posterior distributions with number of design points.
19
Nonlinear Example: Steady-State
Allen–Cahn
20
Allen–Cahn
A prototypical nonlinear model.
−δ∇2u(x) + δ−1(u(x)3 − u(x)) = 0 x ∈ (0, 1)2
u(x) = 1 x1 ∈ {0, 1} ; 0 < x2 < 1
u(x) = −1 x2 ∈ {0, 1} ; 0 < x1 < 1
Goal: infer δ from 16 equally spaced observations of u(x) in the
interior of the domain.
0.0 0.2 0.4 0.6 0.8 1.00.0
0.2
0.4
0.6
0.8
1.0 Negative Stable
0.0 0.2 0.4 0.6 0.8 1.00.0
0.2
0.4
0.6
0.8
1.0 Unstable
0.0 0.2 0.4 0.6 0.8 1.00.0
0.2
0.4
0.6
0.8
1.0 Positive Stable
1.00
0.75
0.50
0.25
0.00
0.25
0.50
0.75
1.00
21
Allen–Cahn: Forward Solutions
Nonlinear PDE - non-GP posterior sampling schemes required, see
[Cockayne et al., 2016].
0.0 0.2 0.4 0.6 0.8 1.00.0
0.2
0.4
0.6
0.8
1.0 Initial Design
1.2
0.9
0.6
0.3
0.0
0.3
0.6
0.9
1.2
0.0 0.2 0.4 0.6 0.8 1.00.0
0.2
0.4
0.6
0.8
1.0 Optimal design
1.2
0.9
0.6
0.3
0.0
0.3
0.6
0.9
1.2
22
Allen–Cahn: Inverse Problem
0.02 0.03 0.04 0.05 0.06 0.07 0.08
θ
0
20
40
60
80
100
120
140
PMM `= 5
PMM `= 10
PMM `= 20
PMM `= 40
PMM `= 80
(a) PMM
0.02 0.03 0.04 0.05 0.06 0.07 0.08
θ
0
200
400
600
800
1000
1200
FEM 5x5
FEM 10x10
FEM 25x25
FEM 50x50
(b) FEA
Comparison of posteriors for δ with different solver resolutions,
when using the PMM forward solver with PN likelihood, vs. FEA
forward solver with Gaussian likelihood.
23
Conclusions
24
We have shown...
• How to build probability measures for the forward solution of
PDEs.
• How to use this to make rhobust inferences in PDE inverse
problems, even with inaccurate forward solvers.
Coming soon...
• Evolutionary problems (∂/∂t).
• More profound nonlinearities.
• Non-Gaussian priors.
25
We have shown...
• How to build probability measures for the forward solution of
PDEs.
• How to use this to make rhobust inferences in PDE inverse
problems, even with inaccurate forward solvers.
Coming soon...
• Evolutionary problems (∂/∂t).
• More profound nonlinearities.
• Non-Gaussian priors.
25
I. Cialenco, G. E. Fasshauer, and Q. Ye. Approximation of stochastic partial differential equations by a kernel-based
collocation method. Int. J. Comput. Math., 89(18):2543–2561, 2012.
J. Cockayne, C. J. Oates, T. Sullivan, and M. Girolami. Probabilistic Meshless Methods for Partial Differential
Equations and Bayesian Inverse Problems. arXiv preprint arXiv:1605.07811, 2016.
P. Diaconis. Bayesian numerical analysis. Statistical decision theory and related topics IV, 1:163–175, 1988.
J. B. Kadane. Parallel and Sequential Computation: A Statistician’s view. Journal of Complexity, 1:256–263, 1985.
A. O’Hagan. Some Bayesian numerical analysis. Bayesian Statistics, 4:345–363, 1992.
H. Owhadi. Bayesian numerical homogenization. arXiv preprint arXiv:1406.6668, 2014.
S. Sarkka. Linear operators and stochastic partial differential equations in Gaussian process regression. In Artificial
Neural Networks and Machine Learning – ICANN 2011: 21st International Conference on Artificial Neural
Networks, Espoo, Finland, June 14-17, 2011, Proceedings, Part II, pages 151–158. Springer, 2011.
J. Skilling. Bayesian solution of ordinary differential equations. Maximum Entropy and Bayesian Methods, 50:
23–37, 1991.
26