richard wilkinson r.d.wilkinson@she eld.acrichard wilkinson r.d.wilkinson@she eld.ac.uk school of...

Inference for complex models

Richard [email protected]

School of Maths and StatisticsUniversity of Sheffield

23 April 2015

Computer experiments

Rohrlich (1991): Computer simulation is

‘a key milestone somewhat comparable to the milestone thatstarted the empirical approach (Galileo) and the deterministicmathematical approach to dynamics (Newton and Laplace)’

Challenges for statistics:How do we make inferences about the world from a simulation of it?

how do we relate simulators to reality?

how do we estimate tunable parameters?

how do we deal with computational constraints?

how do we make uncertainty statements about the world thatcombine models, data and their corresponding errors?

There is an inherent a lack of quantitative information on the uncertaintysurrounding a simulation - unlike in physical experiments.

Bayesian statisticsRepresent all uncertainties as probability distributions:

π(θ|D) = π(D|θ)π(θ)π(D)

π(θ|D) is the posterior distributionI Always hard to compute:

SMC2, PGAS, Tempered NUTS-HMC

π(D|θ) is the likelihood function.I For complex models can be slow to compute:

GP emulators

I Can also be impossible to compute in some cases:

ABC

I

π(D|θ) =∫π(D|X )π(X |θ)dX

Relating simulator to reality can make specifying π(D|θ) particularlydifficult:

Simlator discrepancy modelling

π(D) is the model evidence or normalising constant.I Requires us to integrate, and is thus harder to compute than π(θ|D):

SMC2, nested sampling



π(θ|D) is the posterior distributionI Always hard to compute: SMC2, PGAS, Tempered NUTS-HMC

π(D|θ) is the likelihood function.I For complex models can be slow to compute:

GP emulators

I Can also be impossible to compute in some cases:

ABC

I


Relating simulator to reality can make specifying π(D|θ) particularlydifficult:

Simlator discrepancy modelling





π(θ|D) is the posterior distributionI Always hard to compute: SMC2, PGAS, Tempered NUTS-HMC

π(D|θ) is the likelihood function.I For complex models can be slow to compute: GP emulatorsI Can also be impossible to compute in some cases: ABCI


Relating simulator to reality can make specifying π(D|θ) particularlydifficult: Simlator discrepancy modelling



Uncertainty Quantification (UQ) for computer experiments

CalibrationI Estimate unknown parameters θI Usually via the posterior distribution π(θ|D)I Or history matching

Uncertainty analysisI f (x) a complex simulator. If we are uncertain about x , e.g.,

X ∼ π(x), what is π(f (X ))?Sensitivity analysis

I X = (X1, . . . ,Xd)>. Can we decompose Var(f (X )) into contributions

from each Var(Xi )?I If we can improve our knowledge of any Xi , which should we choose to

minimise Var(f (X ))?Simulator discrepancy

I f (x) is imperfect. How can we quantify or correct simulatordiscrepancy.

Data assimilationI Find π(x1:t |y1:t)




X ∼ π(x), what is π(f (X ))?

Sensitivity analysisI X = (X1, . . . ,Xd)

>. Can we decompose Var(f (X )) into contributionsfrom each Var(Xi )?

I If we can improve our knowledge of any Xi , which should we choose tominimise Var(f (X ))?

Simulator discrepancyI f (x) is imperfect. How can we quantify or correct simulator

discrepancy.








minimise Var(f (X ))?

Simulator discrepancyI f (x) is imperfect. How can we quantify or correct simulator

discrepancy.








minimise Var(f (X ))?Simulator discrepancy

I f (x) is imperfect. How can we quantify or correct simulatordiscrepancy.


Meta-modellingSurrogate modelling

Emulation

Code uncertainty

For complex simulators, run times might be long, ruling out brute-forceapproaches such as Monte Carlo methods.

Consequently, we will only know the simulator output at a finite numberof points.

We call this code uncertainty.

All inference must be done using a finite ensemble of model runs

Dsim = {(θi , f (θi ))}i=1,...,N

If θ is not in the ensemble, then we are uncertainty about the valueof f (θ).

Meta-modelling

Idea: If the simulator is expensive, build a cheap model of it and use thisin any analysis.

‘a model of the model’

We call this meta-model an emulator of our simulator.

Gaussian process emulators are most popular choice for emulator.

Built using an ensemble of model runs Dsim = {(θi , f (θi ))}i=1,...,NThey give an assessment of their prediction accuracy π(f (θ)|Dsim)

Meta-modellingGaussian Process Emulators

Gaussian processes provide a flexible nonparametric distributions for ourprior beliefs about the functional form of the simulator:

f (·) ∼ GP(m(·), σ2c(·, ·))

where m(·) is the prior mean function, and c(·, ·) is the prior covariancefunction (semi-definite).Gaussian processes are invariant under Bayesian updating.

Definition If f (·) ∼ GP(m(·), c(·, ·)) then for any collection of inputsx1, . . . , xn the vector

(f (x1), . . . , f (xn))T ∼ MVN(m(x), σ2Σ)

where Σij = c(xi , xj).

Gaussian Process IllustrationZero mean

Gaussian Process Illustration

Challenges

Design: if we can afford n simulator runs, which parameters shouldwe run it at?

High dimensional inputsI If θ is multidimensional, then even short run times can rule out brute

force approaches

High dimensional outputsI Spatio-temporal.

Incorporating physical knowledge

Difficult behaviour, e.g., switches, step-functions, non-stationarity...

Uncertainty quantification for Carbon Capture and StorageEPSRC: transport

10-4Volume [m^3/mol]

6

8

10

Pres

sure

[MPa

]

310K307K304.2K (Tc)296K290KCritical pointCoexisting vapourCo-existing liquid

Technical challenges:

How do we find non-parametric Gaussian process models that i) obeythe fugacity constraints ii) have the correct asymptotic behaviourHow do we fit parametric equations of state (Peng-Robinson andvariants) - tempered NUTS-HMC.

Storage

Knowledge of the physical problem is encoded in a simulator f

Inputs:Permeability field, K(2d field) y

f (K )yOutputs:

Stream func. (2d field),concentration (2d field),surface flux (1d scalar),...

0 5 10 15 20 25 30 35 40 45 500

5

10

15

20

25

30

35

40

45

50

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

010

2030

4050

0

10

20

30

40

500.5

0.6

0.7

0.8

0.9

1

1.1

1.2

1.3

↓ f (K )True truncated streamfield

5 10 15 20 25 30 35 40 45 50

5

10

15

20

25

30

35

40

45

50

−0.02

−0.015

−0.01

−0.005

0

0.005

0.01True truncated concfield

5 10 15 20 25 30 35 40 45 50

5

10

15

20

25

30

35

40

45

50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Surface Flux= 6.43, . . .

CCS examplesLeft=true, right = emulated, 118 training runs, held out test set.

True streamfield

5 10 15 20 25 30 35 40 45 50

5

10

15

20

25

30

35

40

45

50

−0.04

−0.035

−0.03

−0.025

−0.02

−0.015

−0.01

−0.005

0Emulated streamfield

5 10 15 20 25 30 35 40 45 50

5

10

15

20

25

30

35

40

45

50

−0.04

−0.035

−0.03

−0.025

−0.02

−0.015

−0.01

−0.005

0

True concfield

5 10 15 20 25 30 35 40 45 50

5

10

15

20

25

30

35

40

45

50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Emulated concfield

5 10 15 20 25 30 35 40 45 50

5

10

15

20

25

30

35

40

45

50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

ABC: inference for complexstochastic models

Estimating Divergence Times

Forward simulation

Model evolution and fossil finds

Let τ be the temporal gap between the divergence time and theoldest fossil.

The posterior for τ is then used as a prior for a genetic analysis.

The likelihood function π(D|θ) is intractable, but it is cheap to simulate.

Approximate Bayesian Computation (ABC)Wilkinson 2008/2013, Wilkinson and Tavaré 2009

If the likelihood function is intractable, then ABC is one of the fewapproaches we can use to do inference.

Uniform Rejection Algorithm

Draw θ from π(θ)

Simulate X ∼ f (θ)Accept θ if ρ(D,X ) ≤ �

� reflects the tension between computability and accuracy.

As �→∞, we get observations from the prior, π(θ).If � = 0, we generate observations from π(θ | D).

ABC does not require explicit knowledge of the likelihood function

� = 10

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

● ●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●●●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

−3 −2 −1 0 1 2 3

−10

010

20

theta vs D

theta

D

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●●● ●

●●

●

●

●

●

●

●

● ● ●●

●

●

●

●

●

●●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●● ●

●●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●●

●●●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●●

●

●●

●

●

●

●

●

●

●

●

● ●

●●

●

●

●

●

● ●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

● ●● ●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

● ●

● ●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

● ●

●

●●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●●

● ●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●●

●

●●

●

●●

●

●

●

●

●●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

● ●●

●

●●●

●

●

●

●●

●●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●●

●

●

●

●

●

●●

●●

●

●

●

●

●● ●

●

● ●

●●

●

●●

●●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

− ε

+ ε

D

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

Density

theta

Den

sity

ABCTrue

θ ∼ U[−10, 10], X ∼ N(2(θ + 2)θ(θ − 2), 0.1 + θ2)

ρ(D,X ) = |D − X |, D = 2

� = 7.5

●

●

●

●

●

●● ●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

● ●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

● ●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●● ●

●

●

● ●●

●

●

●

●

●

●

● ●

●

● ●●

●

●

●

●

●

●

●●●

●

●

●

●

●

●●

●

● ●

●

●

●

●

●

●

● ●

●

●

●

●●

●

●●●

●

●

●●

●

●

●●●

●

●●

●

●

●●

●

●●

●●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

−3 −2 −1 0 1 2 3

−10

010

20

theta vs D

theta

D

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

● ●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●●

●

●

●●● ●

●●

●

●

●

●

●

●

● ● ●●

●

●

●

●

●

●●

●

●

●

●

● ●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●●

●●●

●

●

●

●

●●

●

●

●

●

●

●●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●● ●

●●

●

●

●

●

●

●●

●●

●

●

●

●●

●●●●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●●

●

●●

●●

●

●

●

●

● ●

●●

●

●

●

●

● ●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

● ●● ●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●●

● ●

●

●●

●

●

●

●

●

●

●

●

●●

●●

●

● ●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●●

●

●

●

●●

●

●●

●

●●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●●●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●●

●●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●●

●●

●

●

●●

●●

●●

●● ●

● ●

●

●●

●●

●●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

− ε

+ ε

D

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

Density

theta

Den

sity

ABCTrue

� = 5

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●● ●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●●

●

● ●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●● ●

●

●

● ●●

●

●

●

●

●● ●

● ●

●

●

●

● ●

●

●

●

●●

●

●

●

●

●

●

●

● ●●●

●

●

● ●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●●

●●

●

●

● ●

● ●

●

●

●●

●

●●● ●

●●

●

●

●

●

●

●

●●

●

●●●

●

●

●

● ●●

●

●

●

●

●●

●

●●●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●●

●

●●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

● ●

●●

●

●

●

●

●

●

●

●

●●

●

● ●

●

●

●●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

−3 −2 −1 0 1 2 3

−10

010

20

theta vs D

theta

D

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●●●

●

●

●●● ●

●●

●

● ●

●

● ● ●●

●

●●

●●

●

●

●

●

● ●

●

●

●

●

●●

●

●

●●●

●●●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●●

●

●●

●

●● ●

●●

●

●

●

●●

●● ●

●

●●

●●●●

●

●

●

●

●

●

●

●

●●

●●

●

●●

●

●

●

●

●

●●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●●●

●

●●

●

●

●●

●

●

●

● ●

●

●

●

●●

●

●

●●

●

●●

●

●

●

●

●

● ●● ●

●

●●

●●

●

●

●●

●

●

●

●

●

● ●

●

●

●

●

●●

●

●●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

● ●

●

●●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

● ●

●

●●●

●

●

●●

●●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●●

●●

●

●

●

●●

●

●

●●

●

●

●

●●

●●

●

●

●

●

●

●

●

●● ●●

●

●

●

●

●

●●

●●

●

●

●●

●

●

●

●

●

●

●

●

●●●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●● ●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●●

●

●●

●

●

●

●

●

●

●

●

● ●

●

●

●●

●

●

●

●●

●

●● ●

●

●●

●●

●

●

●

●

●

●

●

●●●

●

●●

●

●

●●

●

●

●

●

● ●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●● ●

●

●●

●

●

●

●

●

− ε

+ ε

D

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

Density

theta

Den

sity

ABCTrue

� = 2.5

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●● ●

●

●

●●

●

●

●

●

●

●

●

●

●

● ● ●●

●●

●

●●

●

●

●

●

●

●

●

●

●●

●●

●

●

●●

● ●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●●

●

●●

●

●

●

●

●

●

● ●●

●

●

●●

●●

●

●

●

●

●

●

●

●

● ●

●●

●

●

● ●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

● ●● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●●

●

●

●●

● ●

●

●

●

●

● ●

● ●

●

●

●

●●

●

●

●

●

●

●

●

●

● ●

●

●●

●

●

●● ●

●

●

●

●●

●

●

●

●

●

●

●

●●●

●●

●

●

●

●

●

●

●

●

● ●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●●

●

●●●

●

●

●●

●

●

●

●

●●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

● ●●

●

●

●

●

●●

●

●

●

●●

●●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

● ●

●●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

−3 −2 −1 0 1 2 3

−10

010

20

theta vs D

theta

D

●

●●

●

●●

●

●

●

●

●●

●●●●

●

● ●●

●

●●

●●

●●

●●●●

●

●●●

● ●●

●●

●●

●

● ●

●

●

●●

●

●●●

●

●

●●

●

●●

●●

●●

●

●●

●●●●

●

●

●●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●●● ●

●

●

●

●

●

●●●

●

●●

●

●

●●

●●●

●

●

●

●

●●

●

●

●●

●

●

●

●●●

●●

●●

●

●

●●

●

●

●

●●●

●

●●●

●

●

●●●

● ● ●

●

●●

●●●

●●

●●

●●

●●

●

●●● ●●

●

●

●

●●

●●

●

●

●

●

●●●

●

●●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●● ●

●

●●

●

●●

●● ●

●●

●●

●

●●●

●

●●

●●

●

●

●

●●

● ●

●

●

●

●

●

●

●●

●●

●

●● ●

●

●●

●

●

− ε

+ ε

D

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

Density

theta

Den

sity

ABCTrue

� = 1

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●● ●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

● ● ●●

●

●

●

●●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●●

●●●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●●

●

●●

●

●

●

●

●

●

●

●

●

● ●

●●

●

●

●

● ●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●● ●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

● ●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●●

●

●

●● ●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●●

●

●

●

●●

●

●

●●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●●

●

●

●

●

●●

●

●

●

●●

●●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

● ●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●● ●

●

●

● ●

●●

●

●●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

−3 −2 −1 0 1 2 3

−10

010

20

theta vs D

theta

D ●●● ●●●● ●●●●●●● ●

●●

●●● ●●

● ●

●●

●●

●●

●●

●●●●●

●

●●●●●

●

●●●

●●●

● ●●●●

● ● ●●●●

●

●●

●●

●●●●●

●●

●●

●●

●● ●● ●

● ●● ●●●

●

● ●● ● ●

●●● ●●

− ε

+ ε

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

Density

theta

Den

sity

ABCTrue

Rejection ABCIf the data are too high dimensional we never observe simulations that are‘close’ to the field data - curse of dimensionalityReduce the dimension using summary statistics, S(D).

Approximate Rejection Algorithm With Summaries

Draw θ from π(θ)

Simulate X ∼ f (θ)Accept θ if ρ(S(D), S(X )) < �

If S is sufficient this is equivalent to the previous algorithm.

Simple → Popular with non-statisticians

∃ many extensions and improvementsHow to choose S(D)

How to efficiently sample θ

An integrated molecular and palaeontological analysis

The fossil record does not constrain the primate divergence time asclosely as previously believed.

Genetic and palaeontology estimates unifiedHuman-chimp divergence time pushed further back.

Wilkinson et al. 2011, Bracken-Grissom et al. 2014.

Accelerating ABC: GP-ABCMonte Carlo methods (such as ABC) are costly and can require moresimulation than is possible. However,

most methods sample naively - they don’t learn from previoussimulations

they don’t exploit known properties of the likelihood function, suchas continuity

they sample randomly, rather than using careful design.

Emulators are the usual approach to dealing with complex models. But,emulating stochastic simulators is problematic.

Instead of modelling the simulator output, we can instead modelL(θ) = π(D|θ)

D remains fixed: we only need learn L as a function of θ

1d response surface

But, it can be hard to model.

Iteration 24Left=estimate, right = truth

http://youtu.be/FF3KhKh6NHg

Climate scienceWhat drives the glacial-interglacial cycle?

Eccentricity: orbital departure from a circle, controls duration of the seasonsObliquity: axial tilt, controls amplitude of seasonal cyclePrecession: variation in Earth’s axis of rotation, affects difference betweenseasons

Model selection

What drives the glacial-interglacial cycle?

Which aspect of the astronomical forcing is of primary importance?

Which models best represent the cycle?

Most simple models of the [...] glacial cycles have at leastfour degrees of freedom [parameters], and some have as manyas twelve. Unsurprisingly [...this is] insufficient to distinguishbetween the skill of the various models (Roe and Allen 1999)

Bayesian model selection revolves around the use of the Bayes factor,which are notoriously difficult to compute.

Model selection for stochastic differential equations

1000 observations, 3000 unknown state variables, 1000 unknowntimes, 17 unknown parameters, choice of 5 different simulators.

Simulation studies show we can accurately choose between competingmodels, and identify the correct forcing.

Age model

Can we also quantify chronologicaluncertainty?

dXt = g(Xt , θ)dt + F (t, γ)dt + ΣdW

Yt = d + sX1,t + �t

Plus an age model

dH = −µsdT + σdW

Targetπ(θ,T1:N ,X1:N ,Mk |y1:N)

where T1:N are the unknown times of the observations Y1:N , X1:N are theclimate state variables through time, Mk is the simulation model used,and θ is the corresponding parameter.

I.e., can we simultaneously date the stack, do climate reconstruction, fitthe model, and choose between models?

Age model

Can we also quantify chronologicaluncertainty?

dXt = g(Xt , θ)dt + F (t, γ)dt + ΣdW

Yt = d + sX1,t + �t

Plus an age model

dH = −µsdT + σdWTarget

π(θ,T1:N ,X1:N ,Mk |y1:N)

where T1:N are the unknown times of the observations Y1:N , X1:N are theclimate state variables through time, Mk is the simulation model used,and θ is the corresponding parameter.I.e., can we simultaneously date the stack, do climate reconstruction, fitthe model, and choose between models?

Simulation study results - age vs depth (trend removed)Dots = truth, black line = estimate, grey = 95% CI

●

●

●

●●

●

●●

●

●

● ●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●●

●

● ●

●

●● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

● ●

●●

●

●

●●

●

● ●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●●

●●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

● ●

●

●

●

● ●

●●

●

●

●

●

●

●

●

●

●

●

● ●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

30 25 20 15 10 5 0

−30

−20

−10

010

20

Depth (m)

Tim

e (d

rift r

emov

ed, k

yr)

Simulation study results - climate reconstructionDots = truth, black line = estimate, grey = 95% CI

●

●

●

●

●●

●

●

●

●

●

●

●

●

●● ●

●

● ●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

● ●

●

●

●

●

●

●

●

●●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

● ●

● ●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●●

● ●●

●●

●

●

● ● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

● ●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

● ●

● ●

●

●●

● ●

●

●

●

●

●

●

●●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

● ●

●

● ●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ● ●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●● ●

●

30 25 20 15 10 5 0

−1.

5−

1.0

−0.

50.

00.

51.

01.

5

Depth (m)

X1

●

●

●●

●●

●●

●

●

● ●

●●

●

●

●

●

●

●

●

●● ●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●●

● ●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●●

●

●

●

●

●

●

●

● ●

●

● ●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

● ●

●

●

●

●

●

●

●

● ●

●●

●

●

●

● ●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

● ●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

● ●●

●

●

●

●

●●

●

●

● ●

●

●

●

●

●●

●

●

●

●● ● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

● ●

●

●

●

●

● ●●

● ●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

● ●

● ●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

30 25 20 15 10 5 0

−3

−2

−1

01

23

Depth (m)

X2

Simulation study results - parameter estimation

−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

01

23

β0

−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

0.0

0.5

1.0

1.5

2.0

β1

0.0 0.5 1.0 1.5 2.0

0.0

0.5

1.0

1.5

β2

0.0 0.5 1.0 1.5 2.0

01

23

45

6

δ

0.0 0.2 0.4 0.6 0.8 1.0

02

46

8

γp

0.0 0.2 0.4 0.6 0.8 1.0

02

46

8

γc

0.0 0.2 0.4 0.6 0.8 1.0

02

46

810

γe

0.0 0.2 0.4 0.6 0.8 1.0

02

46

810

σ1

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.0

1.0

2.0

σ2

0.0 0.1 0.2 0.3 0.4 0.5

010

2030

40

σobs

3.0 3.5 4.0 4.5 5.0

02

46

810

D

0.5 1.0 1.5 2.0

01

23

45

67

S

0e+00 2e−05 4e−05 6e−05 8e−05 1e−04

050

000

1500

00

µs

0.000 0.002 0.004 0.006 0.008 0.010

050

010

0015

00

σs

0 10 20 30 40 50

0.00

0.04

0.08

α

0.0 0.2 0.4 0.6 0.8 1.0

02

46

8

φ0

0.000 0.001 0.002 0.003 0.004

020

060

010

00

c

Simultaneous inference of the choice between 5 models, 17 parameters,800 ages, 2400 climate variables, using just 800 observations.

Results for ODP846 - age vs depth (trend removed)Black = posterior mean, grey = 95%CI, red = Huybers 2007, blue = Lisieki and Raymo2004

30 25 20 15 10 5 0

−50

−40

−30

−20

−10

010

20

Depth (m)

Tim

e (d

rift r

emov

ed, k

yr)

Advantages: full UQ, model selection, simultaneous parameter estimationand climate reconstructionIgnoring uncertainty leads to incorrect conclusions

Model discrepancyConsider the state space model:

xt+1 = fθ(xt) + et , yt = g(xt) + �t

et ∼ p(·), �t ∼ q(·)

How do we correct errors in f or g?

Use a GP discrepancy model - eg, xt+1 = fθ(xt) + δ(xt) + et

Chapter 6: Gaussian Process Models of Simulator Discrepancy

−30 −20 −10 0 10 20 30

−15

−10

−5

05

10

15

f(xt,ut)=0.5xt+8ut+GP(0,K)

x(t)

x(t+1)−f(xt,ut)

−30 −20 −10 0 10 20 30

−15

−10

−5

05

10

15

f(xt,ut)= 25xt/(1+xt^2)+8ut+GP(0,K)

x(t)

x(t+1)−f(xt,ut)

−30 −20 −10 0 10 20 30

−15

−10

−5

05

10

15

f(xt,ut)= 8 ut+GP(0,K)

x(t)

x(t+1)−f(xt,ut)

−30 −20 −10 0 10 20 30

−15

−10

−5

05

10

15

f(xt,ut)= 0.5+8 ut+GP(0,K)

x(t)

x(t+1)−f(xt,ut)

−30 −20 −10 0 10 20 30

−15

−10

−5

05

10

15

f(xt,ut)=25 xt/(1+xt^2)+8 ut+GP(0,K)

x(t)

x(t+1)−f(xt,ut)

−30 −20 −10 0 10 20 30

−15

−10

−5

05

10

15

f(xt,ut)=8 ut+GP(0,K)

x(t)

x(t+1)−f(xt,ut)

−30 −20 −10 0 10 20 30

−15

−10

−5

05

10

15

f(xt,ut)= 0.5+8 ut+GP(0,K)

x(t)

x(t+1)−f(xt,ut)

−30 −20 −10 0 10 20 30

−15

−10

−5

05

10

15

f(xt,ut)= 25 xt/(1+xt^2)+8 ut+GP(0,K)

x(t)

x(t+1)−f(xt,ut)

−30 −20 −10 0 10 20 30

−15

−10

−5

05

10

15


x(t)

x(t+1)−f(xt,ut)

Figure 6.6: The learnt discrepancy (solid black line) and the true discrepancy function

(red line) from using different incorrect simulators with Gaussian process discrepancy.

Note that data are generated from Equations 6.4.1 and 6.4.2 with true parameters

(q2, r2) as (0.1, 1) (3 plots in the top row), (1, 0.1) (3 plots in the middle row), and

(1, 100) (3 plots in the bottom row), respectively.

ancy from using different incorrect simulators with Gaussian process discrepancy to

model different set of data. We have found that when data come from the system with

193

Technical challenge: inference using PGAS works but is expensive. Avariational approach looks more promising.

Model discrepancyConsider the state space model:

xt+1 = fθ(xt) + et , yt = g(xt) + �t

et ∼ p(·), �t ∼ q(·)

How do we correct errors in f or g?Use a GP discrepancy model - eg, xt+1 = fθ(xt) + δ(xt) + et

Chapter 6: Gaussian Process Models of Simulator Discrepancy

−30 −20 −10 0 10 20 30

−15

−10

−5

05

10

15

f(xt,ut)=0.5xt+8ut+GP(0,K)

x(t)

x(t+1)−f(xt,ut)

−30 −20 −10 0 10 20 30

−15

−10

−5

05

10

15

f(xt,ut)= 25xt/(1+xt^2)+8ut+GP(0,K)

x(t)

x(t+1)−f(xt,ut)

−30 −20 −10 0 10 20 30

−15

−10

−5

05

10

15


x(t)

x(t+1)−f(xt,ut)

−30 −20 −10 0 10 20 30

−15

−10

−5

05

10

15

f(xt,ut)= 0.5+8 ut+GP(0,K)

x(t)

x(t+1)−f(xt,ut)

−30 −20 −10 0 10 20 30

−15

−10

−5

05

10

15

f(xt,ut)=25 xt/(1+xt^2)+8 ut+GP(0,K)

x(t)

x(t+1)−f(xt,ut)

−30 −20 −10 0 10 20 30

−15

−10

−5

05

10

15

f(xt,ut)=8 ut+GP(0,K)

x(t)

x(t+1)−f(xt,ut)

−30 −20 −10 0 10 20 30

−15

−10

−5

05

10

15

f(xt,ut)= 0.5+8 ut+GP(0,K)

x(t)

x(t+1)−f(xt,ut)

−30 −20 −10 0 10 20 30

−15

−10

−5

05

10

15

f(xt,ut)= 25 xt/(1+xt^2)+8 ut+GP(0,K)

x(t)

x(t+1)−f(xt,ut)

−30 −20 −10 0 10 20 30

−15

−10

−5

05

10

15


x(t)

x(t+1)−f(xt,ut)

Figure 6.6: The learnt discrepancy (solid black line) and the true discrepancy function

(red line) from using different incorrect simulators with Gaussian process discrepancy.

Note that data are generated from Equations 6.4.1 and 6.4.2 with true parameters

(q2, r2) as (0.1, 1) (3 plots in the top row), (1, 0.1) (3 plots in the middle row), and

(1, 100) (3 plots in the bottom row), respectively.

ancy from using different incorrect simulators with Gaussian process discrepancy to

model different set of data. We have found that when data come from the system with

193

Technical challenge: inference using PGAS works but is expensive. Avariational approach looks more promising.

Conclusions

UQ can be vital: ignoring uncertainty can lead to incorrectconclusions, often in subtle ways.

Computational tractability is one of the key bottlenecks: bigsimulation and big data.

Methods from machine learning have the potential to help us makelarge advances in statistical methodology.

Thank you for listening!

richard wilkinson r.d.wilkinson@she eld.acrichard wilkinson r.d.wilkinson@she eld.ac.uk school of...

Documents