approximate bayesian computation - oxford …filippi/teaching/cdt/abclecture.pdfapproximate bayesian...

45
Approximate Bayesian Computation Sarah Filippi Department of Statistics University of Oxford 09/02/2016

Upload: hadien

Post on 20-May-2019

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Approximate Bayesian Computation - Oxford …filippi/Teaching/cdt/abcLecture.pdfApproximate Bayesian Computation Very sensitive to the choice of : should be close to 0 to obtain samples

Approximate Bayesian Computation

Sarah Filippi

Department of StatisticsUniversity of Oxford

09/02/2016

Page 2: Approximate Bayesian Computation - Oxford …filippi/Teaching/cdt/abcLecture.pdfApproximate Bayesian Computation Very sensitive to the choice of : should be close to 0 to obtain samples

Parameter inference in a signalling pathway

Growth factor

Receptor

RAS

RAF

MEKPPMEK

PP

ERKPP

MAP kinasekinase kinase

MAPkinase kinase

MAP kinase

Cellmembrane

Transcription, Proliferation, Survival, Differentiation

A

ERK ERK ERK ERKP P

PERKP

ERK ERK ERK ERKP P

PERKP

ERK ERKPPERK

P

Pt

ERKPP

Pt

Pt

ERKP

Pt

PtPt

Distributive

Phosph

orylation

Dep

hosp

horylatio

n

MEKPPMEK

PP

MEKPPMEK

PP

MEKPPMEK

PP MEK

PPMEK

PP

MEKPPMEK

PP

MEKPPMEK

PP

500

1000

1500

time

●●

● ●● ●

● ●● ● ● ● ● ● ● ●

●●

●●

● ●●

●●

● ● ●●

●●

●●

0 8 16 24 32 40

pERKpMEK

Model via differentialequations

Species 9Parameters 16

Initial conditions 2

Approximate Bayesian Computation Sarah Filippi 1 of 33

Page 3: Approximate Bayesian Computation - Oxford …filippi/Teaching/cdt/abcLecture.pdfApproximate Bayesian Computation Very sensitive to the choice of : should be close to 0 to obtain samples

Parameter inference in a signalling pathway

Growth factor

Receptor

RAS

RAF

MEKPPMEK

PP

ERKPP

MAP kinasekinase kinase

MAPkinase kinase

MAP kinase

Cellmembrane

Transcription, Proliferation, Survival, Differentiation

A

ERK ERK ERK ERKP P

PERKP

ERK ERK ERK ERKP P

PERKP

ERK ERKPPERK

P

Pt

ERKPP

Pt

Pt

ERKP

Pt

PtPt

Distributive

Phosph

orylation

Dep

hosp

horylatio

n

MEKPPMEK

PP

MEKPPMEK

PP

MEKPPMEK

PP MEK

PPMEK

PP

MEKPPMEK

PP

MEKPPMEK

PP

500

1000

1500

time

●●

● ●● ●

● ●● ● ● ● ● ● ● ●

●●

●●

● ●●

●●

● ● ●●

●●

●●

0 8 16 24 32 40

pERKpMEK

Model via differentialequations

Species 9Parameters 16

Initial conditions 2

Approximate Bayesian Computation Sarah Filippi 1 of 33

Page 4: Approximate Bayesian Computation - Oxford …filippi/Teaching/cdt/abcLecture.pdfApproximate Bayesian Computation Very sensitive to the choice of : should be close to 0 to obtain samples

NotationsWe have:• an observed dataset, denoted by x∗

• a parametric model, either deterministic or stochastic.Model = data-generating process:given a value of parameter θ ∈ Rd, the output of the system is

X ∼ f (·|θ)

The likelihood function: θ 7→ f (x∗|θ)

”All models are wrong but some are useful.” George E.P. Box

Aim:• parameter inference: determine how likely it is for each value ofθ to explain the data

• model selection: rank candidate models in terms of their abilityto explain the data

Approximate Bayesian Computation Sarah Filippi 2 of 33

Page 5: Approximate Bayesian Computation - Oxford …filippi/Teaching/cdt/abcLecture.pdfApproximate Bayesian Computation Very sensitive to the choice of : should be close to 0 to obtain samples

Bayesian methodsIn the Bayesian framework, we combine• the information brought by the data x∗, via the likelihood f (x∗|θ)• some a priori information, specified in a prior distribution π(θ)

These informations are summarized in the posterior distribution,which is derived using the Bayes formula:

p(θ|x∗) =f (x∗|θ)π(θ)∫

f (x∗|θ′)π(θ′)dθ′

0 20 40 60 80 100

likelihoo

d

θ0 20 40 60 80 100

prior

θ0 20 40 60 80 100

posterior

θ

X =

Approximate Bayesian Computation Sarah Filippi 3 of 33

Page 6: Approximate Bayesian Computation - Oxford …filippi/Teaching/cdt/abcLecture.pdfApproximate Bayesian Computation Very sensitive to the choice of : should be close to 0 to obtain samples

Sample from the posterior distribution

• In general, no closed-form solution• Use computer simulation and Monte-Carlo techniques to

sample from the posterior distribution• Typical likelihood-based approaches:◦ Importance sampling◦ Markov Chain Monte-Carlo (MCMC)◦ Population Monte-Carlo◦ Sequential Monte-Carlo (SMC) sampler◦ ...

Approximate Bayesian Computation Sarah Filippi 4 of 33

Page 7: Approximate Bayesian Computation - Oxford …filippi/Teaching/cdt/abcLecture.pdfApproximate Bayesian Computation Very sensitive to the choice of : should be close to 0 to obtain samples

What if we can not compute the likelihood?

In many applications of interest: either impossible orcomputationally too expensive.• stochastic differential equations• population genetics• ...

Use so called likelihood-free method such asApproximate Bayesian Computation (ABC).

Approximate Bayesian Computation Sarah Filippi 5 of 33

Page 8: Approximate Bayesian Computation - Oxford …filippi/Teaching/cdt/abcLecture.pdfApproximate Bayesian Computation Very sensitive to the choice of : should be close to 0 to obtain samples

Approximate Bayesian Computation

• Principle:◦ sample parameters θ from the prior distribution◦ select the values of θ such that the simulated data are close to the

observed data.

• More formally: given a small value of ε > 0,

p(θ|x∗) =f (x∗|θ)π(θ)

p(x∗)≈ pε(θ|x∗) =

∫f (x|θ)π(θ)1∆(x,x∗)≤ε dx

p(x∗)

Approximate Bayesian Computation Sarah Filippi 6 of 33

Page 9: Approximate Bayesian Computation - Oxford …filippi/Teaching/cdt/abcLecture.pdfApproximate Bayesian Computation Very sensitive to the choice of : should be close to 0 to obtain samples

Approximate Bayesian Computation

θ1

θ2

Model

t

x

Data, x∗

Simulation, x(θ)

d = ∆(x(θ), x∗)

Reject θ if d > εAccept θ if d ≤ ε

Acknowledgement to Prof. Michael Stumpf (Theoretical System Biology group, Imperial College London) for the slides.

Toni & Stumpf, Bioinformatics (2010)

Approximate Bayesian Computation Sarah Filippi 7 of 33

Page 10: Approximate Bayesian Computation - Oxford …filippi/Teaching/cdt/abcLecture.pdfApproximate Bayesian Computation Very sensitive to the choice of : should be close to 0 to obtain samples

Approximate Bayesian Computation

θ1

θ2

Model

t

x

Data, x∗

Simulation, x(θ)

d = ∆(x(θ), x∗)

Reject θ if d > εAccept θ if d ≤ ε

Acknowledgement to Prof. Michael Stumpf (Theoretical System Biology group, Imperial College London) for the slides.

Toni & Stumpf, Bioinformatics (2010)

Approximate Bayesian Computation Sarah Filippi 7 of 33

Page 11: Approximate Bayesian Computation - Oxford …filippi/Teaching/cdt/abcLecture.pdfApproximate Bayesian Computation Very sensitive to the choice of : should be close to 0 to obtain samples

Approximate Bayesian Computation

θ1

θ2

Model

t

x

Data, x∗

Simulation, x(θ)

d = ∆(x(θ), x∗)

Reject θ if d > εAccept θ if d ≤ ε

Acknowledgement to Prof. Michael Stumpf (Theoretical System Biology group, Imperial College London) for the slides.

Toni & Stumpf, Bioinformatics (2010)

Approximate Bayesian Computation Sarah Filippi 7 of 33

Page 12: Approximate Bayesian Computation - Oxford …filippi/Teaching/cdt/abcLecture.pdfApproximate Bayesian Computation Very sensitive to the choice of : should be close to 0 to obtain samples

Approximate Bayesian Computation

θ1

θ2

Model

t

x

Data, x∗

Simulation, x(θ)

d = ∆(x(θ), x∗)

Reject θ if d > εAccept θ if d ≤ ε

Acknowledgement to Prof. Michael Stumpf (Theoretical System Biology group, Imperial College London) for the slides.

Toni & Stumpf, Bioinformatics (2010)

Approximate Bayesian Computation Sarah Filippi 7 of 33

Page 13: Approximate Bayesian Computation - Oxford …filippi/Teaching/cdt/abcLecture.pdfApproximate Bayesian Computation Very sensitive to the choice of : should be close to 0 to obtain samples

Approximate Bayesian Computation

θ1

θ2

Model

t

x

Data, x∗

Simulation, x(θ)

d = ∆(x(θ), x∗)

Reject θ if d > εAccept θ if d ≤ ε

Acknowledgement to Prof. Michael Stumpf (Theoretical System Biology group, Imperial College London) for the slides.

Toni & Stumpf, Bioinformatics (2010)

Approximate Bayesian Computation Sarah Filippi 7 of 33

Page 14: Approximate Bayesian Computation - Oxford …filippi/Teaching/cdt/abcLecture.pdfApproximate Bayesian Computation Very sensitive to the choice of : should be close to 0 to obtain samples

Approximate Bayesian Computation

θ1

θ2

Model

t

x

Data, x∗

Simulation, x(θ)

d = ∆(x(θ), x∗)

Reject θ if d > εAccept θ if d ≤ ε

Acknowledgement to Prof. Michael Stumpf (Theoretical System Biology group, Imperial College London) for the slides.

Toni & Stumpf, Bioinformatics (2010)

Approximate Bayesian Computation Sarah Filippi 7 of 33

Page 15: Approximate Bayesian Computation - Oxford …filippi/Teaching/cdt/abcLecture.pdfApproximate Bayesian Computation Very sensitive to the choice of : should be close to 0 to obtain samples

Approximate Bayesian Computation

θ1

θ2

Model

t

x

Data, x∗

Simulation, x(θ)

d = ∆(x(θ), x∗)

Reject θ if d > εAccept θ if d ≤ ε

Acknowledgement to Prof. Michael Stumpf (Theoretical System Biology group, Imperial College London) for the slides.

Toni & Stumpf, Bioinformatics (2010)

Approximate Bayesian Computation Sarah Filippi 7 of 33

Page 16: Approximate Bayesian Computation - Oxford …filippi/Teaching/cdt/abcLecture.pdfApproximate Bayesian Computation Very sensitive to the choice of : should be close to 0 to obtain samples

Approximate Bayesian Computation

θ1

θ2

Model

t

x

Data, x∗

Simulation, x(θ)

d = ∆(x(θ), x∗)

Reject θ if d > εAccept θ if d ≤ ε

Acknowledgement to Prof. Michael Stumpf (Theoretical System Biology group, Imperial College London) for the slides.

Toni & Stumpf, Bioinformatics (2010)

Approximate Bayesian Computation Sarah Filippi 7 of 33

Page 17: Approximate Bayesian Computation - Oxford …filippi/Teaching/cdt/abcLecture.pdfApproximate Bayesian Computation Very sensitive to the choice of : should be close to 0 to obtain samples

Approximate Bayesian Computation

θ1

θ2

Model

t

x

Data, x∗

Simulation, x(θ)

d = ∆(x(θ), x∗)

Reject θ if d > εAccept θ if d ≤ ε

Acknowledgement to Prof. Michael Stumpf (Theoretical System Biology group, Imperial College London) for the slides.

Toni & Stumpf, Bioinformatics (2010)

Approximate Bayesian Computation Sarah Filippi 7 of 33

Page 18: Approximate Bayesian Computation - Oxford …filippi/Teaching/cdt/abcLecture.pdfApproximate Bayesian Computation Very sensitive to the choice of : should be close to 0 to obtain samples

Approximate Bayesian Computation

• Very sensitive to the choice of ε:◦ ε should be close to 0 to obtain samples from a distribution close

to the posterior distribution

limε→0

pε(θ|x∗) = p(θ|x∗)

◦ if ε too small, very low acceptance rate• More efficient variants of ABC:◦ regression-adjusted ABC

Tallmon et al., 2004; Fagundes et al., 2007; Blum and Francois, 2010

◦ Markov chain Monte Carlo ABC schemesMarjoram and Molitor, 2003; Ratmann et al., 2007

◦ ABC implementing some variant of sequential importancesampling (SIS) or sequential Monte Carlo (SMC)Sisson et al., 2007; Beaumont et al., 2009; Toni et al., 2009; Del Moral et al., 2011

Approximate Bayesian Computation Sarah Filippi 8 of 33

Page 19: Approximate Bayesian Computation - Oxford …filippi/Teaching/cdt/abcLecture.pdfApproximate Bayesian Computation Very sensitive to the choice of : should be close to 0 to obtain samples

Sequential ABC – a schematic representation

Prior, π(θ)

Population 1

...

Define set of intermediate distributions, πt, t = 1, ...., T

ε1 > ε2 > ...... > εT

πt−1(θ|∆(x∗, x(θ)) < εt−1)

Population t − 1

πt(θ|∆(x∗, x(θ)) < εt)

Population t

...

πT(θ|∆(x∗, x(θ)) < εT)

Population T

Acknowledgement to Prof. Michael Stumpf (Theoretical System Biology group, Imperial College London) for the slides.Toni et al. , J.Roy.Soc. Interface (2009); Toni & Stumpf, Bioinformatics (2010).

Approximate Bayesian Computation Sarah Filippi 9 of 33

Page 20: Approximate Bayesian Computation - Oxford …filippi/Teaching/cdt/abcLecture.pdfApproximate Bayesian Computation Very sensitive to the choice of : should be close to 0 to obtain samples

Sequential ABC – the approach (1)

In the rest of this lecture, we focus on the approach proposed byBeaumont et al., 2009; Toni et al., 2009.

The algorithms are related to the population Monte-Carloalgorithm (Cappe et al, 2004) and invoke importance samplingarguments:

• if a sample θ(t) = (θ(1,t), . . . θ(N,t)) is produced by simulatingeach θ(i,t) ∼ qit independent of now another conditional on thepast samples

• and if each point θ(i,t) is associated to an important weight

ω(i,t) ∝ π(θ(i,t))

qit

then 1N

∑Ni=1 ω

(i,t)h(θ(i,t)) is an unbiased estimator of∫

h(θ)π(θ)dθ.

Approximate Bayesian Computation Sarah Filippi 10 of 33

Page 21: Approximate Bayesian Computation - Oxford …filippi/Teaching/cdt/abcLecture.pdfApproximate Bayesian Computation Very sensitive to the choice of : should be close to 0 to obtain samples

Sequential ABC – the approach (2)In the sequential ABC algorithm, at each population t, we considera sample θ(t) = (θ(1,t), . . . θ(N,t)).

Each θ(i,t) is sampled independently as follows:• θ is sampled from the previous population{θ(i,t−1), ω(i,t−1)}1≤i≤N ,

• then it is perturbed using a perturbation kernel θ ∼ Kt(.|θ)therefore the proposal distribution is

qt(.) =

n∑j=1

ω(j,t−1)Kt(.|θ(j,t−1))

and the important weight associated to θ is

ω ∝ π(θ)∑nj=1 ω

(j,t−1)Kt(θ|θ(j,t−1)).

Approximate Bayesian Computation Sarah Filippi 11 of 33

Page 22: Approximate Bayesian Computation - Oxford …filippi/Teaching/cdt/abcLecture.pdfApproximate Bayesian Computation Very sensitive to the choice of : should be close to 0 to obtain samples

Sequential ABC – the algorithm

1: for all t do2: i← 13: repeat4: if t=1 then5: sample θ from π(θ)6: else7: sample θ from the previous population {θ(i,t−1), ω(i,t−1)}1≤i≤N

8: perturb θ from Kt(·|θ) so that π(θ) > 09: end if

10: sample x from f (·|θ)11: if ∆(x∗, x) ≤ εt then12: θ(i,t) ← θ; i← i + 113: end if14: until i = N + 115: calculate the weights: ω(i,t) ∝ π(θ(i,t))∑n

j=1 ω(j,t−1)Kt(θ(i,t)|θ(j,t−1))

; ω(i,1) = 1/N

16: end for

Toni et al, 2009; Beaumont et al, 2009

Approximate Bayesian Computation Sarah Filippi 12 of 33

Page 23: Approximate Bayesian Computation - Oxford …filippi/Teaching/cdt/abcLecture.pdfApproximate Bayesian Computation Very sensitive to the choice of : should be close to 0 to obtain samples

Sequential ABC – the algorithm

1: for all t do2: i← 13: repeat4: if t=1 then5: sample θ from π(θ)6: else7: sample θ from the previous population {θ(i,t−1), ω(i,t−1)}1≤i≤N

8: perturb θ from Kt(·|θ) so that π(θ) > 09: end if

10: sample x from f (·|θ)11: if ∆(x∗, x) ≤ εt then12: θ(i,t) ← θ; i← i + 113: end if14: until i = N + 115: calculate the weights: ω(i,t) ∝ π(θ(i,t))∑n

j=1 ω(j,t−1)Kt(θ(i,t)|θ(j,t−1))

; ω(i,1) = 1/N

16: end for

Toni et al, 2009; Beaumont et al, 2009

Approximate Bayesian Computation Sarah Filippi 13 of 33

Impact on computational efficiency:• perturbation kernels {Kt}t

• threshold schedule {εt}t

Page 24: Approximate Bayesian Computation - Oxford …filippi/Teaching/cdt/abcLecture.pdfApproximate Bayesian Computation Very sensitive to the choice of : should be close to 0 to obtain samples

Choice of the perturbation kernel

Use adaptive perturbation kernel

• Local perturbation kernel:◦ hardly moves particles◦ high acceptance rate if successive values of ε close enough

• Widely spread kernel:◦ exploration of the parameter space◦ low acceptance rate

Balance between exploring the parameter space andensuring a high acceptance rate

Approximate Bayesian Computation Sarah Filippi 14 of 33

Page 25: Approximate Bayesian Computation - Oxford …filippi/Teaching/cdt/abcLecture.pdfApproximate Bayesian Computation Very sensitive to the choice of : should be close to 0 to obtain samples

Properties of optimal kernel

1. From sequential importance sampling theory:

similarity between two joint distributions of (θ(t−1), θ(t)) whereθ(t−1) ∼ pεt−1 and◦ θ(t) constructed by perturbing θ(t−1) and accepting according to

threshold εt◦ θ(t) ∼ pεt

2. Computational efficiency: high acceptance rate

3. Theoretical requirements for convergence:

◦ kernel with larger support than the target distribution⇒ guarantee asymptotic unbiasedness of the empirical mean

◦ vanish slowly enough in the tails of the target⇒ guarantee finite variance of the estimator

Approximate Bayesian Computation Sarah Filippi 15 of 33

Page 26: Approximate Bayesian Computation - Oxford …filippi/Teaching/cdt/abcLecture.pdfApproximate Bayesian Computation Very sensitive to the choice of : should be close to 0 to obtain samples

Derivation of optimal kernel

Criteria 1: Resemblance between the two distributions

qεt−1,εt (θ(t−1), θ(t)|x)

=pεt−1(θ

(t−1)|x)Kt(θ(t)|θ(t−1))

∫f (x|θ(t)) 1 (∆(x∗, x) ≤ εt) dx

α(Kt, εt−1, εt, x)

and

q∗εt−1,εt(θ(t−1), θ(t)|x) = pεt−1(θ

(t−1)|x)pεt (θ(t)|x)

in terms of the KL divergence (Douc et al, 2007; Cappe et al, 2008; Beaumont et al, 2009)

KL(qεt−1,εt ; q∗εt−1,εt) = −Q(Kt, εt−1, εt, x) + logα(Kt, εt−1, εt, x) + C(εt−1, εt, x)

where

Q(Kt, εt−1, εt, x) =

∫∫pεt−1(θ

(t−1)|x)pεt (θ(t)|x) log Kt(θ

(t)|θ(t−1))dθ(t−1)dθ(t)

Approximate Bayesian Computation Sarah Filippi 16 of 33

Page 27: Approximate Bayesian Computation - Oxford …filippi/Teaching/cdt/abcLecture.pdfApproximate Bayesian Computation Very sensitive to the choice of : should be close to 0 to obtain samples

Derivation of optimal kernel

Criteria 1: mimise the KL divergence:

KL(qεt−1,εt ; q∗εt−1,εt) = −Q(Kt, εt−1, εt, x) + logα(Kt, εt−1, εt, x) + C(εt−1, εt, x)

Criteria 2: maximise the acceptance rate α(Kt, εt−1, εt, x)

MethodSelect the kernel Kt which maximises Q(Kt, εt−1, εt, x) which isequivalent to maximise −KL(qεt−1,εt ; q∗εt−1,εt

) + logα(Kt, εt−1, εt, x).

Remark:

Q(Kt, εt−1, εt, x) =

∫∫pεt−1(θ

(t−1)|x)pεt (θ(t)|x) log Kt(θ

(t)|θ(t−1))dθ(t−1)dθ(t)

can be maximized easily for some families of kernels.

Approximate Bayesian Computation Sarah Filippi 17 of 33

Page 28: Approximate Bayesian Computation - Oxford …filippi/Teaching/cdt/abcLecture.pdfApproximate Bayesian Computation Very sensitive to the choice of : should be close to 0 to obtain samples

Derivation of optimal kernel

Criteria 1: mimise the KL divergence:

KL(qεt−1,εt ; q∗εt−1,εt) = −Q(Kt, εt−1, εt, x) + logα(Kt, εt−1, εt, x) + C(εt−1, εt, x)

Criteria 2: maximise the acceptance rate α(Kt, εt−1, εt, x)

MethodSelect the kernel Kt which maximises Q(Kt, εt−1, εt, x) which isequivalent to maximise −KL(qεt−1,εt ; q∗εt−1,εt

) + logα(Kt, εt−1, εt, x).

Remark:

Q(Kt, εt−1, εt, x) =

∫∫pεt−1(θ

(t−1)|x)pεt (θ(t)|x) log Kt(θ

(t)|θ(t−1))dθ(t−1)dθ(t)

can be maximized easily for some families of kernels.

Approximate Bayesian Computation Sarah Filippi 17 of 33

Page 29: Approximate Bayesian Computation - Oxford …filippi/Teaching/cdt/abcLecture.pdfApproximate Bayesian Computation Very sensitive to the choice of : should be close to 0 to obtain samples

Gaussian random walk kernels• Component-wise kernel:

diagonal covariance matrix Σ(t) Beaumont et al, 2009

• Multi-variate normal kernel:

Σ(t) ≈N∑

i=1

N0∑k=1

ω(i,t−1)ω(k)(θ(k)−θ(i,t−1))(θ(k)−θ(i,t−1))T

θ1

θ2

θ1

θ2

where{θ(k)}

1≤k≤N0

={θ(i,t−1) s.t. ∆(x∗, x(i,t−1)) ≤ εt, 1 ≤ i ≤ N

}• Local multi-variate normal kernels:

use a different kernel for each particle θ(t−1).◦ Nearest neighbours: Σ

(t)θ(t−1),M based on the M nearest neighbours

◦ Optimal local covariance matrix:

Σ(t)θ(t−1) ≈

N0∑k=1

ω(k)(θ(k) − θ(t−1))(θ(k) − θ(t−1))T

◦ Based on FIM for models defined by ODE or SDE

Approximate Bayesian Computation Sarah Filippi 18 of 33

Page 30: Approximate Bayesian Computation - Oxford …filippi/Teaching/cdt/abcLecture.pdfApproximate Bayesian Computation Very sensitive to the choice of : should be close to 0 to obtain samples

Computational efficiency of perturbation kernels

p1

p2

p3

m1

m2

m3

05000

1000015000

01

23

64

5

x106

Num

ber o

f sim

ulat

ions

Running time

Uniform Normalcompon.

Wholepopulation

M-nearestneighbours

OLCM FIM scaledbased on

whole pop.

FIM scaledbased onM neighb.

Multivariate normal

p1

p2 m

01

23

4

200400

600800

10001200

0

Num

ber

of s

imul

atio

ns

Running tim

e

Uniform Normalcompon.

Wholepopulation

M-nearestneighbours

OLCM FIM scaledbased on

whole pop.

FIM scaledbased onM neighb.

Multivariate normal

x105

gene

The Repressilatorsystem

The Hes1 transcriptional

regulation model

Approximate Bayesian Computation Sarah Filippi 19 of 33

Page 31: Approximate Bayesian Computation - Oxford …filippi/Teaching/cdt/abcLecture.pdfApproximate Bayesian Computation Very sensitive to the choice of : should be close to 0 to obtain samples

Computational cost of perturbation kernel

• simulating the data dominates• computational cost of perturbation kernel implementation:

Component-wise normal O(dN2)Multivariate normal based on the whole previous population O(d2N2)

Multivariate normal based on the M nearest neighbours O((d + M)N2 + d2M2N)Multivariate normal with OLCM O(d2N2)

Multivariate normal based on the FIM O(dCN + d2N2)(normalized with entire population)

N = population size; d = parameter dimension

RecommendationUse of multivariate kernels with OLCM• highest acceptance rate in our examples• relatively easy to implement at acceptable computational cost

Approximate Bayesian Computation Sarah Filippi 20 of 33

Page 32: Approximate Bayesian Computation - Oxford …filippi/Teaching/cdt/abcLecture.pdfApproximate Bayesian Computation Very sensitive to the choice of : should be close to 0 to obtain samples

Choice of the threshold schedule

1: for all t do2: i← 13: repeat4: if t=1 then5: sample θ from π(θ)6: else7: sample θ from the previous population {θ(i,t−1), ω(i,t−1)}1≤i≤N

8: perturb θ from Kt(·|θ) so that π(θ) > 09: end if

10: sample x from f (·|θ)11: if ∆(x∗, x) ≤ εt then12: θ(i,t) ← θ; i← i + 113: end if14: until i = N + 115: calculate the weights: ω(i,t) ∝ π(θ(i,t))∑n

j=1 ω(j,t−1)Kt(θ(i,t)|θ(j,t−1))

; ω(i,1) = 1/N

16: end for

Toni et al, 2009; Beaumont et al, 2009

Approximate Bayesian Computation Sarah Filippi 21 of 33

• final value of ε close to 0• minimise number of simulations i.e.◦ minimise number of populations◦ maximise acceptance rate per pop.

Page 33: Approximate Bayesian Computation - Oxford …filippi/Teaching/cdt/abcLecture.pdfApproximate Bayesian Computation Very sensitive to the choice of : should be close to 0 to obtain samples

Choice of the threshold schedule

Previous approaches• Trial and error• Adaptive method based on quantile

Beaumont et al, 2009; Del Moral et al, 2008; Drovandi et al, 2011; Lenormand et al, 2012

• ...

The quantile based approach:εt = α - quantile of the distances {∆(x(i,t−1), x∗)}1≤i≤N

Which value of α should we chose?

An adaptive choice of α: Sedki et al, 2013.

Approximate Bayesian Computation Sarah Filippi 22 of 33

Page 34: Approximate Bayesian Computation - Oxford …filippi/Teaching/cdt/abcLecture.pdfApproximate Bayesian Computation Very sensitive to the choice of : should be close to 0 to obtain samples

Drawback of the quantile approach

0 5 10 15 20

050

100

150

θ

dist

ance

bet

ween

g(θ

) and

g(θ

*)

●●●● ●● ●● ●● ●● ● ●● ●● ●●●● ● ●● ●● ● ● ●●● ●● ●● ●●●●● ●● ● ●● ●●● ● ● ●●●● ● ●●● ●●●●● ● ●● ●● ●● ●● ● ● ● ●● ●● ●● ● ●● ● ●● ●●● ●●● ● ●●● ●● ●

● ●● ●●● ● ●●● ●●●●● ●● ●● ● ●● ● ●●● ●● ● ●● ●● ●● ●●● ● ●●● ●● ●●● ●●● ● ●● ●●● ●●● ● ●● ●● ● ● ●●● ●●● ●● ●●● ● ● ●● ● ●●● ●● ● ●● ● ●● ● ●● ● ●● ●

● ●●● ●● ●●● ●●● ● ●● ●● ●● ● ●● ● ● ●● ●●●● ●● ● ●● ●●● ●●● ●● ● ●● ●● ●● ● ●● ● ● ●● ●● ● ●●● ●●● ●● ●● ●● ●●● ●●● ● ●●● ● ●●● ●●● ●● ●● ● ● ●●●● ●

●●● ● ●●● ●● ●●● ●●● ● ●● ● ●●● ●● ● ●● ● ●● ●●●● ●●●● ●● ●● ●● ●●●●● ● ●● ●● ●●● ● ●● ●●●● ●●● ● ●●● ●● ●●● ●● ●●● ● ●● ●●●● ●● ●●●●● ●● ●●●

●● ●● ●● ●● ●●● ●● ● ●● ●●● ●●●● ●● ● ●● ●● ●● ● ●● ● ● ●●●●● ●● ●● ●●● ● ●● ●● ●●● ●● ● ●● ●●● ●● ● ●●●●● ●● ● ●● ●●●● ●●● ●● ● ●● ●● ●●● ●●● ● ●

●● ● ●● ●●●●●● ● ●● ● ●● ●● ●● ● ●● ● ●●●●●● ●●● ●● ●●● ● ●●●● ● ● ●●● ●●●● ●● ●● ●●● ●●● ● ●● ●● ●● ●● ●● ●●● ●●●● ● ●● ●●● ●● ●●●●●● ● ● ●● ●● ●● ●●● ●● ●● ●●● ● ●●● ● ●● ●●● ● ●●● ●● ●● ● ● ● ●●●●● ●● ●● ●● ● ●●● ●●●● ● ●● ● ●●● ●● ● ● ●● ●●● ● ●● ●● ●● ●●● ●● ●●●● ●●● ● ● ●●●● ●●● ●● ●

Algorithm stuck ina local minimum

0.05θ*

• For large α, the algorithm ‘fails‘.• The optimal (or safe) choice of α depends on the data, the

model and the prior range.

Approximate Bayesian Computation Sarah Filippi 23 of 33

Page 35: Approximate Bayesian Computation - Oxford …filippi/Teaching/cdt/abcLecture.pdfApproximate Bayesian Computation Very sensitive to the choice of : should be close to 0 to obtain samples

The threshold-acceptance rate curveIdea:• use the threshold-acceptance rate curve• avoid area with excessively high acceptance rate

0 5 10 15 20

050

100

150

θ

dist

ance

bet

ween

g(θ)

and

g(θ∗)

●● ●● ●● ●●●● ● ●●●● ● ●●● ●●● ●● ● ● ●●● ●● ●●● ● ●● ●● ● ●● ● ●●● ●● ●● ●●● ●● ● ●● ●● ●● ● ●● ●● ● ● ● ●● ● ●●●●● ● ● ●● ● ●●● ●● ●● ● ●● ●● ●● ● ●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

Population 1

Population 2

Population 3

α(ε)

Approximate Bayesian Computation Sarah Filippi 24 of 33

Page 36: Approximate Bayesian Computation - Oxford …filippi/Teaching/cdt/abcLecture.pdfApproximate Bayesian Computation Very sensitive to the choice of : should be close to 0 to obtain samples

Exploit the threshold-acceptance rate curve

epsilon

acce

ptan

ce ra

te

Convex shapes: symptom ofsampling from local optima.

Proposed method:

• Set ε∗ = argmaxε∂2αt∂ε2

• if α(ε∗) not too small, εt = ε∗

• otherwiseεt = argminε ∆(( ε

εt−1, αt(ε)αt(εt−1)), (0, 1))

Approximate Bayesian Computation Sarah Filippi 25 of 33

Page 37: Approximate Bayesian Computation - Oxford …filippi/Teaching/cdt/abcLecture.pdfApproximate Bayesian Computation Very sensitive to the choice of : should be close to 0 to obtain samples

Estimating the threshold-acceptance rate curve

• Acceptance rate

αt(ε) =

∫ ∫qt(θ)f (x|θ)1 (∆(x, x∗) ≤ ε) dx dθ

where qt(θ) =∑N

i=1 ω(i,t−1)Kt(θ|θ(i,t−1))

• No closed-form expression and difficult to approximate

• For deterministic models, use Unscented Transform (UT)

Parameter space Output space

Approximate Bayesian Computation Sarah Filippi 26 of 33

Page 38: Approximate Bayesian Computation - Oxford …filippi/Teaching/cdt/abcLecture.pdfApproximate Bayesian Computation Very sensitive to the choice of : should be close to 0 to obtain samples

Estimating the threshold-acceptance rate curve

• Acceptance rate

αt(ε) =

∫ ∫qt(θ)f (x|θ)1 (∆(x, x∗) ≤ ε) dx dθ

where qt(θ) =∑N

i=1 ω(i,t−1)Kt(θ|θ(i,t−1))

• No closed-form expression and difficult to approximate• For deterministic models, use Unscented Transform (UT)

Parameter space Output space

Approximate Bayesian Computation Sarah Filippi 26 of 33

Page 39: Approximate Bayesian Computation - Oxford …filippi/Teaching/cdt/abcLecture.pdfApproximate Bayesian Computation Very sensitive to the choice of : should be close to 0 to obtain samples

Adaptive method for threshold choice

At each population t,1. generate a population of perturbed particles,2. fit a Gaussian mixture model to the perturbed population3. estimate

pt(x) =

∫qt(θ)f (x|θ)dθ

using the unscented transform independently for eachcomponent of the Gaussian mixture,

4. estimateαt(ε) =

∫pt(x)1 (∆(x, x∗) ≤ ε) dx

RemarkThe threshold-acceptance rate curve depends on the perturbationkernel.

Approximate Bayesian Computation Sarah Filippi 27 of 33

Page 40: Approximate Bayesian Computation - Oxford …filippi/Teaching/cdt/abcLecture.pdfApproximate Bayesian Computation Very sensitive to the choice of : should be close to 0 to obtain samples

Computational efficiency of the adaptive methodThe Repressilator system:

Approximate Bayesian Computation Sarah Filippi 28 of 33

Page 41: Approximate Bayesian Computation - Oxford …filippi/Teaching/cdt/abcLecture.pdfApproximate Bayesian Computation Very sensitive to the choice of : should be close to 0 to obtain samples

Computational efficiency of the adaptive methodChemical reaction system illustrating the ”local minimum problem”

100

200

300

400

500

Failure

Approximate Bayesian Computation Sarah Filippi 29 of 33

Page 42: Approximate Bayesian Computation - Oxford …filippi/Teaching/cdt/abcLecture.pdfApproximate Bayesian Computation Very sensitive to the choice of : should be close to 0 to obtain samples

Take-home messages

Approximate Bayesian ComputationTo be used for parameter inference when the likelihood can not be

computed but it is possible to simulate from the model.

Sequential approachesComputationally more efficient.

Perturbation kernelGaussian random walk with the

optimal (local) covariance matrix• low computational cost• high acceptance rate

Threshold schedule• the quantile approach can

lead to the wrong posterior• exploit the threshold

-acceptance rate curve

Filippi et al, 2013; Silk, Filippi, Stumpf (2013.)

Approximate Bayesian Computation Sarah Filippi 30 of 33

Page 43: Approximate Bayesian Computation - Oxford …filippi/Teaching/cdt/abcLecture.pdfApproximate Bayesian Computation Very sensitive to the choice of : should be close to 0 to obtain samples

References

Beaumont, M., Cornuet, J.-M., Marin, J.-M., and Robert, C. (2009).Adaptive approximate Bayesian computation. Biometrika.

Blum, M. and Francois, O. (2010). Non-linear regression models forapproximate Bayesian computation. Statist. Comput.

Cappe, O., Guillin, A., Marin, J.-M., and Robert, C. (2004). PopulationMonte Carlo. J. Comput. Graph. Statist.

Del Moral, P., Doucet, A., and Jasra, A. (2006). Sequential Monte Carlosamplers. JRSS Series B

Fagundes, N., et al. (2007). Statistical evaluation of alternative modelsof human evolution. PNAS.

Filippi, S., Barnes, C., Cornebise, J., and Stumpf, M. (2013). Onoptimality of kernels for approximate Bayesian computation usingsequential Monte Carlo. SAGMB

Marjoram, P., Molitor, J., Plagnol, V., and Tavare, S. (2003). Markovchain Monte Carlo without likelihoods. PNAS.

Approximate Bayesian Computation Sarah Filippi 31 of 33

Page 44: Approximate Bayesian Computation - Oxford …filippi/Teaching/cdt/abcLecture.pdfApproximate Bayesian Computation Very sensitive to the choice of : should be close to 0 to obtain samples

References

Ratmann, O., et al. (2007). Using likelihood-free inference to compareevolutionary dynamics of the protein networks of H. pylori and P.falciparum. PLoS Comput Biol .

Silk, D., Filippi, S., and Stumpf, M. (2013). Optimizingthreshold-schedules for sequential approximate Bayesian computation:applications to molecular systems. SAGMB

Sisson, S. A., Fan, Y., and Tanaka, M. (2007). Sequential Monte Carlowithout likelihoods. PNAS.

Toni, T., Welch, D., Strelkowa, N., Ipsen, A., and Stumpf, M. (2009).Approximate Bayesian computation scheme for parameter inference andmodel selection in dynamical systems. JRSS Interface.

Approximate Bayesian Computation Sarah Filippi 32 of 33

Page 45: Approximate Bayesian Computation - Oxford …filippi/Teaching/cdt/abcLecture.pdfApproximate Bayesian Computation Very sensitive to the choice of : should be close to 0 to obtain samples

Other references

Blum, M. (2010). Approximate Bayesian Computation: a non-parametricperspective. J. American Statist. Assoc.

Fearnhead, P. and Prangle, D. (2012). Constructing summary statisticsfor Approximate Bayesian Computation: semi-automatic ApproximateBayesian Computation. JRSS: Series B, (With discussion.).

Marin, J., Pudlo, P., Robert, C., and Ryder, R. (2012). ApproximateBayesian computational methods. Statistics and Computing,.

Robert, C., Cornuet, J.-M., Marin, J.-M., and Pillai, N. (2011). Lack ofconfidence in ABC model choice. PNAS.

Rubin, D. (1984). Bayesianly justifiable and relevant frequencycalculations for the applied statistician. Ann. Statist.

Wilkinson, R. (2013). Approximate Bayesian computation (ABC) givesexact results under the assumption of model error. SAGMB.

Approximate Bayesian Computation Sarah Filippi 33 of 33