![Page 1: An introduction to Sequential Monte Carlomlg.eng.cam.ac.uk/thang/docs/talks/rcc_smc.pdf · Doucet, Arnaud, De Freitas, Nando, ... An introduction to sequential Monte Carlo methods](https://reader031.vdocuments.us/reader031/viewer/2022022502/5aabe3a77f8b9a2e088c68dd/html5/thumbnails/1.jpg)
1
An introduction toSequential Monte Carlo
Thang Bui Jes Frellsen
Department of EngineeringUniversity of Cambridge
Research and Communication Club6 February 2014
![Page 2: An introduction to Sequential Monte Carlomlg.eng.cam.ac.uk/thang/docs/talks/rcc_smc.pdf · Doucet, Arnaud, De Freitas, Nando, ... An introduction to sequential Monte Carlo methods](https://reader031.vdocuments.us/reader031/viewer/2022022502/5aabe3a77f8b9a2e088c68dd/html5/thumbnails/2.jpg)
2
Sequential Monte Carlo (SMC) methods
I Initially designed for online inference in dynamical systemsI Observations arrive sequentially and one needs to update
the posterior distribution of hidden variablesI Analytically tractable solutions are available for linear
Gaussian models, but not for complex modelsI Examples: target tracking, time series analysis, computer
visionI Increasingly used to perform inference for a wide range of
applications, not just dynamical systemsI Example: graphical models, population genetic, ...
I SMC methods are scalable, easy to implement and flexible!
![Page 3: An introduction to Sequential Monte Carlomlg.eng.cam.ac.uk/thang/docs/talks/rcc_smc.pdf · Doucet, Arnaud, De Freitas, Nando, ... An introduction to sequential Monte Carlo methods](https://reader031.vdocuments.us/reader031/viewer/2022022502/5aabe3a77f8b9a2e088c68dd/html5/thumbnails/3.jpg)
3
OutlineMotivationReferences
IntroductionMCMC and importance samplingSequential importance sampling and resamplingExample: A dynamical systemProposalSmoothingMAP estimationParameter estimation
A generic SMC algorithm
Particle MCMC
Particle learning for GP regression
Summary
![Page 4: An introduction to Sequential Monte Carlomlg.eng.cam.ac.uk/thang/docs/talks/rcc_smc.pdf · Doucet, Arnaud, De Freitas, Nando, ... An introduction to sequential Monte Carlo methods](https://reader031.vdocuments.us/reader031/viewer/2022022502/5aabe3a77f8b9a2e088c68dd/html5/thumbnails/4.jpg)
4
Bibliography I
Andrieu, Christophe, Doucet, Arnaud, & Holenstein, Roman. 2010.Particle markov chain monte carlo methods.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(3),269–342.
Cappé, Olivier, Godsill, Simon J, & Moulines, Eric. 2007.An overview of existing methods and recent advances in sequential Monte Carlo.Proceedings of the IEEE, 95(5), 899–924.
Doucet, Arnaud, De Freitas, Nando, & Gordon, Neil. 2001.An introduction to sequential Monte Carlo methods.Pages 3–14 of: Sequential Monte Carlo methods in practice.Springer.
Gramacy, Robert B., & Polson, Nicholas G. 2011.Particle Learning of Gaussian Process Models for Sequential Design andOptimization.Journal of Computational and Graphical Statistics, 20(1), 102–118.
Holenstein, Roman. 2009.Particle Markov Chain Monte Carlo.Ph.D. thesis, The University Of British Columbia.
![Page 5: An introduction to Sequential Monte Carlomlg.eng.cam.ac.uk/thang/docs/talks/rcc_smc.pdf · Doucet, Arnaud, De Freitas, Nando, ... An introduction to sequential Monte Carlo methods](https://reader031.vdocuments.us/reader031/viewer/2022022502/5aabe3a77f8b9a2e088c68dd/html5/thumbnails/5.jpg)
5
Bibliography II
Holenstein, Roman, & Doucet, Arnaud. 2007.Particle Markov Chain Monte Carlo.Workshop: New directions in Monte Carlo Methods Fleurance, France.
Wilkinson, Richard. 2014.GPs huh? what are they good for?Gaussian Process Winter School, Sheffield.
![Page 6: An introduction to Sequential Monte Carlomlg.eng.cam.ac.uk/thang/docs/talks/rcc_smc.pdf · Doucet, Arnaud, De Freitas, Nando, ... An introduction to sequential Monte Carlo methods](https://reader031.vdocuments.us/reader031/viewer/2022022502/5aabe3a77f8b9a2e088c68dd/html5/thumbnails/6.jpg)
6
State Space Models(Doucet et al. , 2001; Cappé et al. , 2007)
The Markovian, nonlinear, non-Gaussian state space modelI Unobserved signal or states {xt|t ∈ N}I Observations or output {yt|t ∈ N+} or {yt|t ∈ N}
P(x0)
P(xt|xt−1) for t ≥ 1 (transition probability)P(yt|xt) for t ≥ 0 (emission/observation probability)
xt−1
yt−1 yt yt+1
xt+1xt
![Page 7: An introduction to Sequential Monte Carlomlg.eng.cam.ac.uk/thang/docs/talks/rcc_smc.pdf · Doucet, Arnaud, De Freitas, Nando, ... An introduction to sequential Monte Carlo methods](https://reader031.vdocuments.us/reader031/viewer/2022022502/5aabe3a77f8b9a2e088c68dd/html5/thumbnails/7.jpg)
7
Inference for State Space Model(Doucet et al. , 2001; Cappé et al. , 2007)
I We are interested the posterior distributions of theunobserved signal
P(x0:t|y0:t) – fixed interval smoothing distributionP(xt−L|y0:t) – fixed lag smoothing distribution
P(xt|y0:t) – filtering distribution
I and expectations under these posteriors, e.g.
EP(x0:t|y0:t)(ht) =
∫ht(x0:t)P(x0:t|y0:t) dx0:t
for some function ht : X (t+1) → Rnht
![Page 8: An introduction to Sequential Monte Carlomlg.eng.cam.ac.uk/thang/docs/talks/rcc_smc.pdf · Doucet, Arnaud, De Freitas, Nando, ... An introduction to sequential Monte Carlo methods](https://reader031.vdocuments.us/reader031/viewer/2022022502/5aabe3a77f8b9a2e088c68dd/html5/thumbnails/8.jpg)
8
Couldn’t we use MCMC?(Doucet et al. , 2001; Holenstein, 2009)
I Sure, generate N samples from P(x0:t|y0:t) using MHI Sample a candidate x′0:t from a proposal distribution
x′0:t ∼ q(x′0:t|x0:t)
I Accept the candidate x′0:t with probability
α(x′0:t|x0:t) = min[
1,P(x′0:t|y0:t)q(x0:t|x′0:t)P(x0:t|y0:t)q(x′0:t|x0:t)
]I Obtain a set of sample {x(i)0:t}N
i=1I Calculate empirical estimates for posterior and expectation
P(x0:t|y0:t) =1N
∑i
δx(i)
0:t(x0:t)
EP(x0:t|y0:t)(ht) =
∫ht(x0:t)P(x0:t) dx0:t =
1N
N∑i=1
ht(x(i)0:t)
![Page 9: An introduction to Sequential Monte Carlomlg.eng.cam.ac.uk/thang/docs/talks/rcc_smc.pdf · Doucet, Arnaud, De Freitas, Nando, ... An introduction to sequential Monte Carlo methods](https://reader031.vdocuments.us/reader031/viewer/2022022502/5aabe3a77f8b9a2e088c68dd/html5/thumbnails/9.jpg)
9
Couldn’t we use MCMC?(Doucet et al. , 2001; Holenstein, 2009)
I Unbiased estimates and in most cases nice convergence
EP(x0:t|y0:t)(ht)
a.s.−→ EP(x0:t|y0:t)(ht) as N →∞
I Problem solved!?I I can be hard to design a good proposal q
I Single-site updates q(x′j |x0:t) can lead to slow mixingI What happens if we get a new data point yt+1?
I We cannot (directly) reuse the samples {x(i)0:t}
I We have to run a new MCMC simulations for P(x0:t+1|y0:t+1)
I MCMC not well-suited for recursive estimation problems
![Page 10: An introduction to Sequential Monte Carlomlg.eng.cam.ac.uk/thang/docs/talks/rcc_smc.pdf · Doucet, Arnaud, De Freitas, Nando, ... An introduction to sequential Monte Carlo methods](https://reader031.vdocuments.us/reader031/viewer/2022022502/5aabe3a77f8b9a2e088c68dd/html5/thumbnails/10.jpg)
10
What about importance sampling?(Doucet et al. , 2001)
I Generate N i.i.d. samples {x(i)0:t}Ni=1 from an arbitrary
importance sampling distribution π(x0:t|y0:t)
I The empirical estimates are
P(x0:t|y0:t) =1N
N∑i=1
δx(i)
0:t(x0:t)w
(i)t
EP(x0:t|y0:t)(ht) =
1N
N∑i=1
ht(x(i)0:t)w
(i)t
where the importance weights are
w(x0:t) =P(x0:t|y0:t)
π(x0:t|y0:t)and w(i)
t =w(
x(i)0:t
)∑
j w(
x(j)0:t
)
![Page 11: An introduction to Sequential Monte Carlomlg.eng.cam.ac.uk/thang/docs/talks/rcc_smc.pdf · Doucet, Arnaud, De Freitas, Nando, ... An introduction to sequential Monte Carlo methods](https://reader031.vdocuments.us/reader031/viewer/2022022502/5aabe3a77f8b9a2e088c68dd/html5/thumbnails/11.jpg)
11
What about importance sampling?(Doucet et al. , 2001)
I EP(x0:t|y0:t)(ht) is biased, but converges to EP(x0:t|y0:t)(ht)
I Problem solved!?I Designing a good importance distribution can be hard!I Still not adequate for recursive estimation
I When seeing new data yt+1, we cannot reuse the samplesand weights for time t
{x(i)0:t , w
(i)t }N
i=1
to sample from P(x0:t+1|y0:t+1)
![Page 12: An introduction to Sequential Monte Carlomlg.eng.cam.ac.uk/thang/docs/talks/rcc_smc.pdf · Doucet, Arnaud, De Freitas, Nando, ... An introduction to sequential Monte Carlo methods](https://reader031.vdocuments.us/reader031/viewer/2022022502/5aabe3a77f8b9a2e088c68dd/html5/thumbnails/12.jpg)
12
Sequential importance sampling(Doucet et al. , 2001; Cappé et al. , 2007)
I Assume that the importance distribution can be factored as
π(x0:t|y0:t) = π(x0:t−1|y0:t−1)︸ ︷︷ ︸importance distribution
at time t − 1
π(xt|x0:t−1, y0:t)︸ ︷︷ ︸extension to time t
= π(x0|y0)
t∏k=1
π(xk|x0:k−1, y0:k)
I The importance weight can then be evaluated recursively
w(i)t ∝ w(i)
t−1P(yt|x(i)t )P(x(i)t |x(i)t−1)
π(x(i)t |x(i)0:t−1, y0:t)(1)
I Given past i.i.d. trajectories {x(i)0:t−1|i = 1, . . . ,N} we can1. simulate x(i)
t ∼ π(xt|x(i)0:t−1, y0:t)
2. update the weight w(i)t for x(i)
0:t based on w(i)t−1 using eq. (1)
I Note that the extended trajectories {x(i)0:t} remain i.i.d.
![Page 13: An introduction to Sequential Monte Carlomlg.eng.cam.ac.uk/thang/docs/talks/rcc_smc.pdf · Doucet, Arnaud, De Freitas, Nando, ... An introduction to sequential Monte Carlo methods](https://reader031.vdocuments.us/reader031/viewer/2022022502/5aabe3a77f8b9a2e088c68dd/html5/thumbnails/13.jpg)
13
Sequential importance sampling
{x(i)t , w
(i)t }
{x(i)t−1, w
(i)t−1}
Adapted from (Doucet et al. , 2001)
I Problem solved!?
![Page 14: An introduction to Sequential Monte Carlomlg.eng.cam.ac.uk/thang/docs/talks/rcc_smc.pdf · Doucet, Arnaud, De Freitas, Nando, ... An introduction to sequential Monte Carlo methods](https://reader031.vdocuments.us/reader031/viewer/2022022502/5aabe3a77f8b9a2e088c68dd/html5/thumbnails/14.jpg)
14
Sequential importance sampling
{x(i)t+1, w
(i)t+1}
{x(i)t , w
(i)t }
{x(i)t−1, w
(i)t−1}
Adapted from (Doucet et al. , 2001)
I Weights become highly degenerated after few steps
![Page 15: An introduction to Sequential Monte Carlomlg.eng.cam.ac.uk/thang/docs/talks/rcc_smc.pdf · Doucet, Arnaud, De Freitas, Nando, ... An introduction to sequential Monte Carlo methods](https://reader031.vdocuments.us/reader031/viewer/2022022502/5aabe3a77f8b9a2e088c68dd/html5/thumbnails/15.jpg)
15
Sequential importance resampling(Doucet et al. , 2001; Cappé et al. , 2007)
I Key idea to eliminate weight degeneracy1. Eliminate particles with low importance weights2. Multiply particles with high importance weights
I Introduce a resampling each time step (or “occasionally”)I Resample a new trajectory {x′(i)0:t |i = 1, . . . ,N}
I Draw N samples from
P(x0:t|y0:t) =1N
N∑i=1
δx(i)0:t(x0:t)w
(i)t
I The weights of the new samples are w′(i)t = 1
N
I The new empirical (unweighted) distribution a time step t
P′(x0:t|y0:t) =1N
N∑i=1
δx(i)
0:t(x0:t)N
(i)t
where N(i)t is the number of copies of x(i)0:t.
I N(i)t is sampled for a multinomial with parameters w(i)
t
![Page 16: An introduction to Sequential Monte Carlomlg.eng.cam.ac.uk/thang/docs/talks/rcc_smc.pdf · Doucet, Arnaud, De Freitas, Nando, ... An introduction to sequential Monte Carlo methods](https://reader031.vdocuments.us/reader031/viewer/2022022502/5aabe3a77f8b9a2e088c68dd/html5/thumbnails/16.jpg)
16
Sequential importance resampling(Doucet et al. , 2001; Cappé et al. , 2007)
1: for i = 1, . . . ,N do2: Sample x(i)0 ∼ π(x0|y0)
3: w(i)0 ←
P(y0|x(i)0 )P(x(i)
0 )
π(x(i)0 |y0)
4: for t = 1, . . . ,T doImportance sampling step
5: for i = 1, . . . ,N do6: Sample x(i)t ∼ π(xt|x0:t−1, y0:t)
7: x(i)0:t ← (x(i)0:t−1, x(i)t )
8: w(i)t ← w(i)
t−1P(yt|x(i)
t )P(x(i)t |x
(i)t−1)
π(x(i)t |x
(i)0:t−1,y0:t)
Resampling/selection step9: Sample N particles {x(i)0:t} from {x(i)0:t} according to {w(i)
t }10: w(i)
t ← 1N for i = 1, . . . ,N
11: return {x(i)0:t}Ni=1
![Page 17: An introduction to Sequential Monte Carlomlg.eng.cam.ac.uk/thang/docs/talks/rcc_smc.pdf · Doucet, Arnaud, De Freitas, Nando, ... An introduction to sequential Monte Carlo methods](https://reader031.vdocuments.us/reader031/viewer/2022022502/5aabe3a77f8b9a2e088c68dd/html5/thumbnails/17.jpg)
17
Sequential importance resampling
{x(i)t−1, N
−1}
{x(i)t−1, w
(i)t−1}
{x(i)t−1, N
−1}
{x(i)t , N−1}
{x(i)t , w
(i)t }
}Resampling
Importancesampling}
i = 1, . . . ,N and N = 10, figure modified from (Doucet et al. , 2001)
![Page 18: An introduction to Sequential Monte Carlomlg.eng.cam.ac.uk/thang/docs/talks/rcc_smc.pdf · Doucet, Arnaud, De Freitas, Nando, ... An introduction to sequential Monte Carlo methods](https://reader031.vdocuments.us/reader031/viewer/2022022502/5aabe3a77f8b9a2e088c68dd/html5/thumbnails/18.jpg)
18
Example - A dynamical system
xt = 12 xt−1 + 25 xt−1
1+x2t−1
+ 8 cos(1.2t) + ut
yt = x2t
20 + vt
wherex0 ∼ N (0, σ2
0), ut ∼ N (0, σ2u), vt ∼ N (0, σ2
v ), σ20 = σ2
u = 10, σ2v = 1.
0 10 20 30 40 50 60 70 80 90 100
−25
−20
−15
−10
−5
0
5
10
15
20
25
x
t
0 10 20 30 40 50 60 70 80 90 100
−5
0
5
10
15
20
25
30
y
t
![Page 19: An introduction to Sequential Monte Carlomlg.eng.cam.ac.uk/thang/docs/talks/rcc_smc.pdf · Doucet, Arnaud, De Freitas, Nando, ... An introduction to sequential Monte Carlo methods](https://reader031.vdocuments.us/reader031/viewer/2022022502/5aabe3a77f8b9a2e088c68dd/html5/thumbnails/19.jpg)
19
Posterior distribution of states
t
p(x
t|y1
:t)
10 20 30 40 50 60 70 80 90 100
−25
−20
−15
−10
−5
0
5
10
15
20
25
![Page 20: An introduction to Sequential Monte Carlomlg.eng.cam.ac.uk/thang/docs/talks/rcc_smc.pdf · Doucet, Arnaud, De Freitas, Nando, ... An introduction to sequential Monte Carlo methods](https://reader031.vdocuments.us/reader031/viewer/2022022502/5aabe3a77f8b9a2e088c68dd/html5/thumbnails/20.jpg)
20
Posterior distribution of states
t
p(x
t|y1
:t)
10 20 30 40 50 60 70 80 90 100
−25
−20
−15
−10
−5
0
5
10
15
20
25
![Page 21: An introduction to Sequential Monte Carlomlg.eng.cam.ac.uk/thang/docs/talks/rcc_smc.pdf · Doucet, Arnaud, De Freitas, Nando, ... An introduction to sequential Monte Carlo methods](https://reader031.vdocuments.us/reader031/viewer/2022022502/5aabe3a77f8b9a2e088c68dd/html5/thumbnails/21.jpg)
21
Posterior distribution of states
0
10
20
30
40
50
60
70
80
90
100
−30−20
−100
1020
30
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
t
xt
p(x
t|y1
:t)
![Page 22: An introduction to Sequential Monte Carlomlg.eng.cam.ac.uk/thang/docs/talks/rcc_smc.pdf · Doucet, Arnaud, De Freitas, Nando, ... An introduction to sequential Monte Carlo methods](https://reader031.vdocuments.us/reader031/viewer/2022022502/5aabe3a77f8b9a2e088c68dd/html5/thumbnails/22.jpg)
22
Proposal
I Bootstrap filter uses πt(xt|xt−1, yt) = P(xt|xt−1) which leadsto a simple form for the importance weight update:w(i)
t ∝ w(i)t−1P(yt|x(i)t )
I The weight update depends on the new proposed state andthe observation!
I Uninformative observation can lead to poor performanceI Optimal proposal:
πt(xt|xt−1, yt) = P(xt|xt−1, yt)
therefore: w(i)t ∝ w(i)
t−1P(yt|xt−1) =∫
P(yt|xt)P(xt|xt−1)dxtI The weight update depends on the previous state and the
observationI Analytically intractable integral, need to resort to
approximation techniques.
![Page 23: An introduction to Sequential Monte Carlomlg.eng.cam.ac.uk/thang/docs/talks/rcc_smc.pdf · Doucet, Arnaud, De Freitas, Nando, ... An introduction to sequential Monte Carlo methods](https://reader031.vdocuments.us/reader031/viewer/2022022502/5aabe3a77f8b9a2e088c68dd/html5/thumbnails/23.jpg)
23
Smoothing
I For a complex SSM, the posterior distribution of statevariables can be smoothed by including futureobservations.
I The joint smoothing distribution can be factorised:
P(x0:T |y0:T) = P(xT |y0:T)
T−1∏t=0
P(xt|xt+1, y0:t)
∝ P(xT |y0:T)
T−1∏t=0
P(xt|y0:t)︸ ︷︷ ︸filtering distribution
P(xt+1|xt)︸ ︷︷ ︸likelihood of future state
Hence, the weight update:
w(i)t (xt+1) = w(i)
t P(xt+1|xt)
![Page 24: An introduction to Sequential Monte Carlomlg.eng.cam.ac.uk/thang/docs/talks/rcc_smc.pdf · Doucet, Arnaud, De Freitas, Nando, ... An introduction to sequential Monte Carlo methods](https://reader031.vdocuments.us/reader031/viewer/2022022502/5aabe3a77f8b9a2e088c68dd/html5/thumbnails/24.jpg)
24
Particle smoother
Algorithm:I Run forward simulation to obtain particle paths{x(i)t ,w
(i)t }i=1,··· ,N;t=1,··· ,T
I Draw xT from P(xT |y0:T)
I Repeat:I Adjust and normalise the filtering weights w(i)
t :
w(i)t = w(i)
t P(xt+1|xt)
I Draw a random sample xt from P(xt:T |y0:T)
The sequence (x0, x2, · · · , xT) is a random draw from theapproximate distribution P(x0:T |y0:T) [O(NT)]
![Page 25: An introduction to Sequential Monte Carlomlg.eng.cam.ac.uk/thang/docs/talks/rcc_smc.pdf · Doucet, Arnaud, De Freitas, Nando, ... An introduction to sequential Monte Carlo methods](https://reader031.vdocuments.us/reader031/viewer/2022022502/5aabe3a77f8b9a2e088c68dd/html5/thumbnails/25.jpg)
25
MAP estimation
I Maximum a posteriori (MAP) estimate:
argmaxx0:T
P(x0:T |y0:T) = argmaxx0:T
P(x0)
T∏t=1
P(xt|xt−1)
T∏t=0
P(yt|xt)
Question: Can we just choose particle trajectory withlargest weights?Answer: NO!
I Assume a discrete particle grid, xt ∈ x(i)t 1≤i≤N , theapproximation can be interpreted as a Hidden MarkovModel with N states.MAP estimate can be found using the Viterbi algorithm
I Keep track of the probability of the most likely path so farI Keep track of the last state index of the most likely path so
far
![Page 26: An introduction to Sequential Monte Carlomlg.eng.cam.ac.uk/thang/docs/talks/rcc_smc.pdf · Doucet, Arnaud, De Freitas, Nando, ... An introduction to sequential Monte Carlo methods](https://reader031.vdocuments.us/reader031/viewer/2022022502/5aabe3a77f8b9a2e088c68dd/html5/thumbnails/26.jpg)
26
Viterbi algorithm for MAP estimation
3
1
−1
Particle 1
Particle 2
Observation
0.3
0.655
−6
4
0.1
0.2
0.05
0.150.3
0.65 0.1
0.2
0.05
0.15−2
2
20.04
0.01
0.02
0.02
0.05
0.2
0.040.01
0.02
0.02
Path probability update:
α(j)t = α
(j)t−1P(x(i)t |x(i)t−1)P(yt|x(i)t )
![Page 27: An introduction to Sequential Monte Carlomlg.eng.cam.ac.uk/thang/docs/talks/rcc_smc.pdf · Doucet, Arnaud, De Freitas, Nando, ... An introduction to sequential Monte Carlo methods](https://reader031.vdocuments.us/reader031/viewer/2022022502/5aabe3a77f8b9a2e088c68dd/html5/thumbnails/27.jpg)
27
Parameter estimation
I Consider SSMs that have P(xt|xt−1, θ),P(yt|xt, θ) where θ isa static parameter vector and one wishes to estimate θ.
I Marginal likelihood:
l(y0:T |θ) =∫
p(y0:T , x0:T |θ)dx0:T
I Optimise l(y0:T |θ) using the EM algorithm:I E-step:
τ(θ, θk) =
N∑i=1
w(i,θ)T
T−1∑t=0
st,θ(x(i,θk)t , x(i,θk)
t+1 )
where
st,θ(xt, xt+1) = log(P(xt+1|xt, θ)) + log(P(yt+1|xt+1, θ))
I Optimise τ(θ|θk) to update θk.
![Page 28: An introduction to Sequential Monte Carlomlg.eng.cam.ac.uk/thang/docs/talks/rcc_smc.pdf · Doucet, Arnaud, De Freitas, Nando, ... An introduction to sequential Monte Carlo methods](https://reader031.vdocuments.us/reader031/viewer/2022022502/5aabe3a77f8b9a2e088c68dd/html5/thumbnails/28.jpg)
28
A generic SMC algorithm(Holenstein, 2009)
I We want to sample from a target distribution π(x) , x ∈ X p
I Assume we have a sequence of bridging distributions ofincreasing dimension
{πn(xn)}pn=1 = {π1(x1), π2(x1, x2), . . . , πp(x1, . . . , xp)}
whereI πn(xn) = Z−1
n γn(xn)I πp(xp) = π(x)
I A sequence of importance densities on XM1(x1)︸ ︷︷ ︸
for initial sample
, {Mn(xn|xn−1)}pn=2︸ ︷︷ ︸
for extending xn−1 ∈ X n−1
by sampling xn ∈ X
I A resampling distribution
r(An|wn), An ∈ {1, . . . ,N}N and wn ∈ [0, 1]N
where Ain−1is the parent at time n− 1 of some particle Xi
n
![Page 29: An introduction to Sequential Monte Carlomlg.eng.cam.ac.uk/thang/docs/talks/rcc_smc.pdf · Doucet, Arnaud, De Freitas, Nando, ... An introduction to sequential Monte Carlo methods](https://reader031.vdocuments.us/reader031/viewer/2022022502/5aabe3a77f8b9a2e088c68dd/html5/thumbnails/29.jpg)
29
A generic SMC algorithm
From (Holenstein, 2009)
![Page 30: An introduction to Sequential Monte Carlomlg.eng.cam.ac.uk/thang/docs/talks/rcc_smc.pdf · Doucet, Arnaud, De Freitas, Nando, ... An introduction to sequential Monte Carlo methods](https://reader031.vdocuments.us/reader031/viewer/2022022502/5aabe3a77f8b9a2e088c68dd/html5/thumbnails/30.jpg)
30
A generic SMC algorithm
From (Holenstein, 2009)
![Page 31: An introduction to Sequential Monte Carlomlg.eng.cam.ac.uk/thang/docs/talks/rcc_smc.pdf · Doucet, Arnaud, De Freitas, Nando, ... An introduction to sequential Monte Carlo methods](https://reader031.vdocuments.us/reader031/viewer/2022022502/5aabe3a77f8b9a2e088c68dd/html5/thumbnails/31.jpg)
31
A generic SMC algorithm(Holenstein, 2009)
I Again we can calculate empirical estimates for target andthe normalization constant (π(x) = Z−1γ(x))
πN(x) =N∑
i=1
δx(i)p(x)W(i)
p
ZN(x) =p∏
n=1
(1N
N∑i=1
wn(X(i)n )
)
I Convergence can be shown under weak assumptions
πN(x) a.s.−→ π(x) as N →∞ZN a.s.−→ Z as N →∞
I The SIR algorithm for state space models is a special caseof this generic SMC algorithm
![Page 32: An introduction to Sequential Monte Carlomlg.eng.cam.ac.uk/thang/docs/talks/rcc_smc.pdf · Doucet, Arnaud, De Freitas, Nando, ... An introduction to sequential Monte Carlo methods](https://reader031.vdocuments.us/reader031/viewer/2022022502/5aabe3a77f8b9a2e088c68dd/html5/thumbnails/32.jpg)
32
Motivation for Particle MCMC(Holenstein & Doucet, 2007; Holenstein, 2009)
I Let’s return to the problem om sampling from a target
π(x), where x = (x1, x2, . . . , xn) ∈ X n
using MCMCI Single-site proposal q(x′j|x)
I Easy to designI Often leads to slow mixing
I It would be more efficient, if we could update larger blocksI Such proposals are harder to construct
I We could use SMC as a proposal distribution!
![Page 33: An introduction to Sequential Monte Carlomlg.eng.cam.ac.uk/thang/docs/talks/rcc_smc.pdf · Doucet, Arnaud, De Freitas, Nando, ... An introduction to sequential Monte Carlo methods](https://reader031.vdocuments.us/reader031/viewer/2022022502/5aabe3a77f8b9a2e088c68dd/html5/thumbnails/33.jpg)
33
Particle Metropolis Hastings Sampler
From (Holenstein, 2009)
![Page 34: An introduction to Sequential Monte Carlomlg.eng.cam.ac.uk/thang/docs/talks/rcc_smc.pdf · Doucet, Arnaud, De Freitas, Nando, ... An introduction to sequential Monte Carlo methods](https://reader031.vdocuments.us/reader031/viewer/2022022502/5aabe3a77f8b9a2e088c68dd/html5/thumbnails/34.jpg)
34
Particle Metropolis Hastings (PMH) Sampler(Holenstein, 2009)
I Standard independent MH algorithm (q(x′|x) = q(x′))I Target πN and proposal qN defined on an extended space
πN(·)qN(·) =
ZN
Z
which leads to the acceptance ratio
α = min
[1,
ZN,∗
ZN(i− 1)
]I Note that α→ 1 as N →∞, since ZN → Z as N →∞
![Page 35: An introduction to Sequential Monte Carlomlg.eng.cam.ac.uk/thang/docs/talks/rcc_smc.pdf · Doucet, Arnaud, De Freitas, Nando, ... An introduction to sequential Monte Carlo methods](https://reader031.vdocuments.us/reader031/viewer/2022022502/5aabe3a77f8b9a2e088c68dd/html5/thumbnails/35.jpg)
35
Particle Gibbs (PG) Sampler(Holenstein, 2009)
I Assume that we are interested in sampling from
π(θ, x) =γ(θ, x)
Z
I Assume that sampling formI π(θ|x) is easyI π(x|θ) is hard
I The PG Sampler uses SMC to sample from π(x|θ)1. Sample θ(i) ∼ π(θ|x(i− 1))2. Sample x(i) ∼ πN(x|θ(i))
I If sampling from π(θ|x) is not easy?I We can use a MH update for θ
![Page 36: An introduction to Sequential Monte Carlomlg.eng.cam.ac.uk/thang/docs/talks/rcc_smc.pdf · Doucet, Arnaud, De Freitas, Nando, ... An introduction to sequential Monte Carlo methods](https://reader031.vdocuments.us/reader031/viewer/2022022502/5aabe3a77f8b9a2e088c68dd/html5/thumbnails/36.jpg)
36
Parameter estimation a state space models using PG(Andrieu et al. , 2010)
I (Re)consider the non-linear state space model
xt =12
xt−1 + 25xt−1
1 + x2t−1
+ 8 cos(1.2t) + Vt
yt =x2
t
20+ Wt
where x0 ∼ N (0, σ20), Vt ∼ N (0, σ2
V) and Wt ∼ N (0, σ2W)
I Assume that θ = (σ2V , σ
2W) is unknown
I Simulate y1:T for T = 500, σ20 = 5, σ2
V = 10 and σ2W = 1
I Sample from P(θ, x1:t|y1:t) usingI Particle Gibbs sampler, with importance dist. fθ(xn|xn−1)I One-at-a-time MH sampler, with proposal fθ(xn|xn−1)
I The algorithms ran for 50,000 iterations (burn-in of 10,000)I Vague inverse-Gamma priors for θ = (σ2
V , σ2W)
![Page 37: An introduction to Sequential Monte Carlomlg.eng.cam.ac.uk/thang/docs/talks/rcc_smc.pdf · Doucet, Arnaud, De Freitas, Nando, ... An introduction to sequential Monte Carlo methods](https://reader031.vdocuments.us/reader031/viewer/2022022502/5aabe3a77f8b9a2e088c68dd/html5/thumbnails/37.jpg)
37
Parameter estimation a state space models using PG(Andrieu et al. , 2010)
One-at-a-time Metropolis Hastings
Particle Gibbs Sampler
![Page 38: An introduction to Sequential Monte Carlomlg.eng.cam.ac.uk/thang/docs/talks/rcc_smc.pdf · Doucet, Arnaud, De Freitas, Nando, ... An introduction to sequential Monte Carlo methods](https://reader031.vdocuments.us/reader031/viewer/2022022502/5aabe3a77f8b9a2e088c68dd/html5/thumbnails/38.jpg)
38
Particle learning for GP regression – motivation
Training a GP using data: D1:n = {(x1, y1), · · · , (xn, yn)} andmake prediction:
P(y∗|θ,D, x∗) (2)
or
P(y∗|D, x∗) =∫
P(y∗|θ,D, x∗)P(θ|D)dθ (3)
I Estimate model hyperparameters θn using ML (2) or usesampling to find the posterior distribution (3)
I Find the inverse of the covariance matrix K−1n .
I Computational cost O(n3).
![Page 39: An introduction to Sequential Monte Carlomlg.eng.cam.ac.uk/thang/docs/talks/rcc_smc.pdf · Doucet, Arnaud, De Freitas, Nando, ... An introduction to sequential Monte Carlo methods](https://reader031.vdocuments.us/reader031/viewer/2022022502/5aabe3a77f8b9a2e088c68dd/html5/thumbnails/39.jpg)
39
Sequential update
Given a new observation pair (xn+1, yn+1) that we want to use inour training set, need to find K−1
n+1 and re-estimatehyperparameters θn+1.
I a naive implementation costs O(n3)
I need an efficient approach that makes use of thesequential nature of data.
![Page 40: An introduction to Sequential Monte Carlomlg.eng.cam.ac.uk/thang/docs/talks/rcc_smc.pdf · Doucet, Arnaud, De Freitas, Nando, ... An introduction to sequential Monte Carlo methods](https://reader031.vdocuments.us/reader031/viewer/2022022502/5aabe3a77f8b9a2e088c68dd/html5/thumbnails/40.jpg)
40
Particle learning for GP regression(Gramacy & Polson, 2011; Wilkinson, 2014)
Sufficient information for each particle S(i)n = {K(i)n , θ
(i)n }
Two-step update based on:
P(Sn+1|D1:n+1) =∫
P(Sn+1|Sn,Dn+1)P(Sn|D1:n+1)dSn
∝∫
P(Sn+1|Sn,Dn+1)P(Dn+1|Sn)P(Sn|D1:n)dSn
1. Resample indices {i}Ni=1 with replacement to obtain new
indices {ζ(i)}Ni=1 according to weights
wi ∼ P(Dn+1|S(i)n ) = P(yn+1|xn+1,Dn, θ(i)n )
2. Propagate sufficient information from Sn to Sn+1
S(i)n+1 ∼ P(Sn+1|Sζ(i)n ,D1:n+1)
![Page 41: An introduction to Sequential Monte Carlomlg.eng.cam.ac.uk/thang/docs/talks/rcc_smc.pdf · Doucet, Arnaud, De Freitas, Nando, ... An introduction to sequential Monte Carlo methods](https://reader031.vdocuments.us/reader031/viewer/2022022502/5aabe3a77f8b9a2e088c68dd/html5/thumbnails/41.jpg)
41
PropagationI Parameters θn are static and can be deterministically
copied from Sζ(i)n to S(i)n+1.I Covariance matrix rank-one update to build K−1
n+1 from K−1n :
Kn+1 =
[Kn k(xn+1)
k>(xn+1) k(xn+1, xn+1)
]then
K−1n+1 =
[K−1
n + gn(xn+1)g>n (xn+1)/µn(xn+1) gn(xn+1)g>n (xn+1) µn(xn+1)
]where
gn(x) = −µ(x)K−1n k(x)
µn(x) = [k(x, x)− k>(x)K−1n k(x)]−1
I Use Cholesky update for stabilityI Cost: O(n2)
![Page 42: An introduction to Sequential Monte Carlomlg.eng.cam.ac.uk/thang/docs/talks/rcc_smc.pdf · Doucet, Arnaud, De Freitas, Nando, ... An introduction to sequential Monte Carlo methods](https://reader031.vdocuments.us/reader031/viewer/2022022502/5aabe3a77f8b9a2e088c68dd/html5/thumbnails/42.jpg)
42
Illustrative result 1 - Prediction
From (Gramacy & Polson, 2011)
![Page 43: An introduction to Sequential Monte Carlomlg.eng.cam.ac.uk/thang/docs/talks/rcc_smc.pdf · Doucet, Arnaud, De Freitas, Nando, ... An introduction to sequential Monte Carlo methods](https://reader031.vdocuments.us/reader031/viewer/2022022502/5aabe3a77f8b9a2e088c68dd/html5/thumbnails/43.jpg)
43
Illustrative result 2 - Particle locations
From (Gramacy & Polson, 2011)
![Page 44: An introduction to Sequential Monte Carlomlg.eng.cam.ac.uk/thang/docs/talks/rcc_smc.pdf · Doucet, Arnaud, De Freitas, Nando, ... An introduction to sequential Monte Carlo methods](https://reader031.vdocuments.us/reader031/viewer/2022022502/5aabe3a77f8b9a2e088c68dd/html5/thumbnails/44.jpg)
44
SMC for learning GP models
Advantages:I Fast for sequential learning problems
Disadvantages:I Particle degeneracy/depletion
I Use MCMC sampler to augment the propagate andrejuvenate the particles after m sequential updates.
I The predictive distribution given model hyperparametersneeds to be analytically tractable [See resample step]
Similar treatment for classification can be found in (Gramacy &Polson, 2011).
![Page 45: An introduction to Sequential Monte Carlomlg.eng.cam.ac.uk/thang/docs/talks/rcc_smc.pdf · Doucet, Arnaud, De Freitas, Nando, ... An introduction to sequential Monte Carlo methods](https://reader031.vdocuments.us/reader031/viewer/2022022502/5aabe3a77f8b9a2e088c68dd/html5/thumbnails/45.jpg)
45
Summary
I SMC is a powerful method for sampling from distributionswith sequential nature
I Online learning in state space modelsI Sample from high dimensional distributionsI As proposal distribution in MCMC
I We presented two concrete examples of using SMCI Particle Gibbs for sampling from the posterior distribtuions
of the parameters in a non-linear state space modelI Particle learning of the hyperparameters in a GP model
I Thank you for your attention!