i)user.it.uu.se/~thosc112/lecture3.pdf · [email protected], www: user.it.uu.se/ ~ thosc112 2015. 3 ......
TRANSCRIPT
Nonlinear system identification usingsequential Monte Carlo methods
Part 3 – Identification strategy 1 (Marginalization)
Thomas SchonDivision of Systems and ControlDepartment of Information TechnologyUppsala University.
Email: [email protected],www: user.it.uu.se/~thosc112
Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015.
Outline – Part 3
Aim: Show how SMC can be used to implementidentification strategy 1 – Marginalization.
1. Summary of Part 2
2. Where are we?
3. Identification strategy 1 – Marginalization
4. Maximum likelihood via Fisher’s identity
5. MCMC again
6. Particle Metropolis Hastings for Bayesian identification
a) Pseudo-marginal MCMCb) Particle MCMC (PMCMC) idea – “exact approximation”
1 / 21 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015.
Summary of Part 2 (I/III)
Algorithm 1 Importance sampler (IS)
1. Sample xi ∼ q(x).2. Compute the weights w(xi) = γ(xi)/q(xi).3. Normalize the weights wi = w(xi)/
∑Nj=1w(xj).
Sampling from a user-chosen proposal distribution q iscorrected for by the weights, which accounts for thediscrepancy between the proposal q and the target π.
2 / 21 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015.
Summary of Part 2 (II/III)
Problem formulation: Sample sequentially from a se-quence of target distributions πt(x1:t)t≥1 of increasing di-mension, where
πt(x1:t) =γt(x1:t)
Zt,
such that γt(xt) : Xt → R+ is known pointwise and Zt =∫π(x1:t)dx1:t is computationally challenging.
So far we have seen that this formulation includes nonlinear SSMs,
πt(x1:t) = p(x1:t | y1:t), γt(x1:t) = p(x1:t, y1:t), Zt = p(y1:t),
3 / 21 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015.
Summary of Part 2 (III/III)
SMC = resampling + sequential importance sampling
1. Resampling: P(ait = j
)= wjt−1/
∑l w
lt−1.
2. Propagation: xit ∼ fθ(xt |xait1:t−1) and xi1:t = xa
it
1:t−1, xit.
3. Weighting: wit = Wt(xit) = gθ(yt |xt).
The ancestor indices aitNi=1 are very useful auxiliary variables!They make the stochasticity of the resampling step explicit.
4 / 21 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015.
Weighting Resampling Propagation Weighting Resampling
Where are we?
Part 1 – Introduce the problem and the strategies (Section 1–4)
a) Strategies for identification of nonlinear SSMs
b) Strategies for state estimation in nonlinear SSMs
Part 2 – Solving nonlinear state estimation using SMC (Section 5)
a) Introduce importance sampling
b) Derive the bootstrap particle filter
Part 3 – Strategy 1 Marginalization (Section 6)
a) Maximum likelihood via Fisher’s identity
b) Particle Metropolis Hastings for Bayesian identification
Part 4 – Strategy 2 Data augmentation (Section 7–8)
a) Expectation Maximization and Gibbs sampling
b) Sampling state trajectories using Markov kernels
c) Some of our current research activities (if there is time)
5 / 21 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015.
Where are we?
Part 1 – We introduced general strategies for identification ofnonlinear and non-Gaussian SSMs.
Part 2 – We saw how SMC can be used to approximate:
• filtering distribution p(xt | y1:t).
• smoothing distributions, e.g. p(xt | y1:T ), p(x1:T | y1:T ).
• likelihood pθ(y1:T ).
• etc.
It is time to put the two together, i.e. use SMC toimplement the strategies for identification in nonlinear
and non-Gaussian SSMs.
6 / 21 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015.
Identification strategy – marginalization
Marginalization (integrate out a variable)
p(a) =
∫p(a, b)db
Deal with the states x1:T by integrating them out and view θ asthe only unknown quantity of interest,
pθ(y1:T ) =
∫pθ(x1:T , y1:T )dx1:T =
T∏
t=1
∫pθ(yt, xt | y1:t−1)dxt
We are averaging pθ(x1:T , y1:T ) over all possible state sequences.
Frequentistic formulation: prediction error method (PEM) anddirect maximization of the likelihood.
Bayesian formulation: Metropolis Hastings algorithm.7 / 21 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015.
Marginalization 1 – Direct optimization
1. Direct optimization work directly with the optimizationproblem
θML = arg maxθ∈Θ
T∏
t=1
∫gθ(yt |xt)pθ(xt | y1:t−1)dxt.
Cannot be solved in closed form, use iterative numerical methods
θk+1 = θk + αksk.
The search direction is typically computed according to
sk = Hkgk, gk = ∇θpθ(y1:T )∣∣θ=θk
.
SMC used to approximate the cost function and its derivative(s).
8 / 21 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015.
MCMC – again and a bit more details
Bayesian formulation – model the unknown parameters as arandom variable θ ∼ p(θ) and compute the posterior distribution
p(θ | y1:T ) =p(y1:T | θ)p(θ)
p(y1:T )=pθ(y1:T )p(θ)
p(y1:T ).
MCMC offers a constructive way of doing this.
9 / 21 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015.
Micro: MCMC – AR(1) example (I/II)
Let us play the game where I ask you to generate samples from
πs(z) = N(z∣∣ 0, 1/(1− 0.82)
).
One realisation from z[t+ 1] = 0.8z[t] + v[t] where v[t] ∼ N (0, 1).Initialise in z[0] = −40.
0 100 200 300 400 500−40
−35
−30
−25
−20
−15
−10
−5
0
5
10
Time
x
This will eventuallygenerate samples from thefollowing stationarydistribution:
πs(z) = N(z
∣∣∣∣ 0,1
1− 0.82
)
as t→∞.
10 / 21 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015.
Micro: MCMC – AR(1) example (II/II)
−8 −6 −4 −2 0 2 4 6 80
0.05
0.1
0.15
0.2
0.25
0.3
1 000 samples
−8 −6 −4 −2 0 2 4 6 80
0.05
0.1
0.15
0.2
0.25
100 000 samples
The true stationary distribution is showed in black and theempirical histogram obtained by simulating the Markov chainz[t+ 1] = 0.8z[t] + v[t] is plotted in gray.
The initial 1 000 samples are discarded (burn-in).
11 / 21 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015.
MCMC again
The idea behind MCMC is to simulate a Markov chain θ[m]m≥1.
The chain is constructed such that its stationary distributioncoincides with the target distribution, here p(θ | y1:T ).
If the chain is ergodic the sample path from the chain can beused to approximate expectations w.r.t. to the target distribution:
1
M − k + 1
M∑
m=k
ϕ(θ[m])a.s.−→
∫ϕ(θ)p(θ | y1:T )dθ,
as M →∞.
Need constructive strategies for constructing the Markov chainwith the desired properties.
12 / 21 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015.
Outline – Part 3
Aim: Show how SMC can be used to implementidentification strategy 1 – marginalization.
1. Summary of Part 2
2. Where are we?
3. Identification strategy: Marginalization
4. Maximum likelihood via Fisher’s identity
5. MCMC again
6. Particle Metropolis Hastings for Bayesian identification
a) Pseudo-marginal MCMCb) Particle MCMC (PMCMC) idea – “exact approximation”
13 / 21 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015.
Using unbiased likelihood within MH
Fact (non-trivial): SMC produce an unbiased estimate of thelikelihood,
pθ(y1:T ) = pθ(y1)T∏
t=2
pθ(yt | y1:t−1) =T∏
t=1
(1
N
N∑
i=1
wit
).
Intuitive idea: What about using this estimate within MH?!
14 / 21 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015.
The extended target distribution
Introduce an auxiliary variable
u = (x1:T ,a2:T ), u ∼ ψθ(u | y1:T ).
Note that,
p(θ, u | y1:T ) = ψθ(u | y1:T )p(θ | y1:T ) =pθ(y1:T )ψθ(u | y1:T )p(θ)
p(y1:T )
has the original target distribution p(θ | y1:T ) as a marginal.Non-trivial construction: Consider the following extendedtarget distribution
φ(θ, u | y1:T ) =pθ,u(y1:T )ψθ(u | y1:T )p(θ)
p(y1:T ),
defined on Θ× XNT × 1, . . . , NN(T−1).
15 / 21 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015.
MH construction
We can now set up a standard MH algorithm that operates in thenon-standard extended space Θ× XNT × 1, . . . , NN(T−1).
The resulting algorithm will generate samples asymptotically fromp(θ | y1:T ) despite the fact that we employ an approximatelikelihood!
To understand this let us marginalize φ(θ, u | y1:T ) w.r.t. theauxiliary variables u.
16 / 21 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015.
Marginalization
Marginalize (recall strategy) out the auxiliary variables u
∫φ(θ, u | y1:T )du =
p(θ)
p(y1:T )
∫pθ,u(y1:T )ψθ(u | y1:T )du.
What can we do about the integral?
SMC produce an unbiased estimate of pθ,u(y1:T )
Eu | θ [pθ,u(y1:T )] =
∫pθ,u(y1:T )ψθ(u | y1:T )du = pθ(y1:T ),
Result: p(θ | y1:T ) is recovered exactly as the marginal of theextended target distribution φ(θ, u | y1:T ), despite the fact that weemploy an SMC approximation of the likelihood using a finitenumber of particles N .
17 / 21 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015.
Particle Metropolis Hastings (PMH)
An interpretation is that using the likelihood estimate from the PFdoes change the marginal w.r.t. u, but it does not change themarginal w.r.t. θ.
This explains the slightly awkward name exact approximation forthese algorithms.
18 / 21 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015.
Particle Metropolis Hastings (PMH)
Based on the current sample (θ[m], u[m]) a new sample (θ′, u′) isproposed according to
θ′ ∼ q(· | θ[m], u[m]), u′ ∼ ψθ′(· | y1:T ).
The probability of accepting this sample is given by
α = min
(1,
pθ′,u′(y1:T )p(θ′)pθ[m],u[m](y1:T )p(θ[m])
q(θ[m] | θ′, u′)q(θ′ | θ[m], u[m])
).
Note: Importantly, α does not require evaluation of ψθ′(u | y1:T )!
Originally appeared in (different derivation)Christophe Andrieu, Arnaud Doucet and Roman Holenstein, Particle Markov chain Monte Carlo methods, Journalof the Royal Statistical Society: Series B, 72:269-342, 2010.
and further developed in,Johan Dahlin, Fredrik Lindsten and Thomas B. Schon, Particle Metropolis Hastings using gradient and Hessianinformation, Statistics and Computing, 25(1):81-92, 2015.
19 / 21 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015.
Pseudo-marginal algorithms
The derivation we gave for PMH above is based on a neat ideacalled the pseudo-marginal approach.Christophe Andrieu and Gareth O. Roberts, The pseudo-marginal approach for efficient Monte Carlocomputations, The Annals of Statistics, 37(2):697-725, 2009.
We have recently used this construction also to solve the renderingproblem in computer graphics.
Online Submission ID: 45678
Pseudo-marginal Metropolis Light Transport
(a) Pseudo-marginal ERPT using ratio tracking (b) ERPT with ray marching (c) PSSMLT with ratio tracking
Figure 1: Equal time renderings of a refractive glass cube containing isotropic heterogenous participating media and a diffuse torus. a) Ourmethod using ERPT [Cline et al. 2005] with ratio tracking [Novak et al. 2014] to obtain an unbiased estimate of the transmission. b) Sameas a) but with deterministic ray-marching used to evaluate the transmission. c) PSSMLT [Kelemen et al. 2002] with ratio tracking.
1 Introduction1
Accurate and efficient simulation of light transport in heteroge-2
neous participating media, such as smoke, clouds and fire, plays3
a key role in the synthesis of visually interesting renderings for e.g.4
visual effects, computer games and product visualization. Simi-5
lar problems also has important applications in other fields such as6
neutron transport and medical radiation dosimetry. While signifi-7
cant progress has been made towards robust and efficient algorithms8
for rendering heterogeneous participating media, scenes containing9
highly anisotropic phase functions, glossy surfaces and complex10
visibility still remain challenging.11
Modern global illumination rendering is based on Monte Carlo12
methods computing averages of stochastically sampled light paths13
connecting the sensor with light sources in the scene. The efficiency14
of these algorithms is directly dependent on the strategy used in15
the light path sampling. For complex scenes, the Metropolis light16
transport (MLT), [Veach and Guibas 1997], algorithms provide a17
powerful framework for sampling the path space. These, Markov18
chain Monte Carlo (MCMC), methods sample paths according to19
some target distribution, π(x), proportional to their contribution20
to the image. However, in many applications the target distribu-21
tion, π(x), cannot be evaluated exactly. A common scenario is of22
rendering scenes with heterogenous participating media, where, in23
general, computing the transmittance along a path is analytically24
intractable. This limits direct use of MLT, as the algorithm depends25
on Metropolis-Hastings (MH) acceptance probabilities which can-26
not be evaluated in closed form. The same problem occur for other27
rendering methods relying on MCMC sampling, such as Energy28
Redistribution path tracing (ERPT) [Cline et al. 2005]. Previous29
work, [Pauly et al. 2000], proposed to substitute the exact quantity30
of interest, π(x), with a biased approximation, π(x), in the compu-31
tation of the MH probability. However, a direct application of this32
method does not to lead to a consistent estimator.33
Our main contribution is a new sampling strategy for MCMC meth-34
ods, e.g. MLT and ERPT, for rendering of scenes with heterogenous35
participating media. Specifically, we show that a positive and un-36
biased estimator of the target distribution E [π(x)] = π(x) can37
replace the exact quantity to simulate a Markov Chain with a sta-38
tionary distribution that has a marginal that is the exact target dis-39
tribution of interest. This enables us to evaluate the transmittance40
function with recent unbiased estimators, [Novak et al. 2014], lead-41
ing to significantly shorter rendering times. Our approach is based42
on pseudo-marginal MCMC methods developed for Bayesian in-43
ference with intractable likelihoods, [Beaumont 2003; Andrieu and44
Roberts 2009]. Compared to previous work (relying on biased ray-45
marching for evaluating transmittance), our method enables simu-46
lation of longer Markov chains and a better exploration of the path47
space in a given computational budget.48
2 Background and related work49
2.1 Path Integral Formulation50
Using the path integral formulation of light transport [Veach 1997;Pauly et al. 2000; Wenzel 2013] the j-th pixel measurement Ij isexpressed as an integral over the space, Ω, of all valid light transportpaths of all lengths:
Ij =
∫
Ω
f(x)Wjdµ(x), with dµ(x) =k∏
i=0
dµ(xi), (1)
where a valid path of length k ∈ (0,∞), x = x0, ..., xk con-nects a light source to the sensor through a series of scatteringvertices, xi, located on surfaces or in participating media. HereWj = Wj(xk−1 → xk) denotes the pixel sensitivity/filter at thesensor. The path measure, dµ(x), is a product measure over themeasures of the path vertices, where path vertices on surfaces cor-respond to area integration, dµ(xi) = dA(xi), and vertices in amedia correspond to volume integration, dµ(xi) = dV (xi). Themeasurement contribution function, fj(x), is defined by
fj(x) = Le
[k−1∏
i=0
G(xi, xi+1)Tr(xi, xi+1)
][k−1∏
i=1
ρ(xi)
]. (2)
where Le = Le(x0 → x1) is the emitted radiance at the light51
source, ρ(xi) denotes scattering functions defined at path vertices,52
G(xi, xi+1) is the geometry term, and Tr(xi, xi+1) is the transmit-53
tance defined on the segments between path vertices. For a detailed54
description of these terms see e.g. [Wenzel 2013].55
1
We (joint work with Joel Kronander and Jonas Unger) will submitthe corresponding paper on Wednesday.
20 / 21 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015.
Using SMC within MCMC (PMCMC)
Particle MCMC (PMCMC) is a systematic way of combining SMCand MCMC.
The extended target construction φ(·) is at the core of all PMCMCconstructions.
Intuitively: SMC is used as a high-dimensional proposalmechanism on the space of state trajectories XT .
A bit more precise: Construct a Markov chain with p(θ | y1:T )(or p(θ, x1:T | y1:T )) as its stationary distribution.
Pioneered by the workChristophe Andrieu, Arnaud Doucet and Roman Holenstein, Particle Markov chain Monte Carlo methods, Journalof the Royal Statistical Society: Series B, 72:269-342, 2010.
21 / 21 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015.