i)user.it.uu.se/~thosc112/lecture3.pdf · [email protected], www: user.it.uu.se/ ~ thosc112 2015. 3 ......

6
Nonlinear system identification using sequential Monte Carlo methods Part 3 – Identification strategy 1 (Marginalization) ThomasSch¨on Division of Systems and Control Department of Information Technology Uppsala University. Email: [email protected], www: user.it.uu.se/ ~ thosc112 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015. Outline – Part 3 Aim: Show how SMC can be used to implement identification strategy 1 – Marginalization. 1. Summary of Part 2 2. Where are we? 3. Identification strategy 1 – Marginalization 4. Maximum likelihood via Fisher’s identity 5. MCMC again 6. Particle Metropolis Hastings for Bayesian identification a) Pseudo-marginal MCMC b) Particle MCMC (PMCMC) idea – “exact approximation” 1 / 21 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015. Summary of Part 2 (I/III) Algorithm 1 Importance sampler (IS) 1. Sample x i q(x). 2. Compute the weights w(x i )= γ (x i )/q(x i ). 3. Normalize the weights w i = w(x i )/ N j =1 w(x j ). Sampling from a user-chosen proposal distribution q is corrected for by the weights, which accounts for the discrepancy between the proposal q and the target π. 2 / 21 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015. Summary of Part 2 (II/III) Problem formulation: Sample sequentially from a se- quence of target distributions {π t (x 1:t )} t1 of increasing di- mension, where π t (x 1:t )= γ t (x 1:t ) Z t , such that γ t (x t ): X t R + is known pointwise and Z t = π(x 1:t )dx 1:t is computationally challenging. So far we have seen that this formulation includes nonlinear SSMs, π t (x 1:t )= p(x 1:t | y 1:t ), γ t (x 1:t )= p(x 1:t ,y 1:t ), Z t = p(y 1:t ), 3 / 21 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015.

Upload: truongkhanh

Post on 18-Jul-2019

225 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: I)user.it.uu.se/~thosc112/lecture3.pdf · on@it.uu.se, www: user.it.uu.se/ ~ thosc112 2015. 3 ... stics 2009. rendering graphics. 45678 t ... 44 [2003;and 45 2009on biased ray-46

Nonlinear system identification usingsequential Monte Carlo methods

Part 3 – Identification strategy 1 (Marginalization)

Thomas SchonDivision of Systems and ControlDepartment of Information TechnologyUppsala University.

Email: [email protected],www: user.it.uu.se/~thosc112

Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015.

Outline – Part 3

Aim: Show how SMC can be used to implementidentification strategy 1 – Marginalization.

1. Summary of Part 2

2. Where are we?

3. Identification strategy 1 – Marginalization

4. Maximum likelihood via Fisher’s identity

5. MCMC again

6. Particle Metropolis Hastings for Bayesian identification

a) Pseudo-marginal MCMCb) Particle MCMC (PMCMC) idea – “exact approximation”

1 / 21 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015.

Summary of Part 2 (I/III)

Algorithm 1 Importance sampler (IS)

1. Sample xi ∼ q(x).2. Compute the weights w(xi) = γ(xi)/q(xi).3. Normalize the weights wi = w(xi)/

∑Nj=1w(xj).

Sampling from a user-chosen proposal distribution q iscorrected for by the weights, which accounts for thediscrepancy between the proposal q and the target π.

2 / 21 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015.

Summary of Part 2 (II/III)

Problem formulation: Sample sequentially from a se-quence of target distributions πt(x1:t)t≥1 of increasing di-mension, where

πt(x1:t) =γt(x1:t)

Zt,

such that γt(xt) : Xt → R+ is known pointwise and Zt =∫π(x1:t)dx1:t is computationally challenging.

So far we have seen that this formulation includes nonlinear SSMs,

πt(x1:t) = p(x1:t | y1:t), γt(x1:t) = p(x1:t, y1:t), Zt = p(y1:t),

3 / 21 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015.

Page 2: I)user.it.uu.se/~thosc112/lecture3.pdf · on@it.uu.se, www: user.it.uu.se/ ~ thosc112 2015. 3 ... stics 2009. rendering graphics. 45678 t ... 44 [2003;and 45 2009on biased ray-46

Summary of Part 2 (III/III)

SMC = resampling + sequential importance sampling

1. Resampling: P(ait = j

)= wjt−1/

∑l w

lt−1.

2. Propagation: xit ∼ fθ(xt |xait1:t−1) and xi1:t = xa

it

1:t−1, xit.

3. Weighting: wit = Wt(xit) = gθ(yt |xt).

The ancestor indices aitNi=1 are very useful auxiliary variables!They make the stochasticity of the resampling step explicit.

4 / 21 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015.

Weighting Resampling Propagation Weighting Resampling

Where are we?

Part 1 – Introduce the problem and the strategies (Section 1–4)

a) Strategies for identification of nonlinear SSMs

b) Strategies for state estimation in nonlinear SSMs

Part 2 – Solving nonlinear state estimation using SMC (Section 5)

a) Introduce importance sampling

b) Derive the bootstrap particle filter

Part 3 – Strategy 1 Marginalization (Section 6)

a) Maximum likelihood via Fisher’s identity

b) Particle Metropolis Hastings for Bayesian identification

Part 4 – Strategy 2 Data augmentation (Section 7–8)

a) Expectation Maximization and Gibbs sampling

b) Sampling state trajectories using Markov kernels

c) Some of our current research activities (if there is time)

5 / 21 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015.

Where are we?

Part 1 – We introduced general strategies for identification ofnonlinear and non-Gaussian SSMs.

Part 2 – We saw how SMC can be used to approximate:

• filtering distribution p(xt | y1:t).

• smoothing distributions, e.g. p(xt | y1:T ), p(x1:T | y1:T ).

• likelihood pθ(y1:T ).

• etc.

It is time to put the two together, i.e. use SMC toimplement the strategies for identification in nonlinear

and non-Gaussian SSMs.

6 / 21 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015.

Identification strategy – marginalization

Marginalization (integrate out a variable)

p(a) =

∫p(a, b)db

Deal with the states x1:T by integrating them out and view θ asthe only unknown quantity of interest,

pθ(y1:T ) =

∫pθ(x1:T , y1:T )dx1:T =

T∏

t=1

∫pθ(yt, xt | y1:t−1)dxt

We are averaging pθ(x1:T , y1:T ) over all possible state sequences.

Frequentistic formulation: prediction error method (PEM) anddirect maximization of the likelihood.

Bayesian formulation: Metropolis Hastings algorithm.7 / 21 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015.

Page 3: I)user.it.uu.se/~thosc112/lecture3.pdf · on@it.uu.se, www: user.it.uu.se/ ~ thosc112 2015. 3 ... stics 2009. rendering graphics. 45678 t ... 44 [2003;and 45 2009on biased ray-46

Marginalization 1 – Direct optimization

1. Direct optimization work directly with the optimizationproblem

θML = arg maxθ∈Θ

T∏

t=1

∫gθ(yt |xt)pθ(xt | y1:t−1)dxt.

Cannot be solved in closed form, use iterative numerical methods

θk+1 = θk + αksk.

The search direction is typically computed according to

sk = Hkgk, gk = ∇θpθ(y1:T )∣∣θ=θk

.

SMC used to approximate the cost function and its derivative(s).

8 / 21 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015.

MCMC – again and a bit more details

Bayesian formulation – model the unknown parameters as arandom variable θ ∼ p(θ) and compute the posterior distribution

p(θ | y1:T ) =p(y1:T | θ)p(θ)

p(y1:T )=pθ(y1:T )p(θ)

p(y1:T ).

MCMC offers a constructive way of doing this.

9 / 21 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015.

Micro: MCMC – AR(1) example (I/II)

Let us play the game where I ask you to generate samples from

πs(z) = N(z∣∣ 0, 1/(1− 0.82)

).

One realisation from z[t+ 1] = 0.8z[t] + v[t] where v[t] ∼ N (0, 1).Initialise in z[0] = −40.

0 100 200 300 400 500−40

−35

−30

−25

−20

−15

−10

−5

0

5

10

Time

x

This will eventuallygenerate samples from thefollowing stationarydistribution:

πs(z) = N(z

∣∣∣∣ 0,1

1− 0.82

)

as t→∞.

10 / 21 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015.

Micro: MCMC – AR(1) example (II/II)

−8 −6 −4 −2 0 2 4 6 80

0.05

0.1

0.15

0.2

0.25

0.3

1 000 samples

−8 −6 −4 −2 0 2 4 6 80

0.05

0.1

0.15

0.2

0.25

100 000 samples

The true stationary distribution is showed in black and theempirical histogram obtained by simulating the Markov chainz[t+ 1] = 0.8z[t] + v[t] is plotted in gray.

The initial 1 000 samples are discarded (burn-in).

11 / 21 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015.

Page 4: I)user.it.uu.se/~thosc112/lecture3.pdf · on@it.uu.se, www: user.it.uu.se/ ~ thosc112 2015. 3 ... stics 2009. rendering graphics. 45678 t ... 44 [2003;and 45 2009on biased ray-46

MCMC again

The idea behind MCMC is to simulate a Markov chain θ[m]m≥1.

The chain is constructed such that its stationary distributioncoincides with the target distribution, here p(θ | y1:T ).

If the chain is ergodic the sample path from the chain can beused to approximate expectations w.r.t. to the target distribution:

1

M − k + 1

M∑

m=k

ϕ(θ[m])a.s.−→

∫ϕ(θ)p(θ | y1:T )dθ,

as M →∞.

Need constructive strategies for constructing the Markov chainwith the desired properties.

12 / 21 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015.

Outline – Part 3

Aim: Show how SMC can be used to implementidentification strategy 1 – marginalization.

1. Summary of Part 2

2. Where are we?

3. Identification strategy: Marginalization

4. Maximum likelihood via Fisher’s identity

5. MCMC again

6. Particle Metropolis Hastings for Bayesian identification

a) Pseudo-marginal MCMCb) Particle MCMC (PMCMC) idea – “exact approximation”

13 / 21 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015.

Using unbiased likelihood within MH

Fact (non-trivial): SMC produce an unbiased estimate of thelikelihood,

pθ(y1:T ) = pθ(y1)T∏

t=2

pθ(yt | y1:t−1) =T∏

t=1

(1

N

N∑

i=1

wit

).

Intuitive idea: What about using this estimate within MH?!

14 / 21 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015.

The extended target distribution

Introduce an auxiliary variable

u = (x1:T ,a2:T ), u ∼ ψθ(u | y1:T ).

Note that,

p(θ, u | y1:T ) = ψθ(u | y1:T )p(θ | y1:T ) =pθ(y1:T )ψθ(u | y1:T )p(θ)

p(y1:T )

has the original target distribution p(θ | y1:T ) as a marginal.Non-trivial construction: Consider the following extendedtarget distribution

φ(θ, u | y1:T ) =pθ,u(y1:T )ψθ(u | y1:T )p(θ)

p(y1:T ),

defined on Θ× XNT × 1, . . . , NN(T−1).

15 / 21 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015.

Page 5: I)user.it.uu.se/~thosc112/lecture3.pdf · on@it.uu.se, www: user.it.uu.se/ ~ thosc112 2015. 3 ... stics 2009. rendering graphics. 45678 t ... 44 [2003;and 45 2009on biased ray-46

MH construction

We can now set up a standard MH algorithm that operates in thenon-standard extended space Θ× XNT × 1, . . . , NN(T−1).

The resulting algorithm will generate samples asymptotically fromp(θ | y1:T ) despite the fact that we employ an approximatelikelihood!

To understand this let us marginalize φ(θ, u | y1:T ) w.r.t. theauxiliary variables u.

16 / 21 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015.

Marginalization

Marginalize (recall strategy) out the auxiliary variables u

∫φ(θ, u | y1:T )du =

p(θ)

p(y1:T )

∫pθ,u(y1:T )ψθ(u | y1:T )du.

What can we do about the integral?

SMC produce an unbiased estimate of pθ,u(y1:T )

Eu | θ [pθ,u(y1:T )] =

∫pθ,u(y1:T )ψθ(u | y1:T )du = pθ(y1:T ),

Result: p(θ | y1:T ) is recovered exactly as the marginal of theextended target distribution φ(θ, u | y1:T ), despite the fact that weemploy an SMC approximation of the likelihood using a finitenumber of particles N .

17 / 21 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015.

Particle Metropolis Hastings (PMH)

An interpretation is that using the likelihood estimate from the PFdoes change the marginal w.r.t. u, but it does not change themarginal w.r.t. θ.

This explains the slightly awkward name exact approximation forthese algorithms.

18 / 21 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015.

Particle Metropolis Hastings (PMH)

Based on the current sample (θ[m], u[m]) a new sample (θ′, u′) isproposed according to

θ′ ∼ q(· | θ[m], u[m]), u′ ∼ ψθ′(· | y1:T ).

The probability of accepting this sample is given by

α = min

(1,

pθ′,u′(y1:T )p(θ′)pθ[m],u[m](y1:T )p(θ[m])

q(θ[m] | θ′, u′)q(θ′ | θ[m], u[m])

).

Note: Importantly, α does not require evaluation of ψθ′(u | y1:T )!

Originally appeared in (different derivation)Christophe Andrieu, Arnaud Doucet and Roman Holenstein, Particle Markov chain Monte Carlo methods, Journalof the Royal Statistical Society: Series B, 72:269-342, 2010.

and further developed in,Johan Dahlin, Fredrik Lindsten and Thomas B. Schon, Particle Metropolis Hastings using gradient and Hessianinformation, Statistics and Computing, 25(1):81-92, 2015.

19 / 21 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015.

Page 6: I)user.it.uu.se/~thosc112/lecture3.pdf · on@it.uu.se, www: user.it.uu.se/ ~ thosc112 2015. 3 ... stics 2009. rendering graphics. 45678 t ... 44 [2003;and 45 2009on biased ray-46

Pseudo-marginal algorithms

The derivation we gave for PMH above is based on a neat ideacalled the pseudo-marginal approach.Christophe Andrieu and Gareth O. Roberts, The pseudo-marginal approach for efficient Monte Carlocomputations, The Annals of Statistics, 37(2):697-725, 2009.

We have recently used this construction also to solve the renderingproblem in computer graphics.

Online Submission ID: 45678

Pseudo-marginal Metropolis Light Transport

(a) Pseudo-marginal ERPT using ratio tracking (b) ERPT with ray marching (c) PSSMLT with ratio tracking

Figure 1: Equal time renderings of a refractive glass cube containing isotropic heterogenous participating media and a diffuse torus. a) Ourmethod using ERPT [Cline et al. 2005] with ratio tracking [Novak et al. 2014] to obtain an unbiased estimate of the transmission. b) Sameas a) but with deterministic ray-marching used to evaluate the transmission. c) PSSMLT [Kelemen et al. 2002] with ratio tracking.

1 Introduction1

Accurate and efficient simulation of light transport in heteroge-2

neous participating media, such as smoke, clouds and fire, plays3

a key role in the synthesis of visually interesting renderings for e.g.4

visual effects, computer games and product visualization. Simi-5

lar problems also has important applications in other fields such as6

neutron transport and medical radiation dosimetry. While signifi-7

cant progress has been made towards robust and efficient algorithms8

for rendering heterogeneous participating media, scenes containing9

highly anisotropic phase functions, glossy surfaces and complex10

visibility still remain challenging.11

Modern global illumination rendering is based on Monte Carlo12

methods computing averages of stochastically sampled light paths13

connecting the sensor with light sources in the scene. The efficiency14

of these algorithms is directly dependent on the strategy used in15

the light path sampling. For complex scenes, the Metropolis light16

transport (MLT), [Veach and Guibas 1997], algorithms provide a17

powerful framework for sampling the path space. These, Markov18

chain Monte Carlo (MCMC), methods sample paths according to19

some target distribution, π(x), proportional to their contribution20

to the image. However, in many applications the target distribu-21

tion, π(x), cannot be evaluated exactly. A common scenario is of22

rendering scenes with heterogenous participating media, where, in23

general, computing the transmittance along a path is analytically24

intractable. This limits direct use of MLT, as the algorithm depends25

on Metropolis-Hastings (MH) acceptance probabilities which can-26

not be evaluated in closed form. The same problem occur for other27

rendering methods relying on MCMC sampling, such as Energy28

Redistribution path tracing (ERPT) [Cline et al. 2005]. Previous29

work, [Pauly et al. 2000], proposed to substitute the exact quantity30

of interest, π(x), with a biased approximation, π(x), in the compu-31

tation of the MH probability. However, a direct application of this32

method does not to lead to a consistent estimator.33

Our main contribution is a new sampling strategy for MCMC meth-34

ods, e.g. MLT and ERPT, for rendering of scenes with heterogenous35

participating media. Specifically, we show that a positive and un-36

biased estimator of the target distribution E [π(x)] = π(x) can37

replace the exact quantity to simulate a Markov Chain with a sta-38

tionary distribution that has a marginal that is the exact target dis-39

tribution of interest. This enables us to evaluate the transmittance40

function with recent unbiased estimators, [Novak et al. 2014], lead-41

ing to significantly shorter rendering times. Our approach is based42

on pseudo-marginal MCMC methods developed for Bayesian in-43

ference with intractable likelihoods, [Beaumont 2003; Andrieu and44

Roberts 2009]. Compared to previous work (relying on biased ray-45

marching for evaluating transmittance), our method enables simu-46

lation of longer Markov chains and a better exploration of the path47

space in a given computational budget.48

2 Background and related work49

2.1 Path Integral Formulation50

Using the path integral formulation of light transport [Veach 1997;Pauly et al. 2000; Wenzel 2013] the j-th pixel measurement Ij isexpressed as an integral over the space, Ω, of all valid light transportpaths of all lengths:

Ij =

Ω

f(x)Wjdµ(x), with dµ(x) =k∏

i=0

dµ(xi), (1)

where a valid path of length k ∈ (0,∞), x = x0, ..., xk con-nects a light source to the sensor through a series of scatteringvertices, xi, located on surfaces or in participating media. HereWj = Wj(xk−1 → xk) denotes the pixel sensitivity/filter at thesensor. The path measure, dµ(x), is a product measure over themeasures of the path vertices, where path vertices on surfaces cor-respond to area integration, dµ(xi) = dA(xi), and vertices in amedia correspond to volume integration, dµ(xi) = dV (xi). Themeasurement contribution function, fj(x), is defined by

fj(x) = Le

[k−1∏

i=0

G(xi, xi+1)Tr(xi, xi+1)

][k−1∏

i=1

ρ(xi)

]. (2)

where Le = Le(x0 → x1) is the emitted radiance at the light51

source, ρ(xi) denotes scattering functions defined at path vertices,52

G(xi, xi+1) is the geometry term, and Tr(xi, xi+1) is the transmit-53

tance defined on the segments between path vertices. For a detailed54

description of these terms see e.g. [Wenzel 2013].55

1

We (joint work with Joel Kronander and Jonas Unger) will submitthe corresponding paper on Wednesday.

20 / 21 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015.

Using SMC within MCMC (PMCMC)

Particle MCMC (PMCMC) is a systematic way of combining SMCand MCMC.

The extended target construction φ(·) is at the core of all PMCMCconstructions.

Intuitively: SMC is used as a high-dimensional proposalmechanism on the space of state trajectories XT .

A bit more precise: Construct a Markov chain with p(θ | y1:T )(or p(θ, x1:T | y1:T )) as its stationary distribution.

Pioneered by the workChristophe Andrieu, Arnaud Doucet and Roman Holenstein, Particle Markov chain Monte Carlo methods, Journalof the Royal Statistical Society: Series B, 72:269-342, 2010.

21 / 21 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015.