current limitations of sequential inference in general hidden markov models

Current limitations of sequential inference ingeneral hidden Markov models

Pierre JacobDepartment of Statistics, University of Oxford

March 5th

Pierre Jacob Sequential inference in HMM 1/ 60

Outline

1 Setting: online inference in time seriesHidden Markov ModelsImplicit modelsExact / sequential / online methods

2 Plug and play methodsApproximate Bayesian ComputationParticle Filters

3 SMC2 for sequential inferenceA sequential method for HMMNot online

4 Numerical experiments

5 Discussion


Hidden Markov Models

y2y1 yT

X2X0 X1 XT

y0

Figure : Graph representation of a general HMM.

(Xt): initial µθ, transition fθ. (Yt) given (Xt): measurement gθ.Prior on the parameter θ ∈ Θ.


Phytoplankton–Zooplankton

●●●

●

●●

●

●●●●●

●●

●

●●

●

●

●

●

●

●●●●

●

●

●

●

●●

●

●●

●●

●●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●●●●●

●

●

●

●

●

●

●●●

●

●

●

●

●

●●●●●●

●

●

●

●●●

●

●●

●●

●

●

●●

●

●

●●

●●

●

●

●●●●

●

●●

●

●

●

●●●●●

●

●

●

●

●

●●●●

●●●●

●●

●

●

●

●

●

●

●●

●

●●

●

●

●

●●●●

●●

●

●

●

●

●●

●

●●●

●

●

●

●●●●●●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●●●●●●●●

●●

●

●

●

●

●●

●●●

●

●

●

●

●●●

●

●

●

●

●●●●●

●●●●

●

●

●

●

●●●

●●●

●●

●

●

●

●

●

●●

●

●●

●●●●

●●

●

●●●●●●

●

●

●●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●●●

●

●

●

●●●●

●●●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●●

●●●●●●

●●

●

●

●

●

●

●●●●

●

●

0

10

20

30

40

0 100 200 300time

obse

rvat

ions

Figure : A time series of 365 observations generated according to aphytoplankton–zooplankton model.


General questions

For each model, how much do the data inform theparameters?

For each model, how much do the data inform the latentMarkov process?

How much do the data inform the choice of a model?

How to predict future observations?



Model choice:

P(M =M(m) | y0:t

)=

P(M =M(m)

)Z(m)

t∑Mm′=1 P

(M =M(m′))Z(m′)

t.

If we acknowledge model uncertainty, then more questions:∫Yφ(yt+k) P(dyt+k | y0:t)

=M∑

m=1

∫Θ(m)

∫Yφ(yt+k) p(dyt+k | y0:t , θ,M(m))

× πθ(m),t(dθ)P(M =M(m) | y0:t

).


Outline





5 Discussion


Phytoplankton–Zooplankton model

Hidden process (xt) = (αt , pt , zt).

At each (integer) time, αt ∼ N (µα, σ2α).

Given αt ,

dptdt

= αpt − cptzt ,

dztdt

= ecptzt −mlzt −mqz2t .

Observations: log yt ∼ N (log pt , σ2y).

Set c = 0.25 and e = 0.3, and (log p0, log z0) ∼ N (log 2, 0.2).

Unknown parameters: θ = (µα, σα, σy,ml ,mq).


Implicit models

Even simple, standard scientific models are such that theimplied probability distribution p(dx0:t | θ) admits a densityfunction that cannot be computed pointwise.

To cover as many models as possible, we can only assumethat the hidden process can be simulated.

This covers cases where xt = ψ(xt−1, k, v1:k), for some integerk, vector v1:k ∈ Rk , and deterministic function ψ.

Calls for “plug and play” methods.

Time series analysis via mechanistic models,Breto, He, Ionides and King, 2009.


Outline





5 Discussion


Exact methods

Consider the problem of estimating some quantity It .

Consider an estimator I Nt where N is a tuning parameter.

Hopefully N is such that I Nt

some sense−−−−−−→N→∞

It .

For instance E[(I Nt − It)2] goes to zero when N →∞.

Variational methods / Ensemble Kalman Filters are not exact.

Consider the estimator that always returns 29.5. . .


Sequential methods

Consider the problem of estimating some quantity It , for allt ≥ 0, e.g. upon the arrival of new data.

Assume the quantities It for all t ≥ 0 are related one to theother.

A sequential method “updates” the estimate I Nt into I N

t+1.

MCMC methods are not sequential: they have to be re-runfrom scratch whenever a new observation arrives.

Therefore, sequential methods are not to be confused withiterative methods.


Online methods

Consider the problem of estimating some quantity It , for allt ≥ 0, e.g. upon the arrival of new data.

A method is online if it provides estimates I Nt of It for all

t ≥ 0, such that. . .

. . . the computational cost of obtaining each I Nt given I N

t−1 isindependent of t,

. . . the precision of the estimate does not explode over time:

r(I Nt ) =

(E

[(I N

t − It)2])1/2

|It |

can be uniformly bounded over t.

Consider the estimator that always returns 29.5. . .


Outline





5 Discussion


Approximate Bayesian Computation

1 Draw θ from the prior distribution πθ.

2 Draw x0:t , a realisation of the hidden Markov chain given θ.

3 Draw y0:t , a realisation of the observations given x0:t and θ.

4 If D(y0:t , y0:t) ≤ ε, keep (θ, x0:t).


Approximate Bayesian Computation

Plug and play: only requires simulations from the model.

Exact if D is a distance and ε is zero.

In practice, D is typically not a distance.

The tolerance ε is often chosen implicitely.

E.g., ε is chosen so that 1% of the generated samples is kept.

Better than the 29.5 estimator?


Outline





5 Discussion


Sequential Monte Carlo for filtering

Objects of interest:

filtering distributions: p(xt |y0:t , θ), for all t, for a given θ,

likelihood: p(y0:t | θ) =∫

p(y0:t | x0:t , θ)p(x0:t | θ)dx0:t .

Particle filters:

propagate recursively Nx particles approximating p(xt | y0:t , θ)for all t,

give likelihood estimates pNx (y0:t | θ) of p(y0:t | θ) for all t.


Plug and play requirement

Particle filters can be implemented if

the hidden process can be simulated forward, given any θ:

x0 ∼ µθ and xt ∼ fθ(· | xt−1),

the measurement density gθ(y | x) can be evaluatedpoint-wise, for any x, y, θ.

A bit less “plug and play” than ABC.



y2

X2X0

y1

X1...

... yT

XT

θ



Consider I (φt) =∫φt(xt)p(xt | y0:t)dxt .

Lp-bound:

E[∣∣∣I N (φt)− I (φt)

∣∣∣p]1/p≤ c(p) ||φt ||∞√

N.

Central limit theorem:√

N(I N (φt)− I (φt)

) D−−−−→N→∞

N(0, σ2

t

).

where σ2t < σ2

max for all t.

Particle filters are fully online, plug and play, and exact. . . forfiltering.



Properties of the likelihood estimatorThe likelihood estimator is unbiased,

E[pNx (y0:t | θ)

]= E

[ t∏s=0

1Nx

Nx∑k=1

wks

]= p(y0:t | θ)

and the relative variance is bounded linearly in time,

V[

pNx (y0:t | θ)p(y0:t | θ)

]≤ C t

Nx

for some constant C (under some conditions!).

Particle filters are not online for likelihood estimation.


Outline





5 Discussion


SMC samplers

The goal is now to approximate sequentially

p(θ), p(θ|y0), . . . , p(θ|y0:T ).

Sequential Monte Carlo samplers.Jarzynski 1997, Neal 2001, Chopin 2002, Del Moral, Doucet& Jasra 2006. . .

Propagates a number Nθ of θ-particles approximatingp(θ | y0:t) for all t.

Evidence estimates pNθ (y0:t) ≈ p(y0:t) for all t.


Targets

p(θ|y1)

p(θ|y1, y2)

p(θ|y1, y2, y3)

p(θ)

Θ

dens

ity

Figure : Sequence of target distributions.


First step

p(θ|y1)

p(θ)

● ●●● ●● ● ●●● ●● ●● ●● ●● ●●●●● ●● ●●●●●● ●●● ● ●●● ●● ●●● ● ●● ●● ●●

Θ

dens

ity

Figure : First distribution in black, next distribution in red.


Importance Sampling

p(θ|y1)

p(θ)

● ●●● ●● ● ●●● ●● ●● ●● ●● ●●●●● ●● ●● ●●●● ● ●● ● ●●● ●● ●● ● ● ●● ●● ●●

Θ

dens

ity

Figure : Samples θ weighted by p(θ | y1)/p(θ) ∝ p(y1 | θ).


Resampling and move

p(θ|y1)

p(θ)

● ●●●● ●● ● ● ●●● ●● ●●● ●●● ●●●● ●●● ●● ●● ● ●●● ● ●●● ● ●● ●●● ●● ● ●●

Θ

dens

ity

Figure : Samples θ after resampling and MCMC move.


SMC samplers

1: Sample from the prior θ(m) ∼ p(·) for m ∈ [1,Nθ].2: Set ω(m) ← 1/Nθ.3: for t = 0 to T do4: Reweight ω(m) ← ω(m) × p(yt |y0:t−1, θ

(m)) for m ∈ [1,Nθ].5: if some degeneracy criterion is met then6: Resample the particles, reset the weights ω(m) ← 1/Nθ.7: MCMC move for each particle, targeting p(θ | y0:t).8: end if9: end for


Proposed method

SMC samplers require

pointwise evaluations of p(yt | y0:t−1, θ),

MCMC moves targeting each intermediate distribution.

For Hidden Markov models, the likelihood is intractable.

Particle filters provide likelihood approximations for a given θ.

Hence, we equip each θ-particle with its own particle filter.


One step of SMC2

For each θ-particle θ(m)t , perform one step of its particle filter:

to obtain pNx (yt+1 | y0:t , θ(m)t ) and reweight:

ω(m)t+1 = ω

(m)t × pNx (yt+1|y0:t , θ

(m)t ).


One step of SMC2

Whenever

Effective sample size =

(∑Nθm=1 ω

(m)t+1

)2

∑Nθm=1

(ω

(m)t+1

)2 < threshold×Nθ

(Kong, Liu & Wong, 1994)

resample the θ-particles and move them by PMCMC, i.e.

Propose θ⋆ ∼ q(·|θ(m)t ) and run PF(Nx , θ

⋆) for t + 1 steps.

Accept or not based on pNx (y0:t+1 | θ⋆).


SMC2

y2

X2X0

y1

X1...

... yT

XT

Θ


Exact approximation

SMC2 is a standard SMC sampler on an extended space, withtarget distribution:

πt(θ, x1:Nx0:t , a1:Nx

0:t−1) = p(θ|y0:t)

× 1N t+1

x

Nx∑n=1

p(xn0:t |θ, y0:t)

Nx∏i=1

i =hnt (1)

q0,θ(x i0)

×

t∏

s=1

Nx∏i=1

i =hnt (s)

W ais−1

s−1,θqs,θ(x is |x

ais−1

s−1 )

.

Related to pseudo-marginal and PMCMC methods.


Exact approximation

From the extended target representation, we obtainθ from p(θ | y1:t),xn

0:t from p(x0:t | θ, y1:t),thus allowing joint state and parameter inference.

Evidence estimates are obtained by computing the average ofthe θ-weights ω(m)

t .

The “extended target” argument yields consistency for anyfixed Nx , when Nθ goes to infinity.

Exact method, sequential by design, but not online.


Outline





5 Discussion


Scalability in T

Cost if MCMC move at each time step

A single move step at time t costs O (tNxNθ).

If move at every step, the total cost becomes O(t2NxNθ

).

If Nx = Ct, the total cost becomes O(t3Nθ

).

With adaptive resampling, the cost is only O(t2Nθ

). Why?


Scalability in T

512

1024

0 100 200 300time

ES

S

Figure : Effective Sample Size against time, for the PZ model.


Scalability in T

0e+00

2e+06

4e+06

6e+06

0 100 200 300time

Cum

ulat

ive

cost

per

par

ticle

Figure : Cumulative cost per θ-particle during one run of SMC2. Thecost is measured by the number of calls to the transition samplingfunction. Nx is fixed.


Scalability in T

iterations

ES

S

0

200

400

600

800

1000

1000 2000 3000 4000 5000

Figure : Effective Sample Size against time, for a linear Gaussian model.


Scalability in T

iteration

com

putin

g tim

e (s

quar

e ro

ot s

cale

)

2500

10000

22500

40000

1000 2000 3000 4000 5000

Figure :√

computing time against time. Nx is increased to achieve afixed acceptance rate in the PMCMC steps.


Scalability in T

Under Bernstein-Von Mises, the posterior becomes Gaussian.

p(θ|y1:ct)

p(θ|y1:t)

Θ

dens

ity

E[ESS ] from p(θ | y1:t) to p(θ | y1:ct) becomes independent of t.Hence resampling times occur geometrically: τk ≈ ck with c > 1.


Scalability in T

More formally. . .The expected ESS at time t + k, if the last resampling time was t,is related to

Vp(θ|y1:t)

[p(θ | y1:t+k)p(θ | y1:t)

]= Vp(θ|y1:t)

[L(θ; y1:t+k)L(θ; y1:t)

∫Θ L(θ; y1:t)p(dθ)∫

Θ L(θ; y1:t+k)p(dθ)

].

Then Laplace expansions of L yield similar results as before, underregularity conditions.


Scalability in T

Open problemOnline exact Bayesian inference in linear time?

On one hand dim(X0:t) = dim(X )× (t + 1) which grows . . .

. . . but θ itself is of fixed dimension and p(θ | y1:t) ≈ N (θ⋆, v⋆/t)!

Our specific problemMove steps at time t imply running a particle filter from time zero.

Attempts have been made at re-starting from t −∆ but then, bias.


Outline





5 Discussion


Phytoplankton–Zooplankton: model

Hidden process (xt) = (αt , pt , zt).

At each (integer) time, αt ∼ N (µα, σ2α).

Given αt ,

dptdt

= αpt − cptzt ,

dztdt

= ecptzt −mlzt −mqz2t .

Observations: log yt ∼ N (log pt , σ2y).

Set c = 0.25 and e = 0.3, and (log p0, log z0) ∼ N (log 2, 0.2).

Unknown parameters: θ = (µα, σα, σy,ml ,mq).


Phytoplankton–Zooplankton: observations

●●●

●

●●

●

●●●●●

●●

●

●●

●

●

●

●

●

●●●●

●

●

●

●

●●

●

●●

●●

●●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●●●●●

●

●

●

●

●

●

●●●

●

●

●

●

●

●●●●●●

●

●

●

●●●

●

●●

●●

●

●

●●

●

●

●●

●●

●

●

●●●●

●

●●

●

●

●

●●●●●

●

●

●

●

●

●●●●

●●●●

●●

●

●

●

●

●

●

●●

●

●●

●

●

●

●●●●

●●

●

●

●

●

●●

●

●●●

●

●

●

●●●●●●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●●●●●●●●

●●

●

●

●

●

●●

●●●

●

●

●

●

●●●

●

●

●

●

●●●●●

●●●●

●

●

●

●

●●●

●●●

●●

●

●

●

●

●

●●

●

●●

●●●●

●●

●

●●●●●●

●

●

●●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●●●

●

●

●

●●●●

●●●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●●

●●●●●●

●●

●

●

●

●

●

●●●●

●

●

0

10

20

30

40

0 100 200 300time

obse

rvat

ions

Figure : A time series of 365 observations generated according to aphytoplankton–zooplankton model.


Phytoplankton–Zooplankton: parameters

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0.60

0.65

0.70

0.75

0.80

0.45 0.50 0.55 0.60σα

µ α ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0.10

0.15

0.20

0.25

0.30

0.45 0.50 0.55 0.60σα

σ y

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0.08

0.09

0.10

0.11

0.05 0.10 0.15 0.20ml

mq

Figure : Posterior distribution of the parameters.



0.00

0.25

0.50

0.75

1.00

0 10 20 30 40 50time

µ α

Figure : Evolution, over the first 50 time steps, of the posteriordistribution of µα.



0.00

0.25

0.50

0.75

1.00

0 10 20 30 40 50time

σ α

Figure : Evolution, over the first 50 time steps, of the posteriordistribution of σα.



0.00

0.25

0.50

0.75

1.00

0 10 20 30 40 50time

σ y

Figure : Evolution, over the first 50 time steps, of the posteriordistribution of σy.



0.00

0.25

0.50

0.75

1.00

0 10 20 30 40 50time

ml

Figure : Evolution, over the first 50 time steps, of the posteriordistribution of ml .



0.00

0.25

0.50

0.75

1.00

0 10 20 30 40 50time

mq

Figure : Evolution, over the first 50 time steps, of the posteriordistribution of mq.


Phytoplankton–Zooplankton: prediction

●

●

● ●

●

●● ● ●

●

● ●

●

●

●

●

●

●● ●

●

●

●

●

●●

●

●●

● ●

● ●

●

●

●

●

●●

● ●

●

0

10

20

30

0 10 20 30 40 50time

obse

rvat

ions

Figure : One step predictions under parameter uncertainty.


Phytoplankton–Zooplankton: prediction

●

●

●●

●

●●●●

●

●●

●

●

●

●

●

●●●

●

●

●

●

●●●

●●●●

●●

●

●●●

●●●●

●●

●●

●

●●●●●

●

●

●

●

●

●●

●

●●

●

●●●●

●

●

●●●●●

●●

●

●●●

●

●●

●

●●●

●

●●●

●

●●●

●

●

●

●●●

●●●●

●●●

●●

●

●

●

●●●

●●●●●●

●

●

●

●●

●

●●●

●●

●●●●●

●

●

●

●

●

●

●●

●

●

●

●●●●●●●●●

●●

●●●

●

●

●●

●

●

●

●

●●●●●

●

●

●●●●●●

●●

●

●●

●

●●

●●

●●●●●

●

●●●●●●●

●●●

●●●

●●

●

●●

●

●

●

●

●●●

●●

●●

●●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●●●

●

●

●●●●

●●●

●

●

●●●

●●

●

●

●●●●

●●●●●●●●

●

●

●

●

●●●●●

0

10

20

30

40

0 100 200 300time

obse

rvat

ions

Figure : One step predictions under parameter uncertainty.


Phytoplankton–Zooplankton: another model

dptdt

= αpt − cptzt

dztdt

= ecptzt −mlzt

or

dptdt

= αpt − cptzt

dztdt

= ecptzt −mlzt−mqz2t

?


Phytoplankton–Zooplankton: model choice

1

100

0 25 50 75 100time

Bay

es fa

ctor

Figure : Bayes Factor against time.


Phytoplankton–Zooplankton: model choice

1e+001e+02

1e+06

1e+12

1e+18

0 100 200 300time

Bay

es fa

ctor

Figure : Bayes Factor against time.


Outline





5 Discussion


Forgetting mechanism for hidden states

Forgetting property of a uniformly ergodic Markov chain:

||pνt − pµ

t ||TV ≤ Cρt

where ν, µ are two initial distributions pνt is the distribution of

Xt after t steps, ρ < 1, C > 0.

Similarly, the filtering distribution πt(dxt) = p(dxt | y0:t)forgets its initial condition geometrically fast.

Introduce the operator Φt , taking a measure, applying aMarkov kernel to it, and then a Bayes update using yt .

Under conditions on the data generating process and themodel,

||Φ0:t(µ)− Φ0:t(ν)||TV ≤ Cρt .


Forgetting mechanism for parameters

Forgetting mechanism for Bayesian posterior distribution:

||pνt − pµ

t ||TV ≤1√tC .

Huge literature on prior robustness.

Posterior forgetting goes much slower than Markov chainforgetting.

An error in the approximation of p(θ | y1:t) damages thesubsequent approximations of p(θ | y1:t+k), for many k’s.

SMC samplers are stable because of the added MCMC steps,which costs increase with t.


Other challenges

Dimensionality: the other big open problem.

Particle filter’s errors grow exponentially fast with dim(X).

Can local particle filters beat the curse of dimensionality?Rebeschini, van Handel, 2013.

Carefully analyzed biased approximations.

Assumption of a spatial forgetting effect from the model.


Other challenges

Particle filters provide useful estimates. . .

. . . but no estimates of their associated variance.

Can we estimate the variance without having to run thealgorithm many times?


Other challenges

Particle methods are more and more commonly used outsidethe setting of HMMs.

For instance, in the setting of long memory processes:probabilistic programming, Bayesian non-parametricapplications.

Are particle methods useful for models that do not satisfyforgetting properties?

Stability of Feynman-Kac formulae with path-dependentpotentials,Chopin, Del Moral, Rubenthaler, 2009.


Discussion

SMC2 allows sequential exact approximation in HMMs, butnot online.

Properties of posterior distributions could help achieving exactonline inference, or prove that it is, in fact, impossible.

Do we want to sample from the posterior as t →∞?

Importance of plug and play inference for time series.

Implementation in LibBi, with GPU support.


Links

Particle Markov chain Monte Carlo,Andrieu, Doucet, Holenstein, 2010 (JRSS B)

Sequential Monte Carlo samplers: error bounds andinsensitivity to initial conditions,Whiteley, 2011 (Stoch. Analysis and Appl.).

SMC2: an algorithm for sequential analysis of HMM,Chopin, Jacob, Papaspiliopoulos, 2013 (JRSS B)

www.libbi.org


current limitations of sequential inference in general hidden markov models

Education

online inference

xxtpdxt y0

xxt pdxt y0

60outline1 setting

model uncertainty

xt 1pdx0 ts

pys xs

parameter uncertainty