knowledge repn. & reasoning lec #24: approximate inference in dbns

Knowledge Repn. & Reasoning Lec #24: Approximate Inference in DBNs UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2004 (Some slides by X. Boyen & D. Koller, and by S. H. Lim; Some slides by Doucet, de Freitas, Murphy, Russell, and H. Zhou)

Upload: lerato

Post on 04-Feb-2016




0 download


Knowledge Repn. & Reasoning Lec #24: Approximate Inference in DBNs. UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2004. (Some slides by X. Boyen & D. Koller, and by S. H. Lim; Some slides by Doucet, de Freitas, Murphy, Russell, and H. Zhou ). Dynamic Systems. - PowerPoint PPT Presentation


Page 1: Knowledge Repn. & Reasoning Lec #24:  Approximate Inference in DBNs

Knowledge Repn. & ReasoningLec #24:

Approximate Inference in DBNsUIUC CS 498: Section EA

Professor: Eyal AmirFall Semester 2004

(Some slides by X. Boyen & D. Koller, and by S. H. Lim;

Some slides by Doucet, de Freitas, Murphy, Russell, and H. Zhou)

Page 2: Knowledge Repn. & Reasoning Lec #24:  Approximate Inference in DBNs

Dynamic Systems

• Filtering in stochastic, dynamic systems:– Monitoring freeway traffic (from an autonomous driver

or for traffic analysis)– Monitoring patient’s symptoms

• Models to deal with uncertainty and/or partial observability in dynamic systems:– Hidden Markov Models (HMMs), Kalman Filters etc– All are special cases of Dynamic Bayesian Networks


Page 3: Knowledge Repn. & Reasoning Lec #24:  Approximate Inference in DBNs


• Exact DBN inference– Filtering– Smoothing– Projection– Explanation

Page 4: Knowledge Repn. & Reasoning Lec #24:  Approximate Inference in DBNs

DBN Myth

• Bayesian Network: a decomposed structure to represent the full joint distribution

• Does it imply easy decomposition for the belief state?

• No!

Page 5: Knowledge Repn. & Reasoning Lec #24:  Approximate Inference in DBNs

Tractable, approximate representation

• Exact inference in DBN is intractable

• Need approximation– Maintain an approximate belief state– E.g. assume Gaussian processes

• Today: – Factored belief state apx [Boyen & Koller ’98]– Particle filtering (if time permits)

Page 6: Knowledge Repn. & Reasoning Lec #24:  Approximate Inference in DBNs


• Use a decomposable representation for the belief state (pre-assume some independency)

Page 7: Knowledge Repn. & Reasoning Lec #24:  Approximate Inference in DBNs


• What about the approximation errors?– It might accumulate and grow unbounded…

Page 8: Knowledge Repn. & Reasoning Lec #24:  Approximate Inference in DBNs

Contraction property

• Main result:– If the process is mixing, then every state

transition results in a contraction of the distance between the two distributions by a constant factor

– Since approximation errors from previous steps decrease exponentially, the overall error remains bounded indefinitely

Page 9: Knowledge Repn. & Reasoning Lec #24:  Approximate Inference in DBNs

Basic framework• Definition 1:

– Prior belief state:

– Posterior belief state:

• Monitoring task:

],...,|[][ )1()0()()(






],,...,|[][ )()1()0()()(








l hllt


















Page 10: Knowledge Repn. & Reasoning Lec #24:  Approximate Inference in DBNs

Simple contraction

• Distance measure:– Relative entropy (KL-divergence) between the

actual and the approximate belief state

• Contraction due to O:

• Contraction due to T (can we do better?):

i i




]ˆ||[]]]ˆ[||][[[ )()()()()(




]ˆ||[]]ˆ[||][[[ )()()()( tttt DTTD

Page 11: Knowledge Repn. & Reasoning Lec #24:  Approximate Inference in DBNs

Simple contraction (cont)

• Definition:– Minimal mixing rate:

• Theorem 3 (the single process contraction theorem):– For process Q, anterior distributions φ and ψ, ulterior distributions

φ’ and ψ’,


1, ij


jijiiQ QQ

]||[)1(]||[ DD Q

Page 12: Knowledge Repn. & Reasoning Lec #24:  Approximate Inference in DBNs

Simple contraction (cont)

• Proof Intuition:

Page 13: Knowledge Repn. & Reasoning Lec #24:  Approximate Inference in DBNs

Compound processes

• Mixing rate could be very small for large processes• The trick is to assume some independence among

subprocesses and factor the DBN along these subprocesses

• Fully independent subprocesses:– Theorem 5:

• For L independent subprocesses T1, …, TL. Let γl be the mixing rate for Tl and let γ = minl γl. Let φ and ψ be distributions over S1

(t), …, SL

(t), and assume that ψ renders the Sl(t) marginally independent.


]||[)1(]||[ DD

Page 14: Knowledge Repn. & Reasoning Lec #24:  Approximate Inference in DBNs

Compound processes (cont)

• Conditionally independent subprocesses• Theorem 6 (the main theorem):– For L independent subprocesses T1, …, TL, assume each

process depends on at most r others, and each influences at most q others. Let γl be the mixing rate for Tl and let γ = minl γl. Let φ and ψ be distributions over S1

(t), …, SL(t), and assume

that ψ renders the Sl(t) marginally independent. Then:





* ]||[)1(]||[

Page 15: Knowledge Repn. & Reasoning Lec #24:  Approximate Inference in DBNs

Efficient, approximate monitoring

• If each approximation incurs an error bounded by ε, then– Total error

• =>error remains bounded• Conditioning on observations might introduce

momentary errors, but the expected error will contract


Page 16: Knowledge Repn. & Reasoning Lec #24:  Approximate Inference in DBNs

Approximate DBN monitoring

• Algorithm (based on standard clique tree inference):

1. Construct a clique tree from the 2-TBN2. Initialize clique tree with conditional probabilities

from CPTs of the DBN3. For each time step:

a. Create a working copy of the tree Y. Create σ(t+1).b. For each subprocess l, incorporate the marginal σ(t)

[X(t)l] in the appropriate factor in Y.

c. Incorporate evidence r(t+1) in Y.d. Calibrate the potentials in Y.e. For each l, query Y for marginal over Xl

(t+1) and store it in σ(t+1).

Page 17: Knowledge Repn. & Reasoning Lec #24:  Approximate Inference in DBNs

Conclusion of Factored DBNs

• Accuracy-efficiency tradeoff:– Small partition =>

• Faster inference• Better contraction• Worse approximation

• Key to good approximation:– Discover weak/sparse interactions among

subprocesses and factor the DBN along these lines

– Domain knowledge helps

Page 18: Knowledge Repn. & Reasoning Lec #24:  Approximate Inference in DBNs


• Factored inference in DBNs

• Sampling: Particle Filtering

Page 19: Knowledge Repn. & Reasoning Lec #24:  Approximate Inference in DBNs

A sneak peek at particle filtering

Page 20: Knowledge Repn. & Reasoning Lec #24:  Approximate Inference in DBNs

Introduction• Analytical methods

– Kalman filter: linear-Gaussian models– HMM: models with finite state space

• Stat. approx. methods for non-parametric distributions and large discrete DBN

• Diff. names:– Sequential Monte Carlo (Handschin and Mayne

1969, Akashi and Kumamoto 1975) – Particle filtering (Doucet et all 1997)– Survival of the fittest (Kanazawa, Koller and Russell

1995)– Condensation in computer vision (Isard and Blake


Page 21: Knowledge Repn. & Reasoning Lec #24:  Approximate Inference in DBNs


• Importance Sampling (IS) revisited– Sequential IS (SIS)– Particle Filtering = SIS + Resampling

• Dynamic Bayesian Networks– A Simple example: ABC network

• Inference in DBN:– Exact inference– Pure Particle Filtering– Rao-Blackwellised PF

• Demonstration in ABC network• Discussions

Page 22: Knowledge Repn. & Reasoning Lec #24:  Approximate Inference in DBNs

Importance Sampling Revisited

• Goal: evaluate the following functional

• Importance Sampling (batch mode):– Sample from – Assign

as weight of each sample– The posterior estimation of is:

Importance function,whose support must include that of the state posterior. It must also have been normalized.
Page 23: Knowledge Repn. & Reasoning Lec #24:  Approximate Inference in DBNs

• How to make it sequential?

• Choose Importance function to be:

• We get the SIS filter

• Benefit of SIS– Observation yk don’t have be given in batch

Sequential Importance Sampling

Page 24: Knowledge Repn. & Reasoning Lec #24:  Approximate Inference in DBNs

Sequential Importance Sampling

Page 25: Knowledge Repn. & Reasoning Lec #24:  Approximate Inference in DBNs


• Why need to resample– Degeneracy of SIS

• The variance of the importance weights (y0:k is r.v.) increases in each recursion step

– Optimal importance function

• Need to sample from and evaluate

• Resampling: eliminate small weights and concentrate on large weights

Page 26: Knowledge Repn. & Reasoning Lec #24:  Approximate Inference in DBNs


• Measure of degeneracy: effective sample size

N_{eff} should be large enough. Otherwise, the variance of the sample weights will be too large.Proved by [Kong, Liu and Wong 1994]
Page 27: Knowledge Repn. & Reasoning Lec #24:  Approximate Inference in DBNs

Resampling Step

Particle filtering = SIS + Resampling

Page 28: Knowledge Repn. & Reasoning Lec #24:  Approximate Inference in DBNs

Rao-Blackwellisation for SIS

• A method to reduce the variance of the final posterior estimation

• Useful when the state can be partitioned as in which can be analytically marginalized.

• Assuming can be evaluated analytically given , one can rewrite the posterior estimation as

Page 29: Knowledge Repn. & Reasoning Lec #24:  Approximate Inference in DBNs

Example: ABC network

Page 30: Knowledge Repn. & Reasoning Lec #24:  Approximate Inference in DBNs

Inference in DBN


for the hidden variables.Obs variables can have Gaussian distribution
Page 31: Knowledge Repn. & Reasoning Lec #24:  Approximate Inference in DBNs

Exact inference in ABC network

Page 32: Knowledge Repn. & Reasoning Lec #24:  Approximate Inference in DBNs

Particle filtering

Page 33: Knowledge Repn. & Reasoning Lec #24:  Approximate Inference in DBNs

Rao-Blackwellised PF

Page 34: Knowledge Repn. & Reasoning Lec #24:  Approximate Inference in DBNs

Rao-Blackwellised PF (2)

Page 35: Knowledge Repn. & Reasoning Lec #24:  Approximate Inference in DBNs

Rao-Blackwellised PF (3)

Page 36: Knowledge Repn. & Reasoning Lec #24:  Approximate Inference in DBNs

Rao-Blackwellised PF (4)

Page 37: Knowledge Repn. & Reasoning Lec #24:  Approximate Inference in DBNs


• Structure of the network:– A, C dependent on B– yt can be also separated into 3 indep. parts