probabilistic reasoning over time 2 - queen's u

Probabilistic Reasoning Over Time 2

CISC 453Amy VanBerlo

[email protected]

Adapted from Ch 15 AIMA3e

Overview

15.3 Hidden Markov Models – HMM◦ +Power of Linear Algebra

15.4 Kalman Filters◦ +Gaussian Distributions revisited

15.5 Dynamic Bayesian Networks◦ The MEGAVARIABLE & Particle Filtering

Hidden Markov Models – HMM(15.3)

Recall:initial state model: P(X0)

transition model: P(Xi | Xi-1)

sensor model: P(Ei | Xi)

Discrete State Variables!


transition model: P(Xi | Xi-1) => Transition Matrix

sensor model: P(Ei | Xi) => Diagonal Matrices


Transition Matrix

state variable: Xt has a value denoted by integer 1-S, where S = # possible states

transition model: P(Xt | Xt-1) becomes SxS Matrix T:

Tij = P(Xt = j| Xt-1 = i)

Tij is the probability of a transition from state i to j.


Transition MatrixEXAMPLE:

From Umbrella World:

Tij = P(Xt = j| Xt-1 = i)

T = P(Xt | Xt-1) =0.7 0.3

0.3 0.7

Diagonal Matrices (Evidence Variable / Sensor Model)

Et is known at time t => Value / et

Need P(et | Xt =i ), => diagonals of SxS Matrix, Ot



Diagonal MatricesEXAMPLE:

From Umbrella World:

U1 = true; U3 = false;

O1 = O3 =0.9 0

0 0.2

0.1 0

0 0.8


Forward Eq => column vector f

f1:t+1 = αOt+1TTf1:t

Backward Eq => column vector b

bk+1:t = TOk+1bk+2:t


Advantages

All computations become simple matrix-vector operations!

ComplexityForward-backward algorithm:

sequence length t is O(S2t)

Improved Smoothing Algorithm:…


Example: LocalizationVacuum World, simplified:

Original Robot, obstacle sensors

NSEW

Action(moveN/S/E/orW)

Belief state: set of all possible locations robot could be in

Additions: Allow for sensor noise

P model robot’s motion

Domain: set empty squares {s1,…sn}

Neighbours(s) / N(s)



Transition Model for Move:P(Xt = j| Xt-1 = i) = Tij =(1/N(i), if j ε Ne(i) else 0)

Unknown start state, so assume uniform distribution over all squares, P(X0 = i) = 1/n

Et 16 possible values

Є each sensor’s error rate



Et 16 possible values

Є each sensor’s error rate, probability getting all four bits right (1- Є)4 / wrong Є4

Discrepancy dit

Probability that a robot in square I would receive a sensor reading et

P(Et = et | Xt =i) = Otii = (1- Є)4-ditЄ

dit

Kalman Filters (15.4)

Where HMM dealt with DISCRETE vars, Kalman Filters vars are CONTINOUS

Ex: tracking a bird flying through the forest

◦ Xt = X, Y , Z , Xveloc ,Yveloc ,Zveloc


Updating Gaussian Distributions

If P(Xt | e1:t) is Gaussian, then predictionP(Xt+1 | e1:t) is Gaussian.

If P(Xt+1 | e1:t) is Gaussian, then the updated distributionP(Xt+1 | e1:t+1) is Gaussian.

therefore:

P(Xt | e1:t) is multivariate Gaussian N( μt, ∑t) for all t

Gaussian is LINEAR=> Xt+1 linear function of Xt


Multivariate Gaussian Implication:

Filtering with a linear Gaussian model produces a Gaussian state distribution for all time

Why so important?

Continuous variable systems grow without bound

over time

Being able to model with normal distributions allows for accurate and complexity reduced calculations


Multivariate Gaussian Implication:


Where it Breaks:

Cannot be applied if transition model nonlinear

ex: bird evading tree

Extended Kalman Filter models transitions as locally linear; fails if system is locally unsmooth.

Dynamic Bayesian Networks (15.5)

In general: each ‘slice’ of a DBN can have any number of:

state variables Xt

Sensor/evidence variables Et

Assume variables and their relationships preserved/replicated from time t to t+1


vs Hidden Markov Models

Every HMM can be rep as a DBN with a single Xt

and Et

Every discrete variable DBN can be rep as an HMM… Combine all Xt into MEGAVARIABLE

◦ Values: all possible tuples of values of individual Xt


vs Hidden Markov Models

◦ If interchangeable.. Where lies the difference?

“Sparseness”:Ex. Suppose DBN 20 boolean Xt, each 3 parents

DBN transition model: 20 x 23 = 160 probabilities

HMM transition matrix: 220 states, 240 probabilities (~trillion!!)


vs Kalman Filters

Every KF can be rep as a DBN with continuous variables and linear Gaussian conditional distributions

Ex: tracking a bird flying through the forest

◦ Xt = X, Y , Z , Xveloc ,Yveloc ,Zveloc


vs Kalman Filters

Every KF can be rep as a DBN but few DBNs are KFs;

DBNs can model arbitrary distributions

KF always model a single multivariate Gaussian distribution

Aspects of the real world (obstacles) introduce non-linearities, require combination discrete and continuous


Constructing DBNs Must specify :

1. prior distribution over state variables P(X0)

2. transition model P(Xt+1 | Xt)

3. sensor model P(Et | Xt)

Must also specify connections between slices

RECALL: model assumed variables and their relationships preserved/replicated from time t to t+1

Simply specify first slice and copy!


Exact Inference in DBNs have seen inference in BN before given sequence observations, can construct full

Bayesian network representation of DBN by replicating slices until network large enough

Unrolling

then apply inference algorithm (ch.14) variable elimination, clustering..etc


Exact Inference in DBNsUnrolling

Problem:

inference cost for each update grows with t

Rollup Filtering add slice t+1, “sum out” slice t using variable

elimination

Largest factor is O(dn+1), update cost O(dn+2)


Exact Inference in DBNsUnrolling

Use DBNs to represent very complex temporal process with many sparsely connected variables

CANNOT reason efficiently and exactly about those processes!


Approximate Inference in DBNs

Likelihood Weighting adapted from (14.5)

sample and weight non-evidence nodes of network in topological order

to avoid growth problem seen in Exact Inference, can simply run all N samples together through DBN, one slice at a time


Approximate Inference in DBNs

Likelihood Weighting STILL FLAWED!

LW samples pay no attention to evidence◦ Fraction “agreeing” falls exponentially with t

◦ # samples req grows exponentially with t

Idea: focus set of samples on high-probability regions state space…. Particle Filtering:


Inference in DBNs- Particle Filtering

population N initial-state samples created from prior distribution P(X0)

Update Cycle repeated for each time step:1. given xt (current state value for the sample)

based on transition model P(Xt+1 | Xt)-propagate sample forward

2. sample weighted by ‘likelihood it assigns to new evidence’, P(et+1 | xt+1)

3. resample pop, new N unweighted samples



Performance:

Approximation error of PF remains bounded over time :D

At least empirically! – Theoretical analysis difficult.

Summary

Temporal models use state and sensor variables replicated over time

Hidden Markov Models have single discrete state variable

Kalman Filters allow n state variables, linear Gaussian, multivariate Gaussian distributions

Dynamic Bayesian Nets selectively interchangeable with HMMs and KFs◦ Particle Filtering good inference method/ filtering

algorithm for DBNs

Thanks!

Questions?

probabilistic reasoning over time 2 - queen's u

Documents