probabilistic reasoning over time 2 - queen's u

37
Probabilistic Reasoning Over Time 2 CISC 453 Amy VanBerlo [email protected] Adapted from Ch 15 AIMA3e

Upload: others

Post on 18-Dec-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Probabilistic Reasoning Over Time 2 - Queen's U

Probabilistic Reasoning Over Time 2

CISC 453Amy VanBerlo

[email protected]

Adapted from Ch 15 AIMA3e

Page 2: Probabilistic Reasoning Over Time 2 - Queen's U

Overview

15.3 Hidden Markov Models – HMM◦ +Power of Linear Algebra

15.4 Kalman Filters◦ +Gaussian Distributions revisited

15.5 Dynamic Bayesian Networks◦ The MEGAVARIABLE & Particle Filtering

Page 3: Probabilistic Reasoning Over Time 2 - Queen's U

Hidden Markov Models – HMM(15.3)

Recall:initial state model: P(X0)

transition model: P(Xi | Xi-1)

sensor model: P(Ei | Xi)

Discrete State Variables!

Page 4: Probabilistic Reasoning Over Time 2 - Queen's U

Hidden Markov Models – HMM(15.3)

transition model: P(Xi | Xi-1) => Transition Matrix

sensor model: P(Ei | Xi) => Diagonal Matrices

Page 5: Probabilistic Reasoning Over Time 2 - Queen's U

Hidden Markov Models – HMM(15.3)

Transition Matrix

state variable: Xt has a value denoted by integer 1-S, where S = # possible states

transition model: P(Xt | Xt-1) becomes SxS Matrix T:

Tij = P(Xt = j| Xt-1 = i)

Tij is the probability of a transition from state i to j.

Page 6: Probabilistic Reasoning Over Time 2 - Queen's U

Hidden Markov Models – HMM(15.3)

Transition MatrixEXAMPLE:

From Umbrella World:

Tij = P(Xt = j| Xt-1 = i)

T = P(Xt | Xt-1) =0.7 0.3

0.3 0.7

Page 7: Probabilistic Reasoning Over Time 2 - Queen's U

Diagonal Matrices (Evidence Variable / Sensor Model)

Et is known at time t => Value / et

Need P(et | Xt =i ), => diagonals of SxS Matrix, Ot

Hidden Markov Models – HMM(15.3)

Page 8: Probabilistic Reasoning Over Time 2 - Queen's U

Hidden Markov Models – HMM(15.3)

Diagonal MatricesEXAMPLE:

From Umbrella World:

U1 = true; U3 = false;

O1 = O3 =0.9 0

0 0.2

0.1 0

0 0.8

Page 9: Probabilistic Reasoning Over Time 2 - Queen's U

Hidden Markov Models – HMM(15.3)

Forward Eq => column vector f

f1:t+1 = αOt+1TTf1:t

Backward Eq => column vector b

bk+1:t = TOk+1bk+2:t

Page 10: Probabilistic Reasoning Over Time 2 - Queen's U

Hidden Markov Models – HMM(15.3)

Advantages

All computations become simple matrix-vector operations!

ComplexityForward-backward algorithm:

sequence length t is O(S2t)

Improved Smoothing Algorithm:…

Page 11: Probabilistic Reasoning Over Time 2 - Queen's U
Page 12: Probabilistic Reasoning Over Time 2 - Queen's U

Hidden Markov Models – HMM(15.3)

Example: LocalizationVacuum World, simplified:

Original Robot, obstacle sensors

NSEW

Action(moveN/S/E/orW)

Belief state: set of all possible locations robot could be in

Additions: Allow for sensor noise

P model robot’s motion

Domain: set empty squares {s1,…sn}

Neighbours(s) / N(s)

Page 13: Probabilistic Reasoning Over Time 2 - Queen's U

Hidden Markov Models – HMM(15.3)

Example: LocalizationVacuum World, simplified:

Transition Model for Move:P(Xt = j| Xt-1 = i) = Tij =(1/N(i), if j ε Ne(i) else 0)

Unknown start state, so assume uniform distribution over all squares, P(X0 = i) = 1/n

Et 16 possible values

Є each sensor’s error rate

Page 14: Probabilistic Reasoning Over Time 2 - Queen's U

Hidden Markov Models – HMM(15.3)

Example: LocalizationVacuum World, simplified:

Et 16 possible values

Є each sensor’s error rate, probability getting all four bits right (1- Є)4 / wrong Є4

Discrepancy dit

Probability that a robot in square I would receive a sensor reading et

P(Et = et | Xt =i) = Otii = (1- Є)4-ditЄ

dit

Page 15: Probabilistic Reasoning Over Time 2 - Queen's U

Hidden Markov Models – HMM(15.3)

Example: LocalizationVacuum World, simplified:

Page 16: Probabilistic Reasoning Over Time 2 - Queen's U

Kalman Filters (15.4)

Where HMM dealt with DISCRETE vars, Kalman Filters vars are CONTINOUS

Ex: tracking a bird flying through the forest

◦ Xt = X, Y , Z , Xveloc ,Yveloc ,Zveloc

Page 17: Probabilistic Reasoning Over Time 2 - Queen's U

Kalman Filters (15.4)

Updating Gaussian Distributions

Probability P(Xt | e1:t) (current distribution)

Prediction P(Xt+1 | e1:t) = ∫Xt P(Xt+1 | xt) P(xt | e1:t) dxt

Sensor Model P(et+1 | Xt+1)

Updated distribution

P(Xt+1 | e1:t+1) = αP(et+1 | Xt+1) P(Xt+1 | e1:t)

Page 18: Probabilistic Reasoning Over Time 2 - Queen's U

Kalman Filters (15.4)

Updating Gaussian Distributions

If P(Xt | e1:t) is Gaussian, then predictionP(Xt+1 | e1:t) is Gaussian.

If P(Xt+1 | e1:t) is Gaussian, then the updated distributionP(Xt+1 | e1:t+1) is Gaussian.

therefore:

P(Xt | e1:t) is multivariate Gaussian N( μt, ∑t) for all t

Gaussian is LINEAR=> Xt+1 linear function of Xt

Page 19: Probabilistic Reasoning Over Time 2 - Queen's U

Kalman Filters (15.4)

Multivariate Gaussian Implication:

Filtering with a linear Gaussian model produces a Gaussian state distribution for all time

Why so important?

Continuous variable systems grow without bound

over time

Being able to model with normal distributions allows for accurate and complexity reduced calculations

Page 20: Probabilistic Reasoning Over Time 2 - Queen's U

Kalman Filters (15.4)

Multivariate Gaussian Implication:

Page 21: Probabilistic Reasoning Over Time 2 - Queen's U

Kalman Filters (15.4)

Where it Breaks:

Cannot be applied if transition model nonlinear

ex: bird evading tree

Extended Kalman Filter models transitions as locally linear; fails if system is locally unsmooth.

Page 22: Probabilistic Reasoning Over Time 2 - Queen's U

Dynamic Bayesian Networks (15.5)

In general: each ‘slice’ of a DBN can have any number of:

state variables Xt

Sensor/evidence variables Et

Assume variables and their relationships preserved/replicated from time t to t+1

Page 23: Probabilistic Reasoning Over Time 2 - Queen's U

Dynamic Bayesian Networks (15.5)

vs Hidden Markov Models

Every HMM can be rep as a DBN with a single Xt

and Et

Every discrete variable DBN can be rep as an HMM… Combine all Xt into MEGAVARIABLE

◦ Values: all possible tuples of values of individual Xt

Page 24: Probabilistic Reasoning Over Time 2 - Queen's U

Dynamic Bayesian Networks (15.5)

vs Hidden Markov Models

◦ If interchangeable.. Where lies the difference?

“Sparseness”:Ex. Suppose DBN 20 boolean Xt, each 3 parents

DBN transition model: 20 x 23 = 160 probabilities

HMM transition matrix: 220 states, 240 probabilities (~trillion!!)

Page 25: Probabilistic Reasoning Over Time 2 - Queen's U

Dynamic Bayesian Networks (15.5)

vs Kalman Filters

Every KF can be rep as a DBN with continuous variables and linear Gaussian conditional distributions

Ex: tracking a bird flying through the forest

◦ Xt = X, Y , Z , Xveloc ,Yveloc ,Zveloc

Page 26: Probabilistic Reasoning Over Time 2 - Queen's U

Dynamic Bayesian Networks (15.5)

vs Kalman Filters

Every KF can be rep as a DBN but few DBNs are KFs;

DBNs can model arbitrary distributions

KF always model a single multivariate Gaussian distribution

Aspects of the real world (obstacles) introduce non-linearities, require combination discrete and continuous

Page 27: Probabilistic Reasoning Over Time 2 - Queen's U

Dynamic Bayesian Networks (15.5)

Constructing DBNs Must specify :

1. prior distribution over state variables P(X0)

2. transition model P(Xt+1 | Xt)

3. sensor model P(Et | Xt)

Must also specify connections between slices

RECALL: model assumed variables and their relationships preserved/replicated from time t to t+1

Simply specify first slice and copy!

Page 28: Probabilistic Reasoning Over Time 2 - Queen's U

Dynamic Bayesian Networks (15.5)

Exact Inference in DBNs have seen inference in BN before given sequence observations, can construct full

Bayesian network representation of DBN by replicating slices until network large enough

Unrolling

then apply inference algorithm (ch.14) variable elimination, clustering..etc

Page 29: Probabilistic Reasoning Over Time 2 - Queen's U

Dynamic Bayesian Networks (15.5)

Exact Inference in DBNsUnrolling

Problem:

inference cost for each update grows with t

Rollup Filtering add slice t+1, “sum out” slice t using variable

elimination

Largest factor is O(dn+1), update cost O(dn+2)

Page 30: Probabilistic Reasoning Over Time 2 - Queen's U

Dynamic Bayesian Networks (15.5)

Exact Inference in DBNsUnrolling

Use DBNs to represent very complex temporal process with many sparsely connected variables

CANNOT reason efficiently and exactly about those processes!

Page 31: Probabilistic Reasoning Over Time 2 - Queen's U

Dynamic Bayesian Networks (15.5)

Approximate Inference in DBNs

Likelihood Weighting adapted from (14.5)

sample and weight non-evidence nodes of network in topological order

to avoid growth problem seen in Exact Inference, can simply run all N samples together through DBN, one slice at a time

Page 32: Probabilistic Reasoning Over Time 2 - Queen's U

Dynamic Bayesian Networks (15.5)

Approximate Inference in DBNs

Likelihood Weighting STILL FLAWED!

LW samples pay no attention to evidence◦ Fraction “agreeing” falls exponentially with t

◦ # samples req grows exponentially with t

Idea: focus set of samples on high-probability regions state space…. Particle Filtering:

Page 33: Probabilistic Reasoning Over Time 2 - Queen's U

Dynamic Bayesian Networks (15.5)

Inference in DBNs- Particle Filtering

population N initial-state samples created from prior distribution P(X0)

Update Cycle repeated for each time step:1. given xt (current state value for the sample)

based on transition model P(Xt+1 | Xt)-propagate sample forward

2. sample weighted by ‘likelihood it assigns to new evidence’, P(et+1 | xt+1)

3. resample pop, new N unweighted samples

Page 34: Probabilistic Reasoning Over Time 2 - Queen's U

Dynamic Bayesian Networks (15.5)

Inference in DBNs- Particle Filtering

Assume consistent at time t:

◦ N(xt | e1:t) / N = P(xt | e1:t)

Propagate forward: pop o Xt+1 are◦ N(xt+1 | e1:t) = ∑xtP(xt+1 | xt) N(xt | e1:t)

Weight samples by their likelihood for et+1:◦ W(xt+1 | e1:t+1) = P(et+1 | xt+1) N(xt+1 | e1:t)

Resample to obtain populations proportional to W:

◦ N(xt+1 | e1:t+1)/N = ……

◦ = P(xt+1 | e1:t+1)

Page 35: Probabilistic Reasoning Over Time 2 - Queen's U

Dynamic Bayesian Networks (15.5)

Inference in DBNs- Particle Filtering

Performance:

Approximation error of PF remains bounded over time :D

At least empirically! – Theoretical analysis difficult.

Page 36: Probabilistic Reasoning Over Time 2 - Queen's U

Summary

Temporal models use state and sensor variables replicated over time

Hidden Markov Models have single discrete state variable

Kalman Filters allow n state variables, linear Gaussian, multivariate Gaussian distributions

Dynamic Bayesian Nets selectively interchangeable with HMMs and KFs◦ Particle Filtering good inference method/ filtering

algorithm for DBNs

Page 37: Probabilistic Reasoning Over Time 2 - Queen's U

Thanks!

Questions?