time series, hmms, kalman filtersguestrin/class/10701-s05/slides/hmms.pdf · kalman filter...

Classic HMM tutorial – see class website:*L. R. Rabiner, "A Tutorial on Hidden Markov Models and SelectedApplications in Speech Recognition," Proc. of the IEEE, Vol.77, No.2,pp.257--286, 1989.

Time series,HMMs,Kalman FiltersMachine Learning – 10701/15781Carlos GuestrinCarnegie Mellon University

March 28th, 2005

Adventures of our BN hero

Compact representation for probability distributionsFast inferenceFast learning

But… Who are the most popular kids?

1. Naïve Bayes

2 and 3. Hidden Markov models (HMMs)Kalman Filters

Handwriting recognition

Character recognition, e.g., kernel SVMs

z cb

cac rr

rr r

r

Example of a hidden Markov model (HMM)

Understanding the HMM Semantics

X1 = {a,…z}

O1 =

X5 = {a,…z}X3 = {a,…z} X4 = {a,…z}X2 = {a,…z}

O2 = O3 = O4 = O5 =

HMMs semantics: Details

X1 = {a,…z}

O1 =

X5 = {a,…z}X3 = {a,…z} X4 = {a,…z}X2 = {a,…z}

O2 = O3 = O4 = O5 =

Just 3 distributions:

HMMs semantics: Joint distribution

X1 = {a,…z}

O1 =

X5 = {a,…z}X3 = {a,…z} X4 = {a,…z}X2 = {a,…z}

O2 = O3 = O4 = O5 =

Learning HMMs from fully observable data is easy

X1 = {a,…z}

O1 =

X5 = {a,…z}X3 = {a,…z} X4 = {a,…z}X2 = {a,…z}

O2 = O3 = O4 = O5 =

Learn 3 distributions:

Possible inference tasks in an HMM

X1 = {a,…z}

O1 =

X5 = {a,…z}X3 = {a,…z} X4 = {a,…z}X2 = {a,…z}

O2 = O3 = O4 = O5 =

Marginal probability of a hidden variable:

Viterbi decoding – most likely trajectory for hidden vars:

Using variable elimination to compute P(Xi|o1:n)

X1 = {a,…z}

O1 =

X5 = {a,…z}X3 = {a,…z} X4 = {a,…z}X2 = {a,…z}

O2 = O3 = O4 = O5 =

Compute:

Variable elimination order?

Example:

What if I want to compute P(Xi|o1:n) for each i?

X1 = {a,…z}

O1 =

X5 = {a,…z}X3 = {a,…z} X4 = {a,…z}X2 = {a,…z}

O2 = O3 = O4 = O5 =

Compute:

Variable elimination for each i?

Variable elimination for each i, what’s the complexity?

Reusing computationX1 = {a,…z}

O1 =

X5 = {a,…z}X3 = {a,…z} X4 = {a,…z}X2 = {a,…z}

O2 = O3 = O4 = O5 =

Compute:

The forwards-backwards algorithmX1 = {a,…z}

O1 =

X5 = {a,…z}X3 = {a,…z} X4 = {a,…z}X2 = {a,…z}

O2 = O3 = O4 = O5 =

Initialization: For i = 2 to n

Generate a forwards factor by eliminating Xi-1

Initialization: For i = n-1 to 1

Generate a backwards factor by eliminating Xi+1

∀ i, probability is:

Most likely explanationX1 = {a,…z}

O1 =

X5 = {a,…z}X3 = {a,…z} X4 = {a,…z}X2 = {a,…z}

O2 = O3 = O4 = O5 = Compute:

Variable elimination order?

Example:

The Viterbi algorithmX1 = {a,…z}

O1 =

X5 = {a,…z}X3 = {a,…z} X4 = {a,…z}X2 = {a,…z}

O2 = O3 = O4 = O5 =

Initialization: For i = 2 to n

Generate a forwards factor by eliminating Xi-1

Computing best explanation: For i = n-1 to 1

Use argmax to get explanation:

What about continuous variables?

In general, very hard! Must represent complex distributions

A special case is very doableWhen everything is GaussianCalled a Kalman filterOne of the most used algorithms in the history of probabilities!

Time series data example: Temperatures from sensor network

SERVER

LAB

KITCHEN

COPYELEC

PHONEQUIET

STORAGE

CONFERENCE

OFFICEOFFICE50

51

52 53

54

46

48

49

47

43

45

44

42 41

3739

38 36

33

3

6

10

11

12

13 14

1516

17

19

2021

22

242526283032

31

2729

23

18

9

5

8

7

4

34

1

2

3540

Operations in Kalman filter

Compute

Start with At each time step t:

Condition on observation

Roll-up (marginalize previous time step)

X1

O1 =

X5X3 X4X2

O2 = O3 = O4 = O5 =

Detour: Understanding Multivariate Gaussians

Observe attributesExample: Observe X1=18

P(X2|X1=18)

Characterizing a multivariate Gaussian

Mean vector:

Covariance matrix:

Conditional Gaussians

Conditional probabilitiesP(Y|X)

Kalman filter with GaussiansX1

O1 =

X5X3 X4X2

O2 = O3 = O4 = O5 =

Equivalent to a linear system

Detour2: Canonical form

Standard form and canonical forms are related:

Conditioning is easy in canonical formMarginalization easy in standard form

Conditioning in canonical form

First multiply:

Then, condition on value B = y


Compute




X1

O1 =

X5X3 X4X2

O2 = O3 = O4 = O5 =

Roll-up in canonical form

First multiply:

Then, marginalize Xt:


Compute




X1

O1 =

X5X3 X4X2

O2 = O3 = O4 = O5 =

Learning a Kalman filter

Must learn:

Learn joint, and use division rule:

Maximum likelihood learning of a multivariate Gaussian

Data:

Means are just empirical means:

Empirical covariances:

What you need to know

Hidden Markov models (HMMs)Very useful, very powerful!Speech, OCR,…Parameter sharing, only learn 3 distributionsTrick reduces inference from O(n2) to O(n)Special case of BN

Kalman filterContinuous vars version of HMMsAssumes Gaussian distributionsEquivalent to linear systemSimple matrix operations for computations

time series, hmms, kalman filtersguestrin/class/10701-s05/slides/hmms.pdf · kalman filter...

Documents