time series, hmms, kalman filtersguestrin/class/10701-s05/slides/hmms.pdf · kalman filter...
TRANSCRIPT
Classic HMM tutorial – see class website:*L. R. Rabiner, "A Tutorial on Hidden Markov Models and SelectedApplications in Speech Recognition," Proc. of the IEEE, Vol.77, No.2,pp.257--286, 1989.
Time series,HMMs,Kalman FiltersMachine Learning – 10701/15781Carlos GuestrinCarnegie Mellon University
March 28th, 2005
Adventures of our BN hero
Compact representation for probability distributionsFast inferenceFast learning
But… Who are the most popular kids?
1. Naïve Bayes
2 and 3. Hidden Markov models (HMMs)Kalman Filters
Handwriting recognition
Character recognition, e.g., kernel SVMs
z cb
cac rr
rr r
r
Example of a hidden Markov model (HMM)
Understanding the HMM Semantics
X1 = {a,…z}
O1 =
X5 = {a,…z}X3 = {a,…z} X4 = {a,…z}X2 = {a,…z}
O2 = O3 = O4 = O5 =
HMMs semantics: Details
X1 = {a,…z}
O1 =
X5 = {a,…z}X3 = {a,…z} X4 = {a,…z}X2 = {a,…z}
O2 = O3 = O4 = O5 =
Just 3 distributions:
HMMs semantics: Joint distribution
X1 = {a,…z}
O1 =
X5 = {a,…z}X3 = {a,…z} X4 = {a,…z}X2 = {a,…z}
O2 = O3 = O4 = O5 =
Learning HMMs from fully observable data is easy
X1 = {a,…z}
O1 =
X5 = {a,…z}X3 = {a,…z} X4 = {a,…z}X2 = {a,…z}
O2 = O3 = O4 = O5 =
Learn 3 distributions:
Possible inference tasks in an HMM
X1 = {a,…z}
O1 =
X5 = {a,…z}X3 = {a,…z} X4 = {a,…z}X2 = {a,…z}
O2 = O3 = O4 = O5 =
Marginal probability of a hidden variable:
Viterbi decoding – most likely trajectory for hidden vars:
Using variable elimination to compute P(Xi|o1:n)
X1 = {a,…z}
O1 =
X5 = {a,…z}X3 = {a,…z} X4 = {a,…z}X2 = {a,…z}
O2 = O3 = O4 = O5 =
Compute:
Variable elimination order?
Example:
What if I want to compute P(Xi|o1:n) for each i?
X1 = {a,…z}
O1 =
X5 = {a,…z}X3 = {a,…z} X4 = {a,…z}X2 = {a,…z}
O2 = O3 = O4 = O5 =
Compute:
Variable elimination for each i?
Variable elimination for each i, what’s the complexity?
Reusing computationX1 = {a,…z}
O1 =
X5 = {a,…z}X3 = {a,…z} X4 = {a,…z}X2 = {a,…z}
O2 = O3 = O4 = O5 =
Compute:
The forwards-backwards algorithmX1 = {a,…z}
O1 =
X5 = {a,…z}X3 = {a,…z} X4 = {a,…z}X2 = {a,…z}
O2 = O3 = O4 = O5 =
Initialization: For i = 2 to n
Generate a forwards factor by eliminating Xi-1
Initialization: For i = n-1 to 1
Generate a backwards factor by eliminating Xi+1
∀ i, probability is:
Most likely explanationX1 = {a,…z}
O1 =
X5 = {a,…z}X3 = {a,…z} X4 = {a,…z}X2 = {a,…z}
O2 = O3 = O4 = O5 = Compute:
Variable elimination order?
Example:
The Viterbi algorithmX1 = {a,…z}
O1 =
X5 = {a,…z}X3 = {a,…z} X4 = {a,…z}X2 = {a,…z}
O2 = O3 = O4 = O5 =
Initialization: For i = 2 to n
Generate a forwards factor by eliminating Xi-1
Computing best explanation: For i = n-1 to 1
Use argmax to get explanation:
What about continuous variables?
In general, very hard! Must represent complex distributions
A special case is very doableWhen everything is GaussianCalled a Kalman filterOne of the most used algorithms in the history of probabilities!
Time series data example: Temperatures from sensor network
SERVER
LAB
KITCHEN
COPYELEC
PHONEQUIET
STORAGE
CONFERENCE
OFFICEOFFICE50
51
52 53
54
46
48
49
47
43
45
44
42 41
3739
38 36
33
3
6
10
11
12
13 14
1516
17
19
2021
22
242526283032
31
2729
23
18
9
5
8
7
4
34
1
2
3540
Operations in Kalman filter
Compute
Start with At each time step t:
Condition on observation
Roll-up (marginalize previous time step)
X1
O1 =
X5X3 X4X2
O2 = O3 = O4 = O5 =
Detour: Understanding Multivariate Gaussians
Observe attributesExample: Observe X1=18
P(X2|X1=18)
Characterizing a multivariate Gaussian
Mean vector:
Covariance matrix:
Conditional Gaussians
Conditional probabilitiesP(Y|X)
Kalman filter with GaussiansX1
O1 =
X5X3 X4X2
O2 = O3 = O4 = O5 =
Equivalent to a linear system
Detour2: Canonical form
Standard form and canonical forms are related:
Conditioning is easy in canonical formMarginalization easy in standard form
Conditioning in canonical form
First multiply:
Then, condition on value B = y
Operations in Kalman filter
Compute
Start with At each time step t:
Condition on observation
Roll-up (marginalize previous time step)
X1
O1 =
X5X3 X4X2
O2 = O3 = O4 = O5 =
Roll-up in canonical form
First multiply:
Then, marginalize Xt:
Operations in Kalman filter
Compute
Start with At each time step t:
Condition on observation
Roll-up (marginalize previous time step)
X1
O1 =
X5X3 X4X2
O2 = O3 = O4 = O5 =
Learning a Kalman filter
Must learn:
Learn joint, and use division rule:
Maximum likelihood learning of a multivariate Gaussian
Data:
Means are just empirical means:
Empirical covariances:
What you need to know
Hidden Markov models (HMMs)Very useful, very powerful!Speech, OCR,…Parameter sharing, only learn 3 distributionsTrick reduces inference from O(n2) to O(n)Special case of BN
Kalman filterContinuous vars version of HMMsAssumes Gaussian distributionsEquivalent to linear systemSimple matrix operations for computations