Modeling time series with hidden Markov models
Advanced Machine learning2017
Nadia Figueroa, Jose Medina and Aude Billard
Time series data
Modeling time series with HMMs 2
Time
Dat
a
What‘s going on here?
Humidity
Barometric pressure
Temperature
Time series data
Modeling time series with HMMs 3
What‘s the problem setting?Time
Dat
a
We don’t care about time …
Time
Dat
a
We have several trajectories with identical
duration
Explicit time dependency
Hum
.
We have unstructured trajectory(ies)!
Consider dependency on the past
Too complex!
Unstructured time series data
Modeling time series with HMMs 4
How to simplify this problem?
Consider dependency on the past
Markov assumption
Rainy
Cloudy
Sunny
Outline
Modeling time series with HMMs 5
Second part (11:15 – 12:00):• Time series segmentation• Bayesian nonparametrics for HMMs
https://github.com/epfl-lasa/ML_toolbox
First part (10:15 – 11:00):• Recap on Markov chains• Hidden Markov Model (HMM)
- Recognition of time series- ML Parameter estimation Time
Dat
a
Outline first part
Modeling time series with HMMs 6
Rainy
Cloudy
Sunny
Markov chains
Modeling time series with HMMs 7
Sunny Cloudy
Rainy
Transition matrix
Sunny
Cloudy
Rainy
Sunny Cloudy Rainy
Initial probabilitiesSunny Cloudy Rainy
Modeling time series with HMMs 8
Likelihood of a Markov chain
Sunny Sunny Cloudy
Transition matrix
Sunny
Cloudy
Rainy
Sunny Cloudy Rainy
Initial probabilitiesSunny Cloudy Rainy
Modeling time series with HMMs 9
Learning Markov chains
Sunny Sunny Cloudy Periodic
Left-to-right
Ergodic
Topologies
Outline first part
Modeling time series with HMMs 10
Rainy
Cloudy
Sunny
Hidden Markov model
Modeling time series with HMMs 11
Sunny Cloudy
Rainy
Transition matrix
Sunny
Cloudy
Rainy
Sunny Cloudy Rainy
Initial probabilitiesSunny Cloudy Rainy
Likelihood of an HMM
Modeling time series with HMMs 12
Transition matrix
Sunny
Cloudy
Rainy
Sunny Cloudy Rainy
Initial probabilitiesSunny Cloudy Rainy
Forward variable
Likelihood of an HMM
Modeling time series with HMMs 13
Transition matrix
Sunny
Cloudy
Rainy
Sunny Cloudy Rainy
Initial probabilitiesSunny Cloudy Rainy
Forward variable
Likelihood of an HMM
Modeling time series with HMMs 14
Transition matrix
Sunny
Cloudy
Rainy
Sunny Cloudy Rainy
Initial probabilitiesSunny Cloudy Rainy
Backward variable
Learning an HMM
Modeling time series with HMMs 15
Baum-Welch algorithm(Expectation-Maximization for HMMs)- Iterative solution- Converges to local minimum
Starting from an initial find a such that
• E-step: Given an observation sequence and a model, find the probabilities of the states to have produced those observations.
• M-step: Given the output of the E-step, update the model parameters to better fit the observations.
Learning an HMM
Modeling time series with HMMs 16
Probability of being in state i at time k
Probability of being in state i at time k andtransition to state j
E-step:
Learning an HMM
Modeling time series with HMMs 17
Probability of being in state i at time k
Probability of being in state i at time k andtransition to state j
M-step:
Learning an HMM
Modeling time series with HMMs 18
• HMM is a parametric technique (Fixed number of states, fixed topology) • Heuristics to determining the optimal number of states
( )
: dataset; : number of datapoints; : number of free parameters- Aikaike Information Criterion: AIC= 2ln 2- Bayesian Information Criterion: 2 ln lnL: maximum likelihood of the model giv
X N KL K
BIC L K N− +
= − +
en K parameters
Lower BIC implies either fewer explanatory variables, better fit, or both.As the number of datapoints (observations) increase, BIC assigns more weights to simpler models than AIC.
Choosing AIC versus BIC depends on the application:Is the purpose of the analysis to make predictions, or to decide which model best represents reality?AIC may have better predictive ability than BIC, but BIC finds a computationally more efficient solution.
Applications of HMMs
Modeling time series with HMMs 19
State estimation:What is the most probable state/state sequence of the system?
Prediction:What are the most probable next observations/state of the system?
Model selection:What is the most likely model that represents these observations?
Examples
Modeling time series with HMMs 20
Speech recognition:
• Left-to-right model
• States are phonemes
• Observations in frequency domain
D.B. Paul., Speech Recognition Using Hidden Markov Models, The Lincoln laboratory journal, 1990
Examples
Modeling time series with HMMs 21
Motion prediction:
• Periodic model
• Observations are observed joints
• Simulate/predict walking patterns
Karg, Michelle, et al. "Human movement analysis: Extension of the f-statistic to time series using hmm." Systems, Man, and Cybernetics (SMC), 2013 IEEE International Conference on. IEEE, 2013.
Examples
Modeling time series with HMMs 22
Motion prediction:
• Left-to-right models
• Autonomous segmentation
• Recognition + prediction
Examples
Modeling time series with HMMs 23
Motion prediction:
• Left-to-right models
• Autonomous segmentation
• Recognition + prediction
Examples
Modeling time series with HMMs 24
Motion prediction:
• Left-to-right model
• Each state is a dynamical system
Examples
Modeling time series with HMMs 25
Motion recognition:
• Recognition of most likely motion and prediction of next step.
MATLAB demo
Toy training set• 1 player• 7 actions• 1 Hidden Markov model per action
Outline
Modeling time series with HMMs 26
Second part (11:15 – 12:00):• Time series segmentation• Bayesian non-parametrics for HMMs
https://github.com/epfl-lasa/ML_toolbox
First part (10:15 – 11:00):• Recap on Markov chains• Hidden Markov Model (HMM)
- Recognition of time series- ML Parameter estimation Time
Dat
a
Time series Segmentation
Modeling time series with HMMs 27
Times-series = Sequence of discrete segments
Why is this an important problem?
Segmentation of Speech Signals
Modeling time series with HMMs 28
Segmenting a continuous speech signal into sets of distinct words.
Modeling time series with HMMs 29
I amseafood
on a diet.
I see foodand I
eatit!
Segmentation of Speech SignalsSegmenting a continuous speech signal
into sets of distinct words.
Segmentation of Human Motion Data
Modeling time series with HMMs 30
Emily Fox et al., Sharing Features among Dynamical Systems with Beta Processes, NIPS, 2009
Segmention of Continuous Motion Capture data from exercise routines into motion categories
Jumping Jacks
Arm Circles Squats
Knee Raises
Segmentation in Human Motion Data
Modeling time series with HMMs 31
Emily Fox et al.., Sharing Features among Dynamical Systems with Beta Processes, NIPS, 2009
12 Variables- Torso position- Waist Angles (2)- Neck Angle- Shoulder Angles- ..
Emily Fox et al., Sharing Features among Dynamical Systems with Beta Processes, NIPS, 2009
Segmentation in Robotics
Modeling time series with HMMs 32
Learning Complex Sequential Tasks from Demonstration
7 Variables- Position
- Orientation
Reach Grate Trash
HMM for Time series Segmentation
Modeling time series with HMMs 33
Assumptions:
• The time-series has been generated by a system that transitions between a set of hidden states:
• At each time step, a sample is drawn from an emission model associated to the current hidden state:
HMM for Time series Segmentation
Modeling time series with HMMs 34
How do we find these segments?
HMM for Time series Segmentation
Modeling time series with HMMs 35
Initial State Probabilities
Transition Matrix
Emission ModelParameters
Baum-Welch algorithm(Expectation-Maximization for HMMs)- Iterative solution- Converges to local minimum
Steps for Segmentation with HMM:1. Learn the HMM parameters through Maximum Likelihood
Estimate (MLE):
HMM Likelihood
Hyper-parameter:Number of states possible K
HMM for Time series Segmentation
Modeling time series with HMMs 36
Steps for Segmentation with HMM:2. Find the most probable sequence of states generating the observations through the Viterbi algorithm:
HMM Joint Probability Distribution
HMM for Time series Segmentation
Modeling time series with HMMs 37
HMM for Time series Segmentation
Modeling time series with HMMs 38
Model Selection for HMMs
Modeling time series with HMMs
Model Selection for HMMs
Modeling time series with HMMs
?
Limitations of classical finite HMMs for Segmentation
Modeling time series with HMMs 41
Cardinality: Choice of hidden states is based on Model Selection heuristics, there is little understanding of the strengths and weaknesses of such methods in this setting [1].
[1] Emily Fox et al., An HDP-HMM for Systems with State Persistence, ICML, 2008[2] Emily Fox et al., Sharing Features among Dynamical Systems with Beta Processes, NIPS, 2009
Topology: We assume that all time series share the same set of emission models and switch among them in exactly the same manner [2].
Undefined # hidden states
Fixed Transition Matrix
Solution: Bayesian Non-Parametrics
Bayesian Non-Parametrics
Modeling time series with HMMs
• Bayesian: Use Bayesian inference to estimate the parameters; i.e. priors on model parameters!
• Non-parametric: Does NOT mean methods with “no parameters”, rather models whose complexity (# of states, # Gaussians) is inferred from the data.
1. Number of parameters grows with sample size.2. Infinite-dimensional parameter space!
Prior Prior
BNP for HMMs: HDP-HMM
Modeling time series with HMMs
Cardinality
• Hierarchical Dirichlet Process (HDP) prior on the transition Matrix!
• Normal Inverse Wishart (NIW) prior on emission parameters!
Hyper-parameters!
Emily Fox et al., An HDP-HMM for Systems with State Persistence, ICML, 2008
BNP for HMMs: HDP-HMM
Modeling time series with HMMs
Cardinality
• The Dirichlet Process (DP) is a prior distribution over distributions.
• Used for clustering with infinite mixture models; i.e. instead of setting K in a GMM, the K is learned from data.
This only gives us an estimate of the K clusters!Cannot be use directly on transition matrix:
• The Hierarchical Dirichlet Process (HDP) is a hierarchy of DPs!
Segmentation with HDP-HMM
Modeling time series with HMMs
Emily Fox et al., An HDP-HMM for Systems with State Persistence, ICML, 2008
Compare to Model Selection with Classical HMMs
Modeling time series with HMMs
BNP for HMMs: BP-HMM
Modeling time series with HMMs
Cardinality Topology
• The Beta Process (BP) prior on the transition Matrix!
• Normal Inverse Wishart (NIW) prior on emission parameters!Emily Fox et al., Sharing Features among Dynamical Systems with Beta Processes, NIPS, 2009
BNP for HMMs: BP-HMM
Modeling time series with HMMs
Cardinality Topology
• The Beta Process (BP) prior on the transition Matrix!
Features (i.e. shared HMM States)
Time-Series
Segmentation in Human Motion Data
Modeling time series with HMMs 49
Emily Fox et al.., Sharing Features among Dynamical Systems with Beta Processes, NIPS, 2009
12 Variables- Torso position- Waist Angles (2)- Neck Angle- Shoulder Angles- ..
Emily Fox et al., Sharing Features among Dynamical Systems with Beta Processes, NIPS, 2009
Applications in Robotics
Modeling time series with HMMs 50
Learning Complex Sequential Tasks from Demonstration