hidden markov models (hmms) chapter 3 (duda et al.) – section 3.10 (warning: this section has lots...

55
Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013 – Dr. George Bebis

Upload: ashley-eric-bruce

Post on 21-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

Hidden Markov Models (HMMs)

Chapter 3 (Duda et al.) – Section 3.10(Warning: this section has lots of typos)

CS479/679 Pattern RecognitionSpring 2013 – Dr. George Bebis

Page 2: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

Sequential vs Temporal Patterns

• Sequential patterns:– The order of data points is irrelevant.

• Temporal patterns:– The order of data points is important (i.e., time series).– Data can be represented by a number of states.– States at time t are influenced directly by states in

previous time steps (i.e., correlated) .

Page 3: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

Hidden Markov Models (HMMs)

• HMMs are appropriate for problems that have an inherent temporality.

– Speech recognition– Gesture recognition – Human activity recognition

Page 4: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

First-Order Markov Models

• Represented by a graph where every node corresponds to a state ωi.

• The graph can be fully-connected with self-loops.

Page 5: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

First-Order Markov Models (cont’d)

• Links between nodes ωi and ωj are associated with a transition probability:

P(ω(t+1)=ωj / ω(t)=ωi )=αij

which is the probability of going to state ωj at time t+1 given that the state at time t was ωi (first-order model).

Page 6: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

First-Order Markov Models (cont’d)

• Markov models are fully described by their transition probabilities αij

• The following constraints should be satisfied:

1,ijj

a i

Page 7: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

Example: Weather Prediction Model

• Assume three weather states:– ω1: Precipitation (rain, snow, hail, etc.)

– ω2: Cloudy

– ω3: Sunny

Transition Matrix

ωω 11 ωω 22 ωω 33

ωω11

ωω22

ωω33

ωω11

ωω22

ωω33

Page 8: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

Computing the probability P(ωT) of a sequence of states ωT

• Given a sequence of states ωT=(ω(1), ω(2),..., ω(T)), the probability that the model generated ωT is equal to the product of the corresponding transition probabilities:

where P(ω(1)/ ω(0))=P(ω(1)) is the prior probability of the first state.

1

( ) ( ( ) / ( 1))T

T

t

P P t t

Page 9: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

Example: Weather Prediction Model (cont’d)

• What is the probability that the weather for eight consecutive days is:

“sunny-sunny-sunny-rainy-rainy-sunny-cloudy-sunny” ?

ω8=ω3ω3ω3ω1ω1ω3ω2ω3

P(ω8)=P(ω3)P(ω3/ω3)P(ω3/ω3) P(ω1/ω3) P(ω1/ω1) P(ω3/ω1) P(ω2/ω3)P(ω3/ω2)=1.536 x 10-4

Page 10: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

Limitations of Markov models

• In Markov models, each state is uniquely associated with an observable event.

• Once an observation is made, the state of the system is trivially retrieved.

• Such systems are not of practical use for most applications.

Page 11: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

Hidden States and Observations

• Assume that each state can generate a number of outputs (i.e., observations) according to some probability distribution.

• Each observation can potentially be generated at any state.

• State sequence is not directly observable (i.e., hidden) but can be approximated from observation sequence.

Page 12: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

First-order HMMs

• Augment Markov model such that when it is in state ω(t) it also emits some symbol v(t) (visible state) among a set of possible symbols.

• We have access to the visible states v(t) only, while ω(t) are unobservable.

Page 13: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

Example: Weather Prediction Model (cont’d)

vv11: temperature: temperature

vv22: humidity: humidity

etc.etc.Observations:Observations:

Page 14: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

Observation Probabilities

• When the model is in state ωj at time t, the probability of emitting a visible state vk at that time is denoted as:

P(v(t)=vk / ω(t)= ωj)=bjk where

(observation probabilities)

• For every sequence of hidden states, there is an associated sequence of visible states:

ωT=(ω(1), ω(2),..., ω(T)) VT=(v(1), v(2),..., v(T))

1,jkk

b j

Page 15: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

Absorbing State ω0

• Given a state sequence and its corresponding observation sequence:

ωT=(ω(1), ω(2),..., ω(T)) VT=(v(1), v(2),..., v(T))

we assume that ω(T)=ω0 is some absorbing state, which uniquely emits symbol v(T)=v0

• Once entering the absorbing state, the system can not escape from it.

Page 16: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

HMM Formalism

• An HMM is defined by {Ω, V, – Ω : {ω1… ωn } are the possible states

– V : {v1…vm } are the possible observations

– iare the prior state probabilities

– A = {aij} are the state transition probabilities

– B = {bik} are the observation state probabilities

Page 17: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

Some Terminology

• Causal: the probabilities depend only upon previous states.

• Ergodic: Given some starting state, every one of the states has a non-zero probability of occurring.

““left-right”left-right” HMM HMM

Page 18: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

Coin toss example

• You are in a room with a barrier (e.g., a curtain) through which you cannot see what is happening on the other side.

• On the other side of the barrier is another person who is performing a coin (or multiple coin) toss experiment.

• The other person will tell you only the result of the experiment, not how he obtained that result.

e.g., VT=HHTHTTHH...T=v(1),v(2), ..., v(T)

Page 19: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

Coin toss example (cont’d)

• Problem: derive an HMM model to explain the observed sequence of heads and tails.– The coins represent the hidden states since we do not

know which coin was tossed each time.– The outcome of each toss represents an observation.– A “likely” sequence of coins (state sequence) may be

inferred from the observations.– The state sequence might not be unique in general.

Page 20: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

Coin toss example: 1-fair coin model

• There are 2 states, each associated with either heads (state1) or tails (state2).

• Observation sequence uniquely defines the states (i.e., states are not hidden).

observation probabilitiesobservation probabilities

Page 21: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

Coin toss example: 2-fair coins model

• There are 2 states, each associated with a coin; a third coin is used to decide which of the fair coins to flip.

• Neither state is uniquely associated with either heads or tails.

observation probabilitiesobservation probabilities

Page 22: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

Coin toss example: 2-biased coins model

• There are 2 states, each associated with a biased coin; a third coin is used to decide which of the biased coins to flip.

• Neither state is uniquely associated with either heads or tails.

observation probabilitiesobservation probabilities

Page 23: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

Coin toss example:3-biased coins model• There are 3 states, each state associated with a

biased coin; we decide which coin to flip using some way (e.g., other coins).

• Neither state is uniquely associated with either heads or tails.

observation probabilitiesobservation probabilities

Page 24: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

Which model is best?

• Since the states are not observable, the best we can do is to select the model θ that best explains the observations:

maxθ P(VT / θ)

• Long observation sequences are typically better in selecting the best model.

Page 25: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

Classification Using HMMs• Given an observation sequence VT and set of possible

models θ, choose the model with the highest probability P(θ / VT) .

( / ) ( )( / )

( )

TT

T

P V PP V

P V

Bayes rule:Bayes rule:

Page 26: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

Three basic HMM problems

• Evaluation– Determine the probability P(VT) that a particular

sequence of visible states VT was generated by a given model (i.e., Forward/Backward algorithm).

• Decoding– Given a sequence of visible states VT, determine the

most likely sequence of hidden states ωT that led to those observations (i.e., using Viterbi algorithm).

• Learning– Given a set of visible observations, determine aij and bjk

(i.e., using EM algorithm - Baum-Welch algorithm).

Page 27: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

Evaluation

• The probability that a model produces VT can be computed using the theorem of total probability:

where ωrT=(ω(1), ω(2),..., ω(T)) is a possible state

sequence and rmax is the max number of state

sequences.• For a model with c states ω1, ω2,..., ωc , rmax=cT

Page 28: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

Evaluation (cont’d)

• We can rewrite each term as follows:

• Combining the two equations we have:

1

( ) ( ( ) / ( 1))T

Tr r r

t

P P t t

1

( / ) ( ( ) / ( ))T

T Tr r

t

P V P v t t

max

1 1

( ) ( ( ) / ( )) ( ( ) / ( 1))r T

Tr r r

r t

P V P v t t P t t

Page 29: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

Evaluation (cont’d)

• Given aij and bjk, it is straightforward to compute P(VT).

• What is the computational complexity?

O(T rmax)=O(T cT)

Page 30: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

Recursive computation of P(VT) (HMM Forward)

v(T)v(1) v(t) v(t+1)

ω(1) ω(t) ω(t+1) ω(T)

ωωii ωωjj......

Page 31: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

Recursive computation of P(VT) (HMM Forward) (cont’d)

( (1), (2),..., ( ), ( 1), ( ) , ( 1) )c

i ji

P v v v t v t t t

or

using marginalization:

Page 32: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

Recursive computation of P(VT) (HMM Forward) (cont’d)

ωω00

0( ) ( )TP V a T

Page 33: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

Recursive computation of P(VT) (HMM Forward) (cont’d)

(i.e., corresponds to state (i.e., corresponds to state ωω(T)=(T)=ωω00))0( ) ( )TP V a T

for j=1 to c dofor j=1 to c do

• What is the computational complexity in this case?

O(T c2)

(if t=T, j=0)

Page 34: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

Example

ωω0 0 ωω11 ωω2 2 ωω33

ωω0 0

ωω11

ωω2 2

ωω33

ωω0 0

ωω11

ωω2 2

ωω33

Page 35: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

Example (cont’d)

• Similarly for t=2,3,4• Finally: 0( ) ( ) 0.0011TP V a T

VT =v1 v3 v2 v0

0.20.2

0.20.2

0.80.8

initial stateinitial state

Page 36: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

Recursive computation of P(VT) (HMM backward)

v(1)

ω(1) ω(t) ω(t+1) ω(T)

v(t) v(t+1) v(T)

......ωωii ωωjj

ββjj(t+1)(t+1) //ω ω (t+1)=(t+1)=ωωjj))

ββii(t)(t) ii

ββii(t)(t) ωωii

Page 37: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

Recursive computation of P(VT) (HMM backward) (cont’d)

==ωωjj))))

( )1

( ) ( 1)c

i j ij iv tj

t t a b

1

( ( 1), ( 2),..., ( ) / ( ) ) ( ( ) / ( ) ) ( ( 1) / ( ) )c

j i j ij

P v t v t v T t P v t t P t t

oror ii

v(1)

ω(1) ω(t) ω(t+1) ω(T)

v(t) v(t+1) v(T)

ωωii ωωjj

Page 38: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

Recursive computation of P(VT) (HMM backward) (cont’d)

( )1

( ) ( 1)c

i j ij iv tj

t t a b

Page 39: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

Decoding

• Find the most probable sequence of hidden states.

• Use an optimality criterion - different optimality criteria lead to different solutions.

• Algorithm 1: choose the states ω(t) which are individually most likely.

Page 40: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

Decoding – Algorithm 1

Page 41: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

Decoding (cont’d)

• Algorithm 2: at each time step t, find the state that has the highest probability αi(t) (i.e., use forward algorithm with minor changes).

Page 42: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

Decoding – Algorithm 2

Page 43: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

Decoding – Algorithm 2 (cont’d)

Page 44: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

Decoding – Algorithm 2 (cont’d)

• There is no guarantee that the path is a valid one.• The path might imply a transition that is not

allowed by the model.

not allowed not allowed since since ωω3232=0=0

0 1 2 3 40 1 2 3 4

Example:

Page 45: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

Decoding (cont’d)

• Algorithm 3: find the single best sequence ωT by maximizing P(ωT/VT)

• This is the most widely used algorithm known as Viterbi algorithm.

Page 46: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

Decoding – Algorithm 3

maximize: P(ωT/VT)

Page 47: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

Decoding – Algorithm 3 (cont’d)

recursion

(similar to Forward Algorithm, except that it uses maximization

over previous states instead of summation\)

Page 48: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

Learning• Determine the transition and emission probabilities

aij and bjv from a set of training examples (i.e., observation sequences V1

T, V2T,..., Vn

T).

• There is no known way to find the ML solution analytically.– It would be easy if we knew the hidden states– Hidden variable problem use EM algorithm!

1 2: max ( , ,..., / )T T TnML approach p V V V

Page 49: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

Learning (cont’d)

• EM algorithm – Update aij and bjk iteratively to better explain the

observed training sequences.

V: V1T, V2

T,..., VnT

• Expectation step: p(ωT/V, θ)

• Maximization step:

θt+1=argmax θ E[log p(ωT,VT/ θ)/ VT, θt]

Page 50: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

Learning (cont’d)

• Updating transition/emission probabilities:

[# ]ˆ

[# ]

[# ]ˆ[# ]

i jij

i

jjv

j

E times it goes from toa

E times it goes from toany other state

E times it emits symbol v while at stateb

E times it emits any other symbol while at state

Page 51: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

Learning (cont’d)

• Define the probability of transitioning from ωi to ωj at step t given VT:

( )( 1) ( )( ) ( ( ) , ( 1) / )

( )i ij jv t jT

ij j i T

t b tt P t t V

P V

Expectation stepExpectation step

Page 52: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

Learning (cont’d)

t-1t-1ααii(t-1)(t-1) ββjj(t)(t)tt

aijbjv(t)

Page 53: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

Learning (cont’d)

1

1

( )[# ]

ˆ[# ]

( )

T

iji j t

ij Ti

ikt k

tE times it goes from to

aE times it goes from toany other state

t

Maximization stepMaximization step

Page 54: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

Learning (cont’d)

1, ( )

1

[# ]ˆ[# ]

( )

( )

jjv

j

T

jkt v t v k

T

jkt k

E times it emits symbol v while at stateb

E times it emits any other symbol while at state

t

t

Maximization stepMaximization step

Page 55: Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013

Practical Problems

• How do we decide on the number of states and the structure of the model?– Use domain knowledge otherwise very hard problem!

• What about the size of observation sequence ?– Should be sufficiently long to guarantee that all state

transitions will appear a sufficient number of times.– A large number of training data is necessary to learn

the HMM parameters.