Markov Models - Seminar
Practical issues in computing the Baum Welch re-estimation formulae
Choosing the number of hidden nodes Generative modelling and stochastic sampling Coursework
Computing the BW parameters (1)
Choose =(,A,B) at random (subject to probability constraints)
A Sunny Rain Wet
Sunny 0.6 0.3 0.1
Rain 0.1 0.6 0.3
Wet 0.2 0.2 0.6
B Red Green Blue
Sunny 0.8 0.1 0.1
Rain 0.1 0.8 0.1
Wet 0.2 0.2 0.6
N h
idd
en s
tate
s
M observable states
N h
idd
en s
tate
s
N hidden states
=1
=1
=1
=1
=1
=1
Sunny Rain Wet
0.6 0.3 0.1
N hidden states
=1
Computing the BW parameters (2)
We want to be able to calculate:
)|(
)()()(),(
11
OP
jObaiji
ttjijtt
t(i) comes from forwards evaluation
t+1(j) comes from backwards evaluation Given O Have initial values for aij and bj(Ot+1) Can calculate P(O|) but do we need to?
Computing the BW parameters (2)
Can calculate P(O|) but do we need to?
)|(
)()()(),( 11
OP
jObaiji ttjijt
t
P(O|) is a normalising constant and is the same value for all t(i,j) for any individual iteration
Can ignore P(O|) if we re-normalise =(,A,B) at the end of the re-estimation
Computing the BW parameters (3)
Recall the Scaling Factor SFt from the previous seminar …
Intended to prevent arithmetic underflow when calculating and trellises Calculate SFt using trellis and apply the same factors to the trellis. SFt
for t = SFt for t+1 (think why …)
Ni
itt iSF
1
)(
Computing the BW parameters (4)
Everything else should now be straightforward …
Except … how to choose the number of hidden nodes N
Choosing N (1)
What is the actual complexity of the underlying task?
Too many nodes – over learning and lack of generalisation capability (model learns precisely only those patterns that occur in O)
Too few nodes – over generalisation (model has now adequately captured the dynamics of the underlying task)
Same problem as deciding how many hidden nodes there should be for a neural network
Choosing N (2)L
og
Lik
elih
oo
d /
sym
bo
l
N
0
-
Little additional performance withincreasing N
Optimal p
oint
Generative modelling (1)
OK, so now we know what a (Hidden) Markov Model is, and how to learn its parameters, but how is this all relevant to Cognitive/Computer Vision?– (H)MMs are generative models– Perception guided by expectation– Visual control– An example visual task …
Generative modelling (2) Simple case study: Visual task
15 3
4
2
Training sequence:{3,3,2,2,2,2,5,5,4,4,3,3,1,1,1}
Generative modelling (3) Example sequence 1 generated by HMM5 observed states & 14 hidden states
Generative modelling (4)Example sequence 2 generated by HMM5 observed states & 14 hidden states
Stochastic sampling
To generate a sequence from =(,A,B): Select starting state according to distribution FOR t=1: T
– Generate ht(N) (a 1*N distribution) using A (part of the trellis computation
– Select a state q according to t(N) distribution
– Generate ot(N) (a 1*N distribution) using q and B
– Select an output symbol ot according to ot(N) distribution
END_FOR