the university of manchesterstudentnet.cs.manchester.ac.uk/ugt/comp14112/... · • the naïve...
Post on 22-Jul-2020
0 Views
Preview:
TRANSCRIPT
Th
e U
niv
ers
ity
of
Ma
nch
est
er
COMP14112
Lecture 11Lecture 11
Markov Chains, HMMs and Speech
Revision
1
Th
e U
niv
ers
ity
of
Ma
nch
est
er What have we covered in the speech
lectures?
• Extracting features from raw speech data
• Classification and the naive Bayes classifier
• Training
• Sequence data• Sequence data
• Markov models
• Hidden Markov models
2
Th
e U
niv
ers
ity
of
Ma
nch
est
er
1. Features and data
• We have to represent sensory information in a useful way: sound waves and robust sensor data are two examples.
• Good “features” are domain specific, but we often end up with a vector of numbers called a feature vector or up with a vector of numbers called a feature vector or data point
• For speech we use MFCC features derived form segmented data
• Methods for processing the feature vectors are general
• Probabilistic approaches are popular- not the only approach, but certainly a leading one
3
Th
e U
niv
ers
ity
of
Ma
nch
est
er
2. Classification
• Given a data point x, what class does it belong
to?
• You constructed probabilistic classifiers in Labs
2 and 3to distinguish between “yes” and “no”2 and 3to distinguish between “yes” and “no”
• You should know what makes a good classifier
– how would you assess its performance?
• Lots of applications – one of the key AI tools
4
Th
e U
niv
ers
ity
of
Ma
nch
est
er
2.1 Probabilistic classification
• For a data point x …
– Estimate the probability density p(x|Ci)for each class i
– Apply Bayes theorem
( )( ) ( )CpCp x
– Apply classification rule: for two classes, p(C1|x) > 0.5 � Class of x = C1
• Multiple classes?
( )( ) ( )
( ) ( )∑=
i
ii CpCp
CpCpCp
x
x
x11
1
5
Th
e U
niv
ers
ity
of
Ma
nch
est
er
2.2 Naïve Bayes classifier
• The naïve Bayes assumption can be used if
data are vectors
– Feature vector components are conditionally
independent given the classindependent given the class
– See lecture notes and Lab 2 for application to time
averaged MFCC features derived from speech
– Examples sheet 6 for discrete valued data example
( ) ( ) ( ) ( ) ( )idiiii CxpCxpCxpCxpCp L221
=x
6
Th
e U
niv
ers
ity
of
Ma
nch
est
er
2.3 1-D Classification
• You’ve seen some example classification rules
• For 1-D data, a single feature x
7
Th
e U
niv
ers
ity
of
Ma
nch
est
er
2.4 n-D Classification
• For 2-D data with feature vector x = [x1, x2]
8
Th
e U
niv
ers
ity
of
Ma
nch
est
er
3. Training
• When we fit a probability density or probabilistic model to data, we have an example of training
• In the Labs, you’ve seen data being used to estimate parameters of a normal distribution and a HMM
• The data that’s used for this is training data• The data that’s used for this is training data
• Training is fundamental to machine learning, a large and important area of research in CS
• NB the performance of the Lab classifier would have improved with more training data
9
Th
e U
niv
ers
ity
of
Ma
nch
est
er
4. Sequence data
• In some cases the data arrives in a sequence
– We used speech data
• Other examples
– Video – Video
– Sequential games
• Anything real-time
– DNA sequence data
10
Th
e U
niv
ers
ity
of
Ma
nch
est
er
5. Markov chains
• You should know– Definition of a first order Markov process
– Parameters are transition probabilities
( ) ( )1121
,,,−−−
= ttttt sspssssp L
– Parameters are transition probabilities
– Normalisation condition
– Can be represented as a directed graph or a transition matrix
– Can be unfolded in time to show all paths of a fixed length (Examples sheet 7 and past paper)
– How to do a simple probabilistic calculation
11
Th
e U
niv
ers
ity
of
Ma
nch
est
er
5. Markov chains
ENDSTART
hh
b
ay
0.5
0.50.5
0.5
0.25
• What are the missing numbers?
• Unroll the model for exactly three time steps
• What is the probability that the sequence will be “hi”?
• What is the probability that a sequence of length 3 will be “hi”?
b
12
Th
e U
niv
ers
ity
of
Ma
nch
est
er
5. Markov chains
• Naïve application of probabilistic calculations is prohibitively slow in Markov chains
• In the lectures we saw a more efficient method based on recursion (Examples sheet 8)
Don’t need to remember the recursive algorithm • Don’t need to remember the recursive algorithm used there, but should be able to apply it to a similar example
• Computationally efficient algorithms are very important – imagine what happens when a problem is scaled up.
13
Th
e U
niv
ers
ity
of
Ma
nch
est
er
6. Hidden Markov models
• HMMs have two parts
– Markov chain model of states. The parameters of
the Markov chain model are the transition
probabilities: p(st|st-1)probabilities: p(st|st-1)
– Emission probability distribution for feature
vectors: p(xt|st)
– In Lab 3 this is a normal density parameterised by
mean and variance for each component of x
14
Th
e U
niv
ers
ity
of
Ma
nch
est
er
6. Hidden Markov models
• In Lab 3 you explored three things
– Training: constructing an HMM from labelled data (what is labelled data?)
– Classification: using the Forward algorithm to calculate p(x1,x2,…,xT|Ci) and plugging it into Bayescalculate p(x1,x2,…,xT|Ci) and plugging it into Bayestheorem
– Decoding: using the Vitterbi algorithm to find the most likely path through the hidden states
• You should be able to understand the tasks, but don’t have to recall details of the algorithms
15
Th
e U
niv
ers
ity
of
Ma
nch
est
er
6. Hidden Markov models
• Simple example of decoding (Lab 3) is
removing the silence from speech signals
• The data without silence is easier to classify
(as in Lab 2)(as in Lab 2)
START STOPsil sil
yes
no
1.0
0.960.96 0.02
0.02
0.01
0.01
0.99
0.99
0.04
16
Th
e U
niv
ers
ity
of
Ma
nch
est
er
7. Applications to speech
• Survey of tasks and performance (Examples
sheet 5)
• Segmentation and MFCC features
• Phonemes and phoneme HMMs• Phonemes and phoneme HMMs
• Triphones
• Decoding speech
• Simple language models
17
Th
e U
niv
ers
ity
of
Ma
nch
est
er
Other applications
• These methods can be generalised to many applications– TrueSkill Ranking system in Xbox live
• http://research.microsoft.com/mlp/trueskill/
– Vision applications• http://videolectures.net/mlss09uk_blake_cv
– Speech– Speech• http://videolectures.net/mlss09uk_bishop_ibi
– Medicine• Probabilistic “graphical models” to update probability of illness given
symptoms
– Biology• Standard way to determine gene function and location of genes in
DNA sequence
18
Th
e U
niv
ers
ity
of
Ma
nch
est
er
How to revise
• Work through Example class sheets and past
paper(s)
• Make sure you understand the relationship
between the labs and the notesbetween the labs and the notes
• Notes, lectures, example sheet solutions and
on the course website
http://intranet.cs.man.ac.uk/csonly/courses/COMP10412/
19
top related