the university of manchesterstudentnet.cs.manchester.ac.uk/ugt/comp14112/... · • the naïve...

COMP14112

Lecture 11Lecture 11

Markov Chains, HMMs and Speech

Revision

er What have we covered in the speech

lectures?

• Extracting features from raw speech data

• Classification and the naive Bayes classifier

• Training

• Sequence data• Sequence data

• Markov models

• Hidden Markov models

1. Features and data

• We have to represent sensory information in a useful way: sound waves and robust sensor data are two examples.

• Good “features” are domain specific, but we often end up with a vector of numbers called a feature vector or up with a vector of numbers called a feature vector or data point

• For speech we use MFCC features derived form segmented data

• Methods for processing the feature vectors are general

• Probabilistic approaches are popular- not the only approach, but certainly a leading one

2. Classification

• Given a data point x, what class does it belong

• You constructed probabilistic classifiers in Labs

2 and 3to distinguish between “yes” and “no”2 and 3to distinguish between “yes” and “no”

• You should know what makes a good classifier

– how would you assess its performance?

• Lots of applications – one of the key AI tools

2.1 Probabilistic classification

• For a data point x …

– Estimate the probability density p(x|Ci)for each class i

– Apply Bayes theorem

( )( ) ( )CpCp x

– Apply classification rule: for two classes, p(C1|x) > 0.5 � Class of x = C1

• Multiple classes?

( )( ) ( )

( ) ( )∑=

ii CpCp

CpCpCp

2.2 Naïve Bayes classifier

• The naïve Bayes assumption can be used if

data are vectors

– Feature vector components are conditionally

independent given the classindependent given the class

– See lecture notes and Lab 2 for application to time

averaged MFCC features derived from speech

– Examples sheet 6 for discrete valued data example

( ) ( ) ( ) ( ) ( )idiiii CxpCxpCxpCxpCp L221

2.3 1-D Classification

• You’ve seen some example classification rules

• For 1-D data, a single feature x

2.4 n-D Classification

• For 2-D data with feature vector x = [x1, x2]

3. Training

• When we fit a probability density or probabilistic model to data, we have an example of training

• In the Labs, you’ve seen data being used to estimate parameters of a normal distribution and a HMM

• The data that’s used for this is training data• The data that’s used for this is training data

• Training is fundamental to machine learning, a large and important area of research in CS

• NB the performance of the Lab classifier would have improved with more training data

4. Sequence data

• In some cases the data arrives in a sequence

– We used speech data

• Other examples

– Video – Video

– Sequential games

• Anything real-time

– DNA sequence data

5. Markov chains

• You should know– Definition of a first order Markov process

– Parameters are transition probabilities

( ) ( )1121

,,,−−−

= ttttt sspssssp L

– Parameters are transition probabilities

– Normalisation condition

– Can be represented as a directed graph or a transition matrix

– Can be unfolded in time to show all paths of a fixed length (Examples sheet 7 and past paper)

– How to do a simple probabilistic calculation

5. Markov chains

ENDSTART

0.50.5

• What are the missing numbers?

• Unroll the model for exactly three time steps

• What is the probability that the sequence will be “hi”?

• What is the probability that a sequence of length 3 will be “hi”?

5. Markov chains

• Naïve application of probabilistic calculations is prohibitively slow in Markov chains

• In the lectures we saw a more efficient method based on recursion (Examples sheet 8)

Don’t need to remember the recursive algorithm • Don’t need to remember the recursive algorithm used there, but should be able to apply it to a similar example

• Computationally efficient algorithms are very important – imagine what happens when a problem is scaled up.

6. Hidden Markov models

• HMMs have two parts

– Markov chain model of states. The parameters of

the Markov chain model are the transition

probabilities: p(st|st-1)probabilities: p(st|st-1)

– Emission probability distribution for feature

vectors: p(xt|st)

– In Lab 3 this is a normal density parameterised by

mean and variance for each component of x

• In Lab 3 you explored three things

– Training: constructing an HMM from labelled data (what is labelled data?)

– Classification: using the Forward algorithm to calculate p(x1,x2,…,xT|Ci) and plugging it into Bayescalculate p(x1,x2,…,xT|Ci) and plugging it into Bayestheorem

– Decoding: using the Vitterbi algorithm to find the most likely path through the hidden states

• You should be able to understand the tasks, but don’t have to recall details of the algorithms

• Simple example of decoding (Lab 3) is

removing the silence from speech signals

• The data without silence is easier to classify

(as in Lab 2)(as in Lab 2)

START STOPsil sil

0.960.96 0.02

7. Applications to speech

• Survey of tasks and performance (Examples

sheet 5)

• Segmentation and MFCC features

• Phonemes and phoneme HMMs• Phonemes and phoneme HMMs

• Triphones

• Decoding speech

• Simple language models

Other applications

• These methods can be generalised to many applications– TrueSkill Ranking system in Xbox live

• http://research.microsoft.com/mlp/trueskill/

– Vision applications• http://videolectures.net/mlss09uk_blake_cv

– Speech– Speech• http://videolectures.net/mlss09uk_bishop_ibi

– Medicine• Probabilistic “graphical models” to update probability of illness given

symptoms

– Biology• Standard way to determine gene function and location of genes in

DNA sequence

How to revise

• Work through Example class sheets and past

paper(s)

• Make sure you understand the relationship

between the labs and the notesbetween the labs and the notes

• Notes, lectures, example sheet solutions and

on the course website

http://intranet.cs.man.ac.uk/csonly/courses/COMP10412/

the university of manchesterstudentnet.cs.manchester.ac.uk/ugt/comp14112/... · • the naïve...

Documents

50516765 informe jibaro la fsp ugt de madrid.cleaned

· harbin guanghan gas turbine co., ltd, uec 03 gt-25000...

pi power - university of...

comp14112 : artificial intelligence fundamentals l ecture 0...

02 ugt castilla yleón spain

ugt tips trickstraps-2008-08-07

lecture 4- introduction to ai -...

bucarest/romania 2015-03-17 · bucarest/romania 2015-03-17...

sorting algorithms - university of...

seccion sindical estatal ugt en vinsa. sección sindical...

mrk-ms&m2s qig copy · 2017. 5. 10. · model: ugt-mst110,...

human udp-glucuronosyltransferase (ugt) 2b10: validation

pi power - university of...

ugt murcia

suitability of a kr for obdm - university of...

using the linux desktop - university of...

ugt juridico

folleto f - ugt

comp14112 artificial intelligence fundamentals · 1...

manual - ugt umwelt-geräte-technik ·...