7- speech recognition (cont’d)

28
1 7-Speech Recognition 7-Speech Recognition (Cont’d) (Cont’d) HMM Calculating Approaches HMM Calculating Approaches Neural Components Neural Components Three Basic HMM Problems Three Basic HMM Problems Viterbi Algorithm Viterbi Algorithm State Duration Modeling State Duration Modeling Training In HMM Training In HMM

Upload: adam-kirkland

Post on 15-Mar-2016

68 views

Category:

Documents


1 download

DESCRIPTION

7- Speech Recognition (Cont’d). HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training In HMM. Speech Recognition Concepts. Speech recognition is inverse of Speech Synthesis. Speech. Text. NLP. Speech Processing. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: 7- Speech Recognition (Cont’d)

11

7-Speech Recognition (Cont’d)7-Speech Recognition (Cont’d)

HMM Calculating ApproachesHMM Calculating ApproachesNeural ComponentsNeural ComponentsThree Basic HMM ProblemsThree Basic HMM ProblemsViterbi AlgorithmViterbi AlgorithmState Duration ModelingState Duration ModelingTraining In HMMTraining In HMM

Page 2: 7- Speech Recognition (Cont’d)

22

Speech Recognition ConceptsSpeech Recognition Concepts

NLP SpeechProcessing

Text Speech

NLPSpeech Processing

Speech Understanding

Speech Synthesis

TextPhone Sequence

Speech Recognition

Speech recognition is inverse of Speech Synthesis

Page 3: 7- Speech Recognition (Cont’d)

33

Speech Recognition Speech Recognition ApproachesApproaches

Bottom-Up ApproachBottom-Up Approach

Top-Down ApproachTop-Down Approach

Blackboard ApproachBlackboard Approach

Page 4: 7- Speech Recognition (Cont’d)

44

Bottom-Up ApproachBottom-Up Approach

Signal Processing

Feature Extraction

Segmentation

Signal Processing

Feature Extraction

Segmentation

Segmentation

Sound Classification Rules

Phonotactic Rules

Lexical Access

Language Model

Voiced/Unvoiced/Silence

Kno

wle

dge

Sou

rces

Recognized Utterance

Page 5: 7- Speech Recognition (Cont’d)

55

UnitMatching

System

Top-Down ApproachTop-Down Approach

FeatureAnalysis

LexicalHypothesis

SyntacticHypothesis

SemanticHypothesis

UtteranceVerifier/Matcher

Inventory of speech

recognition units

Word Dictionary Grammar

TaskModel

Recognized Utterance

Page 6: 7- Speech Recognition (Cont’d)

66

Blackboard ApproachBlackboard Approach

EnvironmentalProcesses

Acoustic Processes Lexical

Processes

SyntacticProcesses

SemanticProcesses

Blackboard

Page 7: 7- Speech Recognition (Cont’d)

77

An overall view of a speech recognition system

bottom up

top down

From Ladefoged 2001

Page 8: 7- Speech Recognition (Cont’d)

88

Recognition TheoriesRecognition Theories

Articulatory Based RecognitionArticulatory Based Recognition– Use from Articulatory system for recognitionUse from Articulatory system for recognition– This theory is the most successful until nowThis theory is the most successful until now

Auditory Based RecognitionAuditory Based Recognition– Use from Auditory system for recognitionUse from Auditory system for recognition

Hybrid Based RecognitionHybrid Based Recognition– Is a hybrid from the above theoriesIs a hybrid from the above theories

Motor TheoryMotor Theory– Model the intended gesture of speakerModel the intended gesture of speaker

Page 9: 7- Speech Recognition (Cont’d)

99

Recognition ProblemRecognition Problem

We have the sequence of acoustic We have the sequence of acoustic symbols and we want to find the words symbols and we want to find the words that expressed by speakerthat expressed by speaker

Solution : Finding the most probable Solution : Finding the most probable word sequence having Acoustic symbolsword sequence having Acoustic symbols

Page 10: 7- Speech Recognition (Cont’d)

1010

Recognition ProblemRecognition Problem

A : Acoustic SymbolsA : Acoustic SymbolsW : Word SequenceW : Word Sequence

we should find so that we should find so that w)|(max)|ˆ( AwPAwP

w

Page 11: 7- Speech Recognition (Cont’d)

1111

Bayse RuleBayse Rule

),()()|( yxPyPyxP

)()()|()|(

yPxPxyPyxP

)()()|()|(

APwPwAPAwP

Page 12: 7- Speech Recognition (Cont’d)

1212

Bayse Rule (Cont’d)Bayse Rule (Cont’d)

)()()|(max

APwPwAP

w

)|(max)|ˆ( AwPAwPw

)()|(max

)|(maxˆ

wPwAPArg

AwPArgw

w

w

Page 13: 7- Speech Recognition (Cont’d)

1313

Simple Language ModelSimple Language Model

nwwwww 321

)|()( 1211wwwwPwP iii

n

i

Computing this probability is very difficult and we need a very big database. So we use from Trigram and Bigram models.

Page 14: 7- Speech Recognition (Cont’d)

1414

Simple Language Model Simple Language Model (Cont’d)(Cont’d)

)|()( 211 iii

n

iwwwPwP

)|()( 11 ii

n

iwwPwP

Trigram :

Bigram :

)()(1 i

n

iwPwP

Monogram :

Page 15: 7- Speech Recognition (Cont’d)

1515

Simple Language Model Simple Language Model (Cont’d)(Cont’d)

)|( 123 wwwP

Computing Method :Number of happening W3 after W1W2

Total number of happening W1W2

AdHoc Method :)()|()|()|( 332321231123 wfwwfwwwfwwwP

Page 16: 7- Speech Recognition (Cont’d)

1616

7-Speech Recognition7-Speech Recognition

Speech Recognition Concepts Speech Recognition Concepts Speech Recognition ApproachesSpeech Recognition ApproachesRecognition TheoriesRecognition TheoriesBayse RuleBayse RuleSimple Language ModelSimple Language ModelP(A|W) Network TypesP(A|W) Network Types

Page 17: 7- Speech Recognition (Cont’d)

1717From Ladefoged 2001

Page 18: 7- Speech Recognition (Cont’d)

1818

P(A|W) Computing P(A|W) Computing ApproachesApproaches

Dynamic Time Warping (DTW)Dynamic Time Warping (DTW)

Hidden Markov Model (HMM)Hidden Markov Model (HMM)

Artificial Neural Network (ANN)Artificial Neural Network (ANN)

Hybrid SystemsHybrid Systems

Page 19: 7- Speech Recognition (Cont’d)

1919

Dynamic Time Warping Dynamic Time Warping Method (DTW)Method (DTW)

To obtain a global distance between two speech patterns a time alignment must be performed

Ex :A time alignment path between a template pattern “SPEECH” and a noisy input “SsPEEhH”

Page 20: 7- Speech Recognition (Cont’d)

2020

Recognition TasksRecognition Tasks

Isolated Word Recognition (IWR) And Isolated Word Recognition (IWR) And Continuous Speech Recognition (CSR)Continuous Speech Recognition (CSR)Speaker Dependent And Speaker Speaker Dependent And Speaker Independent Independent Vocabulary SizeVocabulary Size– Small <20Small <20– Medium >100 , <1000Medium >100 , <1000– Large >1000, <10000Large >1000, <10000– Very Large >10000Very Large >10000

Page 21: 7- Speech Recognition (Cont’d)

2121

Error Production FactorError Production Factor

Prosody (Recognition should be Prosody (Recognition should be Prosody Independent)Prosody Independent)Noise (Noise should be prevented)Noise (Noise should be prevented)

Spontaneous SpeechSpontaneous Speech

Page 22: 7- Speech Recognition (Cont’d)

2222

Artificial Neural NetworkArtificial Neural Network

...

1x

0x

1w0w

1Nw1Nx

y)(

1

1

i

N

iixwy

Simple Computation Element of a Neural Network

Page 23: 7- Speech Recognition (Cont’d)

2323

Artificial Neural Network Artificial Neural Network (Cont’d)(Cont’d)

Neural Network TypesNeural Network Types– PerceptronPerceptron– Time DelayTime Delay– Time Delay Neural Network Computational Time Delay Neural Network Computational

Element (TDNN)Element (TDNN)

Page 24: 7- Speech Recognition (Cont’d)

2424

Artificial Neural Network Artificial Neural Network (Cont’d)(Cont’d)

. . .

. . .0x

0y 1My

1Nx

Single Layer Perceptron

Page 25: 7- Speech Recognition (Cont’d)

2525

Artificial Neural Network Artificial Neural Network (Cont’d)(Cont’d)

. . .

. . .

Three Layer Perceptron

. . .

. . .

Page 26: 7- Speech Recognition (Cont’d)

2626

Hybrid MethodsHybrid Methods

Hybrid Neural Network and Matched Filter For Hybrid Neural Network and Matched Filter For RecognitionRecognition

PATTERN

CLASSIFIER

Speech Acoustic Features Delays

Output Units

Page 27: 7- Speech Recognition (Cont’d)

2727

Neural Network PropertiesNeural Network Properties

The system is simple, But too much The system is simple, But too much iterativeiterativeDoesn’t determine a specific structureDoesn’t determine a specific structureRegardless of simplicity, the results are Regardless of simplicity, the results are goodgoodTraining size is large, so training should be Training size is large, so training should be offlineofflineAccuracy is relatively goodAccuracy is relatively good

Page 28: 7- Speech Recognition (Cont’d)

2828

Hidden Markov ModelHidden Markov Model

Observation : O1,O2, . . . Observation : O1,O2, . . .

States in time : q1, q2, . . .States in time : q1, q2, . . .

All states : s1, s2, . . .All states : s1, s2, . . .

tOOOO ,,,, 321

tqqqq ,,,, 321

Si Sjjiaija