74.419 artificial intelligence 2004 speech & natural language processing speech recognition...
TRANSCRIPT
74.419 Artificial Intelligence 2004 Speech & Natural Language Processing
• Speech Recognition• acoustic signal as input
• conversion into written words
• Natural Language Processing• written text as input
• sentences (well-formed or not)
• Spoken Language Understanding• analysis of spoken language (transcribed speech)
Speech & Natural Language Processing
Areas in Speech Recognition• Signal Processing• Phonetics• Word RecognitionAreas in Natural Language Processing• Morphology• Grammar & Parsing (syntactic analysis)• Semantics• Pragamatics• Discourse / Dialogue• Spoken Language Understanding
Speech Production & Reception
Sound and Hearing• change in air pressure sound wave• reception through inner ear membrane /
microphone• break-up into frequency components: receptors
in cochlea / mathematical frequency analysis (e.g. Fast-Fourier Transform FFT) → Frequency Spectrum
• perception/recognition of phonemes and subsequently words (e.g. Neural Networks, Hidden-Markov Models)
Phoneme Recognition:HMM, Neural Networks
Phonemes
Acoustic / sound waveFiltering, Sampling Spectral Analysis; FFT
Frequency Spectrum
Features (Phonemes; Context)
Grammar or Statistics Phoneme Sequences / Words
Grammar or Statistics for likely word sequences
Word Sequence / Sentence
Speech Recognition
Signal Processing / Analysis
Speech Signal
Analog-Digital Conversion of acoustic signal → Sampling in Time Frames = “windows”
Characteristics of a Speech Signal formants - strong frequency components;
characterize e.g. vowels, gender of speaker; dark stripe in spectrum
pitch – fundamental frequency (baseline for higher frequency harmonics like formants)
place of articulation (recognition model based on model of vocal tract)
change in frequency distribution
Video of glottis and speech signal in lingWAVES (from http://www.lingcom.de)
Speech Signal
Analog-Digital Conversion of Acoustic Signals
→ Sampling
Analysis of Signal in Time Frames (“windows”)
Characteristics of a Speech Signal formants - strong frequency components; characterize
e.g. vowels, gender of speaker; dark stripe in spectrum pitch – fundamental frequency (baseline for higher
frequency harmonics like formants) place of articulation (recognition model based on model
of vocal tract) change in frequency distribution
Speech Recognition Characteristics
Speech Recognition vs. Speaker Identification
Speaker-dependent vs. speaker independent
Single word vs. continuous speech
Large vs. small vocabulary
Additional References
Hong, X. & A. Acero & H. Hon: Spoken Language Processing. A Guide to Theory, Algorithms, and System Development. Prentice-Hall, NJ, 2001.