74.419 artificial intelligence 2004 speech & natural language processing speech recognition...

18
74.419 Artificial Intelligence 2004 Speech & Natural Language Processing • Speech Recognition acoustic signal as input conversion into written words • Natural Language Processing written text as input sentences (well-formed or not) • Spoken Language Understanding analysis of spoken language (transcribed speech)

Upload: henry-cole

Post on 14-Jan-2016

226 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 74.419 Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural

74.419 Artificial Intelligence 2004 Speech & Natural Language Processing

• Speech Recognition• acoustic signal as input

• conversion into written words

• Natural Language Processing• written text as input

• sentences (well-formed or not)

• Spoken Language Understanding• analysis of spoken language (transcribed speech)

Page 2: 74.419 Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural

Speech & Natural Language Processing

Areas in Speech Recognition• Signal Processing• Phonetics• Word RecognitionAreas in Natural Language Processing• Morphology• Grammar & Parsing (syntactic analysis)• Semantics• Pragamatics• Discourse / Dialogue• Spoken Language Understanding

Page 3: 74.419 Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural

Speech Production & Reception

Sound and Hearing• change in air pressure sound wave• reception through inner ear membrane /

microphone• break-up into frequency components: receptors

in cochlea / mathematical frequency analysis (e.g. Fast-Fourier Transform FFT) → Frequency Spectrum

• perception/recognition of phonemes and subsequently words (e.g. Neural Networks, Hidden-Markov Models)

Page 4: 74.419 Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural
Page 5: 74.419 Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural
Page 6: 74.419 Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural

Phoneme Recognition:HMM, Neural Networks

Phonemes

Acoustic / sound waveFiltering, Sampling Spectral Analysis; FFT

Frequency Spectrum

Features (Phonemes; Context)

Grammar or Statistics Phoneme Sequences / Words

Grammar or Statistics for likely word sequences

Word Sequence / Sentence

Speech Recognition

Signal Processing / Analysis

Page 7: 74.419 Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural

Speech Signal

Analog-Digital Conversion of acoustic signal → Sampling in Time Frames = “windows”

Characteristics of a Speech Signal formants - strong frequency components;

characterize e.g. vowels, gender of speaker; dark stripe in spectrum

pitch – fundamental frequency (baseline for higher frequency harmonics like formants)

place of articulation (recognition model based on model of vocal tract)

change in frequency distribution

Page 8: 74.419 Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural
Page 9: 74.419 Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural

Video of glottis and speech signal in lingWAVES (from http://www.lingcom.de)

Page 10: 74.419 Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural
Page 11: 74.419 Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural

Speech Signal

Analog-Digital Conversion of Acoustic Signals

→ Sampling

Analysis of Signal in Time Frames (“windows”)

Characteristics of a Speech Signal formants - strong frequency components; characterize

e.g. vowels, gender of speaker; dark stripe in spectrum pitch – fundamental frequency (baseline for higher

frequency harmonics like formants) place of articulation (recognition model based on model

of vocal tract) change in frequency distribution

Page 12: 74.419 Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural
Page 13: 74.419 Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural
Page 14: 74.419 Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural
Page 15: 74.419 Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural
Page 16: 74.419 Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural

Speech Recognition Characteristics

Speech Recognition vs. Speaker Identification

Speaker-dependent vs. speaker independent

Single word vs. continuous speech

Large vs. small vocabulary

Page 17: 74.419 Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural
Page 18: 74.419 Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural

Additional References

Hong, X. & A. Acero & H. Hon: Spoken Language Processing. A Guide to Theory, Algorithms, and System Development. Prentice-Hall, NJ, 2001.