speech processing basics
Post on 18-Jul-2015
72 Views
Preview:
TRANSCRIPT
Speech Processing
• Fundamentals of Digital Speech processing
1.Anatomy and physiology of speech organs
2.The process of speech production
3.The Acoustic Theory of speech production
4.Digital models for speech signals
Applications of Speech Processing
• 1.Speech recognition: speech to text• 2.Speech understanding: Not exact words(meaning is
important rather than text) :speech translation• 3.speech synthesis: Text to speech, computer can
speak to you• 4.Word processing: check and correct spelling,
grammar and style• 5.text prediction: speed up word processing• 6.automatic summarization: Topic identification,
summary generation• 7.text mining : Necessary data
• Anatomy: It is the study of structure of bodies of people or animals• Physiology: It is the study of how people’s and animals bodies functions
and understanding the higher order mechanisms within the human central nervous system that account for speech production in human beings
• Acoustic: It is a scientific study of sounds• Phonetics: It is relating to the sound of a word or to the sounds that are
used in languages • Phonemes: It is the smallest unit of sounds which is significant in a
language • Articulatory:It is the action of productory a sound or word cleary,in speech
or music• Linguistics: It is study of the way in which language works• Semantics: It is the branch of Linguistics that deals with the meanings of
words and sentences.
Speech Processing
SignalProcessing Information
TheoryPhonetics
Acoustics
Algorithms(Programming)
Fourier transformsDiscrete time filtersAR(MA) models
EntropyCommunication theoryRate-distortion theory
Statistical SPStochastic models
PsychoacousticsRoom acousticsSpeech production
ASR: Application
© James Glass, MIT
7
Recognition
Voice Input Analog to Digital Acoustic Model
Language Model
Display Speech EngineFeedback
Automatic Speech Recognition
Speech Generation
• first talker formulates a message(in this mind)that he wants to transmit to listener via speech
• The process of message formulation is creation of printed text expressing the words of message
• The next step is conversion of the message into a language code.
• This roughly corresponds to converting the printed text of message into set of phoneme sequence corresponding to sounds that make up words and pitch accent associated with the sounds
• Once the language code is chosen, the talker must execute a series of neuromuscular commands to cause the vocal cords to vibrate when appropriate and shape the vocal tract such that the proper sequence of speech sounds is created and spoken by the talker, then producing an acoustic signal as final output
Speech Recognition
• First the listener processes the acoustic signal the basilar membrane in the inner ear, which providing a running spectrum analysis of the incoming signal.
• The neural activity along the auditory nerve is converted into a language code at higher centers of processing within the brain and message comprehension is achieved
• The lungs and the associated muscles act as the source of air for exciting the vocal mechanism.
• The muscle force pushes air out of lungs(shown as a piston pushing up within a cylinder)and though the bronchi and trachea.
• When the vocal cords are tensed, the air flow causes them to vibrate ,producing so called voiced speech sounds
• When the vocal cords are relaxed, in order to produce a sound, the air flow either must pass through a constriction in vocal tract and thereby become turbulent, producing so called unvoiced speech sounds
Classifications
• 1.silence(s)-no speech is produced()
• 2.Unvoiced(U):vocal cords are not vibrating so speech signal is aperiodic or random in nature
• 3.Voiced(V): vocal cords are vibrate periodically when air flows from the lungs, so speech signal is periodic
Speech Waveform Characteristics
• Loudness
• Voiced/Unvoiced.
• Pitch.
– Fundamental frequency.
• Spectral envelope.
– Formants.
Speech Waveform Characteristics Cont.
Voiced Speech Unvoiced Speech
/ih/ /s/
Phoneme HierarchySpeech sounds
Vowels ConsonantsDiphtongs
Plosive
NasalFricative
Retroflexliquid
Lateralliquid
Glide
iy, ih, ae, aa, ah, ao,ax, eh,er, ow, uh, uw
ay, ey,oy, aw
w, y
p, b, t,d, k, g
m, n, ng f, v, th, dh,s, z, sh, zh, h
r
l
Language dependent.About 50 in English.
Signal processing
Digital speech processing
• Speech signals are composed of a sequence of sounds.
• The study of these rules and their implication s in human communication is the domain of linguistics.
• The study and classification of sound of speech is called phonetics.
top related