natural language processing
DESCRIPTION
Natural Language Processing. A host of technologies touching many interdisciplinary areas. Automatic Speech Recognition Speech Coding Speaker identification Speech Transformation Speech Synthesis Speech Mining Dialog Systems Talking Heads Language Translation Hearing aids - PowerPoint PPT PresentationTRANSCRIPT
Natural Language Processing
• Automatic Speech Recognition• Speech Coding• Speaker identification• Speech Transformation• Speech Synthesis• Speech Mining• Dialog Systems• Talking Heads• Language Translation• Hearing aids• Speech Enhancements• Mobile devices• Gaming
• Signal Processing• Acoustics• Physics• Engineering• Linguistics• Psychology• Mathematics• Computer Science • Communication• Cognition
A host of technologies touching many interdisciplinary areas
Human vs. Machine• Turing Test: Cannot distinguish if we are communicating with
a human or machine
• Telephone Automated Systems– How many of us are fooled?
• Why?– Speech has many ambiguities– Humans change words on-the-fly while speaking– Colloquial speech does not follow a strict grammar– Co-articulations and sloppy pronunciation– Humans are good a filtering noise– Humans understand world view and context– Humans recognize individual characteristics– Prosody contains information like emotion and emphasis
One sentence, eight possible meanings
– I cooked waterfowl for her.– I stole her waterfowl and cooked it.– I used my abilities to create a living waterfowl for her.– I caused her to bid low in the game of bridge.– I created the plastic duck that she owns.– I caused her to quickly lower her head or body.– I waved my magic wand and turned her into
waterfowl.– I caused her to avoid the test.
I made her duck
Ambiguities in speech
• Ambiguities in pronunciation– “haya dun”– “ay d ih s h er d s ah m th in ng ah b aw m uh v ih ng r ih s en l ih”
• Ambiguities in articulation (Coarticulation)– tee, tree, city, beaten, steep– this car, this ship
• Ambiguities in meaning– We will review that in the near future– He lives near the station– two, to, and too
Semantic Problems
– “I called my mother on the television and did not understand the door. It was too breakfast, but they came from far to near. My mother is not too old for me to be young." (Wernecke’s aphasia)
– “we went up the river sunk."
– "John I believe Sally said Bill believed Sue saw."
Could a computer infer the meaning?I cdnuolt blveiee that I cluod aulaclty uesdnatnrd what I was
rdgnieg.
The phaonmneal pweor of the hmuan mnid Aoccdrnig to rscheearch at Cmabridgde Uinervtisy, it deosn't mttaer in what oredr the ltteers in a word are, the olny iprmoatnt tihng is that the frist and lsat ltteer be in the rghit pclae.
The rset can be a taotl mses and you can still raed it wouthit a problem.
This is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the word as a wlohe.
Amzanig huh?
Yaeh and I awlyas thought slpeling was ipmorantt!
Robot-human dialogRobot: “Hi, my name is Robo. I am looking for work to raise funds for Natural
Language Processing research.”Person: “Do you know how to paint?”Robo: “I have successfully completed training in this skill.”Person: “Great! The porch needs painting. Here are the brushes and paint.”
Robot rolls away efficiently. An hour later he returns.
Robo: “The task is complete.”Person: “That was fast, here is your salary; good job, and come back again.”
Robo speaks while rolling away with the payment.
Robo: “The car was not a Porche; it was a Mercedes.”
Moral: You need a sense of humor to work in this field.
99% accuracy
State-of-the-art• Recognition
– Large vocabulary recognition with 98% accuracy– Difficulty: filtering background noise
• Synthesis– Produce clear computer generated speech that is easily understood– Difficulty: incorporate prosody to achieve natural-sounding speech
• Coding– Compress 256k bps (bits per second) audio to 4k bits per second– Research: compress to as low as 600 bps (Human brain: 50 bps)
• Examples of on-going research– Match speech to talking heads– Transform speech from one person to sound like another– Authentication by voice – Automated analysis of audio– Human-machine dialog– Language independent algorithms
A bit of history• Talking machines go back to the middle ages
– Vocoder (Based on Kempelen’s speaking machine)– Theremin (Precursor to digital synthesizers)
• Signal Analysis research go back to the 1700s– Fourier and Laplace– Telephone and speech coding– Rex talking dog in 1922– More recently: Dudley, Levinson– Dramatic progress resulting from ARPA (Advanced
Research Project’s Agency ) challenges
Sample Sound Waves (Sound Editor)
Top: “this is a demo” Bottom: “A goat …. A coat”
Download and install from ACORNS web-site
Time domain, frequency domain, cepstrals, windows, formants,features, frequency, period, amplitude, pitch, quasi periodic, vowels, fricatives, plosives, energy, zero crossings, pitch, sampling rate, windows, frames, filter, Nyquist, onset, duration, phase
GraphCalc (Freeware – download)
• Freeware• I advise you
download and install it
• Good for creating and visualizing signals.
Introduction to Sound
• Amplitude — The distance from zero to the maximum height• Period — The time it takes for a sine wave to complete one cycle• Wavelength (λ) — The distance from one point to the same point on
the next cycle• Frequency (Hz) — The repetitions or cycles per second
Sound results from vibrations in air pressure
Sound• High pitched sounds vibrate fast• Loud sounds have large amplitudes• Sounds with different timbre
(qualities) have subordinate frequencies attached
• Complex sound waves are a series of waves added together
http://www.colorado.edu/physics/phet/ contains sound and other simulations
Understanding Sine Waves• Sine is the ratio of the height to the hypotenuse• Many phenomena in nature occur in sine wave patterns
Complex Wave Patterns• Sound waves occupying the
same space combine to form a new wave of a different shape.
• Harmonically related waves add together and can create any complex wave pattern.
• Harmonically related waves have frequencies that are multiples of a basic frequency.
Fourier proposed that all sound signals can be decomposed into a group of sine waves
Complex Wave Examples
Nyquist TheoremNyquist Frequency (fN) = highest detectible frequencySampling Frequency (fs) = samples per time periodMaximum Signal Frequency (fmax)
Theorem: fN = 2 * fmax; fs >= fN
Inadequate Sampling Adequate Sampling
How many cycles per second do we need?
Aliasing
• When does this occur?– Frequencies (f>N) present that are above Nyquist Frequency(fN)– If f∆ = f>N – fN, then fN+f∆ is indistinguishable from fN-f∆.
• What do we do about it?– Place an anti-aliasing filter to eliminate high frequencies– This CANNOT be done in software
• Example of aliasing - Take a picture of sun every 23 hours• 24 x 23 = 552 hours between sunrises• Sun appears to move from west to east
Different frequencies become indistinguishable
Aliasing and Filtering
Time vs. Frequency Domain
Time Domain: Signal is a composite wave of different frequenciesFrequency Domain: Split time domain into the individual frequencies
Formants• F0: Resonant frequency of the sound productions
– Male average: 100 hz, Female average: 200 hz, Child average: 300 hz
• F1, F2, F3: Formants are multiples of the fundamental frequency (resonances) that vary depending on shape of the vocal tract.– Articulator to the back moves formants together– Articulators to the front moves formants apart– Roundness impacts the complex relationship between F2
and F3• Formants are an excellent feature for distinguishing
vowels. They are less useful for distinguishing unvoiced sounds
Communication
• Form (message)• Meaning (semantics)• Signal (audio sound waves, written
text)• Channel (medium): spoken, written,
gestures
Create and receive information rather than passively extracting it
Semiotics
• Affective communication – Express primitive emotions. Meanings universal.
• Iconic – Meaning easily inferred from the form of expression (slippery road signs).
• Symbolic – Create arbitrary relationships between form and meaning. Each symbol or sound are clearly distinguished (colors) – limited set of meanings.
• Natural – Add grammar, syntax, and sound combinations to express abstract concepts; express productively an unlimited number of messages.
The science of signs and symbols
Science of Language
• Morphology: Language structure• Acoustics: Study of sound• Phonology: Classification of linguistic sounds• Semantics: Study of meaning • Pragmatics: How language is used• Phonetics: Speech production and perception
Natural Language Processing draws from these fields to engineer practical systems that work.
Speech
• Encode – send – signal – receive – decode• Communication tends to be effective and efficient• Speech is as easy on the mouth as possible while
still being understood• Speakers adjust according to implied knowledge
they share with their listeners
Noisy channel
Human Language• Verbal: discrete message carried with continuous signal.• Prosodic: Continuous parallel intonation scale.
– Affective: instinctive, sudden expression – Augmentative : varied by individual to clarify or inject
personality. – Supra-segmental: intonation patterns of a language– Null (neutral): minimal use of prosody to accent words
and phrases .• Text: Written channel
– punctuation and context hints at prosody– TTS infers prosody using language specific knowledge.
Language Components• Phoneme: Smallest discrete unit of sound that
distinguishes words (Minimal Pair Principle)• Syllable: Acoustic component perceived as a
single unit• Morpheme: Smallest linguistic unit with meaning• Word: Speaker identifiable unit of meaning• Phrase: Sub-message of one or more words• Sentence: Self-contained message derived from a
sequence of phrases and words
Natural Language Characteristics
• Phones are the set of all possible sounds that humans can articulate. Each phone has unique characteristics.
• Each language selects a set of phonemes from the larger set of phones (English – 40). Our hearing is tuned to respond to this smaller set.
• Speech is a highly redundant sequential sequence of sounds (phonemes) , pitch (prosody), gestures, and expressions varying with time.
Audio Signal Redundancy• Continuous signal (virtually infinite)• Sampled
– Mac: 44,100 2-byte samples per second (705kbps)– PC: 16,000 2-byte samples per second (256kbps)– Telephone: 4k 1-byte sample per second (32kbps)– CELP Compression: 8kbps– Research: 4kbps, 2.4 kbps– Military applications: 600 bps– Human brain: 50 bps
Course Goals• Introduce algorithms and techniques used in natural
language processing• Explain how these techniques are useful outside of
this specific field• Provide enough background so we, as a class, can
begin to work towards significant contributions in the winter follow-up class
• Discuss various areas, but focus on speech synthesis• Discuss topics in a manner accessible to students
with diverse backgrounds
Projects• Pronunciation aid
– Useful for language learning for students to grasp the phonemes that are not in their first language
– Useful for the hearing impaired to be able to speak normally through visual feedback
• Generate speech from a language independent script– Design a language-independent script and identify possible
problems. A future application is to analyze speech and translate into this script.
• Identify, codify speaker dependent speech components– Future applications are computer based games where
audio can transform voices to a multitude of speakers
The sound “m”
Sound Wave (program) Sound Wave (your sound)
Sound Wave Sound Wave “mmmmmm”“mmmmmm”
Tongue placement Width of wind pipe Vocal chord vibration Picture key:
Your Sound Wave
Program Wave
Pronunciation Lesson
The sound “h”
Sound Wave (program) Sound Wave (your sound)
Sound Wave Sound Wave “hhhhhhhh”“hhhhhhhh”
Tongue placement Width of wind pipe Vocal chord vibration Picture key- Changes in:
Your Sound Wave
Program Wave
Pronunciation Lesson