natural language processing

Natural Language Processing

• Automatic Speech Recognition• Speech Coding• Speaker identification• Speech Transformation• Speech Synthesis• Speech Mining• Dialog Systems• Talking Heads• Language Translation• Hearing aids• Speech Enhancements• Mobile devices• Gaming

• Signal Processing• Acoustics• Physics• Engineering• Linguistics• Psychology• Mathematics• Computer Science • Communication• Cognition

A host of technologies touching many interdisciplinary areas

Human vs. Machine• Turing Test: Cannot distinguish if we are communicating with

a human or machine

• Telephone Automated Systems– How many of us are fooled?

• Why?– Speech has many ambiguities– Humans change words on-the-fly while speaking– Colloquial speech does not follow a strict grammar– Co-articulations and sloppy pronunciation– Humans are good a filtering noise– Humans understand world view and context– Humans recognize individual characteristics– Prosody contains information like emotion and emphasis

One sentence, eight possible meanings

– I cooked waterfowl for her.– I stole her waterfowl and cooked it.– I used my abilities to create a living waterfowl for her.– I caused her to bid low in the game of bridge.– I created the plastic duck that she owns.– I caused her to quickly lower her head or body.– I waved my magic wand and turned her into

waterfowl.– I caused her to avoid the test.

I made her duck

Ambiguities in speech

• Ambiguities in pronunciation– “haya dun”– “ay d ih s h er d s ah m th in ng ah b aw m uh v ih ng r ih s en l ih”

• Ambiguities in articulation (Coarticulation)– tee, tree, city, beaten, steep– this car, this ship

• Ambiguities in meaning– We will review that in the near future– He lives near the station– two, to, and too

Semantic Problems

– “I called my mother on the television and did not understand the door. It was too breakfast, but they came from far to near. My mother is not too old for me to be young." (Wernecke’s aphasia)

– “we went up the river sunk."

– "John I believe Sally said Bill believed Sue saw."

Could a computer infer the meaning?I cdnuolt blveiee that I cluod aulaclty uesdnatnrd what I was

rdgnieg.

The phaonmneal pweor of the hmuan mnid Aoccdrnig to rscheearch at Cmabridgde Uinervtisy, it deosn't mttaer in what oredr the ltteers in a word are, the olny iprmoatnt tihng is that the frist and lsat ltteer be in the rghit pclae.

The rset can be a taotl mses and you can still raed it wouthit a problem.

This is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the word as a wlohe.

Amzanig huh?

Yaeh and I awlyas thought slpeling was ipmorantt!

Robot-human dialogRobot: “Hi, my name is Robo. I am looking for work to raise funds for Natural

Language Processing research.”Person: “Do you know how to paint?”Robo: “I have successfully completed training in this skill.”Person: “Great! The porch needs painting. Here are the brushes and paint.”

Robot rolls away efficiently. An hour later he returns.

Robo: “The task is complete.”Person: “That was fast, here is your salary; good job, and come back again.”

Robo speaks while rolling away with the payment.

Robo: “The car was not a Porche; it was a Mercedes.”

Moral: You need a sense of humor to work in this field.

99% accuracy

State-of-the-art• Recognition

– Large vocabulary recognition with 98% accuracy– Difficulty: filtering background noise

• Synthesis– Produce clear computer generated speech that is easily understood– Difficulty: incorporate prosody to achieve natural-sounding speech

• Coding– Compress 256k bps (bits per second) audio to 4k bits per second– Research: compress to as low as 600 bps (Human brain: 50 bps)

• Examples of on-going research– Match speech to talking heads– Transform speech from one person to sound like another– Authentication by voice – Automated analysis of audio– Human-machine dialog– Language independent algorithms

A bit of history• Talking machines go back to the middle ages

– Vocoder (Based on Kempelen’s speaking machine)– Theremin (Precursor to digital synthesizers)

• Signal Analysis research go back to the 1700s– Fourier and Laplace– Telephone and speech coding– Rex talking dog in 1922– More recently: Dudley, Levinson– Dramatic progress resulting from ARPA (Advanced

Research Project’s Agency ) challenges

Sample Sound Waves (Sound Editor)

Top: “this is a demo” Bottom: “A goat …. A coat”

Download and install from ACORNS web-site

Time domain, frequency domain, cepstrals, windows, formants,features, frequency, period, amplitude, pitch, quasi periodic, vowels, fricatives, plosives, energy, zero crossings, pitch, sampling rate, windows, frames, filter, Nyquist, onset, duration, phase

GraphCalc (Freeware – download)

• Freeware• I advise you

download and install it

• Good for creating and visualizing signals.

Introduction to Sound

• Amplitude — The distance from zero to the maximum height• Period — The time it takes for a sine wave to complete one cycle• Wavelength (λ) — The distance from one point to the same point on

the next cycle• Frequency (Hz) — The repetitions or cycles per second

Sound results from vibrations in air pressure

Sound• High pitched sounds vibrate fast• Loud sounds have large amplitudes• Sounds with different timbre

(qualities) have subordinate frequencies attached

• Complex sound waves are a series of waves added together

http://www.colorado.edu/physics/phet/ contains sound and other simulations

http://www.colorado.edu/physics/phet/

Understanding Sine Waves• Sine is the ratio of the height to the hypotenuse• Many phenomena in nature occur in sine wave patterns

Complex Wave Patterns• Sound waves occupying the

same space combine to form a new wave of a different shape.

• Harmonically related waves add together and can create any complex wave pattern.

• Harmonically related waves have frequencies that are multiples of a basic frequency.

Fourier proposed that all sound signals can be decomposed into a group of sine waves

Complex Wave Examples

Nyquist TheoremNyquist Frequency (fN) = highest detectible frequencySampling Frequency (fs) = samples per time periodMaximum Signal Frequency (fmax)

Theorem: fN = 2 * fmax; fs >= fN

Inadequate Sampling Adequate Sampling

How many cycles per second do we need?

Aliasing

• When does this occur?– Frequencies (f>N) present that are above Nyquist Frequency(fN)– If f∆ = f>N – fN, then fN+f∆ is indistinguishable from fN-f∆.

• What do we do about it?– Place an anti-aliasing filter to eliminate high frequencies– This CANNOT be done in software

• Example of aliasing - Take a picture of sun every 23 hours• 24 x 23 = 552 hours between sunrises• Sun appears to move from west to east

Different frequencies become indistinguishable

Aliasing and Filtering

Time vs. Frequency Domain

Time Domain: Signal is a composite wave of different frequenciesFrequency Domain: Split time domain into the individual frequencies

Formants• F0: Resonant frequency of the sound productions

– Male average: 100 hz, Female average: 200 hz, Child average: 300 hz

• F1, F2, F3: Formants are multiples of the fundamental frequency (resonances) that vary depending on shape of the vocal tract.– Articulator to the back moves formants together– Articulators to the front moves formants apart– Roundness impacts the complex relationship between F2

and F3• Formants are an excellent feature for distinguishing

vowels. They are less useful for distinguishing unvoiced sounds

Communication

• Form (message)• Meaning (semantics)• Signal (audio sound waves, written

text)• Channel (medium): spoken, written,

gestures

Create and receive information rather than passively extracting it

Semiotics

• Affective communication – Express primitive emotions. Meanings universal.

• Iconic – Meaning easily inferred from the form of expression (slippery road signs).

• Symbolic – Create arbitrary relationships between form and meaning. Each symbol or sound are clearly distinguished (colors) – limited set of meanings.

• Natural – Add grammar, syntax, and sound combinations to express abstract concepts; express productively an unlimited number of messages.

The science of signs and symbols

Science of Language

• Morphology: Language structure• Acoustics: Study of sound• Phonology: Classification of linguistic sounds• Semantics: Study of meaning • Pragmatics: How language is used• Phonetics: Speech production and perception

Natural Language Processing draws from these fields to engineer practical systems that work.

Speech

• Encode – send – signal – receive – decode• Communication tends to be effective and efficient• Speech is as easy on the mouth as possible while

still being understood• Speakers adjust according to implied knowledge

they share with their listeners

Noisy channel

Human Language• Verbal: discrete message carried with continuous signal.• Prosodic: Continuous parallel intonation scale.

– Affective: instinctive, sudden expression – Augmentative : varied by individual to clarify or inject

personality. – Supra-segmental: intonation patterns of a language– Null (neutral): minimal use of prosody to accent words

and phrases .• Text: Written channel

– punctuation and context hints at prosody– TTS infers prosody using language specific knowledge.

Language Components• Phoneme: Smallest discrete unit of sound that

distinguishes words (Minimal Pair Principle)• Syllable: Acoustic component perceived as a

single unit• Morpheme: Smallest linguistic unit with meaning• Word: Speaker identifiable unit of meaning• Phrase: Sub-message of one or more words• Sentence: Self-contained message derived from a

sequence of phrases and words

Natural Language Characteristics

• Phones are the set of all possible sounds that humans can articulate. Each phone has unique characteristics.

• Each language selects a set of phonemes from the larger set of phones (English – 40). Our hearing is tuned to respond to this smaller set.

• Speech is a highly redundant sequential sequence of sounds (phonemes) , pitch (prosody), gestures, and expressions varying with time.

Audio Signal Redundancy• Continuous signal (virtually infinite)• Sampled

– Mac: 44,100 2-byte samples per second (705kbps)– PC: 16,000 2-byte samples per second (256kbps)– Telephone: 4k 1-byte sample per second (32kbps)– CELP Compression: 8kbps– Research: 4kbps, 2.4 kbps– Military applications: 600 bps– Human brain: 50 bps

Course Goals• Introduce algorithms and techniques used in natural

language processing• Explain how these techniques are useful outside of

this specific field• Provide enough background so we, as a class, can

begin to work towards significant contributions in the winter follow-up class

• Discuss various areas, but focus on speech synthesis• Discuss topics in a manner accessible to students

with diverse backgrounds

Projects• Pronunciation aid

– Useful for language learning for students to grasp the phonemes that are not in their first language

– Useful for the hearing impaired to be able to speak normally through visual feedback

• Generate speech from a language independent script– Design a language-independent script and identify possible

problems. A future application is to analyze speech and translate into this script.

• Identify, codify speaker dependent speech components– Future applications are computer based games where

audio can transform voices to a multitude of speakers

The sound “m”

Sound Wave (program) Sound Wave (your sound)

Sound Wave Sound Wave “mmmmmm”“mmmmmm”

Tongue placement Width of wind pipe Vocal chord vibration Picture key:

Your Sound Wave

Program Wave

Pronunciation Lesson

The sound “h”

Sound Wave (program) Sound Wave (your sound)

Sound Wave Sound Wave “hhhhhhhh”“hhhhhhhh”

Tongue placement Width of wind pipe Vocal chord vibration Picture key- Changes in:

Your Sound Wave

Program Wave

Pronunciation Lesson

natural language processing

Documents

living waterfowl

d s ah

v ih ng r ih s en

speakingcolloquial speech

ng ah b

robothuman dialogrobot

near futurehe lives

filtering noisehumans