natural language processing

33
Natural Language Processing Automatic Speech Recognition Speech Coding Speaker identification Speech Transformation Speech Synthesis Speech Mining Dialog Systems Talking Heads Language Translation Hearing aids Speech Enhancements Mobile devices • Gaming Signal Processing Acoustics Physics Engineering Linguistics Psychology Mathematics Computer Science Communication Cognition st of technologies touching many interdisciplinary

Upload: taro

Post on 18-Mar-2016

66 views

Category:

Documents


0 download

DESCRIPTION

Natural Language Processing. A host of technologies touching many interdisciplinary areas. Automatic Speech Recognition Speech Coding Speaker identification Speech Transformation Speech Synthesis Speech Mining Dialog Systems Talking Heads Language Translation Hearing aids - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Natural Language Processing

Natural Language Processing

• Automatic Speech Recognition• Speech Coding• Speaker identification• Speech Transformation• Speech Synthesis• Speech Mining• Dialog Systems• Talking Heads• Language Translation• Hearing aids• Speech Enhancements• Mobile devices• Gaming

• Signal Processing• Acoustics• Physics• Engineering• Linguistics• Psychology• Mathematics• Computer Science • Communication• Cognition

A host of technologies touching many interdisciplinary areas

Page 2: Natural Language Processing

Human vs. Machine• Turing Test: Cannot distinguish if we are communicating with

a human or machine

• Telephone Automated Systems– How many of us are fooled?

• Why?– Speech has many ambiguities– Humans change words on-the-fly while speaking– Colloquial speech does not follow a strict grammar– Co-articulations and sloppy pronunciation– Humans are good a filtering noise– Humans understand world view and context– Humans recognize individual characteristics– Prosody contains information like emotion and emphasis

Page 3: Natural Language Processing

One sentence, eight possible meanings

– I cooked waterfowl for her.– I stole her waterfowl and cooked it.– I used my abilities to create a living waterfowl for her.– I caused her to bid low in the game of bridge.– I created the plastic duck that she owns.– I caused her to quickly lower her head or body.– I waved my magic wand and turned her into

waterfowl.– I caused her to avoid the test.

I made her duck

Page 4: Natural Language Processing

Ambiguities in speech

• Ambiguities in pronunciation– “haya dun”– “ay d ih s h er d s ah m th in ng ah b aw m uh v ih ng r ih s en l ih”

• Ambiguities in articulation (Coarticulation)– tee, tree, city, beaten, steep– this car, this ship

• Ambiguities in meaning– We will review that in the near future– He lives near the station– two, to, and too

Page 5: Natural Language Processing

Semantic Problems

– “I called my mother on the television and did not understand the door. It was too breakfast, but they came from far to near. My mother is not too old for me to be young." (Wernecke’s aphasia)

– “we went up the river sunk."

– "John I believe Sally said Bill believed Sue saw."

Page 6: Natural Language Processing

Could a computer infer the meaning?I cdnuolt blveiee that I cluod aulaclty uesdnatnrd what I was

rdgnieg.

The phaonmneal pweor of the hmuan mnid Aoccdrnig to rscheearch at Cmabridgde Uinervtisy, it deosn't mttaer in what oredr the ltteers in a word are, the olny iprmoatnt tihng is that the frist and lsat ltteer be in the rghit pclae.

The rset can be a taotl mses and you can still raed it wouthit a problem.

This is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the word as a wlohe.

Amzanig huh?

Yaeh and I awlyas thought slpeling was ipmorantt!

Page 7: Natural Language Processing

Robot-human dialogRobot: “Hi, my name is Robo. I am looking for work to raise funds for Natural

Language Processing research.”Person: “Do you know how to paint?”Robo: “I have successfully completed training in this skill.”Person: “Great! The porch needs painting. Here are the brushes and paint.”

Robot rolls away efficiently. An hour later he returns.

Robo: “The task is complete.”Person: “That was fast, here is your salary; good job, and come back again.”

Robo speaks while rolling away with the payment.

Robo: “The car was not a Porche; it was a Mercedes.”

Moral: You need a sense of humor to work in this field.

99% accuracy

Page 8: Natural Language Processing

State-of-the-art• Recognition

– Large vocabulary recognition with 98% accuracy– Difficulty: filtering background noise

• Synthesis– Produce clear computer generated speech that is easily understood– Difficulty: incorporate prosody to achieve natural-sounding speech

• Coding– Compress 256k bps (bits per second) audio to 4k bits per second– Research: compress to as low as 600 bps (Human brain: 50 bps)

• Examples of on-going research– Match speech to talking heads– Transform speech from one person to sound like another– Authentication by voice – Automated analysis of audio– Human-machine dialog– Language independent algorithms

Page 9: Natural Language Processing

A bit of history• Talking machines go back to the middle ages

– Vocoder (Based on Kempelen’s speaking machine)– Theremin (Precursor to digital synthesizers)

• Signal Analysis research go back to the 1700s– Fourier and Laplace– Telephone and speech coding– Rex talking dog in 1922– More recently: Dudley, Levinson– Dramatic progress resulting from ARPA (Advanced

Research Project’s Agency ) challenges

Page 10: Natural Language Processing

Sample Sound Waves (Sound Editor)

Top: “this is a demo” Bottom: “A goat …. A coat”

Download and install from ACORNS web-site

Time domain, frequency domain, cepstrals, windows, formants,features, frequency, period, amplitude, pitch, quasi periodic, vowels, fricatives, plosives, energy, zero crossings, pitch, sampling rate, windows, frames, filter, Nyquist, onset, duration, phase

Page 11: Natural Language Processing

GraphCalc (Freeware – download)

• Freeware• I advise you

download and install it

• Good for creating and visualizing signals.

Page 12: Natural Language Processing

Introduction to Sound

• Amplitude — The distance from zero to the maximum height• Period — The time it takes for a sine wave to complete one cycle• Wavelength (λ) — The distance from one point to the same point on

the next cycle• Frequency (Hz) — The repetitions or cycles per second

Sound results from vibrations in air pressure

Page 13: Natural Language Processing

Sound• High pitched sounds vibrate fast• Loud sounds have large amplitudes• Sounds with different timbre

(qualities) have subordinate frequencies attached

• Complex sound waves are a series of waves added together

http://www.colorado.edu/physics/phet/ contains sound and other simulations

Page 14: Natural Language Processing

Understanding Sine Waves• Sine is the ratio of the height to the hypotenuse• Many phenomena in nature occur in sine wave patterns

Page 15: Natural Language Processing

Complex Wave Patterns• Sound waves occupying the

same space combine to form a new wave of a different shape.

• Harmonically related waves add together and can create any complex wave pattern.

• Harmonically related waves have frequencies that are multiples of a basic frequency.

Fourier proposed that all sound signals can be decomposed into a group of sine waves

Page 16: Natural Language Processing

Complex Wave Examples

Page 17: Natural Language Processing

Nyquist TheoremNyquist Frequency (fN) = highest detectible frequencySampling Frequency (fs) = samples per time periodMaximum Signal Frequency (fmax)

Theorem: fN = 2 * fmax; fs >= fN

Inadequate Sampling Adequate Sampling

How many cycles per second do we need?

Page 18: Natural Language Processing

Aliasing

• When does this occur?– Frequencies (f>N) present that are above Nyquist Frequency(fN)– If f∆ = f>N – fN, then fN+f∆ is indistinguishable from fN-f∆.

• What do we do about it?– Place an anti-aliasing filter to eliminate high frequencies– This CANNOT be done in software

• Example of aliasing - Take a picture of sun every 23 hours• 24 x 23 = 552 hours between sunrises• Sun appears to move from west to east

Different frequencies become indistinguishable

Page 19: Natural Language Processing

Aliasing and Filtering

Page 20: Natural Language Processing

Time vs. Frequency Domain

Time Domain: Signal is a composite wave of different frequenciesFrequency Domain: Split time domain into the individual frequencies

Page 21: Natural Language Processing

Formants• F0: Resonant frequency of the sound productions

– Male average: 100 hz, Female average: 200 hz, Child average: 300 hz

• F1, F2, F3: Formants are multiples of the fundamental frequency (resonances) that vary depending on shape of the vocal tract.– Articulator to the back moves formants together– Articulators to the front moves formants apart– Roundness impacts the complex relationship between F2

and F3• Formants are an excellent feature for distinguishing

vowels. They are less useful for distinguishing unvoiced sounds

Page 22: Natural Language Processing

Communication

• Form (message)• Meaning (semantics)• Signal (audio sound waves, written

text)• Channel (medium): spoken, written,

gestures

Create and receive information rather than passively extracting it

Page 23: Natural Language Processing

Semiotics

• Affective communication – Express primitive emotions. Meanings universal.

• Iconic – Meaning easily inferred from the form of expression (slippery road signs).

• Symbolic – Create arbitrary relationships between form and meaning. Each symbol or sound are clearly distinguished (colors) – limited set of meanings.

• Natural – Add grammar, syntax, and sound combinations to express abstract concepts; express productively an unlimited number of messages.

The science of signs and symbols

Page 24: Natural Language Processing

Science of Language

• Morphology: Language structure• Acoustics: Study of sound• Phonology: Classification of linguistic sounds• Semantics: Study of meaning • Pragmatics: How language is used• Phonetics: Speech production and perception

Natural Language Processing draws from these fields to engineer practical systems that work.

Page 25: Natural Language Processing

Speech

• Encode – send – signal – receive – decode• Communication tends to be effective and efficient• Speech is as easy on the mouth as possible while

still being understood• Speakers adjust according to implied knowledge

they share with their listeners

Noisy channel

Page 26: Natural Language Processing

Human Language• Verbal: discrete message carried with continuous signal.• Prosodic: Continuous parallel intonation scale.

– Affective: instinctive, sudden expression – Augmentative : varied by individual to clarify or inject

personality. – Supra-segmental: intonation patterns of a language– Null (neutral): minimal use of prosody to accent words

and phrases .• Text: Written channel

– punctuation and context hints at prosody– TTS infers prosody using language specific knowledge.

Page 27: Natural Language Processing

Language Components• Phoneme: Smallest discrete unit of sound that

distinguishes words (Minimal Pair Principle)• Syllable: Acoustic component perceived as a

single unit• Morpheme: Smallest linguistic unit with meaning• Word: Speaker identifiable unit of meaning• Phrase: Sub-message of one or more words• Sentence: Self-contained message derived from a

sequence of phrases and words

Page 28: Natural Language Processing

Natural Language Characteristics

• Phones are the set of all possible sounds that humans can articulate. Each phone has unique characteristics.

• Each language selects a set of phonemes from the larger set of phones (English – 40). Our hearing is tuned to respond to this smaller set.

• Speech is a highly redundant sequential sequence of sounds (phonemes) , pitch (prosody), gestures, and expressions varying with time.

Page 29: Natural Language Processing

Audio Signal Redundancy• Continuous signal (virtually infinite)• Sampled

– Mac: 44,100 2-byte samples per second (705kbps)– PC: 16,000 2-byte samples per second (256kbps)– Telephone: 4k 1-byte sample per second (32kbps)– CELP Compression: 8kbps– Research: 4kbps, 2.4 kbps– Military applications: 600 bps– Human brain: 50 bps

Page 30: Natural Language Processing

Course Goals• Introduce algorithms and techniques used in natural

language processing• Explain how these techniques are useful outside of

this specific field• Provide enough background so we, as a class, can

begin to work towards significant contributions in the winter follow-up class

• Discuss various areas, but focus on speech synthesis• Discuss topics in a manner accessible to students

with diverse backgrounds

Page 31: Natural Language Processing

Projects• Pronunciation aid

– Useful for language learning for students to grasp the phonemes that are not in their first language

– Useful for the hearing impaired to be able to speak normally through visual feedback

• Generate speech from a language independent script– Design a language-independent script and identify possible

problems. A future application is to analyze speech and translate into this script.

• Identify, codify speaker dependent speech components– Future applications are computer based games where

audio can transform voices to a multitude of speakers

Page 32: Natural Language Processing

The sound “m”

Sound Wave (program) Sound Wave (your sound)

Sound Wave Sound Wave “mmmmmm”“mmmmmm”

Tongue placement Width of wind pipe Vocal chord vibration Picture key:

Your Sound Wave

Program Wave

Pronunciation Lesson

Page 33: Natural Language Processing

The sound “h”

Sound Wave (program) Sound Wave (your sound)

Sound Wave Sound Wave “hhhhhhhh”“hhhhhhhh”

Tongue placement Width of wind pipe Vocal chord vibration Picture key- Changes in:

Your Sound Wave

Program Wave

Pronunciation Lesson