speech and language processing

SUBMITTED BY-

VIKALP MAHENDRA

(EC-11)

SUBMITTED TO-

MR. ABHISHEK SRIVASTAVA

Introduction

Block Diagram

Linguistic Levels Of Analysis

Phonetics

Organs Of Speech And Articulation

Acoustic Model

Circuit Diagram

Components Used

Features Of HM2007

Working

Extracting Phonemes In Frequency Domain

Markov Model

Advantages

Applications

Conclusion

Analyses sound and converts spoken word into text

Uses knowledge of spoken English

Programs are available for voice recognition.

Systems work best on Windows XP & Windows Vista

Computers

AlgorithmsDatabases

Robotics SearchNatural Language Processing

Information

Retrieval

Machine

TranslationLanguage

Analysis

Semantics

Speech

Written language

Phonology: sounds / letters / pronunciation

Morphology: the structure of words

Syntax: how these sequences are structured

Semantics: meaning of the strings

The Study of the way Humans make, Transmit, and

receive sounds

Phonology - the study of sound systems of languages

A typical word such as moon broken down into

three phonemes: m, ue , n.

Phoneme represents all vowels and consonants of

spoken speech

Most vowel sounds are modified by the shape of the lips (rounded / spread / neutral)

Sounds are made by vibrating the vocal cords (voicing)

Vowels can be :-

Single sounds – Monophthongs or pure vowels

Double sounds - Diphthongs

Triple sounds - Triphthongs

Pure vowels usually come in pairs consisting of long and short sounds

This is found in the word tea. The lips are spread and the sound is long.

This is found in the word hip. The lips are slightly spread and the sound is short.

The tongue tip is raised slightly at the front towards the alveolar. In the longer sound the

tongue is raised higher.

This sound is made by relaxing the mouth and

keeping your lips in a neutral position and making

a short sound. It is found in words like paper,

over, about, and common in weak verbs in spoken

English.

The long sound – you, too & blue

The short sound –Good, would &

wool

The lips are rounded and the centre

and back of the tongue is raised towards

the soft plate. For the longer sound the

tongue is raised higher and the lips are

more rounded.

This sound is made with the mouth

spread wide open. It is found in – cat,

man, apple & ran

Here we have three sounds: The sounds from -

1) for 2) tour 3) go

Triphthongs are combinations of three sounds-

English has 1 triphthong (a diphthong + a

schwa sound)

Diphthongs are combinations of two sounds.

Diphthongs are combinations of pure vowels.

•a:+ I = ‘aI’ - tie, buy, height & night

•e + I = ‘eI’ - way, paid & gate

•o: + I = ‘oI’ – boy, coin & coy

•e + = e - where, hair & care

• I + = I - here, hear & beer

e e

ee

The audio recording of speech to create a statistical representation of sound.

To create a speech recognition engine, a large

database of models is created to match each phoneme

These database models have stored phonemes

The language model has the grammar of the

sentence to decode our spoken word to text.

HM 2007 IC

SRAM 8K*8

LATCH 74LS373

INPUT BUFFER 7448

XTAL 3.57MHz

PCB

KEYPAD

PC MOUNTED SWITCHES

7 SEGMENT DISPLAY

MICROPHONE

22K RESISTOR

100K RESISTOR

.0047F CAPACITOR

A single chip voice recognition system

having 48 pin .

Manufactured by Hualon

Maximum 40 word and word length 1.92 sec

Microphone support

5V power supply

How a computer convert spoken speech into data ??

When we speak, a microphone converts the analog signal of our voice into

digital chunks of data that the computer analyzes.

It is from this data that the computer extracts enough information that

confidently guess the word being spoken

To extract phonemes

Phonemes are linguistic units

The sounds that group together form words

Phoneme converts into sound & depends on many factors

aa - father

ae - cat

ah - cut

ao - dog

aw - foul

ng - sing

t - talk

th - thin

uh - book

waveform shows

phonemes freq

characteristics

Phonemes are extracted by running waveform through Fourier

transform

Easily visible in frequency domain

This can be make out by seeing spectrograph

Spectrograph is a 3-D plot of waveform freq and amplitude

versus time and amplitude is shown in grey colour

Computer generates list of phoneme

These phoneme have to be converted into words and to

sentence so Markov model is used

It compares the observed phoneme with the stored phoneme

In this, word tomato is written both in English and American

English format

This idea is used upto the level of sentences and improved

recognition

It is used to translate different form of language

It Is used in telephones

The std land line telephone has a bandwidth of 64kb/s.

Sampling rate of 8khz

In Std desktop P.C ,the limiting factor is sound card.It can

record sampling rate between 16 kHz to 48 kHz

MILITARY

HELICOPTERS

IN MOBILE SMARTPHONES`

SPEECH CONTROLLED

APPLIANCES

VOICE RECOGNITION SECURITY

Speech recognition system is one of the latest technology .

Ir reduces costs like that of training

Steps :

Fourier transform of signal

Extraction of Phonemes

Formation of word on the basis of Markov Models

Charm of Simplicity

With the advent of this technology, we will hopefully see a new era of human computer interaction .

From: Chapter 1 of An Introduction to Natural Language

Processing, Computational Linguistics, and Speech

Recognition, by Daniel Jurafsky and James H. Martin

http://en.wikipedia.org/wiki/acoustic model

http://en.wikipedia.org/wiki/speech recognition

www.wikpedia.org

www.slideshare.net

Natural Language Processing by Rada Mihalcea

www.youtube.com

http://www.wikpedia.org/

http://www.slideshare.net/

speech and language processing

Technology