speech and language processing
TRANSCRIPT
SUBMITTED BY-
VIKALP MAHENDRA
(EC-11)
SUBMITTED TO-
MR. ABHISHEK SRIVASTAVA
Introduction
Block Diagram
Linguistic Levels Of Analysis
Phonetics
Organs Of Speech And Articulation
Acoustic Model
Circuit Diagram
Components Used
Features Of HM2007
Working
Extracting Phonemes In Frequency Domain
Markov Model
Advantages
Applications
Conclusion
Analyses sound and converts spoken word into text
Uses knowledge of spoken English
Programs are available for voice recognition.
Systems work best on Windows XP & Windows Vista
Computers
AlgorithmsDatabases
Robotics SearchNatural Language Processing
Information
Retrieval
Machine
TranslationLanguage
Analysis
Semantics
Speech
Written language
Phonology: sounds / letters / pronunciation
Morphology: the structure of words
Syntax: how these sequences are structured
Semantics: meaning of the strings
The Study of the way Humans make, Transmit, and
receive sounds
Phonology - the study of sound systems of languages
A typical word such as moon broken down into
three phonemes: m, ue , n.
Phoneme represents all vowels and consonants of
spoken speech
Most vowel sounds are modified by the shape of the lips (rounded / spread / neutral)
Sounds are made by vibrating the vocal cords (voicing)
Vowels can be :-
Single sounds – Monophthongs or pure vowels
Double sounds - Diphthongs
Triple sounds - Triphthongs
Pure vowels usually come in pairs consisting of long and short sounds
This is found in the word tea. The lips are spread and the sound is long.
This is found in the word hip. The lips are slightly spread and the sound is short.
The tongue tip is raised slightly at the front towards the alveolar. In the longer sound the
tongue is raised higher.
This sound is made by relaxing the mouth and
keeping your lips in a neutral position and making
a short sound. It is found in words like paper,
over, about, and common in weak verbs in spoken
English.
The long sound – you, too & blue
The short sound –Good, would &
wool
The lips are rounded and the centre
and back of the tongue is raised towards
the soft plate. For the longer sound the
tongue is raised higher and the lips are
more rounded.
This sound is made with the mouth
spread wide open. It is found in – cat,
man, apple & ran
Here we have three sounds: The sounds from -
1) for 2) tour 3) go
Triphthongs are combinations of three sounds-
English has 1 triphthong (a diphthong + a
schwa sound)
Diphthongs are combinations of two sounds.
Diphthongs are combinations of pure vowels.
•a:+ I = ‘aI’ - tie, buy, height & night
•e + I = ‘eI’ - way, paid & gate
•o: + I = ‘oI’ – boy, coin & coy
•e + = e - where, hair & care
• I + = I - here, hear & beer
e e
ee
The audio recording of speech to create a statistical representation of sound.
To create a speech recognition engine, a large
database of models is created to match each phoneme
These database models have stored phonemes
The language model has the grammar of the
sentence to decode our spoken word to text.
HM 2007 IC
SRAM 8K*8
LATCH 74LS373
INPUT BUFFER 7448
XTAL 3.57MHz
PCB
KEYPAD
PC MOUNTED SWITCHES
7 SEGMENT DISPLAY
MICROPHONE
22K RESISTOR
100K RESISTOR
.0047F CAPACITOR
A single chip voice recognition system
having 48 pin .
Manufactured by Hualon
Maximum 40 word and word length 1.92 sec
Microphone support
5V power supply
How a computer convert spoken speech into data ??
When we speak, a microphone converts the analog signal of our voice into
digital chunks of data that the computer analyzes.
It is from this data that the computer extracts enough information that
confidently guess the word being spoken
To extract phonemes
Phonemes are linguistic units
The sounds that group together form words
Phoneme converts into sound & depends on many factors
aa - father
ae - cat
ah - cut
ao - dog
aw - foul
ng - sing
t - talk
th - thin
uh - book
waveform shows
phonemes freq
characteristics
Phonemes are extracted by running waveform through Fourier
transform
Easily visible in frequency domain
This can be make out by seeing spectrograph
Spectrograph is a 3-D plot of waveform freq and amplitude
versus time and amplitude is shown in grey colour
Computer generates list of phoneme
These phoneme have to be converted into words and to
sentence so Markov model is used
It compares the observed phoneme with the stored phoneme
In this, word tomato is written both in English and American
English format
This idea is used upto the level of sentences and improved
recognition
It is used to translate different form of language
It Is used in telephones
The std land line telephone has a bandwidth of 64kb/s.
Sampling rate of 8khz
In Std desktop P.C ,the limiting factor is sound card.It can
record sampling rate between 16 kHz to 48 kHz
MILITARY
HELICOPTERS
IN MOBILE SMARTPHONES`
SPEECH CONTROLLED
APPLIANCES
VOICE RECOGNITION SECURITY
Speech recognition system is one of the latest technology .
Ir reduces costs like that of training
Steps :
Fourier transform of signal
Extraction of Phonemes
Formation of word on the basis of Markov Models
Charm of Simplicity
With the advent of this technology, we will hopefully see a new era of human computer interaction .
From: Chapter 1 of An Introduction to Natural Language
Processing, Computational Linguistics, and Speech
Recognition, by Daniel Jurafsky and James H. Martin
http://en.wikipedia.org/wiki/acoustic model
http://en.wikipedia.org/wiki/speech recognition
www.wikpedia.org
www.slideshare.net
Natural Language Processing by Rada Mihalcea
www.youtube.com