ai based character recognition and speech synthesis
TRANSCRIPT
Seminar on
“ AI Based Character Recognition and Speech Synthesis”
Developed By:
Kalyani Hadke Rani Kubetkar
Shreya Surjuse Ankita Jadhao
Kruttika Sorte
Guided By
Prof. H. N. Datir
Artificial Intelligence based
Character Recognition and Speech Synthesis
NEED!!!We are facing so many problem in our daily life like, if we capturing the image some time we can not get proper image and not recognize the words.Lots of people have the problem of illiteracy .So we wish that this image should be converted to text for various purposes.While studying, we don’t read the text as a regular practice. So we wish that this text can be converted into audio.Apart which we wish should be captured in image & converted into audio.As generally we prefer hearing songs,
Introduction to CR and SS
• Optical Character Recognition (OCR) is an electronic or mechanical converter.
• OCR converts scanned images or text into machine code.
• Speech Synthesis is the artificial production of human speech.• Speech synthesizer – a computer system used for this purpose.• TTS engine performs:• Language into speech• Symbolic linguistic representation to speech
• Image
OCR
• Recognized text
TEXT• Speech
engine
speech
• Image
OCR
• Recognized text
TEXT• Recognized
text
TEXT• Speech
engine
speech
Overview
DFD For Character Recognition System
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning Network
recognition Network testing
Pre-processing explanation
De-noising
De-skew
Binarization
Pre-processing
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning Network
recognition Network testing
Pre-processing explanation
DFD For Character Recognition System
Image segmentation Decompose sequence of characters in individual
symbols. Directly affects the rate of recognition of script. Locate and identify boundaries of image.
1. External segmentation2. Internal segmentation
SEGMENTATION
. .
Image segmentation is the process of partitioning an image into multiple segments ,so as to change the representation of an image into something that is more meaningful and easier to analyze.
1
23
4
. External Segmentation: determine the character lines in the text.
Image segmentation is the process of partitioning 1
I m a g e
Internal Segmentation: decompose an image of sequence of characters to images of individual symbols
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning Network
recognition Network testing
Pre-processing explanation
DFD For Character Recognition System
• Mapping of symbol image into a corresponding two dimensional binary matrix
• Issue – deciding the size of matrix• Sampling strategy for mapping the symbol
image
Image Digitization - Matrix matching
Input alphabet ‘ a ‘
0
0 0
0
0
0
0
0
0
0 0
0
0
0
0
0
0
1
1 1
1
1
1
1
1
1
1
1 1 1
Segmented grid
Digitization
• To feed matrix data to the network it must be linearize to a single dimension
0
0 0
0
0
0
0
0
0
0 0
0
0
0
0
0
0
1
1 1
1
1
1
1
1
1
1
1 1 1
…………...0 1 1
N
A
M
E
NAME
001110100….
111010011….
11001100….
000111101…..
NAMENEURAL
NETWORK
14
1
13
5Image of scanned document
Sub-images of individual letter from document
Binary representation of sub-images. E.g 0 is white and 1 is black.
A supervised neural network that has been trained to recognize images of characters.
Neural network output numeric values corresponding to the recognized characters.
File contains the text of the scanned document.
DFD For Character Recognition System
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning Network
recognition Network testing
Pre-processing explanation
Artificial neural network consists of a large number of highly interconnected processing elements (neurons) working in unison to solve specific problems analogous to the biological neurons in the brain. Neurons communicated with weighted links
NEURON NEURONWeighted link
X1
Xn
Output
Wk1
Wkp
SummationSigmoid function
• Feed-forward neural network • A multilayer perceptron • Teaching and adaption of ANN• Implementation the ANN
Neural Network
Input SignalOutput signal
Input layerFirst hidden layer
Second hidden layerOutput layer
DFD For Character Recognition System
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning Network
Recognition Network testing
Pre-processing explanation
Neural Network
Input SignalOutput signal
Binary converted image
Obtained text of scanned image
Back-propagation for Error calculationERROR
N
A
M
E
NAME
001110100….
111010011….
11001100….
000111101…..
NAMENEURAL
NETWORK
14
1
13
5
Sub-images of individual letter from document
Binary representation of sub-images. E.g 0 is white and 1 is black.
A supervised neural network that has been trained to recognize images of characters.
Neural network output numeric values corresponding to the recognized characters.
File contains the text of the scanned document.
Image of scanned document
Speech Synthesis
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning Network
Input Image
File containing
Text of scanned document
NLP DSP SPEECH
TEXT
TTS Engine
• TTS-Text to Speech engine• a computer-based system that read any text
aloud.• TTS engine consist of Front-end - NLP Back-end -DSP
Speech Synthesis
Modules of Text-to-Speech
Natural language processing
Text PreprocessingText Analysis
Linguistic Analysis
Digital signal
processing
SpeechSynthesizer
TEXT SPEECH
Prosody
Phonemes
Figure 1. A simple but general functional diagram of a TTS system
Input Output
Speech Synthesis
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning Network
Input Image
File containing
Text of scanned document
NLP DSP SPEECH
TEXT
TTS Engine
TEXT ANALYSIS
Auto PHONEME
Prosody Generation
• This step called high-level, front-end or text-to-phoneme.
• It consists of the following parts: Text analysis Automatic Phonetization Prosody generation
NLP Module
Speech Synthesis
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning Network
Input Image
File containing
Text of scanned document
NLP DSP SPEECH
TEXT
TTS Engine
TEXT ANALYSIS
Auto PHONEME
Prosody Generation
NLP Module
Text Analysis
A pre-processing
A morphological analysis
A contextual analysis
A syntactic-prosodic
Text analysis
Speech Synthesis
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning Network
Input Image
File containing
Text of scanned document
NLP DSP SPEECH
TEXT
TTS Engine
TEXT ANALYSIS
Auto PHONEME
Prosody Generation
NLP Module
Automatic Phonetization
Rule-Based
Dictionary-based
Hybrid-approach
Automatic Phonetization
Speech Synthesis
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning Network
Input Image
File containing
Text of scanned document
NLP DSP SPEECH
TEXT
TTS Engine
TEXT ANALYSIS
Auto PHONEME
Prosody Generation
Formantsynthesis
Concatenative synthesis
NLP Module
Prosody Generation
Pitch
Intonation
Ryhthm
ProsodyGeneration
Speech Synthesis
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning Network
Input Image
File containing
Text of scanned document
NLP DSP SPEECH
TEXT
TTS Engine
TEXT ANALYSIS
Auto PHONEME
Prosody Generation
Formantsynthesis
Concatenative synthesis
DSP component• Low level phoneme to speech• There are two main technologies used for the
generating synthetic speech waveforms: • Concatenative synthesis • Formant synthesis
Speech Synthesis
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning Network
Input Image
File containing
Text of scanned document
NLP DSP SPEECH
TEXT
TTS Engine
TEXT ANALYSIS
Auto PHONEME
Prosody Generation
Formantsynthesis
Concatenative synthesis
Formant Synthesis• Formant synthesis – rule-based synthesis• does not use any human speech samples at runtime.• Wave-form created using an acoustic model of the
human vocal tract.• Generates artificial, somewhat robotic speech
Speech Synthesis
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning Network
Input Image
File containing
Text of scanned document
NLP DSP SPEECH
TEXT
TTS Engine
TEXT ANALYSIS
Auto PHONEME
Prosody Generation
Formantsynthesis
Concatenative synthesis
Concatenative synthesis
• Based on the concatenation of segments of recorded speech.
• Gives the most natural sounding synthesized speech.
Concatenative Synthesis
Diphone Concatenation
Synthesis
Unit Concatenation
Synthesis
Somewhat robotic speech, sonic glitches natural speech
SUBTYPES
• Unit Concatenation Synthesis– Algorithm
• Break language down to small units (phonemes, syllables, etc.)• Create a large database of recorded speech• Each unit is labeled: pitch, duration, prosody, position in syllable, etc.
Labeling is synthesizer-dependant• Target utterance is selected at runtime by determining the best chain
of units (HMM, Decision Tree)• Use DSP to smooth transitions between units
Approaches To Wave-form Generation Concatenative
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning Network
Input Image
File containing
Text of scanned document
NLP DSP SPEECH
TEXT
TTS Engine
TEXT ANALYSIS
Auto PHONEME
Prosody Generation
Formantsynthesis
Concatenative synthesis
Advantages• Machine Language Translation
• Information Retrievals
• Visual Issue (Difficulty seeing text)
• Motor Issue(Difficulty handling a book or paper)
QUESTIONS????