ai based character recognition and speech synthesis

45
Seminar on “ AI Based Character Recognition and Speech Synthesis” Developed By : Kalyani Hadke Rani Kubetkar Shreya Surjuse Ankita Jadhao Kruttika Sorte

Upload: ankita-jadhao

Post on 26-Jan-2017

448 views

Category:

Engineering


2 download

TRANSCRIPT

Page 1: Ai based character recognition and speech  synthesis

Seminar on

“ AI Based Character Recognition and Speech Synthesis”

Developed By:

Kalyani Hadke Rani Kubetkar

Shreya Surjuse Ankita Jadhao

Kruttika Sorte

Guided By

Prof. H. N. Datir

Page 2: Ai based character recognition and speech  synthesis

Artificial Intelligence based

Character Recognition and Speech Synthesis

Page 3: Ai based character recognition and speech  synthesis

NEED!!!We are facing so many problem in our daily life like, if we capturing the image some time we can not get proper image and not recognize the words.Lots of people have the problem of illiteracy .So we wish that this image should be converted to text for various purposes.While studying, we don’t read the text as a regular practice. So we wish that this text can be converted into audio.Apart which we wish should be captured in image & converted into audio.As generally we prefer hearing songs,

Page 4: Ai based character recognition and speech  synthesis

Introduction to CR and SS

• Optical Character Recognition (OCR) is an electronic or mechanical converter.

• OCR converts scanned images or text into machine code.

• Speech Synthesis is the artificial production of human speech.• Speech synthesizer – a computer system used for this purpose.• TTS engine performs:• Language into speech• Symbolic linguistic representation to speech

Page 5: Ai based character recognition and speech  synthesis

• Image

OCR

• Recognized text

TEXT• Speech

engine

speech

• Image

OCR

• Recognized text

TEXT• Recognized

text

TEXT• Speech

engine

speech

Overview

Page 6: Ai based character recognition and speech  synthesis

DFD For Character Recognition System

Pre-Processing Segmentation Image Digitization

Network ImplementationTraining of Learning Network

recognition Network testing

Pre-processing explanation

Page 7: Ai based character recognition and speech  synthesis

De-noising

De-skew

Binarization

Pre-processing

Page 8: Ai based character recognition and speech  synthesis

Pre-Processing Segmentation Image Digitization

Network ImplementationTraining of Learning Network

recognition Network testing

Pre-processing explanation

DFD For Character Recognition System

Page 9: Ai based character recognition and speech  synthesis

Image segmentation Decompose sequence of characters in individual

symbols. Directly affects the rate of recognition of script. Locate and identify boundaries of image.

1. External segmentation2. Internal segmentation

SEGMENTATION

Page 10: Ai based character recognition and speech  synthesis

. .

Image segmentation is the process of partitioning an image into multiple segments ,so as to change the representation of an image into something that is more meaningful and easier to analyze.

1

23

4

. External Segmentation: determine the character lines in the text.

Image segmentation is the process of partitioning 1

Page 11: Ai based character recognition and speech  synthesis

I m a g e

Internal Segmentation: decompose an image of sequence of characters to images of individual symbols

Page 12: Ai based character recognition and speech  synthesis

Pre-Processing Segmentation Image Digitization

Network ImplementationTraining of Learning Network

recognition Network testing

Pre-processing explanation

DFD For Character Recognition System

Page 13: Ai based character recognition and speech  synthesis

• Mapping of symbol image into a corresponding two dimensional binary matrix

• Issue – deciding the size of matrix• Sampling strategy for mapping the symbol

image

Image Digitization - Matrix matching

Page 14: Ai based character recognition and speech  synthesis

Input alphabet ‘ a ‘

0

0 0

0

0

0

0

0

0

0 0

0

0

0

0

0

0

1

1 1

1

1

1

1

1

1

1

1 1 1

Segmented grid

Digitization

Page 15: Ai based character recognition and speech  synthesis

• To feed matrix data to the network it must be linearize to a single dimension

0

0 0

0

0

0

0

0

0

0 0

0

0

0

0

0

0

1

1 1

1

1

1

1

1

1

1

1 1 1

…………...0 1 1

Page 16: Ai based character recognition and speech  synthesis

N

A

M

E

NAME

001110100….

111010011….

11001100….

000111101…..

NAMENEURAL

NETWORK

14

1

13

5Image of scanned document

Sub-images of individual letter from document

Binary representation of sub-images. E.g 0 is white and 1 is black.

A supervised neural network that has been trained to recognize images of characters.

Neural network output numeric values corresponding to the recognized characters.

File contains the text of the scanned document.

Page 17: Ai based character recognition and speech  synthesis

DFD For Character Recognition System

Pre-Processing Segmentation Image Digitization

Network ImplementationTraining of Learning Network

recognition Network testing

Pre-processing explanation

Page 18: Ai based character recognition and speech  synthesis

Artificial neural network consists of a large number of highly interconnected processing elements (neurons) working in unison to solve specific problems analogous to the biological neurons in the brain. Neurons communicated with weighted links

NEURON NEURONWeighted link

X1

Xn

Output

Wk1

Wkp

SummationSigmoid function

Page 19: Ai based character recognition and speech  synthesis

• Feed-forward neural network • A multilayer perceptron • Teaching and adaption of ANN• Implementation the ANN

Page 20: Ai based character recognition and speech  synthesis

Neural Network

Input SignalOutput signal

Input layerFirst hidden layer

Second hidden layerOutput layer

Page 21: Ai based character recognition and speech  synthesis

DFD For Character Recognition System

Pre-Processing Segmentation Image Digitization

Network ImplementationTraining of Learning Network

Recognition Network testing

Pre-processing explanation

Page 22: Ai based character recognition and speech  synthesis

Neural Network

Input SignalOutput signal

Binary converted image

Obtained text of scanned image

Back-propagation for Error calculationERROR

Page 23: Ai based character recognition and speech  synthesis

N

A

M

E

NAME

001110100….

111010011….

11001100….

000111101…..

NAMENEURAL

NETWORK

14

1

13

5

Sub-images of individual letter from document

Binary representation of sub-images. E.g 0 is white and 1 is black.

A supervised neural network that has been trained to recognize images of characters.

Neural network output numeric values corresponding to the recognized characters.

File contains the text of the scanned document.

Image of scanned document

Page 24: Ai based character recognition and speech  synthesis

Speech Synthesis

Pre-Processing Segmentation Image Digitization

Network ImplementationTraining of Learning Network

Input Image

File containing

Text of scanned document

NLP DSP SPEECH

TEXT

TTS Engine

Page 25: Ai based character recognition and speech  synthesis

• TTS-Text to Speech engine• a computer-based system that read any text

aloud.• TTS engine consist of Front-end - NLP Back-end -DSP

Speech Synthesis

Page 26: Ai based character recognition and speech  synthesis

Modules of Text-to-Speech

Natural language processing

Text PreprocessingText Analysis

Linguistic Analysis

Digital signal

processing

SpeechSynthesizer

TEXT SPEECH

Prosody

Phonemes

Figure 1. A simple but general functional diagram of a TTS system

Input Output

Page 27: Ai based character recognition and speech  synthesis

Speech Synthesis

Pre-Processing Segmentation Image Digitization

Network ImplementationTraining of Learning Network

Input Image

File containing

Text of scanned document

NLP DSP SPEECH

TEXT

TTS Engine

TEXT ANALYSIS

Auto PHONEME

Prosody Generation

Page 28: Ai based character recognition and speech  synthesis

• This step called high-level, front-end or text-to-phoneme.

• It consists of the following parts: Text analysis Automatic Phonetization Prosody generation

NLP Module

Page 29: Ai based character recognition and speech  synthesis

Speech Synthesis

Pre-Processing Segmentation Image Digitization

Network ImplementationTraining of Learning Network

Input Image

File containing

Text of scanned document

NLP DSP SPEECH

TEXT

TTS Engine

TEXT ANALYSIS

Auto PHONEME

Prosody Generation

Page 30: Ai based character recognition and speech  synthesis

NLP Module

Text Analysis

A pre-processing

A morphological analysis

A contextual analysis

A syntactic-prosodic

Text analysis

Page 31: Ai based character recognition and speech  synthesis

Speech Synthesis

Pre-Processing Segmentation Image Digitization

Network ImplementationTraining of Learning Network

Input Image

File containing

Text of scanned document

NLP DSP SPEECH

TEXT

TTS Engine

TEXT ANALYSIS

Auto PHONEME

Prosody Generation

Page 32: Ai based character recognition and speech  synthesis

NLP Module

Automatic Phonetization

Rule-Based

Dictionary-based

Hybrid-approach

Automatic Phonetization

Page 33: Ai based character recognition and speech  synthesis

Speech Synthesis

Pre-Processing Segmentation Image Digitization

Network ImplementationTraining of Learning Network

Input Image

File containing

Text of scanned document

NLP DSP SPEECH

TEXT

TTS Engine

TEXT ANALYSIS

Auto PHONEME

Prosody Generation

Formantsynthesis

Concatenative synthesis

Page 34: Ai based character recognition and speech  synthesis

NLP Module

Prosody Generation

Pitch

Intonation

Ryhthm

ProsodyGeneration

Page 35: Ai based character recognition and speech  synthesis

Speech Synthesis

Pre-Processing Segmentation Image Digitization

Network ImplementationTraining of Learning Network

Input Image

File containing

Text of scanned document

NLP DSP SPEECH

TEXT

TTS Engine

TEXT ANALYSIS

Auto PHONEME

Prosody Generation

Formantsynthesis

Concatenative synthesis

Page 36: Ai based character recognition and speech  synthesis

DSP component• Low level phoneme to speech• There are two main technologies used for the

generating synthetic speech waveforms: • Concatenative synthesis • Formant synthesis

Page 37: Ai based character recognition and speech  synthesis

Speech Synthesis

Pre-Processing Segmentation Image Digitization

Network ImplementationTraining of Learning Network

Input Image

File containing

Text of scanned document

NLP DSP SPEECH

TEXT

TTS Engine

TEXT ANALYSIS

Auto PHONEME

Prosody Generation

Formantsynthesis

Concatenative synthesis

Page 38: Ai based character recognition and speech  synthesis

Formant Synthesis• Formant synthesis – rule-based synthesis• does not use any human speech samples at runtime.• Wave-form created using an acoustic model of the

human vocal tract.• Generates artificial, somewhat robotic speech

Page 39: Ai based character recognition and speech  synthesis

Speech Synthesis

Pre-Processing Segmentation Image Digitization

Network ImplementationTraining of Learning Network

Input Image

File containing

Text of scanned document

NLP DSP SPEECH

TEXT

TTS Engine

TEXT ANALYSIS

Auto PHONEME

Prosody Generation

Formantsynthesis

Concatenative synthesis

Page 40: Ai based character recognition and speech  synthesis

Concatenative synthesis

• Based on the concatenation of segments of recorded speech.

• Gives the most natural sounding synthesized speech.

Page 41: Ai based character recognition and speech  synthesis

Concatenative Synthesis

Diphone Concatenation

Synthesis

Unit Concatenation

Synthesis

Somewhat robotic speech, sonic glitches natural speech

SUBTYPES

Page 42: Ai based character recognition and speech  synthesis

• Unit Concatenation Synthesis– Algorithm

• Break language down to small units (phonemes, syllables, etc.)• Create a large database of recorded speech• Each unit is labeled: pitch, duration, prosody, position in syllable, etc.

Labeling is synthesizer-dependant• Target utterance is selected at runtime by determining the best chain

of units (HMM, Decision Tree)• Use DSP to smooth transitions between units

Approaches To Wave-form Generation Concatenative

Page 43: Ai based character recognition and speech  synthesis

Pre-Processing Segmentation Image Digitization

Network ImplementationTraining of Learning Network

Input Image

File containing

Text of scanned document

NLP DSP SPEECH

TEXT

TTS Engine

TEXT ANALYSIS

Auto PHONEME

Prosody Generation

Formantsynthesis

Concatenative synthesis

Page 44: Ai based character recognition and speech  synthesis

Advantages• Machine Language Translation

• Information Retrievals

• Visual Issue (Difficulty seeing text)

• Motor Issue(Difficulty handling a book or paper)

Page 45: Ai based character recognition and speech  synthesis

QUESTIONS????