visual speech to text conversion applicable to telephone communication

20
Visual-speech to text conversion applicable to telephone communication for deaf individuals 30TH APRIL 2013

Upload: swathi-venugopal

Post on 13-Jul-2015

151 views

Category:

Engineering


0 download

TRANSCRIPT

Visual-speech to text

conversion applicable

to telephone

communication for deaf

individuals

30TH APRIL 2013

Lip-reading technique,

speech can be understood by interpreting

movements of lips, face and tongue.

not one-to-one

Impossible to distinguish phonemes using

visual information alone

Visual-speech to text conversion applicable to telephone communication for deaf individuals

INTRODUCTION

developed by Cornett

contains two components:

the hand shape the hand position relative to the

face.

Hand shapes- consonant phonemes

hand positions -vowel phonemes.

improves speech perception to a large extent

the Cued Speech system

Visual-speech to text conversion applicable to telephone communication for deaf individuals

the Cued Speech system

Visual-speech to text conversion applicable to telephone communication for deaf individuals

AIM OF NEW SYSTEM

To investigate the designing of a system able to

automatically recognize Cued Speech and convert it

to text.

Possible for deaf or speech-impaired individuals to

communicate with each other and also with normal-

hearing persons

Using gestures

captured by devices equipped by a camera

Visual-speech to text conversion applicable to telephone communication for deaf individuals

METHODS

Visual-speech to text conversion applicable to telephone communication for deaf individuals

Corpus, feature extraction, and

statistical modeling

The speakers’ lips were painted blue, and color

marks were placed on the speakers’ fingers. .

The data were derived from a video recording of

the cuers pronouncing and coding in Cued

Speech

landmarks with different colors were placed on

the fingers

faster and more accurate image processing

stage.

The audio part of the video recording was

synchronized with the image.

An automatic image processing method was

applied to the video lip width (A),

lip aperture (B),

lip area (S).

pinching of the upper lip (Bsup)

lower (Binf) lip

Visual-speech to text conversion applicable to telephone communication for deaf individuals

Concatenative feature fusion

Tracks and extracts the xy coordinates

each time frame,

uses those values as features in the

HMM modeling.

uses the concatenation of the

synchronous lip shape and hand features

as the joint feature vector given by,

Visual-speech to text conversion applicable to telephone communication for deaf individuals

Lip shape

feature vector,

Joint lip hand

feature vector,

Hand feature

vector,

Dimensionality of the

joint feature vector

Parameters used for lip

shape modeling.

Visual-speech to text conversion applicable to telephone communication for deaf individuals

RESULTS

Visual-speech to text conversion applicable to telephone communication for deaf individuals

Isolated word recognition

1. Recognition in normal-hearing subject

2. Recognition in deaf subject

Visual-speech to text conversion applicable to telephone communication for deaf individuals

3. Multi-speaker isolated word recognition:

investigate whether it is possible to train speaker-

independent HMMs for Cued Speech recognition.

The training data consisted of 750 words from the

normal-hearing subject, and 750 words from the

deaf subject.

For testing 700 words from normal-hearing subject

and 700 words from the deaf subject were used,

respectively.

Each state was modeled with a mixture of 4

Gaussian distributions.

For lip shape and hand shape integration,

concatenative feature fusion was used.

Visual-speech to text conversion applicable to telephone communication for deaf individuals

Visual-speech to text conversion applicable to telephone communication for deaf individuals

4. Continuous phoneme recognition

Phoneme correct for continuous phoneme word

recognition in the case of a normal-hearing subject.

Visual-speech to text conversion applicable to telephone communication for deaf individuals

Phoneme correct for continuous phoneme word

recognition in the case of a deaf subject.

Visual-speech to text conversion applicable to telephone communication for deaf individuals

Hand shapes and lips shape were integrated

using concatenative feature fusion and HMM-

based automatic recognition was conducted.

For continuous phoneme recognition, a 86%

phoneme correct was achieved for the normal-

hearing cuer and a 82.7% phoneme correct for

the dead cuer were achieved, respectively.

Speech in both normal-hearing and deaf

subjects were also conducted obtaining a

94.9% and a 89% accuracy, respectively.

.

CONCLUSION

Visual-speech to text conversion applicable to telephone communication for deaf individuals

A multi-speaker experiment using data

from both normal-hearing and deaf subject

showed a 89.6% word accuracy, on

average.

This result indicates that training speaker-

independent HMMs for Cued Speech using

a large number of subjects should not face

particular difficulties

CONCLUSION

Visual-speech to text conversion applicable to telephone communication for deaf individuals

REFERENCES

Visual-speech to text conversion applicable to telephone communication for deaf individuals

G. Potamianos, C. Neti, G. Gravier, A. Garg, and A.W. Senior,

“recent Advances in the automatic recognition of audiovisual

speech,” in Proceedings of the IEEE, vol. 91, issue 9, pp.

1306–1326, 2003.

S. Nakamura, K. Kumatani, and S. Tamura, “Multi-modal

temporal asynchronicity modeling by product hmms for

robust audio-visual speech recognition,” in Proceedings of

Fourth IEEE International Conference on Multimodal

Interfaces (ICMI’02), p. 305, 2002.

R. O. Cornett, “Cued speech,” American Annals of the Deaf,

vol. 112, pp. 3–13, 1967.

J. Leybaert, “Phonology acquired through the eyes and

spelling in deaf children,”Journal of Experimental Child

Psychology, vol. 75, pp. 291– 318, 2000

Thank you!

ANY

QUESTION

S?

Visual-speech to text conversion applicable to telephone communication for deaf individuals