project 9 automatic fingersign to speech translator

Final Presentation

Lale Akarun Oya Aran

Alexey Karpov

Milos Zeleny

Hasim Sak

Erinc Dikici

Alp Kindiroglu

Marek Hruz

Pavel Campr

Daniel Schorno

Alexander Ronzhin Zdenek Krnoul

Finger spelling <-> Speech (F2S & S2F) ◦ Translation between Russian, English, Czech, Turkish

Multilingual fingersign alphabet database◦ Turkish alphabet (5 subjects)◦ Czech alphabet (4 subjects)◦ Russian alphabet (2 subjects)◦ Numbers and special stop signs

Semi-Automatic annotation module:◦ 11 videos each 15-30 minutes

Filter Images

Select Keyframes

Crop Sign-Space

Segment Hand

Locations

Skin color based hand detection◦ Initialization of model by movement of hands

Video Input (Turkish or

Czech)Skin Color Detection

Keyframe Selection

Text Output (UTF 8)

Tracking and Segmentation

of hands

Feature Extraction & Classification

Tracking of the hands by Camshift

◦ Hierarchical hand and face redetection

◦ Hand segmentation Backprojection Double Differencing



Keyframe Selection

Text Output (UTF 8)


of hands


Two tier classification:◦ Keyframe Selection◦ Gesture Recognition

Detection of Keyframes:◦ Motion of Hands

Displacement of tracked hand centers Changes in hand external contour

◦ Image Blur Strength of gradient trace around hand

contours



Keyframe Selection

Text Output (UTF 8)


of hands


Hand gesture Descriptors:◦ Radial Distance Functions

◦ Elliptic Fourier Descriptors

◦ Local Binary Patterns

◦ Hu Moments Classification of each feature is done by KNN.

◦ Classified results for each feature are fused by voting. ◦ Optional word level fusion with Levenshtein Distance.



Keyframe Selection

Text Output (UTF 8)


of hands


Continuous speech recognition: ◦ A weighted finite-state transducer based speech decoder◦ 3-gram language model◦ 100K vocabulary size

News portal based 10843 tri-phone HMM states

◦ 11 Gaussians for acoustic model ◦ 188 hours broadcast news speech data

Voice Activity Detection(VAD)◦ Preprocessing step on continious ASR◦ Identifies false voice triggers◦ Employed Methods:

Rabiner’s Method: Energy level and zero-crossing rates of the acoustic waveform

Supervised learning: Energy level of the signal modeled using GMMs

Isolated speech recognition:◦ Phoneme based speech recognition◦ Represented by HMMs using GMMs◦ Used for out-of-vocabulary words◦ Speech Commands allow module control

Python Based Web Service

◦ Handles Input/Output from multiple modules

◦ Users communicate using sessions

◦ All messages in utf-8 encoding or transcribed form

◦ Translation of sentences handled by Google Translate

◦ Messages types: Letter Word Sentence

Computer speech synthesis given an arbitrary input text

Two TTS systems are applied:◦ MARY TTS developed

by DFKI (Germany)

◦ TTS engine developed by UIIP (Belarus) and SPIIRAS (Russia).

Web-based service◦ Polls for messages from the web-server.

Visual Fingersign output provided through a 3D avatar

Available for two languages:◦ Czech Sign Alphabet◦ American Sign Alphabet

Module composed of:◦ 3D animation model

38 joints and segments (16 for hand)◦ Trajectory generator

Rotations of body parts handled with Inverse Kinematics

Head and lip motion provided by talking head system

Inputs and outputs words.

City names game◦ Module Design:

◦ Fingerspell-> Amsterdam Speech-> Madrid◦ Fingerspell-> Doha Speech-> Alta◦ Fingerspell-> Athens Speech-> Sukre◦ Fingerspell-> Eton Speech-> Nairobi

Visual Input (Turkish)

Audio Letter Input (Russian)

Finger Spelling

Recognition

Isolated Speech

Recognition

Finger Spelling

Synthesis

Speech Synthesis

Visual Output (Czech)

Audio Output

(English)

Server (Translator)

City names game◦ Fingerspell-> Amsterdam Speech-> Madrid◦ Fingerspell-> Doha Speech-> Alta◦ Fingerspell-> Athens Speech-> Sukre◦ Fingerspell-> Eton Speech-> Nairobi

Casual Continuous Conversation

Audio Sentence

Input (Turkish)

Isolated Speech

Recognition

Finger Spelling

Synthesis

Speech Synthesis

Visual Output (Czech)

Audio Output

(English)

Server (Translator)

Automated language detection for fingerspelling

Further testing

Increasing overall system speed

Addition of missing languages to underlying modules

project 9 automatic fingersign to speech translator

Documents

speech recognitionrepresented

camshifthierarchical

hand centerschanges

webbased servicepolls

visual fingersign output

subjectsczech alphabet

subjectsrussian alphabet

energy level