project 9 automatic fingersign to speech translator
DESCRIPTION
Project 9 Automatic Fingersign to Speech Translator. Final Presentation. The group. Lale Akarun. Oya Aran. Alp Kindiroglu. Alexey Karpov. Milos Zeleny. Marek Hruz. Hasim Sak. Pavel Campr. Erinc Dikici. Daniel Schorno. Zdenek Krnoul. Alexander Ronzhin. Objectives & System Flowchart. - PowerPoint PPT PresentationTRANSCRIPT
Final Presentation
Lale Akarun Oya Aran
Alexey Karpov
Milos Zeleny
Hasim Sak
Erinc Dikici
Alp Kindiroglu
Marek Hruz
Pavel Campr
Daniel Schorno
Alexander Ronzhin Zdenek Krnoul
Finger spelling <-> Speech (F2S & S2F) ◦ Translation between Russian, English, Czech, Turkish
Multilingual fingersign alphabet database◦ Turkish alphabet (5 subjects)◦ Czech alphabet (4 subjects)◦ Russian alphabet (2 subjects)◦ Numbers and special stop signs
Semi-Automatic annotation module:◦ 11 videos each 15-30 minutes
Filter Images
Select Keyframes
Crop Sign-Space
Segment Hand
Locations
Skin color based hand detection◦ Initialization of model by movement of hands
Video Input (Turkish or
Czech)Skin Color Detection
Keyframe Selection
Text Output (UTF 8)
Tracking and Segmentation
of hands
Feature Extraction & Classification
Tracking of the hands by Camshift
◦ Hierarchical hand and face redetection
◦ Hand segmentation Backprojection Double Differencing
Video Input (Turkish or
Czech)Skin Color Detection
Keyframe Selection
Text Output (UTF 8)
Tracking and Segmentation
of hands
Feature Extraction & Classification
Two tier classification:◦ Keyframe Selection◦ Gesture Recognition
Detection of Keyframes:◦ Motion of Hands
Displacement of tracked hand centers Changes in hand external contour
◦ Image Blur Strength of gradient trace around hand
contours
Video Input (Turkish or
Czech)Skin Color Detection
Keyframe Selection
Text Output (UTF 8)
Tracking and Segmentation
of hands
Feature Extraction & Classification
Hand gesture Descriptors:◦ Radial Distance Functions
◦ Elliptic Fourier Descriptors
◦ Local Binary Patterns
◦ Hu Moments Classification of each feature is done by KNN.
◦ Classified results for each feature are fused by voting. ◦ Optional word level fusion with Levenshtein Distance.
Video Input (Turkish or
Czech)Skin Color Detection
Keyframe Selection
Text Output (UTF 8)
Tracking and Segmentation
of hands
Feature Extraction & Classification
Continuous speech recognition: ◦ A weighted finite-state transducer based speech decoder◦ 3-gram language model◦ 100K vocabulary size
News portal based 10843 tri-phone HMM states
◦ 11 Gaussians for acoustic model ◦ 188 hours broadcast news speech data
Voice Activity Detection(VAD)◦ Preprocessing step on continious ASR◦ Identifies false voice triggers◦ Employed Methods:
Rabiner’s Method: Energy level and zero-crossing rates of the acoustic waveform
Supervised learning: Energy level of the signal modeled using GMMs
Isolated speech recognition:◦ Phoneme based speech recognition◦ Represented by HMMs using GMMs◦ Used for out-of-vocabulary words◦ Speech Commands allow module control
Python Based Web Service
◦ Handles Input/Output from multiple modules
◦ Users communicate using sessions
◦ All messages in utf-8 encoding or transcribed form
◦ Translation of sentences handled by Google Translate
◦ Messages types: Letter Word Sentence
Computer speech synthesis given an arbitrary input text
Two TTS systems are applied:◦ MARY TTS developed
by DFKI (Germany)
◦ TTS engine developed by UIIP (Belarus) and SPIIRAS (Russia).
Web-based service◦ Polls for messages from the web-server.
Visual Fingersign output provided through a 3D avatar
Available for two languages:◦ Czech Sign Alphabet◦ American Sign Alphabet
Module composed of:◦ 3D animation model
38 joints and segments (16 for hand)◦ Trajectory generator
Rotations of body parts handled with Inverse Kinematics
Head and lip motion provided by talking head system
Inputs and outputs words.
City names game◦ Module Design:
◦ Fingerspell-> Amsterdam Speech-> Madrid◦ Fingerspell-> Doha Speech-> Alta◦ Fingerspell-> Athens Speech-> Sukre◦ Fingerspell-> Eton Speech-> Nairobi
Visual Input (Turkish)
Audio Letter Input (Russian)
Finger Spelling
Recognition
Isolated Speech
Recognition
Finger Spelling
Synthesis
Speech Synthesis
Visual Output (Czech)
Audio Output
(English)
Server (Translator)
City names game◦ Fingerspell-> Amsterdam Speech-> Madrid◦ Fingerspell-> Doha Speech-> Alta◦ Fingerspell-> Athens Speech-> Sukre◦ Fingerspell-> Eton Speech-> Nairobi
Casual Continuous Conversation
Audio Sentence
Input (Turkish)
Isolated Speech
Recognition
Finger Spelling
Synthesis
Speech Synthesis
Visual Output (Czech)
Audio Output
(English)
Server (Translator)
Automated language detection for fingerspelling
Further testing
Increasing overall system speed
Addition of missing languages to underlying modules