SIFT-BASED ARABIC SIGN LANGUAGE
RECOGNITION (ArSL) SYSTEM
By
Alaa Tharwat1,3
And
Tarek Gaber2,3
1Faculty of Eng. Suez Canal University, Ismailia, Egypt
2Faculty of Computers & Informatics , Suez Canal University, Ismailia, Egypt
3Scientic Research Group in Egypt (SRGE), http://www.egyptscience.netSuez Canal University
AECIA 2014 –November17-19, Addis Ababa, Ethiopia
Introduction
Proposed Method
General framework
Feature extraction
Classification
Experimental Results
Conclusions
Agenda
3AECIA 2014 –November17-19, Addis Ababa, Ethiopia
Introduction
Proposed Method
General framework
Feature extraction
Classification
Experimental Results
Conclusions
Agenda
4AECIA 2014 –November17-19, Addis Ababa, Ethiopia
Introduction: Why ArSL
• Help vocally disabled people
to speak freely.
• Easy way of communication
with non-mute people.
• ArSL is the natural language
for deaf like spoken language
to vocal
Introduction: Aim of the work
Design a sign language recognition approach to transcribe
sign gestures into meaningful text or speech so that
communication between deaf and hearing society can
easily be made.
السيارةالشارع
What is ArSL?
Translating ArSL to spoken language, i.e. translate hand gestures to Arabic characters
Sign Language hand formations: Hand shape
Hand location
Hand movement
Hand orientation
Introduction: Types of ArSL
1- Vision-based Approach
Requires special set up for camera, but needs some
preprocessing and computational to extract features.
Ex
trac
tF
ea
ture
sCollect gestures
Cla
ss
ifica
tion
De
cis
ion
The Electronic-gloves consists of:• 22 sensors
• Light weight
• Flexible
2-ElectronicGlove-based Approach
Inconvenience of gloves, but ease of signal extractions
Introduction: Types of ARSL (Continue)
Introduction
Proposed Method
General framework
Feature extraction
Classification
Experimental Results
Conclusions
Agenda
10AECIA 2014 –November17-19, Addis Ababa, Ethiopia
Proposed Method: General Framework
Training Images Testing Images
Feature Vectors
SIFT Feature Extraction MethodDifference
of
Gaussian
Pyramid
KeyPoints
detection
Unreliable
KeyPoints
Eliminatio
n
Orientatio
n
Assignme
nt
Descriptor
Computatio
n
Feature Vectors
LDAMatching
Proposed Method: General Framework
Training phase
Collecting all training images (i.e.
gestures of Arabic Sign Language).
Extracting the features using SIFT
Representing each image by one feature
vector.
Applying a dimensionality reduction
(e.g, LDA) to reduce the number
features in the vector
Testing phase
Collecting the testing image,
Extract the features
Feature vector is projected on LDA
space.
Applying machine learning techniques
for classifying the test feature vector to
decide whether the animal is identified
or not).
Proposed Method: Feature Extraction
Feature Extraction SIFT (Scale Invariant Feature Transform
SIFT feature extraction algorithm
consists of the following steps:• Creating the Difference of Gaussian Pyramid
(Scale-Space Peak Selection)
• Extrema Detection
• Unreliable Keypoints Elimination
• Orientation Assignment
• Descriptor Computation Keypoints or Extrema
extracted from one image
(gesture) using SIFT
algorithm
Proposed Method: Feature Extraction
Matching between two getures based on SIFT features
Feature Extraction SIFT (Scale Invariant Feature Transform
Proposed Method: Feature Extraction
Feature Extraction SIFT (Scale Invariant Feature Transform
The Number of features extracted by SIFT depends its
parameters which has been considered in our experiment:• Peak Threshold (PeakThr)
• patch size (Psize)
• number of angels (Nangels) and number of bins (Nbins)
Proposed Method: Classification Techniques
We have used the following classifiers assess their performance
with our approach :
SVM is one of the classifers which deals with a problem of high dimensional
datasets and gives very good results.
K-NN: unknown patterns are distinguished based on the similarity to known
samples
Nearest Neighbor: Its idea is extremely simple as it does not require learning
Introduction
Proposed Method
General framework
Feature extraction
Classification
Experimental Results
Conclusions
Agenda
17AECIA 2014 –November17-19, Addis Ababa, Ethiopia
Experimental Results: Dataset
We have used 210 gray level images
with size 200x200.
These images represent 30 Arabic
characters, 7 images for each
character).
The images are collected in different
illumination, rotation, quality levels,
and image partiality.A sample of collected ArSL gestures
representing different characters .
Experimental Scenarios
We have designed three experiment Scenarios:
To select the most suitable parameters.
To understand the effect of changing the
number of training images.
To prove that our proposed method is
robust against rotation
To prove that our proposed method is
robust against occlusion.
Experimental Results
Experimental Results – 1st Scenario: Selecting SIFT parameters
Accuracy results (in %) of our approach based on different SIFT
parameters
NangelsPsizePeakThr
Classifier
s84232x3
2
16x1
6
8x84x40.20.10
10098.994.293.210099.294.294.297.7100NN
10098.996.393.610099.296.396.398.9100K-NN
10098.996.394.210010097.798.999.2100SVM
Experimental Results
Experimental Results – 2nd Scenario: Different Training No. of images
Accuracy results (in %) of our approach using different training
images
No. of Training ImagesClassier
135
98.999.2100Min. Dist.
98.998.9100k-NN (k=5)
98.999.100SVM
Experimental Results
Experimental Results – 3rd Scenario: Rotated images
Angles of rotation (o)MatchingF.E.M.
31527022518013590450
98.910097.810096.797.898.9100Min Dist.
SIFT
10010098.910096.7100100100k-NN_5
10010098.910098.998.9100100SVM
Accuracy in (%) of our approach when rotated images are used
Experimental Results
Experimental Results – 4th Scenario: Occluded images
Percentage of OcclusionMatchingF.E.M.
VerticalHorizontal
604020604020
32.295.698.934.493.398.9Nearest NeighborSIFT
53.396.797.838.995.697.8k-NN_5
45.696.798.952.295.698.9SVM
Accuracy of cattle identification based on image occlusion
(%)
Experimental Results
A comparison between proposed system and previous systems.
Accuracy results (in %) of our approach using different training
images
Accuracy in (%)Author93.5K. Assaleh et al. [1]
94.4Al-Jarrah et al. [6]
97.5Al-Jarrah et al. [9]
87Mohandes et al. [12]
99Our proposed
Introduction
Proposed Method
General framework
Feature extraction
Classification
Experimental Results
Conclusions
Agenda
25AECIA 2014 –November17-19, Addis Ababa, Ethiopia
Conclusions
Our proposal approach for ArSL Recognition
Achieve an excellent accuracy to identify ArSL from 2D images
Robust against to rotation images with different angels and occluded
images horizontally or vertically.
Robust against many previous ArSL approaches.
Performance of this approach is measured by
Using captured images with Matlab implementation
Comparison with related work
Future Work
Improving the results of in case of image
occlusion
Increase the size of the dataset to check its
scalability.
Identify characters from video frames and then
try to implement real time ArSL system.
Thanks Acknowledgement to By the respected co-authers
Abul Ella Hassenian3,4, M. K. Shahin 1, Basma Refaat 1
1Faculty of Eng. Suez Canal University, Ismailia, Egypt
2Faculty of Computers & Informatics , Suez Canal University, Ismailia, Egypt
3Faculty of Computers and Information, Cairo University, Egypt
4Scientic Research Group in Egypt (SRGE), http://www.egyptscience.netSuez Canal University