isolated word recognition and sign language detection ... · linear prediction coding (lpc),...
TRANSCRIPT
© 2017, IJARCSMS All Rights Reserved 137 | P a g e
ISSN: 2321-7782 (Online) e-ISJN: A4372-3114 Impact Factor: 6.047
Volume 5, Issue 7, July 2017
International Journal of Advance Research in Computer Science and Management Studies
Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com
Isolated Word Recognition and Sign Language Detection using
LPC and MFCC Rubal
1
Computer Science and Engineering
Maharaja Agrasen University
Baddi – India
Vineet Mehan2
Computer Science and Engineering
Maharaja Agrasen University
Baddi – India
Abstract: Speech technology and systems in human computer interaction have witnessed a steady and important
advancement over last two decades. Today, speech technologies are commercially available for boundless but interesting
range of tasks. These technologies permit machines to respond correctly and consistently to human voices, and provide
useful and valuable services The aim of this paper is to investigate the algorithms of speech recognition on isolated words
and then link the voice data with Indian sign language gestures. The author programmed the systems for algorithms of
speech recognition and their comparative study in MATLAB. Author used combination of multiple algorithm to get high
accuracy rate of speech detection. There are two major algorithms used in this thesis. One is based on Linear Predictive
Coding (LPC). The other one is to use the Mel Frequency Ceptrum Coefficient (MFCC). The classification technique used
are Gaussian Mixture Model (GMM) and k- nearest neighbor (KNN). Two conclusions can be drawn from the results, the
first, MFCC has poor performance with short recording and second, LPC has better performance with short recording.
Keywords: Automatic Speech Recognition (ASR), Gaussian Mixture Model (GMM), Linear Predictive Coding (LPC), mel-
frequency cepstral coefficients (MFCC), k- nearest neighbor (KNN).
I. INTRODUCTION
Speech is the most natural way of communication. It provides an efficient means of man-machine communication.
Generally, transfer of information between human and machine is accomplished via keyboard, mouse etc. But human can speak
more quickly instead of typing. Speech input offers high bandwidth information and relative ease of use [14]. Speech
recognition can be defined as the process of converting an acoustic signal, captured by a microphone or a telephone, to a set of
words. It is done by means of Algorithm implemented as a computer program. Speech processing is one of the exciting areas of
signal processing. In this paper, we present a speech recognition system for isolated words. Linear Predictive Coding (LPC) and
Mel Frequency Ceptrum Coefficients (MFCC) are used to train and recognize the speech and GMM is used to classify the
features from the speech-utterances. The core of all speech recognition systems consists of a set of statistical models
representing the various sounds of the language to be recognized. speech has temporal structure and can be encoded as a
sequence of spectral vectors spanning the audio frequency range, LPC and MFCC both are feature extraction technique [11].
Apart from introduction in section 1, the paper is organized as follows. Section 2 presents the Feature extraction. Section 3
describes the MFCC.
Rubal et al., International Journal of Advance Research in Computer Science and Management Studies
Volume 5, Issue 7, July 2017 pg. 137-145
© 2017, IJARCSMS All Rights Reserved ISSN: 2321-7782 (Online) Impact Factor: 6.047 e-ISJN: A4372-3114 138 | P a g e
II. SPEECH RECOGNITION SYSTEM ARCHITECTURE
The speech recognition system architecture is shown in fig. 1. It consists of two modules, training and testing module.
Training module generates the system model which is used during testing.
A. Preprocessing:
Speech-signal is an analog waveform which cannot be directly processed by digital systems. Thus preprocessing is necessary
to transform the input speech into a form that can be processed by recognizer. To accomplish this, the speech-input is digitized or
sampled. The sampled speech-signal is then processed through the first-order filters to spectrally flatten the signal. This process,
known as pre-emphasis, increases the magnitude of higher frequencies with respect to the magnitude of lower frequencies. The
next step is to divide the speech-signal into the frames with frame size ranging from 10 to 25 milliseconds and an overlap of
50%−70% between consecutive frames [15].
B. Feature Extraction:
The goal of feature extraction is to find a set of properties of an utterance that have acoustic correlations to the speech-signal,
that is parameters that can somehow be computed or estimated through processing of the signal waveform that is speech
waveform. Such parameters are termed as features. The feature extraction process is expected to discard irrelevant information to
the task (ex. Silence part, noise) while keeping the useful one. It includes the process of measuring some important characteristic
of the signal such as energy or frequency response (i.e. signal measurement), augmenting these measurements with some
perceptually meaningful derived measurements (i.e. signal parameterization), and statically conditioning these numbers to form
observation vectors [16].
Fig. 1 Speech Recognition System Architecture
C. Model Generation:
The model is generated using various approaches such as Hidden Markov Model (HMM), Artificial Neural Networks
(ANN), Dynamic Bayesian Networks (DBN), Support Vector Machine (SVM) and hybrid methods (i.e. combination of two or
more approaches) as is used in this paper i.e. combination of LPC, MFCC, GMM and KNN.
D. Speech Transcription:
Speech Transcription component recognizes the test samples based on the acoustic properties of word.
III. SPEECH FUTURE EXTRACTION
Convert the speech waveform to a set of features (at a considerably lower information rate) for further analysis [18]. This is
often referred as the signal-processing front end. The speech signal is a slowly timed varying signal (it is called quasi-stationary).
An example of speech signal is shown in Figure 2. When examined over a sufficiently short period of time (between 5 and 100
msec), its characteristics are fairly stationary. However, over long periods of time (on the order of 1/5 seconds or more) the
signal characteristic change to reflect the different speech sounds being spoken. Therefore, short-time spectral analysis is the
most common way to characterize the speech signal.
Speech Signal
Preprocessing
Speech
Signal Feature
Extraction Phonetic Unit Recognition
Acoustic
Modelling
Language
Modelling
Decoded
Message
Rubal et al., International Journal of Advance Research in Computer Science and Management Studies
Volume 5, Issue 7, July 2017 pg. 137-145
© 2017, IJARCSMS All Rights Reserved ISSN: 2321-7782 (Online) Impact Factor: 6.047 e-ISJN: A4372-3114 139 | P a g e
Fig. 2 Example of speech signal
A wide range of possibilities exist for parametrically representing the speech signal for the speaker recognition task, such as
Linear Prediction Coding (LPC), Mel-Frequency Cepstrum Coefficients (MFCC), and others. MFCC is perhaps the best known
and most popular, and will be described in this paper.
IV. MEL-FREQUENCY CEPSTRUM COEFFICIENTS PROCESSOR
A block diagram of the structure of an MFCC processor is given in Fig 3. The speech input is typically recorded at a
sampling rate above 10000 Hz. This sampling frequency was chosen to minimize the effects of aliasing in the analog-to-digital
conversion. These sampled signals can capture all frequencies up to 5 kHz, which cover most energy of sounds that are
generated by humans. As been discussed previously, the main purpose of the MFCC processor is to mimic the behavior of the
human ears [19]. In addition, rather than the speech waveforms themselves, MFFC‟s are shown to be less susceptible to
mentioned variations.
Fig. 3 Block diagram of the MFCC processor
Frame blocking
In this step the continuous speech signal is blocked into frames of N samples, with adjacent frames being separated by M
(M < N). The first frame consists of the first N samples. The second frame begins M samples after the first frame, and overlaps
it by N - M samples and so on. This process continues until all the speech is accounted for within one or more frames. Typical
values for N and M are N = 256 (which is equivalent to ~ 30 msec windowing and facilitate the fast radix-2 FFT) and M =
100.used for recognition [14].
Windowing
The next step in the processing is to window each individual frame so as to minimize the signal discontinuities at the
beginning and end of each frame. The concept here is to minimize the spectral distortion by using the window to taper the
signal to zero at the beginning and end of each frame. If we define the window as 10),( Nnnw , where N is the
number of samples in each frame, then the result of windowing is the signal
0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
Time (second)
mel
cepstrum
mel
spectrum
framecontinuous
speech
Frame
Blocking
Windowing FFT spectrum
Mel-frequency
WrappingCepstrum
Rubal et al., International Journal of Advance Research in Computer Science and Management Studies
Volume 5, Issue 7, July 2017 pg. 137-145
© 2017, IJARCSMS All Rights Reserved ISSN: 2321-7782 (Online) Impact Factor: 6.047 e-ISJN: A4372-3114 140 | P a g e
10),()()( Nnnwnxny ll
Typically the Hamming window is used, which has the form:
10,1
2cos46.054.0)(
Nn
N
nnw
Fast Fourier Transform (FFT)
The next processing step is the Fast Fourier Transform, which converts each frame of N samples from the time domain into
the frequency domain. The FFT is a fast algorithm to implement the Discrete Fourier Transform (DFT), which is defined on the
set of N samples {xn}, as follow:
1
0
/2 1,...,2,1,0,N
n
Nknj
nk NkexX
In general Xk‟s are complex numbers and we only consider their absolute values (frequency magnitudes). The resulting
sequence {Xk} is interpreted as follow: positive frequencies 0 ≤ f < Fs/2 correspond to values
0 ≤ n ≤ N/2, while negative frequencies -FS/2 < f <0 correspond to N/2+1 ≤ n ≤ N-1. Here, Fs denotes the sampling
frequency.
The result after this step is often referred to as spectrum or periodogram.
Mel frequency Wrapping
As mentioned above, psychophysical studies have shown that human perception of the frequency contents of sounds for
speech signals does not follow a linear scale. Thus for each tone with an actual frequency, f, measured in Hz, a subjective pitch
is measured on a scale called the „mel‟ scale. The mel-frequency scale is a linear frequency spacing below 1000 Hz and a
logarithmic spacing above 1000 Hz
Fig. 4 An example of mel-spaced filterbank
One approach to simulating the subjective spectrum is to use a filter bank, spaced uniformly on the mel-scale (see Figure
3). That filter bank has a triangular bandpass frequency response, and the spacing as well as the bandwidth is determined by a
constant mel frequency interval. The number of mel spectrum coefficients, K, is typically chosen as 20. Note that this filter
bank is applied in the frequency domain, thus it simply amounts to applying the triangle-shape windows as in the Figure 4 to the
spectrum. A useful way of thinking about this mel-wrapping filter bank is to view each filter as a histogram bin (where bins
have overlap) in the frequency domain.
0 1000 2000 3000 4000 5000 6000 70000
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Mel-spaced filterbank
Frequency (Hz)
Rubal et al., International Journal of Advance Research in Computer Science and Management Studies
Volume 5, Issue 7, July 2017 pg. 137-145
© 2017, IJARCSMS All Rights Reserved ISSN: 2321-7782 (Online) Impact Factor: 6.047 e-ISJN: A4372-3114 141 | P a g e
Cepstrum
In this final step, we convert the log mel spectrum back to time. The result is called the mel frequency cepstrum
coefficients (MFCC). The cepstral representation of the speech spectrum provides a good representation of the local spectral
properties of the signal for the given frame analysis. Because the mel spectrum coefficients (and so their logarithm) are real
numbers, we can convert them to the time domain using the Discrete Cosine Transform (DCT). Therefore if we denote those
mel power spectrum coefficients that are the result of the last step are 1,...,2,0,~
0 KkS , we can calculate the MFCC's,
,~nc as
Where, n=0,1…..,K-1, Note that we exclude the first component, ,~0c from the DCT since it represents the mean value of
the input signal, which carried little speaker specific information.
V. LINEAR PREDICTIVE COEFFICIENT (LPC)
It is a powerful, robust, accurate, reliable and popular tool for speech recognition, compression and synthesis. The main
objective of LPC is frame-based analysis of the input speech signal to generate observational vectors. [21] It is a very simplified
method and belongs to spectral analysis part.
LPC technique can provide estimation of poles of vocal tract transfer function. Each sample in LPC can be approximated as
past samples in linear combination. In order to implement LPC and generate the features the input speech signals needs to pass
through pre-emphasizer, the output of pre-emphasizer acts as the input to frame blocking where the signal is blocked into
frames of N samples.
The next step after frame blocking is windowing where each frame is windowed in order to reduce signal discontinuity at
the beginning and end of every frame. Hamming frame is an example of typical frame. After windowing each windowed frame
is auto correlated and the highest autocorrelation value gives the order of LPC analysis and finally LPC coefficients are derived.
[26]
Fig. 5 Block Diagram of LPC implementation
VI. IMPLEMENTATION STRATEGIES
We came up with the idea of implementation of theories we have studied so far regarding LPC, GMM, KNN and MFCC.
Here is an algorithm which explains pretty much of implementation strategy [27]. The steps of proposed algorithm are described
as follows:
1. Start
K k n S c
K
k k n
2
1 cos )
~ (log ~
1
Rubal et al., International Journal of Advance Research in Computer Science and Management Studies
Volume 5, Issue 7, July 2017 pg. 137-145
© 2017, IJARCSMS All Rights Reserved ISSN: 2321-7782 (Online) Impact Factor: 6.047 e-ISJN: A4372-3114 142 | P a g e
2. Load speech dataset
3. Then applied MFCC algorithm for finding speech features i.e. training dataset.
4. Apply GMM for classification of MFCC
5. Extract MFCC for test data set
6. Classify MFCC test data on Mahanalobis distance
7. Similarly, LPC algorithm is applied for finding speech features.
8. Apply GMM for classification of LPC
9. Apply KNN‟s mahanalobis algorithm for LPC
10. Apply LPC for test data
11. Display output
12. Stop
As described above in the algorithm for implementation strategy, author will load the dataset of speech; this dataset
comprises voice samples of individuals who spoke six letters namely A, B, C, Five, Point, V. Each letter was spoken 25 times i.e.
each individual gave 150 voice samples. Here in this thesis we have voice data of three persons i.e. total voice samples used are
450. Each person gave voice sample in continuous mode hence each sample is of its own type, because of this variation in dataset
our system will recognize speech at a high accuracy rate. Individuals who provided data sample are named Rubal, F, Neeraj,
voice sample of first person were recorded in real environment i.e. in noised environment whereas later two persons‟ data are in
silent environment or controlled environment.[29]
In training section we fed images in jpeg format of Indian sign language gestures of hands, using these data sets our system
will be trained. These gestures are picked from internet, using google search engine. In testing section author stored images of
Indian sign language by using own hands so that to make a clear difference between trainig data and testing data.
Now author will apply MFCC over the dataset and training set to extract the features of voice samples. Then GMM will be
used for classification of MFCC, after MFCC a mahanalobis algorithm will be used to calculate the distance of MFCC vectors.
Same process will be used for the testing data set. In the next step author will calculate the LPC feature extraction and GMM
applied over the LPC for classification and Mahanalobis for distance calculation. Training and testing data set both undergoes the
same procedure.
We can understand the above explained algorithm in a better way from a flow chart. To construct a flow chart from the
algorithm we need to have basic knowledge of flow chart construction. Fig. 6 shows the flow chart for implementation of above
designed algorithm.
Automatic speech recognition system enables a machine to hear, comprehend and respond to the speech signal, which is
provided by the speaker as input. The methodology of the speech recognition system can be understood by the following
flowchart i.e. Fig 6. The input is given to the system; this input is in the form of speech signal that is then divided into smaller
components known as frames, also known as the analysis part. Then the feature extraction is performed and useful features are
extracted out of the voice signal.
After feature extraction the feature vectors are divided into two categories, training and testing [30]. In training speech
modeling is performed and in testing pattern matching is done. During recognition phase the test speech data is matched with the
training models and the result is given according to the best match. Thus, the speech recognition system can be classified mainly
into four phases.
Rubal et al., International Journal of Advance Research in Computer Science and Management Studies
Volume 5, Issue 7, July 2017 pg. 137-145
© 2017, IJARCSMS All Rights Reserved ISSN: 2321-7782 (Online) Impact Factor: 6.047 e-ISJN: A4372-3114 143 | P a g e
1. Analysis 2. Feature Extraction 3. Modeling 4. Testing or Matching
Fig. 6 Flow chart for implementation strategy
VII. RESULTS
This section describes the performance accuracy achieved via proposed methodology. Table 4.1 shows the comparative
study of proposed work with other algorithms, named as MFCC algorithm, LPC algorithm and Correlation method. To find the
recognition accuracy (R), the following formula is used.
R = (No. of times digit recognized / Total no. of trials) *100
TABLE 1Accuracy Rate determined by the system
VIII. CONCLUSION
Automatic Speech Recognition has been under scrutiny for many years but still completely accurate and efficient systems
have not been created. In this paper we have studied speech recognition system in depth and also few feature extraction
techniques like LPC and MFCC. It was observed that each technique has its own merits and demerits. There are many
limitations with the systems, such as it gets affected by background noise and give less efficient results, it also cannot identify
speeches from various users due to speech overlap and the system also undergoes problems while detecting accent and
pronunciation of the speakers. It can also be concluded from the study that majority of work in speech recognition has been
done for English language and comparatively less work has been carried out for other languages like Arabic and Indian. It can
Start
Load Dataset
Load Testing/Training Data
User
Name
Test sample
(Each word)
Recognized by MFCC
A B C Five Point V
Recognized by LPC
A B C Five Point V
Accuracy
Rate of
MFCC(R
)
Accuracy
Rate of
LPC(R)
Rubal 5 1 1 1 4 0 1
5 5 5 5 5 5
26.67 100
F 5 0 0 1 0 1 1
5 5 5 5 5 5
10 100
Neeraj 5 5 5 5 5 5 5
5 4 4 4 5 5
100 90
Apply MFCC Apply LPC
Apply GMM
Apply Mahanalobis algorithm
Display Output
Output
Rubal et al., International Journal of Advance Research in Computer Science and Management Studies
Volume 5, Issue 7, July 2017 pg. 137-145
© 2017, IJARCSMS All Rights Reserved ISSN: 2321-7782 (Online) Impact Factor: 6.047 e-ISJN: A4372-3114 144 | P a g e
also be observed from the study that for English language the recognition is most accurate and hence the rate is higher as
compared to others. Instead of implementing single feature extraction techniques for ASR and there is a need to develop
combination of one or more techniques i.e. hybrid techniques that will make the system more reliable, robust and provide more
accurate results. This thesis includes comparison between two-feature extraction techniques along with detailed study of these
feature extraction techniques and the main aim of this thesis is to provide researchers working in this area with an understanding
of differences between these commonly used feature extraction techniques. We managed to get an accuracy of around 90%
using LPC and around 40% for MFCC using the GMM classification. Comparing LPC with MFCC, the performance of MFCC
is better.
A C KNOWLEDGEMENT
The author would first like to express very profound gratitude to her parents and to her friends for providing her with
unfailing support and continuous encouragement throughout her years of study and through the process of researching and
writing this dissertation. This accomplishment would not have been possible without them.
References
1. Deng, Li; Douglas O'Shaughnessy, “Speech processing: a dynamic and optimization-oriented approach”, Marcel Dekker. pp. 41–48. ISBN 0-8247-4040-8,
2003.
2. Tai, Hwan-Ching; Chung, Dai-Ting, "Stradivari Violins Exhibit Formant Frequencies Resembling Vowels Produced by Females", Savart Journal. 1 (2),
June 14, 2012.
3. Min Xu; et al., "HMM-based audio keyword generation", In Kiyoharu Aizawa; Yuichi Nakamura; Shin'ichi Satoh. Advances in Multimedia Information
Processing – PCM 2004: 5th Pacific Rim Conference on Multimedia (PDF), Springer. ISBN 3-540-23985-5, 2004.
4. Sahidullah, Md.; Saha, Goutam, "Design, analysis and experimental evaluation of block based transformation in MFCC computation for speaker
recognition Speech Communication", 54 (4): 543–565. doi:10.1016/j.specom.2011.11.004, May 2012.
5. Fang Zheng, Guoliang Zhang and Zhanjiang Song, "Comparison of Different Implementations of MFCC," J. Computer Science & Technology, 16(6):
582–589, 2001.
6. S. Furui, "Speaker-independent isolated word recognition based on emphasized spectral dynamics", 1986.
7. European Telecommunications Standards Institute, “Speech Processing, Transmission and Quality Aspects (STQ); Distributed speech recognition; Front-
end feature extraction algorithm; Compression algorithms”. Technical standard ES 201 108, v1.1.3, 2003.
8. T. Ganchev, N. Fakotakis, and G. Kokkinakis, "Comparative evaluation of various MFCC implementations on the speaker verification task," in 10th
International Conference on Speech and Computer (SPECOM 2005), Vol. 1, pp. 191–194, 2005.
9. Meinard Müller, “Information Retrieval for Music and Motion”, Springer p. 65, ISBN 978-3-540-74047-6, 2007.
10. V. Tyagi and C. Wellekens, “On desensitizing the Mel-Cepstrum to spurious spectral components for Robust Speech Recognition, in Acoustics, Speech,
and Signal Processing”, 2005 Proceedings (ICASSP05). IEEE International Conference on, vol. 1, pp. 529–532, 2005.
11. Mermelstein, "Distance measures for speech recognition, psychological and instrumental," in Pattern Recognition and Artificial Intelligence, C. H. Chen,
Ed., pp. 374–388. Academic, New York, 1976.
12. S.B. Davis, and P. Mermelstein, "Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences," in
IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), pp. 357–366, 1980.
13. J. S. Bridle and M. D. Brown, "An Experimental Automatic Word-Recognition System", JSRU Report No. 1003, Joint Speech Research Unit, Ruislip,
England, 1974.
14. Nelson Morgan; Hervé Bourlard & Hynek Hermansky, "Automatic Speech Recognition: An Auditory Perspective", In Steven Greenberg & William A.
Ainsworth. Speech Processing in the Auditory System. Springer. p. 315. ISBN 978-0-387-00590-4, 2004.
15. L. C. W. Pols, "Spectral Analysis and Identification of Dutch Vowels in Monosyllabic Words," Doctoral dissertion, Free University, Amsterdam, The
Netherlands, 1966.
16. R. Plomp, L. C. W. Pols, and J. P. van de Geer, "Dimensional analysis of vowel spectra." J. Acoustical Society of America, 41(3):707–712, 1967.
17. Yuhuan Zhou, Jinming Wang, Xiongwei Zhang, “Research on Adaptive Speaker Identification Based on GMM”, 2009 International Forum on Computer
Science-Technology and Applications, IEEE, 2009.
18. Hiroshi Matsumoto and Masanora Moroto, “Evaluation of mel-lpc cepstrum in a large vocabulary continuous speech recognition”, IEEE published, 2001.
19. Kartiki Gupta, Divya Gupta,” An analysis on LPC, RASTA and MFCC techniques in Automatic Speech Recognition System”, IEEE published, 2016.
20. Jose B. Trangol Curipe and Abel Herrera Camacho, “Feature Extraction Using LPC-Residual and Mel Frequency Cepstral Coefficients in Forensic
Speaker Recognition”, International Journal of Computer and Electrical Engineering, Vol. 5, No. 1, February 2013.
Rubal et al., International Journal of Advance Research in Computer Science and Management Studies
Volume 5, Issue 7, July 2017 pg. 137-145
© 2017, IJARCSMS All Rights Reserved ISSN: 2321-7782 (Online) Impact Factor: 6.047 e-ISJN: A4372-3114 145 | P a g e
21. Abhishek Dixit, Abhinav Vidwans and Pankaj Sharma,” Improved MFCC and LPC Algorithm for Bundelkhandi Isolated Digit Speech Recognition”,
International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT) – 2016
22. Anuradha P Nair, Shoba Krishnan and Zia Saquib, “MFCC Based Noise Reduction in ASR Using Kalman Filtering”, Conference on Advances in Signal
Processing (CASP), IEEE published, 2016.
23. M. K. Brown and L. R. Rabiner, “On the Use of Energy in LPC-Based Recognition of Isolated Words”, The bell system technical journal Vol. 61, No. 10,
December 1982.
24. Saptarshi Boruah and Subhash Basishtha, “A Study on HMM based Speech Recognition System”, IEEE International Conference on Computational
Intelligence and Computing Research, 2013.
25. Kuldeep Kumar and R. K. Aggarwal, “Hindi speech recognition system using HTK”, International Journal of Computing and Business Research ISSN
(Online): 2229-6166 Volume 2 Issue 2 May 2011.
26. Bishnu Prasad Das and Ranjan Parekh, “Recognition of Isolated Words using Features based on LPC, MFCC, ZCR and STE, with Neural Network
Classifiers”, International Journal of Modern Engineering Research (IJMER) Vol.2, Issue.3, May-June 2012 pp-854-858 ISSN: 2249-6645.
27. Helping website, “https://sites.google.com/site/autosignlan”, Automatic Sign Language Detection.
28. Helping website “http://www.ifp.uiuc.edu/~minhdo/teaching/speaker_recognition”, DSP Mini-Project: An Automatic Speaker Recognition System,
29. Helping website, “https://www.wikipedia.com/speechrecognition”, Speech Recognition and its history.
30. Suma Swamy and K.V Ramakrishnan, “An efficient Speech Recognition System”, Computer Science & Engineering: An International Journal (CSEIJ),
Vol. 3, No. 4, August 2013.
AUTHOR (S ) P ROFILE
Rubal, received the B.Tech degree in Computer Science and Technology from Uttar Pradesh
Technical University in 2014. She is pursuing the M.tech in Computer Science and Technology from
Maharaja Agrasen University, Baddi, Himachal Pradesh, India.
Vineet Mehan, received the B.Tech degree in Information Technology from Kurukshetra University in
2003. He received the M.E. degree in Computer Science and Engineering from NITTTR, Panjab
University in 2008. He completed Ph.D. degree in Computer Science and Engineering from Dr. B.R.
Ambedkar National Institute of Technology, Jalandhar IN 2016. His research interests includes Image
Processing, Watermarking and Cryptography.