report voice recognigation
TRANSCRIPT
-
7/31/2019 Report voice recognigation
1/52
LIST OF FIGURES:
-
7/31/2019 Report voice recognigation
2/52
LIST OF TABLES:
-
7/31/2019 Report voice recognigation
3/52
ABSTRACT
Our aim is to provide the computer with a natural interface, including the ability
to understand human speech. For this purpose, we propose a way how to handle
the Computer System specially Windows 8 with voice command. At first, the userinitiates a given command by his voice through the microphone then the software
of the proposed system will take over to recognize the command. If the
recognition is succeeded or matched with one of the given voice command then it
will perform the operation according to speakers command. In our proposed
system we are going to use Microsoft Speech SDK for voice recognition process
and Voice-XML for creating the voice grammar in the software part. It has the
flexibility to work with the speech of any user.
Keywords-- Dynamic Programming Algorithm, Hidden Markov Model,Microphone, Microsoft speech SDK, Phonemes, Speech recognition, Voice- XML,
Windows 8.
-
7/31/2019 Report voice recognigation
4/52
1.INTRODUCTION:Recent years it has been seen that the improvements in the quality and
performance of speech-based human machine interaction is steady. The nextgeneration of speech based interface technology will enable easy to use
automation of new and existing communication services, making human machine
interaction more natural. For the disabled people the absence of the data bases
and diversity of the articulator handicaps are major obstacles for the construction
of reliable speech recognition systems, which explains poverty of the market in
systems of speech recognition for disabled people. If a person finds it difficult or is
not capable of handling the mouse ports and the keyboard and if the keyboard or
mouse is faulty, there have to be other ways to handle the operating system.
Speech may act as one of them. There is a growing demand for systems capableof handling Operating System using only the voice commands given by a person.
And this paper represents a way how to control the OS by using voice command it
also proves fruitful for surgeons while operating on a patient to retrieve his/her
previous records from the computers database. It is also applicable for consumer
electronics including games, mobile phones, vehicle navigation, speech ticket
reservations etc. As windows 8 is about to release so we are creating Speech
control for Windows 8.
Speech recognition (in many contexts also known as automatic speech
recognition, computer speech recognition or erroneously as voice recognition) is
the process of converting a speech signal to a sequence of words, by means of an
algorithm implemented as a computer program.
Speech recognition applications that have emerged over the last few years
include voice dialing (e.g., "Call home"), call routing (e.g., "I would like to make a
collect call"), simple data entry (e.g., entering a credit card number), preparation
of structured documents (e.g., a radiology report), domotic appliances controland content-based spoken audio search (e.g. find a podcast where particular
words were spoken).
Voice recognition or speaker recognition is a related process that attempts to
identify the person speaking, as opposed to what is being said.
http://en.wikipedia.org/wiki/Domotichttp://en.wikipedia.org/wiki/Speaker_recognitionhttp://en.wikipedia.org/wiki/Speaker_recognitionhttp://en.wikipedia.org/wiki/Domotic -
7/31/2019 Report voice recognigation
5/52
-
7/31/2019 Report voice recognigation
6/52
PARAMETERS RANGE
Speaking mode Isolated words to continue speech
Speaking style Read speech to spontaneous speech
Enrollment Speaking-dependent to speaker-independent
Vocabulary Small( 20,000 words )
Language model Finite-state to context-sensitive
Perplexity Small ( < 10 ) to large ( > 100)
SNR High ( > 30 dB ) to low ( < 10 dB)
Transducer Voice-cancelling microphone to telephone
Table1.1: Typical parameters used to characterize the capability of speech recognition systems
Speech recognition is a difficult problem, largely because of the many sources of
variability associated with the signal. First, the acoustic realizations of phonemes,
the smallest sound units of which words are composed, are highly dependent on
the context in which they appear. Thesephonetic variabilities are exemplified by
the acoustic differences of the phoneme in two, true, and butter in American
English. At word boundaries, contextual variations can be quite dramatic---making
gas shortage sound like gash shortage in American English, and devo andare
sound like devandare in Italian.
Second, acoustic variabilities can result from changes in the environment as well
as in the position and characteristics of the transducer. Third, within-speaker
variabilities can result from changes in the speaker's physical and emotional state,
speaking rate, or voice quality. Finally, differences in sociolinguistic background,
dialect, and vocal tract size and shape can contribute to cross-speaker
variabilities.
Figure shows the major components of a typical speech recognition system. The
digitized speech signal is first transformed into a set of useful measurements orfeatures at a fixed rate, typically once every 10--20 msec (see sections for signal
representation and digital signal processing, respectively). These measurements
are then used to search for the most likely word candidate, making use of
constraints imposed by the acoustic, lexical, and language models. Throughout
this process, training data are used to determine the values of the model
parameters.
-
7/31/2019 Report voice recognigation
7/52
Figure1.1: Components of a typical speech recognition system.
Speech recognition systems attempt to model the sources of variability described
above in several ways. At the level of signal representation, researchers have
developed representations that emphasize perceptually important speaker-
independent features of the signal, and de-emphasize speaker-dependent
characteristics. At the acoustic phonetic level, speaker variability is typically
modeled using statistical techniques applied to large amounts of data. Speaker
adaptation algorithms have also been developed that adapt speaker-independent
acoustic models to those of the current speaker during system use. Effects of
linguistic context at the acoustic phonetic level are typically handled by training
separate models for phonemes in different contexts; this is called context
dependent acoustic modeling.
Word level variability can be handled by allowing alternate pronunciations of
words in representations known as pronunciation networks. Common alternate
pronunciations of words, as well as effects of dialect and accent are handled by
allowing search algorithms to find alternate paths of phonemes through these
networks. Statistical language models, based on estimates of the frequency of
occurrence of word sequences, are often used to guide the search through the
most probable sequence of words.
The dominant recognition paradigm in the past fifteen years is known as hidden
Markov models (HMM). An HMM is a doubly stochastic model, in which the
generation of the underlying phoneme string and the frame-by-frame, surface
acoustic realizations are both represented probabilistically as Markov processes,
as discussed in sections. Neural networks have also been used to estimate the
-
7/31/2019 Report voice recognigation
8/52
frame based scores; these scores are then integrated into HMM-based system
architectures, in what has come to be known as hybrid systems, as described in
section.
An interesting feature of frame-based HMM system is that speech segments areidentified during the search process, rather than explicitly. An alternate approach
is to first identify speech segments, then classify the segments and use the
segment scores to recognize words. This approach has produced competitive
recognition performance in several tasks
-
7/31/2019 Report voice recognigation
9/52
2.LITERATURE SURVEY:HIDDEN MARKOV MODEL (HMM)-BASED SPEECH RECOGNITION:
Modern general-purpose speech recognition systems are generally based on
(HMMs). This is a statistical model which outputs a sequence of symbols or
quantities. One possible reason why HMMs are used in speech recognition is that
a speech signal could be viewed as a piece-wise stationary signal or a short-time
stationary signal. That is, one could assume in a short-time in the range of 10
milliseconds, speech could be approximated as a stationary process. Speech could
thus be thought as a Markov model for many stochastic processes (known as
states).
Another reason why HMMs are popular is because they can be trained
automatically and are simple and computationally feasible to use. In speech
recognition, to give the very simplest setup possible, the hidden Markov model
would output a sequence of n-dimensional real-valued vectors with n around, say,
13, outputting one of these every 10 milliseconds. The vectors, again in the very
simplest case, would consist ofcepstral coefficients, which are obtained by taking
a Fourier transform of a short-time window of speech and decor relating the
spectrum using a cosine transform, then taking the first (most significant)
coefficients. The hidden Markov model will tend to have, in each state, astatistical distribution called a mixture of diagonal covariance Gaussians which will
give likelihood for each observed vector. Each word, or (for more general speech
recognition systems), each phoneme, will have a different output distribution; a
hidden Markov model for a sequence of words or phonemes is made by
concatenating the individual trained hidden Markov models for the separate
words and phonemes.
Described above are the core elements of the most common, HMM-based
approach to speech recognition. Modern speech recognition systems use variouscombinations of a number of standard techniques in order to improve results
over the basic approach described above. A typical large-vocabulary system
would need context dependency for the phones (so phones with different left and
right context have different realizations as HMM states); it would use cepstral
normalization to normalize for different speaker and recording conditions; for
http://en.wikipedia.org/wiki/Stationary_processhttp://en.wikipedia.org/wiki/Markov_modelhttp://en.wikipedia.org/wiki/Cepstrumhttp://en.wikipedia.org/wiki/Fourier_transformhttp://en.wikipedia.org/wiki/Phonemehttp://en.wikipedia.org/wiki/Phonemehttp://en.wikipedia.org/wiki/Fourier_transformhttp://en.wikipedia.org/wiki/Cepstrumhttp://en.wikipedia.org/wiki/Markov_modelhttp://en.wikipedia.org/wiki/Stationary_process -
7/31/2019 Report voice recognigation
10/52
further speaker normalization it might use vocal tract length normalization (VTLN)
for male-female normalization and maximum likelihood linear regression (MLLR)
for more general speaker adaptation. The features would have so-called delta and
delta-delta coefficients to capture speech dynamics and in addition might use
heteroscedastic linear discriminant analysis (HLDA); or might skip the delta anddelta-delta coefficients and use splicing and an LDA-based projection followed
perhaps by heteroscedastic linear discriminant analysis or a global semitied
covariance transform (also known as maximum likelihood linear transform, or
MLLT). Many systems use so-called discriminative training techniques which
dispense with a purely statistical approach to HMM parameter estimation and
instead optimize some classification-related measure of the training data.
Examples are maximum mutual information (MMI), minimum classification error
(MCE) and minimum phone error (MPE).
Decoding of the speech (the term for what happens when the system is presented
with a new utterance and must compute the most likely source sentence) would
probably use the Viterbi algorithm to find the best path, and here there is a
choice between dynamically creating a combination hidden Markov model which
includes both the acoustic and language model information, or combining it
statically beforehand (the finite state transducer, or FST, approach).
NEURAL NETWORK-BASED SPEECH RECOGNITION:
Another approach in acoustic modeling is the use ofneural networks. They are
capable of solving much more complicated recognition tasks, but do not scale as
well as HMMs when it comes to large vocabularies. Rather than being used in
general-purpose speech recognition applications they can handle low quality,
noisy data and speaker independence. Such systems can achieve greater accuracy
than HMM based systems, as long as there is training data and the vocabulary is
limited. A more general approach using neural networks is phoneme recognition.
This is an active field of research, but generally the results are better than for
HMMs. There are also NN-HMM hybrid systems that use the neural network part
for phoneme recognition and the hidden markov model part for language
modeling.
http://www.clsp.jhu.edu/~kumar/thesis.pshttp://en.wikipedia.org/wiki/Viterbi_algorithmhttp://en.wikipedia.org/wiki/Artificial_neural_networkshttp://en.wikipedia.org/wiki/Artificial_neural_networkshttp://en.wikipedia.org/wiki/Viterbi_algorithmhttp://www.clsp.jhu.edu/~kumar/thesis.ps -
7/31/2019 Report voice recognigation
11/52
DYNAMIC TIME WARPING (DTW)-BASED SPEECH RECOGNITION:
Dynamic time warping is an approach that was historically used for speech
recognition but has now largely been displaced by the more successful HMM-
based approach. Dynamic time warping is an algorithm for measuring similaritybetween two sequences which may vary in time or speed. For instance,
similarities in walking patterns would be detected, even if in one video the person
was walking slowly and if in another they were walking more quickly, or even if
there were accelerations and decelerations during the course of one observation.
DTW has been applied to video, audio, and graphics -- indeed, any data which can
be turned into a linear representation can be analyzed with DTW.
A well-known application has been automatic speech recognition, to cope with
different speaking speeds. In general, it is a method that allows a computer tofind an optimal match between two given sequences (e.g. time series) with
certain restrictions, i.e. the sequences are "warped" non-linearly to match each
other. This sequence alignment method is often used in the context of hidden
Markov models.
LIMITATIONS:
Natural voice recognition system faces a major drawback of spontaneous voicerecognition, namely hesitations, out of vocabulary. An efficient dialogue design
can greatly improve the performance of the voice interface. People should be
trained as to how the commands should be pronounced so as to get accurate
results. This software will prove to be a boon to the people who are physically
disabled and are unable to use mouse and keyboard as external input device. If
the ports of mouse and keyboard do not work properly then one can also operate
operating system using this software. It may save our cost from purchasing of
mouse and keyboard.
Speech Recognition Engine may take some unwanted signals i.e. noise in theenvironment, which are not required for our command. For such unwanted
signals sometimes our command cannot be recognized properly and is executed
in a changed manner. This may be the main limitation of this software.
-
7/31/2019 Report voice recognigation
12/52
3.PROJECT STATEMENT:
Fig 3.1:
As shown, the user provides voice commands through microphone. The given
command is then converted into electrical pulse by the microphone. The sound
card converts electrical pulse into digital signal. The Speech Recognition Engine
then converts digital signals into phonemes and finally we get text command. The
respective operation is thus performed. This procedure repeats for every voice
command.
Modules:
1. Phonemes Extraction2. HMM3. SAPI4. XML Database5. Action Applier
-
7/31/2019 Report voice recognigation
13/52
1) Phonemes Extraction:
Phonemes are the linguistic units. They are the sounds that group together to
form our words, although how a phoneme converts into sound depends on many
factors including the surrounding phonemes, speaker accent and age. English usesabout 44 phonemes to convey the 500,000 or so words it contains, making them a
relatively good data item for speech recognition engines to work with. These
phonemes are extracted by Microsoft Speech SDK.
SOME EXAMPLES OF PHONEMES USED IN WORDS
From the extracted phonemes we get the command in text format.
2) USE OF HMM (HIDDEN MARKOV MODEL):
Now we have a list of phonemes extracted from the given input command. These
phonemes need to be combined and converted into word. The most common
method is to use a Hidden Markov Model (HMM). A Markov Model (in a speech
recognition context) is basically a chain of phonemes that represent a word. The
chain can branch, and if it does, is statistically balanced. HMMs function as
probabilistic finite state machines: The model consists of a set of states, and its
topology specifies the allowed transitions between them. At every time frame, an
HMM makes a probabilistic transition from one state to another and emits a
feature vector with each transition.
-
7/31/2019 Report voice recognigation
14/52
The use of Hidden Markov Models (HMM) may improve the accuracy to recognize
words in view of the fact that HMM takes into account the probabilities of
transition among phonemes.
Fig 3.2: Example of HMM
3) SAPI (SPEECH APPLICATION PROGRAMMING INTERFACE):
SAPI is an interface between our application platform and Microsoft Speech
Engine. It provides the word formed by HMM to our programming platform which
is further compared with the voice-xml database. The speech recognition engine
that is utilized by this voice controlled system is the Microsoft's speech
recognition engine and the associated development kit 5.1 (Microsoft Speech SDK
5.1). The recognition rate of Microsoft's speech recognition engine is not high in
continuous speech mode but extremely high under the command control mode.We use (SAPI) to implement voice function. SAPI provides a high level interface
between applications and speech engine. Controlling and management of various
speech engines need real-time operation technology. However, SAPI realizes and
hides the underlying technical detail.
-
7/31/2019 Report voice recognigation
15/52
There are two basic types of SAPI engines: text-to-speech (TTS) systems and
speech recognizers. The TTS systems can synthesize text strings and files into
spoken audio using synthetic voices, whereas Speech recognizers can convert
human spoken audio into readable text strings and files. Speech engine
communicates with SAPI by the device driver interface (DDI) layer and SAPIcommunicates with applications by API. So by the use of these application
interfaces, voice recognition and speech synthesis software can be developed.
Dynamic Programming Algorithm:
In this type of speech recognition technique the input voice data is converted to
commands. The recognition process then consists of matching the incoming
speech with stored commands. The command with the lowest distance measurefrom the input pattern is the recognized word. The best match (lowest distance
measure) is based upon dynamic programming. This is called a Dynamic Time
Warping (DTW) word recognizer.
Two important concepts in DTW are,
a) Features: The information in each signal has to be represented in some
manner.
b) Distances: Some form of metric has been used in order to obtain a match path.
There are two types:
Local: A computational difference between a feature of one signal and a feature
of the other.
Global: The overall computational difference between an entire signal and
another signal of possibly different length.
Speech is a time-dependent process. So the utterances of the same word will
have different durations, and utterances of the same word with the same
duration will differ in the middle, due to different parts of the words being spoken
at different rates. To obtain a global distance between two speech patterns a timeversus time comparison must be performed using a time-time matrix.
We obtain a global distance between two speech patterns using a time-time"
matrix. As an illustration, consider input SsPEEhH which is a 'noisy' version of
-
7/31/2019 Report voice recognigation
16/52
the reference word SPEECH. The time-time matrix for this illustration will be
as follows:
Fig 3.3: Time-Time matrix
If D (i,j) is the global distance up to (i,j) and the local distance at (i,j) is given by
d(i,j)
D (i, j) = min [D (i-1, j-1), D (i-1, j), D (i, j-1)] + d (i, j) (1)
Where d (i, j) is calculated using the Euclidean distance metric given by
D(x, y) = ( (xj - yj) 2) 1/2 . (2)
-
7/31/2019 Report voice recognigation
17/52
Initial condition will be D (1, 1) = d (1, 1).
The final global distance D (n,N) is calculated recursively using the base condition
as terminating condition. The final global distance D (n,N) gives us the overallmatching score of the reference word with the input. The input word is then
recognized as the word corresponding to the reference command in the database
with the lowest matching score.
This algorithm ensures a polynomial complexity: O (n2v),
Where n is sequences lengths and v is the total number of commands in our
dictionary.
XML Database
The grammar of the commands used in our paper is stored in a XML file. Here, inour paper we are using XML file referred to as Voice-XML as our database. The
Reference word used in our algorithm for comparison with the input word is
taken from our Voice-XML database. When input command matches with the
stored grammar the specific operation related to the command gets executed.
-
7/31/2019 Report voice recognigation
18/52
4.SYSTEM REQUIREMENT AND SPECIFICATION: UML DIAGRAM
-
7/31/2019 Report voice recognigation
19/52
DFDDFD Level 0:
DFD Level 1:
DFD Level 2:
System
Syste
m
Speech
Recognition
Text
Command
-
7/31/2019 Report voice recognigation
20/52
CONTROL FLOW DIAGRAM
CLASS DIAGRAM
-
7/31/2019 Report voice recognigation
21/52
ACTIVITY DIAGRAMact Activ ity Diagram
Start
initialize
speech
engine
receiv esound
sendsound
receivespeechengine
call HMM
compare text
command
compare
XML
command Found
Stop
command
stop
Perform Action CommandYes
Stop command
-
7/31/2019 Report voice recognigation
22/52
COMPONENT DIAGRAM
DEPLOYMENT DIAGRAM
deployment Deployment
Computer
deployment spec
Windows 8
deployment spec
.net 4.0
device
microphone
Speech Recognition
Engine
Text Commander
-
7/31/2019 Report voice recognigation
23/52
HARDWARE AND SOFTWARE REQUIREMENT:
Hardware Requirements:
System : Pentium IV 2.4 GHz. Hard Disk : 40 GB. Floppy Drive : 1.44 Mb. Monitor : 15 VGA Colour. Mouse : Logitech. Ram : 512 Mb.
Software Requirements:
Operating system : Windows 8. Coding Language : C# ,DOT NET visual studio 2012 Data Base : MS SQL Server 2008
-
7/31/2019 Report voice recognigation
24/52
5.PLANING AND SHEDULING THE PROJECT WORK:
SOFTWARE ENGINEERING APPROACH:
Fig 5.1: Incremental Model
-
7/31/2019 Report voice recognigation
25/52
Incremental Development and Release:
Developing systems through incremental release requires first providing essential
operating functions, then providing system users with improved and more
capable versions of a system at regular intervals .This model combines the classic
software life cycle with iterative enhancement at the level of system development
organization. It also supports a strategy to periodically distribute software
maintenance updates and services to dispersed user communities. This in turn
accommodates the provision of standard software maintenance contracts. It is
therefore a popular model of software evolution used by many commercial
software firms and system vendors. This approach has also been extended
through the use of software prototyping tools and techniques, which more
directly provide support for incremental development and iterative release for
early and ongoing user feedback and evaluation. Figure 2 provides an example
view of an incremental development, build, and release model for engineering
large Ada-based software systems, Incremental release of software functions
and/or subsystems (developed through stepwise refinement) to separate in-
house quality assurance teams that apply statistical measures and analyses as the
basis for certifying high-quality software systems.
-
7/31/2019 Report voice recognigation
26/52
REQUIREMENT ANALYSIS:
NORMAL REQUIREMENTS:
1. User interfaces:In our system we provide a GUI on both server and client side. The users of the
system can communicate with the help of , LAN and make use of the GUI available
to them to execute.
2. Hardware interfaces:There are few hardware interfaces to the system:
MICROPHONE3. Software Interface:
There are few software interfaces for the system:
MICROPHONE drivers in order to install the MICROPHONE.4. Communication Interface:
Following communication interface required by the system:
MICROPHONE:In order to communicate with the MICROPHONE.
Expected Requirements:
1. Performance Requirement: The microphone which we will be using for sending the recognition should
be noise free and should have freer bandwidth.
-
7/31/2019 Report voice recognigation
27/52
The switch needs to have exactly the same number of clients mentioned inthe system.
2. Safety Requirements: To keep this system safe care should be taken to avoid the theft of
components of the system.
The input voltage for MICROPHONE should not be more than the standardsapplied to them.
3. Security Requirements: The server needs not to share any drives for networking thus avoiding data
theft.
4. Software Quality Attributes: To keep interfacing MICROPHONE modem more flexible.
NON FUNCTIONAL REQUIREMENT STUDY
The nonfunctional requirement of the project is analyzed in this phase and
business proposal is put forth with a very general plan for the project and some
cost estimates. During system analysis the nonfunctional requirement study of
the proposed system is to be carried out. This is to ensure that the proposed
system is not a burden to the company. For nonfunctional requirement analysis,
some understanding of the major requirements for the system is essential.
-
7/31/2019 Report voice recognigation
28/52
Three key considerations involved in the nonfunctional requirement analysis are
ECONOMICAL NON FUNCTIONAL REQUIREMENT TECHNICAL NON FUNCTIONAL REQUIREMENT SOCIAL NON FUNCTIONAL REQUIREMENT
ECONOMICAL NON FUNCTIONAL REQUIREMENT
This study is carried out to check the economic impact that the system
will have on the organization. The amount of fund that the company can pour into
the research and development of the system is limited. The expenditures must be
justified. Thus the developed system as well within the budget and this was
achieved because most of the technologies used are freely available. Only the
customized products had to be purchased.
TECHNICAL NON FUNCTIONAL REQUIREMENT
This study is carried out to check the technical nonfunctional
requirement, that is, the technical requirements of the system. Any system
developed must not have a high demand on the available technical resources.
This will lead to high demands on the available technical resources. This will lead
to high demands being placed on the client. The developed system must have a
-
7/31/2019 Report voice recognigation
29/52
modest requirement, as only minimal or null changes are required for
implementing this system.
SOCIAL NON FUNCTIONAL REQUIREMENT
The aspect of study is to check the level of acceptance of the system by
the user. This includes the process of training the user to use the system
efficiently. The user must not feel threatened by the system, instead must accept
it as a necessity. The level of acceptance by the users solely depends on the
methods that are employed to educate the user about the system and to make
him familiar with it. His level of confidence must be raised so that he is also able
to make some constructive criticism, which is welcomed, as he is the final user of
the system.
Excited Requirements:
1. User should not enter message or question to that is not appropriate withclose domain FAQ
2. System should respond to each SMS in appropriate manner by usingtemplate matching algorithm.
-
7/31/2019 Report voice recognigation
30/52
REQUIREMENT VALIDATION
1. Organization and Completeness
1. Are all internal cross-references to other requirements correct? Yes2. Are all requirements written at a consistent and appropriate level of detail? Yes3. Do the requirements provide an adequate basis for design? Yes4. Is the implementation priority of each requirement included? No5. Are all external hardware, software, and communication interfaces defined? Yes6. Have algorithms intrinsic to the functional requirements been defined? Yes7. Does the SRS include all of the known customer or system needs? Yes8. Is any necessary information missing from a requirement? If so, is it identified as TBD? No9. Is the expected behavior documented for all anticipated error conditions? Yes
2. Correctness
1. Do any requirements conflict with or duplicate other requirements? No2. Is each requirement written in clear, concise, unambiguous language? No3. Is each requirement verifiable by testing, demonstration, review, or
analysis?
Yes
4. Is each requirement in scope for the project? Yes5. Is each requirement free from content and grammatical errors? Yes6. Can all of the requirements be implemented within known constraints? Yes7. Are any specified error messages unique and meaningful? No
-
7/31/2019 Report voice recognigation
31/52
3. Quality Attributes
1. Are all performance objectives properly specified? Yes2. Are all security and safety considerations properly specified? Yes3. Are other pertinent quality attribute goals explicitly documented and
quantified, with the acceptable tradeoffs specified?Yes
4. Traceability
1. Is each requirement uniquely and correctly identified? Yes2. Can each software functional requirement be traced to a higher-
level requirement (e.g., system requirement, use case)?Yes
5. Special Issues
1. Are all requirements actually requirements, not design or implementation solutions? Ye2. Are the time-critical functions identified, and timing criteria specified for them? Ye3. Are all significant consumers of scarce resources (memory, network bandwidth,
processor capacity, etc.) identified, and is their anticipated resource consumption
specified?
Ye
4. Have internationalization issues been adequately addressed? No
-
7/31/2019 Report voice recognigation
32/52
SYSTEM IMPLEMENTATION PLAN:
1. EFFORT ESTIMATE TABLE:Task Effort weeks Deliverables Milestones
Analysis of existing systems & compare with
proposed one
4 weeks
Literature survey 1 weeks
Designing & planning 2 weeks
o System flow 1 weekso Designing modules & its
deliverables
2 week Modules
design document
Implementation 7 weeks Primary system
Testing 4 weeks Test Reports Formal
Documentation 2 weeks Complete project
report
Formal
Table 5.1 : Effort Estimate Table
-
7/31/2019 Report voice recognigation
33/52
2. PHASE DESCRIPTION:
Phase Task Description
Phase 1 Analysis Analyze the information given in the IEEE paper.
Phase 2 Literature survey Collect raw data and elaborate on literature surveys.
Phase 3 Design Assign the module and design the process flow
control.
Phase 4 Implementation Implement the code for all the modules and integrate
all the modules.
Phase 5 Testing Test the code and overall process weather the
process works properly.
Phase 6 Documentation Prepare the document for this project with conclusion
and future enhancement.
Table 5.2: Phase Description
3. PROJECT PLAN
Date
Phase
Jun
/11
Jul
/11
Au
g/11
Sep/11
Oc
t/11
No
v/11
De
c/11
Jan
/11
Feb/12
Ma
r/12
Phase 1
Phase 2
Phase 3
-
7/31/2019 Report voice recognigation
34/52
Phase 4
Phase 5
Phase 6
Table 5.3: Project Plan
4. ESTIMATION OF KLOC:
The number of lines required for implementation of various modules can
be estimated as follows:
Sr.No. Modules KLOC
1. Graphical User Interface 0.50
2. User authentication Code 0.20
3. Database Code 0.60
4. Web Design Code 0.50
5. Device Drivers 0.40
6. Interfacing Code 0.20
Table 5.4: Estimation of KLOC
Thus the total number of lines required is approximately 2.40 KLOC.
-
7/31/2019 Report voice recognigation
35/52
Efforts:
E=3.2* (KLOC) ^1.02
E=3.2* (2.40) 1.02
E=7.82 person-month
Development Time (In Months):
D=E / N
D=7.82 /3
D=2.66months.
Number of Persons:
4 persons are required to complete the project with given time span
successful.
-
7/31/2019 Report voice recognigation
36/52
FEASIBILITY ASSESSMENT:
What are P, NP-Complete, and NP-Hard? When solving problems we have to
decide the difficulty level of our problem. There are three types of classes
provided for that. These are as follows:
1) P Class
2) NP-hard Class
3) NP-Complete Class
A decision problem is in P if there is a known polynomial-time algorithm to get
that answer. A decision problem is in NP if there is a known polynomial-time
algorithm for a non-deterministic machine to get the answer. Problems known to
be in P are trivially in NP the nondeterministic machine just never troubles
itself to fork another process, and acts just like a deterministic one.
But there are some problems which are known to be in NP for which no poly-
time deterministic algorithm is known; in other words, we know theyre in NP, but
dont know if theyre in P.A problem is NP-complete if you can prove that (1) its
in NP, and (2) show that its poly-time reducible to a problem already known to be
NP-complete.
A problem is NP-hard if and only if its at least as hard as an NP-complete
problem. The more conventional Traveling Salesman Problem of finding the
shortest route is NP-hard, not strictly NP-complete.
-
7/31/2019 Report voice recognigation
37/52
For Project:
A: Voice Communication
B: Algorithmic Processing
Time Complexity = Am
+Bn
------------ (1)
So Project Feasible and its under Permutable Class (P - Class)
Explanation:
To process Voice in and out of our system it will take some time let us consider
that m.
And to process each of the algorithms it will also requires some time. Let us
consider that time as n. Because Equation 1 and Definition of P-class project is in
P-Class Type of Feasibility
ECONOMICAL FEASIBILITY
This study is carried out to check the economic impact that the system will have
on the organization. The amount of fund that the company can pour into the
research and development of the system is limited. The expenditures must be
justified. Thus the developed system as well within the budget and this was
-
7/31/2019 Report voice recognigation
38/52
achieved because most of the technologies used are freely available. Only the
customized products had to be purchased.
TECHNICAL FEASIBILITY
This study is carried out to check the technical feasibility, that is, the technical
requirements of the system. Any system developed must not have a high demand
on the available technical resources. This will lead to high demands on the
available technical resources. This will lead to high demands being placed on the
client. The developed system must have a modest requirement, as only minimal
or null changes are required for implementing this system.
SOCIAL FEASIBILITY
The aspect of study is to check the level of acceptance of the system by the user.
This includes the process of training the user to use the system efficiently. The
user must not feel threatened by the system, instead must accept it as a
necessity. The level of acceptance by the users solely depends on the methods
that are employed to educate the user about the system and to make him familiar
with it. His level of confidence must be raised so that he is also able to make some
constructive criticism, which is welcomed, as he is the final user of the system.
-
7/31/2019 Report voice recognigation
39/52
RISK MITIGATION, MONITORING AND MANAGEMENT PLAN
SCOPE AND INTENT OF RMMM ACTIVITIES
The goal of the risk mitigation, monitoring and management plan is to identify as
many potential risks as possible. To help determine what the potential risks are,
Game Forge will be evaluated using the checklists found in section 6.3 of Roger S.
Pressmans Software Engineering, A Practitioners Approach [Reference is the
SEPA, 4/e, see risk checklists contained within this Web site]. These checklists
help to identify potential risks in a generic sense. The project will then be
analyzed to determine any project-specific risks.
When all risks have been identified, they will then be evaluated to determine
their probability of occurrence, and how Game Forge will be affected if they do
occur. Plans will then be made to avoid each risk, to track each risk to determine
if it is more or less likely to occur, and to plan for those risks should they occur. It
is the organizations responsibility to perform risk mitigation, monitoring, and
management in order to produce a quality product. The quicker the risks can be
identified and avoided, the smaller the chances of having to face that particular
risks consequence. The fewer consequences suffered as a result of good RMMM
plan, the better the product and the smoother the development process.
RISK MANAGEMENT ORGANIZATIONAL ROLE
Each member of the organization will undertake risk management. The
development team will consistently be monitoring their progress and project
status as to identify present and future risks as quickly and accurately as possible.
With this said, the members who are not directly involved with the
implementation of the product will also need to keep their eyes open for any
possible risks that the development team did not spot. The responsibility of risk
management falls on each member of the organization, while William Lordmaintains this document.
-
7/31/2019 Report voice recognigation
40/52
RISK IDENTIFICATION CHECKLIST:
Product Size Risks
Estimated size in lines of code (LOC)Project will have an estimated _______ line of code.
Degree of confidence in estimated sizeWe are highly confident in our estimated size.
Estimated size in number of programs, files, and transactions1. We estimate 12 programs.
2. We estimate 10 large files for the engine, 5 large files for the user
interface.
3. We estimate 40 or more transactions for the engine, and 20 transactions
for the user-interface.
Percentage deviation in size from average for previous productsWe allow for a 20% deviation from average. Size of database created or used
The size of the database that we will use will be an estimated 7 tables.
The number of fields will vary per table and will have an overall average of
8 fields per table. The number of records in each table will vary with the
number of sprites that the user adds to the project, and the number of
instances of each sprite that the user creates.
Number of usersThe number of users will be fairly high. There will be 5 users per instance ofthe software running, as the software is client/server or intended for multi-
user use.
Number of projected changes to the requirementsWe estimate 3 possible projected changes to the requirements. These will
be as a result of our realization of what is required and not required as we
get further into implementation, as well as a result of interaction with the
customer and verification of the customers requirements.
Amount of reuse of softwareReuse will be very important to get the project started. GSM Modem is verysimple to reuse (for the most part) and previous programs used to code for
with GSM Modem will be reviewed and much GSM Modem code will be
recopied.
-
7/31/2019 Report voice recognigation
41/52
Business Impact Risk
Amount and quality of documentation that must be produced anddelivered to customer the customer will be supplied with a complete online
help file and users manual for Game Forge. Coincidentally, the customerwill have access to all development documents for Game Forge, as the
customer will also be grading the project.
Governmental constraints in the construction of the product none known. Costs associated with late delivery Late delivery will prevent the customer
from issuing a letter of acceptance for the product, which will result in an
incomplete grade for the course for all members of the organization
Costs associated with a defective product Unknown at this time.
Customer Related Risks
Have you worked with the customer in the past? Yes, all team membershave completed at least one project for the customer, though none of them
have been to the magnitude of the current project.
Does the customer have a solid idea of what is required? Yes, the customerhas access to both the System Requirements Specification, and theSoftware Requirements Specification for the Game Forge project.
Will the customer agree to spend time in formal requirements gatheringmeetings to identify project scope? Unknown. While the customer will
likely participate if asked, the inquiry has not yet been made.
Process Risks
Does senior management support a written policy statement thatemphasizes the importance of a standard process for software
development? N/A. PA Software does not have a senior management. It
should be noted that the structured method has been adopted. At the
-
7/31/2019 Report voice recognigation
42/52
completion of the project, it will be determined if the software method is
acceptable as a standard process, or if changes need to be implemented.
Has your organization developed a written description of the softwareprocess to be used on this project? Yes. Is under development using the
structured method as described in part three of Roger S. Pressmans
Software Engineering, A Practitioners Approach.
Are staff members willing to use the software process? Yes. The softwareprocess was agreed upon before development work began.
Is the software process used for other products? N/A. PA Software has noother projects currently.
Technical Issues
Are facilitated application specification techniques used to aid incommunication between the customer and the developer? The
development team will hold frequent meetings directly with the customer.
No formal meetings are held (all informal). During these meetings the
software is discussed and notes are taken for future review.
Are specific methods used for software analysis? Special methods will beused to analyze the softwares progress and quality. These are a series of
tests and reviews to ensure the software is up to speed. For more
information, see the Software Quality Assurance and Software
Configuration Management documents.
Do you use a specific method for data and architectural design? Data andarchitectural design will be mostly object oriented. This allows for a higher
degree data encapsulation and modularity of code.
Technology Risks
Is the technology to be built new to your organization?No
-
7/31/2019 Report voice recognigation
43/52
Does the software interface with new or unproven hardware?No
Is a specialized user interface demanded by the product requirements?Yes.
Development Environment Risks
Is a software project management tool available?No.
No software tools are to be used. Due to the existing deadline, the development
team felt it would be more productive to begin implementing the project thantrying to learn new software tools. After the completion of the project software
tools may be implemented for future projects.
Risk Table
Risks Category Probability (%) Impact
Computer Crash TI 70 1
Late Delivery BU 30 1Technology will not
Meet Expectations
TE 25 1
End users Resist
System
BU 20 1
Changes in
Requirement
PS 20 2
Lack of Development
Experience
TI 20 2
Lack of Database
Stability
TI 40 2
Deviation from
Software Engi.
PI 10 3
Poor Comments TI 20 4
Fig 5.5: Risk Table
-
7/31/2019 Report voice recognigation
44/52
Impact Values:
1 Catastrophic
2 Critical
3 Marginal
4 Negligible
Risk Refinement
At various points in the checklist, lack of software tools is identified as a potential
risk. Due to time constraints, the members of the design team felt that searching
for and learning to use additional software tools could be detrimental to the
project, as it would take time away from project development. For this reason, we
have decided to forgo the use of software tools. It will not be explored as a
potential risk because all planning will be done without considering their use.
STRATEGIES TO MANAGE RISK
Risk Mitigation, Monitoring and Management
RISK: COMPUTER CRASH
MitigationThe cost associated with a computer crash resulting in a loss of data is crucial. A
computer crash itself is not crucial, but rather the loss of data. A loss of data will
result in not being able to deliver the product to the customer. This will result in a
not receiving a letter of acceptance from the customer. Without the letter of
acceptance, the group will receive a failing grade for the course. As a result the
organization is taking steps to make multiple backup copies of the software in
development and all documentation associated with it, in multiple locations.
MonitoringWhen working on the product or documentation, the staff member should always
be aware of the stability of the computing environment theyre working in. Any
changes in the stability of the environment should be recognized and taken
seriously.
-
7/31/2019 Report voice recognigation
45/52
ManagementThe lack of a stable-computing environment is extremely hazardous to software
development team. In the event that the computing environment is found
unstable, the development team should cease work on that system until the
environment is made stable again, or should move to a system that is stable andcontinue working there.
RISK: LATE DELIVERY
MitigationThe cost associated with a late delivery is critical. A late delivery will result in a
late delivery of a letter of acceptance from the customer. Without the letter ofacceptance, the group will receive a failing grade for the course. Steps have been
taken to ensure a timely delivery by gauging the scope of project based on the
delivery deadline.
MonitoringA schedule has been established to monitor project status. Falling behind
schedule would indicate a potential for late delivery. The schedule will be
followed closely during all development stages.
Management
Late delivery would be a catastrophic failure in the project development. If the
project cannot be delivered on time the development team will not pass the
course. If it becomes apparent that the project will not be completed on time, the
only course of action available would be to request an extension to the deadline
form the customer.
RISK: TECHNOLOGY DOES NOT MEET SPECIFICATIONS
Mitigation
-
7/31/2019 Report voice recognigation
46/52
In order to prevent this from happening, meetings (formal and informal) will be
held with the customer on a routine business. This insures that the product we
are producing and the specifications of the customer are equivalent.
MonitoringThe meetings with the customer should ensure that the customer and ourorganization understand each other and the requirements for the product.
ManagementShould the development team come to the realization that their idea of the
product specifications differs from those of the customer, the customer should be
immediately notified and whatever steps necessary to rectify this problem should
be done. Preferably a meeting should be held between the development team
and the customer to discuss at length this issue.
RISK: END USERS RESIST SYSTEM
MitigationIn order to prevent this from happening, the software will be developed with the
end user in mind. The user-interface will be designed in a way to make use of the
program convenient and pleasurable.
MonitoringThe software will be developed with the end user in mind. The development team
will ask the opinion of various outside sources throughout the development
phases. Specifically the user-interface developer will be sure to get a thorough
opinion from others.
ManagementShould the program be resisted by the end user, the program will be thoroughly
examined to find the reasons that this is so. Specifically the user interface will be
investigated and if necessary, revamped into a solution.
RISK: CHANGES IN REQUIREMENTS
Mitigation
-
7/31/2019 Report voice recognigation
47/52
In order to prevent this from happening, meetings (formal and informal) will be
held with the customer on a routine business. This insures that the product we
are producing and the requirements of the customer are equivalent.
MonitoringThe meetings with the customer should ensure that the customer and ourorganization understand each other and the requirements for the product.
ManagementShould the development team come to the realization that their idea of the
product requirements differs from those of the customer, the customer should be
immediately notified and whatever steps necessary to rectify this problem should
be taken. Preferably a meeting should be held between the development team
and the customer to discuss at length this issue.
RISK: LACK OF DEVELOPMENT EXPERIENCE
MitigationIn order to prevent this from happening, the development team will be required
to learn the languages and techniques necessary to develop this software. The
member of the team that is the most experienced in a particular facet of the
development tools will need to instruct those who are not as well versed.
MonitoringEach member of the team should watch and see areas where another team
member may be weak. Also if one of the members is weak in a particular area it
should be brought to the attention by that member, to the other members.
ManagementThe members who have the most experience in a particular area will be required
to help those who dont out should it come to the attention of the team that aparticular member needs help.
RISK: DATABASE IS NOT STABLE
-
7/31/2019 Report voice recognigation
48/52
MitigationIn order to prevent this from happening, developers who are in contact with the
database, and/or use functions that interact with the database, should keep in
mind the possible errors that could be caused due to poor programming/error
checking. These issues should be brought to the attention of each of the othermembers that are also in contact with the database.
MonitoringEach user should be sure that the database is left in the condition it was before it
was touched, to identify possible problems. The first notice of database errors
should be brought to the attention of the other team members.
ManagementShould this occur, the organization would call a meeting and discuss the causes ofthe database instability, along with possible solutions?
RISK: POOR COMMENTS IN CODE
MitigationPoor code commenting can be minimized if commenting standards are better
expressed. While standards have been discussed informally, no formal standard
yet exists. A formal written standard must be established to ensure quality ofcomments in all code.
MonitoringReviews of code, with special attention given to comments will determine if they
are up to standard. This must be done frequently enough to control comment
quality. If they are not done comment quality could drop, resulting in code that is
difficult to maintain and update.
ManagementShould code comment quality begin to drop, time must be made available to
bring comments up to standard. Careful monitoring will minimize the impact of
poor commenting. Any problems are resolved by adding and refining comments
as necessary.
-
7/31/2019 Report voice recognigation
49/52
FUTURE DIRECTIONS:
Robustness:
In a robust system, performance degrades gracefully (rather than
catastrophically) as conditions become more different from those under which it
was trained. Differences in channel characteristics and acoustic environment
should receive particular attention.
Portability:
Portability refers to the goal of rapidly designing, developing and deploying
systems for new applications. At present, systems tend to suffer significant
degradation when moved to a new task. In order to return to peak performance,
they must be trained on examples specific to the new task, which is timeconsuming and expensive.
Adaptation:
How can systems continuously adapt to changing conditions (new speakers,
microphone, task, etc.) and improve through use? Such adaptation can occur at
many levels in systems, sub word models, word pronunciations, language models,
etc.
Language Modeling:
Current systems use statistical language models to help reduce the search space
and resolve acoustic ambiguity. As vocabulary size grows and other constraints
are relaxed to create more habitable systems, it will be increasingly important to
get as much constraint as possible from language models; perhaps incorporating
syntactic and semantic constraints that cannot be captured by purely statistical
models.
Confidence Measures:Most speech recognition systems assign scores to hypotheses for the purpose of
rank ordering them. These scores do not provide a good indication of whether a
hypothesis is correct or not, just that it is better than the other hypotheses. As we
move to tasks that require actions, we need better methods to evaluate the
absolute correctness of hypotheses.
-
7/31/2019 Report voice recognigation
50/52
Out-of-Vocabulary Words:
Systems are designed for use with a particular set of words, but system users may
not know exactly which words are in the system vocabulary. This leads to a
certain percentage of out-of-vocabulary words in natural conditions. Systems
must have some method of detecting such out-of-vocabulary words, or they will
end up mapping a word from the vocabulary onto the unknown word, causing an
error.
Spontaneous Speech:
Systems that are deployed for real use must deal with a variety of spontaneous
speech phenomena, such as filled pauses, false starts, hesitations, ungrammatical
constructions and other common behaviors not found in read speech.
Development on the ATIS task has resulted in progress in this area, but muchwork remains to be done.
Prosody:
Prosody refers to acoustic structure that extends over several segments or words.
Stress, intonation, and rhythm convey important information for word
recognition and the user's intentions (e.g., sarcasm, anger). Current systems do
not capture prosodic structure. How to integrate prosodic information into the
recognition architecture is a critical question that has not yet been answered.
Modeling Dynamics:
Systems assume a sequence of input frames which are treated as if they were
independent. But it is known that perceptual cues for words and phonemes
require the integration of features that reflect the movements of the articulators,
which are dynamic in nature. How to model dynamics and incorporate this
information into recognition systems is an unsolved problem.
-
7/31/2019 Report voice recognigation
51/52
6.REFRENCES:
*1+ Ben Mosbah, B., Speech Recognition for Disabilities People Volume
1,Information and Communication Technologies, 2006. ICTTA apos; 06.2nd, Issue,
24-28 April 2006 Page(s): 864 869
*2+ XiaoJie Yuan, Jing Fan, Design and Implementation of Voice Controlled Tetris
Game Based on Microsoft SDK 978-1-61284-774-0/11 IEEE 2011.
[3] Mukund Pabmanabhan, Michel Pichney, Large vocabulary speech recognition
algorithms, IEEE Computer magazine, 0018-9162/02, pp. 42-50, 2002.
[4] Fengyu Zhou, Guohui Tian, Yang Yang, Hairong Xiao and Jingshuai Chen,
Research and Implementation of Voice Interaction System Based On PC in
Intelligent Space Proceedings of the 2010 IEEE International Conference on
Automation and Logistics August 16-20 2010, Hong Kong and Macau 978-1-4244-
8376-1/10 IEEE 2010
*5+ Md. Abdul Kader, Biswajit Singha, and Md. Nazrul Islam, Speech Enabled
Operating System Control Proceedings of 11th International Conference on
Computer and Information Technology (ICCIT 2008) 25-27 December, 2008,Khulna, Bangladesh 1-4244-2136-7/08 IEEE 2008 [6] D. LeBlanc, Y. Ben Ahmed, S.
Selouani, Y. Bouslimani, H. Hamam, Computer Interface by Gesture and Voice for
Users with Special Needs 1-4244-0674-9/06 IEEE 2006
[7]Mu-Chun Su and Mina-Tsang Chung, Voice-controlled human computer
Interface for the Disabled COMPUTING & CONTROL ENGINEERING JOURNAL
OCTOBER 2001
*8+ Interacting With Computers by Voice: Automatic Speech Recognition andSynthesis by DOUGLAS OSHAUGHNESSY, SENIOR MEMBER, IEEE, PROCEEDINGS
OF THE IEEE, VOL. 91, NO. 9, SEPTEMBER 2003, 0018-9219/03 IEEE 2003
[9]Omar Florez-Choque, Ernesto Cuadros-Vargas, Improving Human Computer
Interaction Through Spoken Natural Language 1-4244-0707-9/07 Ieee 2007
-
7/31/2019 Report voice recognigation
52/52
*10+ Baseform Adaptation for Large Vocabulary Hidden Markov Model Based
Speech Recognition Systems By Gediard Rigoll Ch2847-2/90/0000-0141 Ieee
1990
[11]http://www.johndavies.notts.sch.uk/children/documents/44PhonemesVoice
d.ppt
[12]http://www.microsoft.com/speech/download/sdk51/
*13+ Titus Felix Furtuna, Dynamic Programming Algorithms in Speech
Recognition Revista Informatica Economica Nr. 2(46)/2008