report voice recognigation

7/31/2019 Report voice recognigation

1/52

LIST OF FIGURES:


2/52

LIST OF TABLES:


3/52

ABSTRACT

Our aim is to provide the computer with a natural interface, including the ability

to understand human speech. For this purpose, we propose a way how to handle

the Computer System specially Windows 8 with voice command. At first, the userinitiates a given command by his voice through the microphone then the software

of the proposed system will take over to recognize the command. If the

recognition is succeeded or matched with one of the given voice command then it

will perform the operation according to speakers command. In our proposed

system we are going to use Microsoft Speech SDK for voice recognition process

and Voice-XML for creating the voice grammar in the software part. It has the

flexibility to work with the speech of any user.

Keywords-- Dynamic Programming Algorithm, Hidden Markov Model,Microphone, Microsoft speech SDK, Phonemes, Speech recognition, Voice- XML,

Windows 8.


4/52

1.INTRODUCTION:Recent years it has been seen that the improvements in the quality and

performance of speech-based human machine interaction is steady. The nextgeneration of speech based interface technology will enable easy to use

automation of new and existing communication services, making human machine

interaction more natural. For the disabled people the absence of the data bases

and diversity of the articulator handicaps are major obstacles for the construction

of reliable speech recognition systems, which explains poverty of the market in

systems of speech recognition for disabled people. If a person finds it difficult or is

not capable of handling the mouse ports and the keyboard and if the keyboard or

mouse is faulty, there have to be other ways to handle the operating system.

Speech may act as one of them. There is a growing demand for systems capableof handling Operating System using only the voice commands given by a person.

And this paper represents a way how to control the OS by using voice command it

also proves fruitful for surgeons while operating on a patient to retrieve his/her

previous records from the computers database. It is also applicable for consumer

electronics including games, mobile phones, vehicle navigation, speech ticket

reservations etc. As windows 8 is about to release so we are creating Speech

control for Windows 8.

Speech recognition (in many contexts also known as automatic speech

recognition, computer speech recognition or erroneously as voice recognition) is

the process of converting a speech signal to a sequence of words, by means of an

algorithm implemented as a computer program.

Speech recognition applications that have emerged over the last few years

include voice dialing (e.g., "Call home"), call routing (e.g., "I would like to make a

collect call"), simple data entry (e.g., entering a credit card number), preparation

of structured documents (e.g., a radiology report), domotic appliances controland content-based spoken audio search (e.g. find a podcast where particular

words were spoken).

Voice recognition or speaker recognition is a related process that attempts to

identify the person speaking, as opposed to what is being said.
http://en.wikipedia.org/wiki/Domotichttp://en.wikipedia.org/wiki/Speaker_recognitionhttp://en.wikipedia.org/wiki/Speaker_recognitionhttp://en.wikipedia.org/wiki/Domotic


5/52


6/52

PARAMETERS RANGE

Speaking mode Isolated words to continue speech

Speaking style Read speech to spontaneous speech

Enrollment Speaking-dependent to speaker-independent

Vocabulary Small( 20,000 words )

Language model Finite-state to context-sensitive

Perplexity Small ( < 10 ) to large ( > 100)

SNR High ( > 30 dB ) to low ( < 10 dB)

Transducer Voice-cancelling microphone to telephone

Table1.1: Typical parameters used to characterize the capability of speech recognition systems

Speech recognition is a difficult problem, largely because of the many sources of

variability associated with the signal. First, the acoustic realizations of phonemes,

the smallest sound units of which words are composed, are highly dependent on

the context in which they appear. Thesephonetic variabilities are exemplified by

the acoustic differences of the phoneme in two, true, and butter in American

English. At word boundaries, contextual variations can be quite dramatic---making

gas shortage sound like gash shortage in American English, and devo andare

sound like devandare in Italian.

Second, acoustic variabilities can result from changes in the environment as well

as in the position and characteristics of the transducer. Third, within-speaker

variabilities can result from changes in the speaker's physical and emotional state,

speaking rate, or voice quality. Finally, differences in sociolinguistic background,

dialect, and vocal tract size and shape can contribute to cross-speaker

variabilities.

Figure shows the major components of a typical speech recognition system. The

digitized speech signal is first transformed into a set of useful measurements orfeatures at a fixed rate, typically once every 10--20 msec (see sections for signal

representation and digital signal processing, respectively). These measurements

are then used to search for the most likely word candidate, making use of

constraints imposed by the acoustic, lexical, and language models. Throughout

this process, training data are used to determine the values of the model

parameters.


7/52

Figure1.1: Components of a typical speech recognition system.

Speech recognition systems attempt to model the sources of variability described

above in several ways. At the level of signal representation, researchers have

developed representations that emphasize perceptually important speaker-

independent features of the signal, and de-emphasize speaker-dependent

characteristics. At the acoustic phonetic level, speaker variability is typically

modeled using statistical techniques applied to large amounts of data. Speaker

adaptation algorithms have also been developed that adapt speaker-independent

acoustic models to those of the current speaker during system use. Effects of

linguistic context at the acoustic phonetic level are typically handled by training

separate models for phonemes in different contexts; this is called context

dependent acoustic modeling.

Word level variability can be handled by allowing alternate pronunciations of

words in representations known as pronunciation networks. Common alternate

pronunciations of words, as well as effects of dialect and accent are handled by

allowing search algorithms to find alternate paths of phonemes through these

networks. Statistical language models, based on estimates of the frequency of

occurrence of word sequences, are often used to guide the search through the

most probable sequence of words.

The dominant recognition paradigm in the past fifteen years is known as hidden

Markov models (HMM). An HMM is a doubly stochastic model, in which the

generation of the underlying phoneme string and the frame-by-frame, surface

acoustic realizations are both represented probabilistically as Markov processes,

as discussed in sections. Neural networks have also been used to estimate the


8/52

frame based scores; these scores are then integrated into HMM-based system

architectures, in what has come to be known as hybrid systems, as described in

section.

An interesting feature of frame-based HMM system is that speech segments areidentified during the search process, rather than explicitly. An alternate approach

is to first identify speech segments, then classify the segments and use the

segment scores to recognize words. This approach has produced competitive

recognition performance in several tasks


9/52

2.LITERATURE SURVEY:HIDDEN MARKOV MODEL (HMM)-BASED SPEECH RECOGNITION:

Modern general-purpose speech recognition systems are generally based on

(HMMs). This is a statistical model which outputs a sequence of symbols or

quantities. One possible reason why HMMs are used in speech recognition is that

a speech signal could be viewed as a piece-wise stationary signal or a short-time

stationary signal. That is, one could assume in a short-time in the range of 10

milliseconds, speech could be approximated as a stationary process. Speech could

thus be thought as a Markov model for many stochastic processes (known as

states).

Another reason why HMMs are popular is because they can be trained

automatically and are simple and computationally feasible to use. In speech

recognition, to give the very simplest setup possible, the hidden Markov model

would output a sequence of n-dimensional real-valued vectors with n around, say,

13, outputting one of these every 10 milliseconds. The vectors, again in the very

simplest case, would consist ofcepstral coefficients, which are obtained by taking

a Fourier transform of a short-time window of speech and decor relating the

spectrum using a cosine transform, then taking the first (most significant)

coefficients. The hidden Markov model will tend to have, in each state, astatistical distribution called a mixture of diagonal covariance Gaussians which will

give likelihood for each observed vector. Each word, or (for more general speech

recognition systems), each phoneme, will have a different output distribution; a

hidden Markov model for a sequence of words or phonemes is made by

concatenating the individual trained hidden Markov models for the separate

words and phonemes.

Described above are the core elements of the most common, HMM-based

approach to speech recognition. Modern speech recognition systems use variouscombinations of a number of standard techniques in order to improve results

over the basic approach described above. A typical large-vocabulary system

would need context dependency for the phones (so phones with different left and

right context have different realizations as HMM states); it would use cepstral

normalization to normalize for different speaker and recording conditions; for
http://en.wikipedia.org/wiki/Stationary_processhttp://en.wikipedia.org/wiki/Markov_modelhttp://en.wikipedia.org/wiki/Cepstrumhttp://en.wikipedia.org/wiki/Fourier_transformhttp://en.wikipedia.org/wiki/Phonemehttp://en.wikipedia.org/wiki/Phonemehttp://en.wikipedia.org/wiki/Fourier_transformhttp://en.wikipedia.org/wiki/Cepstrumhttp://en.wikipedia.org/wiki/Markov_modelhttp://en.wikipedia.org/wiki/Stationary_process


10/52

further speaker normalization it might use vocal tract length normalization (VTLN)

for male-female normalization and maximum likelihood linear regression (MLLR)

for more general speaker adaptation. The features would have so-called delta and

delta-delta coefficients to capture speech dynamics and in addition might use

heteroscedastic linear discriminant analysis (HLDA); or might skip the delta anddelta-delta coefficients and use splicing and an LDA-based projection followed

perhaps by heteroscedastic linear discriminant analysis or a global semitied

covariance transform (also known as maximum likelihood linear transform, or

MLLT). Many systems use so-called discriminative training techniques which

dispense with a purely statistical approach to HMM parameter estimation and

instead optimize some classification-related measure of the training data.

Examples are maximum mutual information (MMI), minimum classification error

(MCE) and minimum phone error (MPE).

Decoding of the speech (the term for what happens when the system is presented

with a new utterance and must compute the most likely source sentence) would

probably use the Viterbi algorithm to find the best path, and here there is a

choice between dynamically creating a combination hidden Markov model which

includes both the acoustic and language model information, or combining it

statically beforehand (the finite state transducer, or FST, approach).

NEURAL NETWORK-BASED SPEECH RECOGNITION:

Another approach in acoustic modeling is the use ofneural networks. They are

capable of solving much more complicated recognition tasks, but do not scale as

well as HMMs when it comes to large vocabularies. Rather than being used in

general-purpose speech recognition applications they can handle low quality,

noisy data and speaker independence. Such systems can achieve greater accuracy

than HMM based systems, as long as there is training data and the vocabulary is

limited. A more general approach using neural networks is phoneme recognition.

This is an active field of research, but generally the results are better than for

HMMs. There are also NN-HMM hybrid systems that use the neural network part

for phoneme recognition and the hidden markov model part for language

modeling.
http://www.clsp.jhu.edu/~kumar/thesis.pshttp://en.wikipedia.org/wiki/Viterbi_algorithmhttp://en.wikipedia.org/wiki/Artificial_neural_networkshttp://en.wikipedia.org/wiki/Artificial_neural_networkshttp://en.wikipedia.org/wiki/Viterbi_algorithmhttp://www.clsp.jhu.edu/~kumar/thesis.ps


11/52

DYNAMIC TIME WARPING (DTW)-BASED SPEECH RECOGNITION:

Dynamic time warping is an approach that was historically used for speech

recognition but has now largely been displaced by the more successful HMM-

based approach. Dynamic time warping is an algorithm for measuring similaritybetween two sequences which may vary in time or speed. For instance,

similarities in walking patterns would be detected, even if in one video the person

was walking slowly and if in another they were walking more quickly, or even if

there were accelerations and decelerations during the course of one observation.

DTW has been applied to video, audio, and graphics -- indeed, any data which can

be turned into a linear representation can be analyzed with DTW.

A well-known application has been automatic speech recognition, to cope with

different speaking speeds. In general, it is a method that allows a computer tofind an optimal match between two given sequences (e.g. time series) with

certain restrictions, i.e. the sequences are "warped" non-linearly to match each

other. This sequence alignment method is often used in the context of hidden

Markov models.

LIMITATIONS:

Natural voice recognition system faces a major drawback of spontaneous voicerecognition, namely hesitations, out of vocabulary. An efficient dialogue design

can greatly improve the performance of the voice interface. People should be

trained as to how the commands should be pronounced so as to get accurate

results. This software will prove to be a boon to the people who are physically

disabled and are unable to use mouse and keyboard as external input device. If

the ports of mouse and keyboard do not work properly then one can also operate

operating system using this software. It may save our cost from purchasing of

mouse and keyboard.

Speech Recognition Engine may take some unwanted signals i.e. noise in theenvironment, which are not required for our command. For such unwanted

signals sometimes our command cannot be recognized properly and is executed

in a changed manner. This may be the main limitation of this software.


12/52

3.PROJECT STATEMENT:

Fig 3.1:

As shown, the user provides voice commands through microphone. The given

command is then converted into electrical pulse by the microphone. The sound

card converts electrical pulse into digital signal. The Speech Recognition Engine

then converts digital signals into phonemes and finally we get text command. The

respective operation is thus performed. This procedure repeats for every voice

command.

Modules:

1. Phonemes Extraction2. HMM3. SAPI4. XML Database5. Action Applier


13/52

1) Phonemes Extraction:

Phonemes are the linguistic units. They are the sounds that group together to

form our words, although how a phoneme converts into sound depends on many

factors including the surrounding phonemes, speaker accent and age. English usesabout 44 phonemes to convey the 500,000 or so words it contains, making them a

relatively good data item for speech recognition engines to work with. These

phonemes are extracted by Microsoft Speech SDK.

SOME EXAMPLES OF PHONEMES USED IN WORDS

From the extracted phonemes we get the command in text format.

2) USE OF HMM (HIDDEN MARKOV MODEL):

Now we have a list of phonemes extracted from the given input command. These

phonemes need to be combined and converted into word. The most common

method is to use a Hidden Markov Model (HMM). A Markov Model (in a speech

recognition context) is basically a chain of phonemes that represent a word. The

chain can branch, and if it does, is statistically balanced. HMMs function as

probabilistic finite state machines: The model consists of a set of states, and its

topology specifies the allowed transitions between them. At every time frame, an

HMM makes a probabilistic transition from one state to another and emits a

feature vector with each transition.


14/52

The use of Hidden Markov Models (HMM) may improve the accuracy to recognize

words in view of the fact that HMM takes into account the probabilities of

transition among phonemes.

Fig 3.2: Example of HMM

3) SAPI (SPEECH APPLICATION PROGRAMMING INTERFACE):

SAPI is an interface between our application platform and Microsoft Speech

Engine. It provides the word formed by HMM to our programming platform which

is further compared with the voice-xml database. The speech recognition engine

that is utilized by this voice controlled system is the Microsoft's speech

recognition engine and the associated development kit 5.1 (Microsoft Speech SDK

5.1). The recognition rate of Microsoft's speech recognition engine is not high in

continuous speech mode but extremely high under the command control mode.We use (SAPI) to implement voice function. SAPI provides a high level interface

between applications and speech engine. Controlling and management of various

speech engines need real-time operation technology. However, SAPI realizes and

hides the underlying technical detail.


15/52

There are two basic types of SAPI engines: text-to-speech (TTS) systems and

speech recognizers. The TTS systems can synthesize text strings and files into

spoken audio using synthetic voices, whereas Speech recognizers can convert

human spoken audio into readable text strings and files. Speech engine

communicates with SAPI by the device driver interface (DDI) layer and SAPIcommunicates with applications by API. So by the use of these application

interfaces, voice recognition and speech synthesis software can be developed.

Dynamic Programming Algorithm:

In this type of speech recognition technique the input voice data is converted to

commands. The recognition process then consists of matching the incoming

speech with stored commands. The command with the lowest distance measurefrom the input pattern is the recognized word. The best match (lowest distance

measure) is based upon dynamic programming. This is called a Dynamic Time

Warping (DTW) word recognizer.

Two important concepts in DTW are,

a) Features: The information in each signal has to be represented in some

manner.

b) Distances: Some form of metric has been used in order to obtain a match path.

There are two types:

Local: A computational difference between a feature of one signal and a feature

of the other.

Global: The overall computational difference between an entire signal and

another signal of possibly different length.

Speech is a time-dependent process. So the utterances of the same word will

have different durations, and utterances of the same word with the same

duration will differ in the middle, due to different parts of the words being spoken

at different rates. To obtain a global distance between two speech patterns a timeversus time comparison must be performed using a time-time matrix.

We obtain a global distance between two speech patterns using a time-time"

matrix. As an illustration, consider input SsPEEhH which is a 'noisy' version of


16/52

the reference word SPEECH. The time-time matrix for this illustration will be

as follows:

Fig 3.3: Time-Time matrix

If D (i,j) is the global distance up to (i,j) and the local distance at (i,j) is given by

d(i,j)

D (i, j) = min [D (i-1, j-1), D (i-1, j), D (i, j-1)] + d (i, j) (1)

Where d (i, j) is calculated using the Euclidean distance metric given by

D(x, y) = ( (xj - yj) 2) 1/2 . (2)


17/52

Initial condition will be D (1, 1) = d (1, 1).

The final global distance D (n,N) is calculated recursively using the base condition

as terminating condition. The final global distance D (n,N) gives us the overallmatching score of the reference word with the input. The input word is then

recognized as the word corresponding to the reference command in the database

with the lowest matching score.

This algorithm ensures a polynomial complexity: O (n2v),

Where n is sequences lengths and v is the total number of commands in our

dictionary.

XML Database

The grammar of the commands used in our paper is stored in a XML file. Here, inour paper we are using XML file referred to as Voice-XML as our database. The

Reference word used in our algorithm for comparison with the input word is

taken from our Voice-XML database. When input command matches with the

stored grammar the specific operation related to the command gets executed.


18/52

4.SYSTEM REQUIREMENT AND SPECIFICATION: UML DIAGRAM


19/52

DFDDFD Level 0:

DFD Level 1:

DFD Level 2:

System

Syste

m

Speech

Recognition

Text

Command


20/52

CONTROL FLOW DIAGRAM

CLASS DIAGRAM


21/52

ACTIVITY DIAGRAMact Activ ity Diagram

Start

initialize

speech

engine

receiv esound

sendsound

receivespeechengine

call HMM

compare text

command

compare

XML

command Found

Stop

command

stop

Perform Action CommandYes

Stop command


22/52

COMPONENT DIAGRAM

DEPLOYMENT DIAGRAM

deployment Deployment

Computer

deployment spec

Windows 8

deployment spec

.net 4.0

device

microphone

Speech Recognition

Engine

Text Commander


23/52

HARDWARE AND SOFTWARE REQUIREMENT:

Hardware Requirements:

System : Pentium IV 2.4 GHz. Hard Disk : 40 GB. Floppy Drive : 1.44 Mb. Monitor : 15 VGA Colour. Mouse : Logitech. Ram : 512 Mb.

Software Requirements:

Operating system : Windows 8. Coding Language : C# ,DOT NET visual studio 2012 Data Base : MS SQL Server 2008


24/52

5.PLANING AND SHEDULING THE PROJECT WORK:

SOFTWARE ENGINEERING APPROACH:

Fig 5.1: Incremental Model


25/52

Incremental Development and Release:

Developing systems through incremental release requires first providing essential

operating functions, then providing system users with improved and more

capable versions of a system at regular intervals .This model combines the classic

software life cycle with iterative enhancement at the level of system development

organization. It also supports a strategy to periodically distribute software

maintenance updates and services to dispersed user communities. This in turn

accommodates the provision of standard software maintenance contracts. It is

therefore a popular model of software evolution used by many commercial

software firms and system vendors. This approach has also been extended

through the use of software prototyping tools and techniques, which more

directly provide support for incremental development and iterative release for

early and ongoing user feedback and evaluation. Figure 2 provides an example

view of an incremental development, build, and release model for engineering

large Ada-based software systems, Incremental release of software functions

and/or subsystems (developed through stepwise refinement) to separate in-

house quality assurance teams that apply statistical measures and analyses as the

basis for certifying high-quality software systems.


26/52

REQUIREMENT ANALYSIS:

NORMAL REQUIREMENTS:

1. User interfaces:In our system we provide a GUI on both server and client side. The users of the

system can communicate with the help of , LAN and make use of the GUI available

to them to execute.

2. Hardware interfaces:There are few hardware interfaces to the system:

MICROPHONE3. Software Interface:

There are few software interfaces for the system:

MICROPHONE drivers in order to install the MICROPHONE.4. Communication Interface:

Following communication interface required by the system:

MICROPHONE:In order to communicate with the MICROPHONE.

Expected Requirements:

1. Performance Requirement: The microphone which we will be using for sending the recognition should

be noise free and should have freer bandwidth.


27/52

The switch needs to have exactly the same number of clients mentioned inthe system.

2. Safety Requirements: To keep this system safe care should be taken to avoid the theft of

components of the system.

The input voltage for MICROPHONE should not be more than the standardsapplied to them.

3. Security Requirements: The server needs not to share any drives for networking thus avoiding data

theft.

4. Software Quality Attributes: To keep interfacing MICROPHONE modem more flexible.

NON FUNCTIONAL REQUIREMENT STUDY

The nonfunctional requirement of the project is analyzed in this phase and

business proposal is put forth with a very general plan for the project and some

cost estimates. During system analysis the nonfunctional requirement study of

the proposed system is to be carried out. This is to ensure that the proposed

system is not a burden to the company. For nonfunctional requirement analysis,

some understanding of the major requirements for the system is essential.


28/52

Three key considerations involved in the nonfunctional requirement analysis are

ECONOMICAL NON FUNCTIONAL REQUIREMENT TECHNICAL NON FUNCTIONAL REQUIREMENT SOCIAL NON FUNCTIONAL REQUIREMENT

ECONOMICAL NON FUNCTIONAL REQUIREMENT

This study is carried out to check the economic impact that the system

will have on the organization. The amount of fund that the company can pour into

the research and development of the system is limited. The expenditures must be

justified. Thus the developed system as well within the budget and this was

achieved because most of the technologies used are freely available. Only the

customized products had to be purchased.

TECHNICAL NON FUNCTIONAL REQUIREMENT

This study is carried out to check the technical nonfunctional

requirement, that is, the technical requirements of the system. Any system

developed must not have a high demand on the available technical resources.

This will lead to high demands on the available technical resources. This will lead

to high demands being placed on the client. The developed system must have a


29/52

modest requirement, as only minimal or null changes are required for

implementing this system.

SOCIAL NON FUNCTIONAL REQUIREMENT

The aspect of study is to check the level of acceptance of the system by

the user. This includes the process of training the user to use the system

efficiently. The user must not feel threatened by the system, instead must accept

it as a necessity. The level of acceptance by the users solely depends on the

methods that are employed to educate the user about the system and to make

him familiar with it. His level of confidence must be raised so that he is also able

to make some constructive criticism, which is welcomed, as he is the final user of

the system.

Excited Requirements:

1. User should not enter message or question to that is not appropriate withclose domain FAQ

2. System should respond to each SMS in appropriate manner by usingtemplate matching algorithm.


30/52

REQUIREMENT VALIDATION

1. Organization and Completeness

1. Are all internal cross-references to other requirements correct? Yes2. Are all requirements written at a consistent and appropriate level of detail? Yes3. Do the requirements provide an adequate basis for design? Yes4. Is the implementation priority of each requirement included? No5. Are all external hardware, software, and communication interfaces defined? Yes6. Have algorithms intrinsic to the functional requirements been defined? Yes7. Does the SRS include all of the known customer or system needs? Yes8. Is any necessary information missing from a requirement? If so, is it identified as TBD? No9. Is the expected behavior documented for all anticipated error conditions? Yes

2. Correctness

1. Do any requirements conflict with or duplicate other requirements? No2. Is each requirement written in clear, concise, unambiguous language? No3. Is each requirement verifiable by testing, demonstration, review, or

analysis?

Yes

4. Is each requirement in scope for the project? Yes5. Is each requirement free from content and grammatical errors? Yes6. Can all of the requirements be implemented within known constraints? Yes7. Are any specified error messages unique and meaningful? No


31/52

3. Quality Attributes

1. Are all performance objectives properly specified? Yes2. Are all security and safety considerations properly specified? Yes3. Are other pertinent quality attribute goals explicitly documented and

quantified, with the acceptable tradeoffs specified?Yes

4. Traceability

1. Is each requirement uniquely and correctly identified? Yes2. Can each software functional requirement be traced to a higher-

level requirement (e.g., system requirement, use case)?Yes

5. Special Issues

1. Are all requirements actually requirements, not design or implementation solutions? Ye2. Are the time-critical functions identified, and timing criteria specified for them? Ye3. Are all significant consumers of scarce resources (memory, network bandwidth,

processor capacity, etc.) identified, and is their anticipated resource consumption

specified?

Ye

4. Have internationalization issues been adequately addressed? No


32/52

SYSTEM IMPLEMENTATION PLAN:

1. EFFORT ESTIMATE TABLE:Task Effort weeks Deliverables Milestones

Analysis of existing systems & compare with

proposed one

4 weeks

Literature survey 1 weeks

Designing & planning 2 weeks

o System flow 1 weekso Designing modules & its

deliverables

2 week Modules

design document

Implementation 7 weeks Primary system

Testing 4 weeks Test Reports Formal

Documentation 2 weeks Complete project

report

Formal

Table 5.1 : Effort Estimate Table


33/52

2. PHASE DESCRIPTION:

Phase Task Description

Phase 1 Analysis Analyze the information given in the IEEE paper.

Phase 2 Literature survey Collect raw data and elaborate on literature surveys.

Phase 3 Design Assign the module and design the process flow

control.

Phase 4 Implementation Implement the code for all the modules and integrate

all the modules.

Phase 5 Testing Test the code and overall process weather the

process works properly.

Phase 6 Documentation Prepare the document for this project with conclusion

and future enhancement.

Table 5.2: Phase Description

3. PROJECT PLAN

Date

Phase

Jun

/11

Jul

/11

Au

g/11

Sep/11

Oc

t/11

No

v/11

De

c/11

Jan

/11

Feb/12

Ma

r/12

Phase 1

Phase 2

Phase 3


34/52

Phase 4

Phase 5

Phase 6

Table 5.3: Project Plan

4. ESTIMATION OF KLOC:

The number of lines required for implementation of various modules can

be estimated as follows:

Sr.No. Modules KLOC

1. Graphical User Interface 0.50

2. User authentication Code 0.20

3. Database Code 0.60

4. Web Design Code 0.50

5. Device Drivers 0.40

6. Interfacing Code 0.20

Table 5.4: Estimation of KLOC

Thus the total number of lines required is approximately 2.40 KLOC.


35/52

Efforts:

E=3.2* (KLOC) ^1.02

E=3.2* (2.40) 1.02

E=7.82 person-month

Development Time (In Months):

D=E / N

D=7.82 /3

D=2.66months.

Number of Persons:

4 persons are required to complete the project with given time span

successful.


36/52

FEASIBILITY ASSESSMENT:

What are P, NP-Complete, and NP-Hard? When solving problems we have to

decide the difficulty level of our problem. There are three types of classes

provided for that. These are as follows:

1) P Class

2) NP-hard Class

3) NP-Complete Class

A decision problem is in P if there is a known polynomial-time algorithm to get

that answer. A decision problem is in NP if there is a known polynomial-time

algorithm for a non-deterministic machine to get the answer. Problems known to

be in P are trivially in NP the nondeterministic machine just never troubles

itself to fork another process, and acts just like a deterministic one.

But there are some problems which are known to be in NP for which no poly-

time deterministic algorithm is known; in other words, we know theyre in NP, but

dont know if theyre in P.A problem is NP-complete if you can prove that (1) its

in NP, and (2) show that its poly-time reducible to a problem already known to be

NP-complete.

A problem is NP-hard if and only if its at least as hard as an NP-complete

problem. The more conventional Traveling Salesman Problem of finding the

shortest route is NP-hard, not strictly NP-complete.


37/52

For Project:

A: Voice Communication

B: Algorithmic Processing

Time Complexity = Am

+Bn

------------ (1)

So Project Feasible and its under Permutable Class (P - Class)

Explanation:

To process Voice in and out of our system it will take some time let us consider

that m.

And to process each of the algorithms it will also requires some time. Let us

consider that time as n. Because Equation 1 and Definition of P-class project is in

P-Class Type of Feasibility

ECONOMICAL FEASIBILITY

This study is carried out to check the economic impact that the system will have

on the organization. The amount of fund that the company can pour into the

research and development of the system is limited. The expenditures must be

justified. Thus the developed system as well within the budget and this was


38/52

achieved because most of the technologies used are freely available. Only the

customized products had to be purchased.

TECHNICAL FEASIBILITY

This study is carried out to check the technical feasibility, that is, the technical

requirements of the system. Any system developed must not have a high demand

on the available technical resources. This will lead to high demands on the

available technical resources. This will lead to high demands being placed on the

client. The developed system must have a modest requirement, as only minimal

or null changes are required for implementing this system.

SOCIAL FEASIBILITY

The aspect of study is to check the level of acceptance of the system by the user.

This includes the process of training the user to use the system efficiently. The

user must not feel threatened by the system, instead must accept it as a

necessity. The level of acceptance by the users solely depends on the methods

that are employed to educate the user about the system and to make him familiar

with it. His level of confidence must be raised so that he is also able to make some

constructive criticism, which is welcomed, as he is the final user of the system.


39/52

RISK MITIGATION, MONITORING AND MANAGEMENT PLAN

SCOPE AND INTENT OF RMMM ACTIVITIES

The goal of the risk mitigation, monitoring and management plan is to identify as

many potential risks as possible. To help determine what the potential risks are,

Game Forge will be evaluated using the checklists found in section 6.3 of Roger S.

Pressmans Software Engineering, A Practitioners Approach [Reference is the

SEPA, 4/e, see risk checklists contained within this Web site]. These checklists

help to identify potential risks in a generic sense. The project will then be

analyzed to determine any project-specific risks.

When all risks have been identified, they will then be evaluated to determine

their probability of occurrence, and how Game Forge will be affected if they do

occur. Plans will then be made to avoid each risk, to track each risk to determine

if it is more or less likely to occur, and to plan for those risks should they occur. It

is the organizations responsibility to perform risk mitigation, monitoring, and

management in order to produce a quality product. The quicker the risks can be

identified and avoided, the smaller the chances of having to face that particular

risks consequence. The fewer consequences suffered as a result of good RMMM

plan, the better the product and the smoother the development process.

RISK MANAGEMENT ORGANIZATIONAL ROLE

Each member of the organization will undertake risk management. The

development team will consistently be monitoring their progress and project

status as to identify present and future risks as quickly and accurately as possible.

With this said, the members who are not directly involved with the

implementation of the product will also need to keep their eyes open for any

possible risks that the development team did not spot. The responsibility of risk

management falls on each member of the organization, while William Lordmaintains this document.


40/52

RISK IDENTIFICATION CHECKLIST:

Product Size Risks

Estimated size in lines of code (LOC)Project will have an estimated _______ line of code.

Degree of confidence in estimated sizeWe are highly confident in our estimated size.

Estimated size in number of programs, files, and transactions1. We estimate 12 programs.

2. We estimate 10 large files for the engine, 5 large files for the user

interface.

3. We estimate 40 or more transactions for the engine, and 20 transactions

for the user-interface.

Percentage deviation in size from average for previous productsWe allow for a 20% deviation from average. Size of database created or used

The size of the database that we will use will be an estimated 7 tables.

The number of fields will vary per table and will have an overall average of

8 fields per table. The number of records in each table will vary with the

number of sprites that the user adds to the project, and the number of

instances of each sprite that the user creates.

Number of usersThe number of users will be fairly high. There will be 5 users per instance ofthe software running, as the software is client/server or intended for multi-

user use.

Number of projected changes to the requirementsWe estimate 3 possible projected changes to the requirements. These will

be as a result of our realization of what is required and not required as we

get further into implementation, as well as a result of interaction with the

customer and verification of the customers requirements.

Amount of reuse of softwareReuse will be very important to get the project started. GSM Modem is verysimple to reuse (for the most part) and previous programs used to code for

with GSM Modem will be reviewed and much GSM Modem code will be

recopied.


41/52

Business Impact Risk

Amount and quality of documentation that must be produced anddelivered to customer the customer will be supplied with a complete online

help file and users manual for Game Forge. Coincidentally, the customerwill have access to all development documents for Game Forge, as the

customer will also be grading the project.

Governmental constraints in the construction of the product none known. Costs associated with late delivery Late delivery will prevent the customer

from issuing a letter of acceptance for the product, which will result in an

incomplete grade for the course for all members of the organization

Costs associated with a defective product Unknown at this time.

Customer Related Risks

Have you worked with the customer in the past? Yes, all team membershave completed at least one project for the customer, though none of them

have been to the magnitude of the current project.

Does the customer have a solid idea of what is required? Yes, the customerhas access to both the System Requirements Specification, and theSoftware Requirements Specification for the Game Forge project.

Will the customer agree to spend time in formal requirements gatheringmeetings to identify project scope? Unknown. While the customer will

likely participate if asked, the inquiry has not yet been made.

Process Risks

Does senior management support a written policy statement thatemphasizes the importance of a standard process for software

development? N/A. PA Software does not have a senior management. It

should be noted that the structured method has been adopted. At the


42/52

completion of the project, it will be determined if the software method is

acceptable as a standard process, or if changes need to be implemented.

Has your organization developed a written description of the softwareprocess to be used on this project? Yes. Is under development using the

structured method as described in part three of Roger S. Pressmans

Software Engineering, A Practitioners Approach.

Are staff members willing to use the software process? Yes. The softwareprocess was agreed upon before development work began.

Is the software process used for other products? N/A. PA Software has noother projects currently.

Technical Issues

Are facilitated application specification techniques used to aid incommunication between the customer and the developer? The

development team will hold frequent meetings directly with the customer.

No formal meetings are held (all informal). During these meetings the

software is discussed and notes are taken for future review.

Are specific methods used for software analysis? Special methods will beused to analyze the softwares progress and quality. These are a series of

tests and reviews to ensure the software is up to speed. For more

information, see the Software Quality Assurance and Software

Configuration Management documents.

Do you use a specific method for data and architectural design? Data andarchitectural design will be mostly object oriented. This allows for a higher

degree data encapsulation and modularity of code.

Technology Risks

Is the technology to be built new to your organization?No


43/52

Does the software interface with new or unproven hardware?No

Is a specialized user interface demanded by the product requirements?Yes.

Development Environment Risks

Is a software project management tool available?No.

No software tools are to be used. Due to the existing deadline, the development

team felt it would be more productive to begin implementing the project thantrying to learn new software tools. After the completion of the project software

tools may be implemented for future projects.

Risk Table

Risks Category Probability (%) Impact

Computer Crash TI 70 1

Late Delivery BU 30 1Technology will not

Meet Expectations

TE 25 1

End users Resist

System

BU 20 1

Changes in

Requirement

PS 20 2

Lack of Development

Experience

TI 20 2

Lack of Database

Stability

TI 40 2

Deviation from

Software Engi.

PI 10 3

Poor Comments TI 20 4

Fig 5.5: Risk Table


44/52

Impact Values:

1 Catastrophic

2 Critical

3 Marginal

4 Negligible

Risk Refinement

At various points in the checklist, lack of software tools is identified as a potential

risk. Due to time constraints, the members of the design team felt that searching

for and learning to use additional software tools could be detrimental to the

project, as it would take time away from project development. For this reason, we

have decided to forgo the use of software tools. It will not be explored as a

potential risk because all planning will be done without considering their use.

STRATEGIES TO MANAGE RISK

Risk Mitigation, Monitoring and Management

RISK: COMPUTER CRASH

MitigationThe cost associated with a computer crash resulting in a loss of data is crucial. A

computer crash itself is not crucial, but rather the loss of data. A loss of data will

result in not being able to deliver the product to the customer. This will result in a

not receiving a letter of acceptance from the customer. Without the letter of

acceptance, the group will receive a failing grade for the course. As a result the

organization is taking steps to make multiple backup copies of the software in

development and all documentation associated with it, in multiple locations.

MonitoringWhen working on the product or documentation, the staff member should always

be aware of the stability of the computing environment theyre working in. Any

changes in the stability of the environment should be recognized and taken

seriously.


45/52

ManagementThe lack of a stable-computing environment is extremely hazardous to software

development team. In the event that the computing environment is found

unstable, the development team should cease work on that system until the

environment is made stable again, or should move to a system that is stable andcontinue working there.

RISK: LATE DELIVERY

MitigationThe cost associated with a late delivery is critical. A late delivery will result in a

late delivery of a letter of acceptance from the customer. Without the letter ofacceptance, the group will receive a failing grade for the course. Steps have been

taken to ensure a timely delivery by gauging the scope of project based on the

delivery deadline.

MonitoringA schedule has been established to monitor project status. Falling behind

schedule would indicate a potential for late delivery. The schedule will be

followed closely during all development stages.

Management

Late delivery would be a catastrophic failure in the project development. If the

project cannot be delivered on time the development team will not pass the

course. If it becomes apparent that the project will not be completed on time, the

only course of action available would be to request an extension to the deadline

form the customer.

RISK: TECHNOLOGY DOES NOT MEET SPECIFICATIONS

Mitigation


46/52

In order to prevent this from happening, meetings (formal and informal) will be

held with the customer on a routine business. This insures that the product we

are producing and the specifications of the customer are equivalent.

MonitoringThe meetings with the customer should ensure that the customer and ourorganization understand each other and the requirements for the product.

ManagementShould the development team come to the realization that their idea of the

product specifications differs from those of the customer, the customer should be

immediately notified and whatever steps necessary to rectify this problem should

be done. Preferably a meeting should be held between the development team

and the customer to discuss at length this issue.

RISK: END USERS RESIST SYSTEM

MitigationIn order to prevent this from happening, the software will be developed with the

end user in mind. The user-interface will be designed in a way to make use of the

program convenient and pleasurable.

MonitoringThe software will be developed with the end user in mind. The development team

will ask the opinion of various outside sources throughout the development

phases. Specifically the user-interface developer will be sure to get a thorough

opinion from others.

ManagementShould the program be resisted by the end user, the program will be thoroughly

examined to find the reasons that this is so. Specifically the user interface will be

investigated and if necessary, revamped into a solution.

RISK: CHANGES IN REQUIREMENTS

Mitigation


47/52

In order to prevent this from happening, meetings (formal and informal) will be

held with the customer on a routine business. This insures that the product we

are producing and the requirements of the customer are equivalent.

MonitoringThe meetings with the customer should ensure that the customer and ourorganization understand each other and the requirements for the product.

ManagementShould the development team come to the realization that their idea of the

product requirements differs from those of the customer, the customer should be

immediately notified and whatever steps necessary to rectify this problem should

be taken. Preferably a meeting should be held between the development team

and the customer to discuss at length this issue.

RISK: LACK OF DEVELOPMENT EXPERIENCE

MitigationIn order to prevent this from happening, the development team will be required

to learn the languages and techniques necessary to develop this software. The

member of the team that is the most experienced in a particular facet of the

development tools will need to instruct those who are not as well versed.

MonitoringEach member of the team should watch and see areas where another team

member may be weak. Also if one of the members is weak in a particular area it

should be brought to the attention by that member, to the other members.

ManagementThe members who have the most experience in a particular area will be required

to help those who dont out should it come to the attention of the team that aparticular member needs help.

RISK: DATABASE IS NOT STABLE


48/52

MitigationIn order to prevent this from happening, developers who are in contact with the

database, and/or use functions that interact with the database, should keep in

mind the possible errors that could be caused due to poor programming/error

checking. These issues should be brought to the attention of each of the othermembers that are also in contact with the database.

MonitoringEach user should be sure that the database is left in the condition it was before it

was touched, to identify possible problems. The first notice of database errors

should be brought to the attention of the other team members.

ManagementShould this occur, the organization would call a meeting and discuss the causes ofthe database instability, along with possible solutions?

RISK: POOR COMMENTS IN CODE

MitigationPoor code commenting can be minimized if commenting standards are better

expressed. While standards have been discussed informally, no formal standard

yet exists. A formal written standard must be established to ensure quality ofcomments in all code.

MonitoringReviews of code, with special attention given to comments will determine if they

are up to standard. This must be done frequently enough to control comment

quality. If they are not done comment quality could drop, resulting in code that is

difficult to maintain and update.

ManagementShould code comment quality begin to drop, time must be made available to

bring comments up to standard. Careful monitoring will minimize the impact of

poor commenting. Any problems are resolved by adding and refining comments

as necessary.


49/52

FUTURE DIRECTIONS:

Robustness:

In a robust system, performance degrades gracefully (rather than

catastrophically) as conditions become more different from those under which it

was trained. Differences in channel characteristics and acoustic environment

should receive particular attention.

Portability:

Portability refers to the goal of rapidly designing, developing and deploying

systems for new applications. At present, systems tend to suffer significant

degradation when moved to a new task. In order to return to peak performance,

they must be trained on examples specific to the new task, which is timeconsuming and expensive.

Adaptation:

How can systems continuously adapt to changing conditions (new speakers,

microphone, task, etc.) and improve through use? Such adaptation can occur at

many levels in systems, sub word models, word pronunciations, language models,

etc.

Language Modeling:

Current systems use statistical language models to help reduce the search space

and resolve acoustic ambiguity. As vocabulary size grows and other constraints

are relaxed to create more habitable systems, it will be increasingly important to

get as much constraint as possible from language models; perhaps incorporating

syntactic and semantic constraints that cannot be captured by purely statistical

models.

Confidence Measures:Most speech recognition systems assign scores to hypotheses for the purpose of

rank ordering them. These scores do not provide a good indication of whether a

hypothesis is correct or not, just that it is better than the other hypotheses. As we

move to tasks that require actions, we need better methods to evaluate the

absolute correctness of hypotheses.


50/52

Out-of-Vocabulary Words:

Systems are designed for use with a particular set of words, but system users may

not know exactly which words are in the system vocabulary. This leads to a

certain percentage of out-of-vocabulary words in natural conditions. Systems

must have some method of detecting such out-of-vocabulary words, or they will

end up mapping a word from the vocabulary onto the unknown word, causing an

error.

Spontaneous Speech:

Systems that are deployed for real use must deal with a variety of spontaneous

speech phenomena, such as filled pauses, false starts, hesitations, ungrammatical

constructions and other common behaviors not found in read speech.

Development on the ATIS task has resulted in progress in this area, but muchwork remains to be done.

Prosody:

Prosody refers to acoustic structure that extends over several segments or words.

Stress, intonation, and rhythm convey important information for word

recognition and the user's intentions (e.g., sarcasm, anger). Current systems do

not capture prosodic structure. How to integrate prosodic information into the

recognition architecture is a critical question that has not yet been answered.

Modeling Dynamics:

Systems assume a sequence of input frames which are treated as if they were

independent. But it is known that perceptual cues for words and phonemes

require the integration of features that reflect the movements of the articulators,

which are dynamic in nature. How to model dynamics and incorporate this

information into recognition systems is an unsolved problem.


51/52

6.REFRENCES:

*1+ Ben Mosbah, B., Speech Recognition for Disabilities People Volume

1,Information and Communication Technologies, 2006. ICTTA apos; 06.2nd, Issue,

24-28 April 2006 Page(s): 864 869

*2+ XiaoJie Yuan, Jing Fan, Design and Implementation of Voice Controlled Tetris

Game Based on Microsoft SDK 978-1-61284-774-0/11 IEEE 2011.

[3] Mukund Pabmanabhan, Michel Pichney, Large vocabulary speech recognition

algorithms, IEEE Computer magazine, 0018-9162/02, pp. 42-50, 2002.

[4] Fengyu Zhou, Guohui Tian, Yang Yang, Hairong Xiao and Jingshuai Chen,

Research and Implementation of Voice Interaction System Based On PC in

Intelligent Space Proceedings of the 2010 IEEE International Conference on

Automation and Logistics August 16-20 2010, Hong Kong and Macau 978-1-4244-

8376-1/10 IEEE 2010

*5+ Md. Abdul Kader, Biswajit Singha, and Md. Nazrul Islam, Speech Enabled

Operating System Control Proceedings of 11th International Conference on

Computer and Information Technology (ICCIT 2008) 25-27 December, 2008,Khulna, Bangladesh 1-4244-2136-7/08 IEEE 2008 [6] D. LeBlanc, Y. Ben Ahmed, S.

Selouani, Y. Bouslimani, H. Hamam, Computer Interface by Gesture and Voice for

Users with Special Needs 1-4244-0674-9/06 IEEE 2006

[7]Mu-Chun Su and Mina-Tsang Chung, Voice-controlled human computer

Interface for the Disabled COMPUTING & CONTROL ENGINEERING JOURNAL

OCTOBER 2001

*8+ Interacting With Computers by Voice: Automatic Speech Recognition andSynthesis by DOUGLAS OSHAUGHNESSY, SENIOR MEMBER, IEEE, PROCEEDINGS

OF THE IEEE, VOL. 91, NO. 9, SEPTEMBER 2003, 0018-9219/03 IEEE 2003

[9]Omar Florez-Choque, Ernesto Cuadros-Vargas, Improving Human Computer

Interaction Through Spoken Natural Language 1-4244-0707-9/07 Ieee 2007


52/52

*10+ Baseform Adaptation for Large Vocabulary Hidden Markov Model Based

Speech Recognition Systems By Gediard Rigoll Ch2847-2/90/0000-0141 Ieee

1990

[11]http://www.johndavies.notts.sch.uk/children/documents/44PhonemesVoice

d.ppt

[12]http://www.microsoft.com/speech/download/sdk51/

*13+ Titus Felix Furtuna, Dynamic Programming Algorithms in Speech

Recognition Revista Informatica Economica Nr. 2(46)/2008

report voice recognigation

Documents