spik v1.0 voice commands execution in a windows environment dekel abelson eliran dahan instructor:...

17
Spik v1.0 Voice Commands Execution in a Windows Environment Dekel Abelson Eliran Dahan Instructor: Ari Todtfeld

Post on 21-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Spik v1.0 Voice Commands Execution in a Windows Environment Dekel Abelson Eliran Dahan Instructor: Ari Todtfeld

Spik v1.0

Voice Commands Execution in a Windows Environment

Dekel AbelsonEliran Dahan

Instructor: Ari Todtfeld

Page 2: Spik v1.0 Voice Commands Execution in a Windows Environment Dekel Abelson Eliran Dahan Instructor: Ari Todtfeld

Objectives

• Analysis and exploration of Voice-Recognition systems, the abilities of such systems and its limitations

• Understanding the Windows architecture

and programming concepts• Development and implementation of a tool that enables

users to execute voice commands in a Windows environment, including the restructuring of a graphic interface (GUI) of the tool.

• Learning the Microsoft Speech SDK 5.1

(Software Development Kit) and its speech engine

Page 3: Spik v1.0 Voice Commands Execution in a Windows Environment Dekel Abelson Eliran Dahan Instructor: Ari Todtfeld

Project skills

• C++ programming skills

• XML (Extensible Markup Language) programming skills

• Programming in windows environment include API (Application Programming Interface) commands

Page 4: Spik v1.0 Voice Commands Execution in a Windows Environment Dekel Abelson Eliran Dahan Instructor: Ari Todtfeld

Brief history

• 1994 - Release of Dragon Systems' “DragonDictate” for Windows 1.0,

using discrete speech recognition technology • 1996 - Introduction of IBM’s “MedSpeak”, being the first continuous

speech recognition software• 1997 - Dragon Systems’ “NaturallySpeaking” first general-purpose

continuous speech software program

Two months later IBM release it’s “ViaVoice”

• 2005 – Due to improvements in PC’s process time and in the algorithms

used - today there are several speech recognition programs in the market.

Page 5: Spik v1.0 Voice Commands Execution in a Windows Environment Dekel Abelson Eliran Dahan Instructor: Ari Todtfeld

Voice recognition

• Voice recognition follows these steps:1. Spoken words enter a microphone2. Audio is processed by the computer's sound card3. The software discriminates between lower-frequency

vowels and higher-frequency consonants and compares the results with phonemes, the smallest building blocks of speech

The software then compares results to groups of phonemes, and then to actual words, determining the most likely match

4. The sentence is transferred to a word processing application

Page 6: Spik v1.0 Voice Commands Execution in a Windows Environment Dekel Abelson Eliran Dahan Instructor: Ari Todtfeld

Architecture

Voice command by the user

SAPI 5.1 Speech Application Program Interface

Commands executionusing API functions

Processing the recognizedcommands by C++/XML code

Page 7: Spik v1.0 Voice Commands Execution in a Windows Environment Dekel Abelson Eliran Dahan Instructor: Ari Todtfeld

GUI

• Execution file - spik.exe• The GUI - A window that receives the voice commands

from the user. This GUI has been built in C++ using the

basic “Windown” class.

Page 8: Spik v1.0 Voice Commands Execution in a Windows Environment Dekel Abelson Eliran Dahan Instructor: Ari Todtfeld

Sapi 5.1

• The SAPI provides a high-level interface between the application

and the speech engine• The TTS (Text-To-Speech) system synthesize text strings

and files into spoken audio Speech • Speech recognizers convert human spoken audio into

readable text strings

Page 9: Spik v1.0 Voice Commands Execution in a Windows Environment Dekel Abelson Eliran Dahan Instructor: Ari Todtfeld

Processing

Main function contains the infinite loop waiting for messages to process

Main window procedurethat handles the messages to the window

Execute commands that have been identified by the speech engine

Microsoft Speech Engine

API functions

Page 10: Spik v1.0 Voice Commands Execution in a Windows Environment Dekel Abelson Eliran Dahan Instructor: Ari Todtfeld

Commands Execution

• Windows API is a set of Application Programming Interfaces available in the Microsoft Windows operating systems which enable developers to create software

• The API consists of C functions implemented in dynamically linked libraries (DLLs), mainly in core DLLs - kernel32.dll, user32.dll and gdi32.dll

• Main API functions we have used:CreateProcess()– runs executable filesWinExec() – runs windows proceduresShellExecute() – runs URL filesShowWindow() – sets the specified window's show state SendMessage() – sends the specified message to a window or

windows keybd_event() – synthesizes a keystroke PostMessage() – places (posts) a message in the message queue

associated with the thread that created the specified window

Page 11: Spik v1.0 Voice Commands Execution in a Windows Environment Dekel Abelson Eliran Dahan Instructor: Ari Todtfeld

The Code

קבצי קוד מקור Cבשפת ++

Headerקבצי של התוכנית

קובץ תוכנית הרצה

XMLקובץ טקסט בפורמט לשימוש מנוע זיהוי הקול

קובץ טקסט המכיל מחרוזותלשימוש התוכנית

קובץ מקומפללשימוש מנוע זיהוי הקול

Headerקבצי של מנוע זיהוי הקול

Page 12: Spik v1.0 Voice Commands Execution in a Windows Environment Dekel Abelson Eliran Dahan Instructor: Ari Todtfeld

Adaptation & Training

• The speech recognition engine adapts itself to the user’s voice, vocabulary and speech style in order to improve speech recognition accuracy

• After adaptation there will be only ¼ of recognition errors and the accuracy will rise

• As more training is being done,

accuracy will rise to

around 95%.

Page 13: Spik v1.0 Voice Commands Execution in a Windows Environment Dekel Abelson Eliran Dahan Instructor: Ari Todtfeld

Voice command example

• Calculator usage:Say the voice command “Open Calculator”To run the calc.exe program

Say a simple exerciseAnd than say “Equal” or “Result”To show the solution

Page 14: Spik v1.0 Voice Commands Execution in a Windows Environment Dekel Abelson Eliran Dahan Instructor: Ari Todtfeld

Voice command example

• Run programs - notepad

command line

• Internet usage - search google

• Windows navigation - my documents

system properties

start menu

screen saver

Page 15: Spik v1.0 Voice Commands Execution in a Windows Environment Dekel Abelson Eliran Dahan Instructor: Ari Todtfeld

Added value of the project

• Advanced versions based on Spik v1.0 will be a helpful tool for using the computer and the web, for physically challenged population

Page 16: Spik v1.0 Voice Commands Execution in a Windows Environment Dekel Abelson Eliran Dahan Instructor: Ari Todtfeld

Future Development

• Advanced OS navigation in order to eliminate

the use of the keyboard

• Adding Speech-to-Text capabilities

• Improved GUI to let users enter their own

voice commands

Page 17: Spik v1.0 Voice Commands Execution in a Windows Environment Dekel Abelson Eliran Dahan Instructor: Ari Todtfeld

Q&A