irobotrock: a speech recognition mobile applicationltahvild/courses/ece750-11/material… ·...

19
iROBOTROCK: A Speech Recognition Mobile Application Reema Pimpale Prabhat Narayan Anand Kamath

Upload: others

Post on 09-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: iROBOTROCK: A Speech Recognition Mobile Applicationltahvild/courses/ECE750-11/material… · Introduction Remotely accessing machine to do tasks such as • Play music • Type e-mail

iROBOTROCK:A

Speech Recognition Mobile Application

Reema PimpalePrabhat NarayanAnand Kamath

Page 2: iROBOTROCK: A Speech Recognition Mobile Applicationltahvild/courses/ECE750-11/material… · Introduction Remotely accessing machine to do tasks such as • Play music • Type e-mail

Outline• Introduction• Technologies• Current Approaches• Our Solution• Users (Application Domain)• Our Approach• Pending Functionality• Future Enhancements• References

Page 3: iROBOTROCK: A Speech Recognition Mobile Applicationltahvild/courses/ECE750-11/material… · Introduction Remotely accessing machine to do tasks such as • Play music • Type e-mail

IntroductionRemotely accessing machine to do tasks such as• Play music• Type e-mail• Take notes• Browse web

MotivationTo increase user friendliness by using Speech Recognition.

Page 4: iROBOTROCK: A Speech Recognition Mobile Applicationltahvild/courses/ECE750-11/material… · Introduction Remotely accessing machine to do tasks such as • Play music • Type e-mail

Technologies• iOS – Operating system developed by Apple in use on

their mobile devices, such as the iPhone• Python – Cross platform programming language• CMUSphinx – A speech recognition toolkit which has a

number of packages for different tasks and applications.• PocketSphinx – A package of CMUSphinx• SphinxBase – A package of CMUSphinx• Acoustic Model – is used by a speech recognition engine

to recognize speech.• LMTool – Web based tool for generating the language

model

Page 5: iROBOTROCK: A Speech Recognition Mobile Applicationltahvild/courses/ECE750-11/material… · Introduction Remotely accessing machine to do tasks such as • Play music • Type e-mail

Current ApproachesLatest technologies• iPhone – Siri• Android – IrisFeatures supported by the above technology:• Send text messages• Listen to music• Call contacts• Send e-mail• View a map• Visit websites

Page 6: iROBOTROCK: A Speech Recognition Mobile Applicationltahvild/courses/ECE750-11/material… · Introduction Remotely accessing machine to do tasks such as • Play music • Type e-mail

Our Solution• Recognize the speech and connect to your computer.

• Open a website on your computer by giving a voice inputon your phone.

• Type and send an email from your computer using voiceinput from your phone.

• In short, mobile device acts as a remote control toconnect and carry out tasks on your computer.

Page 7: iROBOTROCK: A Speech Recognition Mobile Applicationltahvild/courses/ECE750-11/material… · Introduction Remotely accessing machine to do tasks such as • Play music • Type e-mail

Users (Application Domain)

• Extremely useful for handicapped people.

• Useful for people on the go.

Page 8: iROBOTROCK: A Speech Recognition Mobile Applicationltahvild/courses/ECE750-11/material… · Introduction Remotely accessing machine to do tasks such as • Play music • Type e-mail

Our Approach

Researched the available

speech recognition

libraries

Designed the high

level architecture

design

Designed the

Sequence diagram to understand

the data flow

Build the Application

for the iPhone (Client

Application)

Build the server

ApplicationTest the

application

Page 9: iROBOTROCK: A Speech Recognition Mobile Applicationltahvild/courses/ECE750-11/material… · Introduction Remotely accessing machine to do tasks such as • Play music • Type e-mail

• Client side - Uses the open source components ofCMUSphinx to provide very complicated voice-recognition functionality.

• Server side – Uses Python to dynamically load additionalclasses at run-time and allow enhanced functionalitywithout requiring future users to touch any previouslydeveloped code

Researched the

available speech

recognition libraries

Designing the high level architecture

design

Designed the Sequence diagram to understand the

data flow

Build the Application for the iPhone

(Client Application) Build the server

Application Test the application

Page 10: iROBOTROCK: A Speech Recognition Mobile Applicationltahvild/courses/ECE750-11/material… · Introduction Remotely accessing machine to do tasks such as • Play music • Type e-mail

Researched the available speech

recognition libraries

Designed the high

level architecture design

Designed the Sequence diagram to understand the

data flow

Build the Application for the iPhone

(Client Application) Build the server

Application Test the application

Page 11: iROBOTROCK: A Speech Recognition Mobile Applicationltahvild/courses/ECE750-11/material… · Introduction Remotely accessing machine to do tasks such as • Play music • Type e-mail

Researched the available speech

recognition libraries

Designing the high level

architecture design

Designed the

Sequence diagram to understand

the data flow

Build the Application for the iPhone (Client

Application) Build the server

Application Test the application

Page 12: iROBOTROCK: A Speech Recognition Mobile Applicationltahvild/courses/ECE750-11/material… · Introduction Remotely accessing machine to do tasks such as • Play music • Type e-mail

Uses the CMUSphinx toolkit and implement thearchitecture for the client.

The main components are• PocketSphinx: lightweight recognizer library written in C.

• Sphinxbase: support library required by Pocketsphinx.(Part of the PocketSphinx component).

• Dictionary: (Language Model and Acoustic model)

Researched the available speech

recognition libraries

Designing the high level

architecture design

Designed the Sequence diagram to

understand the data flow

Build the Application

for the iPhone (Client

Application)

Build the server Application Test the application

Page 13: iROBOTROCK: A Speech Recognition Mobile Applicationltahvild/courses/ECE750-11/material… · Introduction Remotely accessing machine to do tasks such as • Play music • Type e-mail

• Written in Python• Basic TCP server that listens for messages from client

on an assigned port• Waits for keywords to activate specific functionality• Ex. "MESSAGE" triggers e-mail composition component

– All subsequent messages received will be appendedto a message string

– Waits for "BYE" keyword to end message and sendemail

Researched the available speech

recognition libraries

Designing the high level architecture

design

Designed the Sequence diagram to understand the

data flow

Build the Application for the iPhone

(Client Application)

Build the server

ApplicationTest the application

Page 14: iROBOTROCK: A Speech Recognition Mobile Applicationltahvild/courses/ECE750-11/material… · Introduction Remotely accessing machine to do tasks such as • Play music • Type e-mail

Live Demonstration

Researched the available speech

recognition libraries

Designing the high level architecture

design

Designed the Sequence diagram to understand the

data flow

Build the Application for the iPhone

(Client Application) Build the server

ApplicationTest the

application

Page 15: iROBOTROCK: A Speech Recognition Mobile Applicationltahvild/courses/ECE750-11/material… · Introduction Remotely accessing machine to do tasks such as • Play music • Type e-mail

Pending Functionality• Dynamically load additional components enabled on the

server.

• New component will be written as a Python class with a defined set of callbacks.

• Each class defines trigger messages and defines how following messages should be handled to perform any required task

Page 16: iROBOTROCK: A Speech Recognition Mobile Applicationltahvild/courses/ECE750-11/material… · Introduction Remotely accessing machine to do tasks such as • Play music • Type e-mail

Future Enhancements• Currently uses unencrypted TCP connection.

• Security can be enabled using TLS (Transport LayerSecurity) to encrypt messages.

• Plugin based system can be extended in any way thatthe user desires

Page 17: iROBOTROCK: A Speech Recognition Mobile Applicationltahvild/courses/ECE750-11/material… · Introduction Remotely accessing machine to do tasks such as • Play music • Type e-mail

RESULT

Successfully implemented a mobile application usingspeech recognition which connects to the computerremotely and is able to send an email via the voice inputgiven through the phone.

Page 18: iROBOTROCK: A Speech Recognition Mobile Applicationltahvild/courses/ECE750-11/material… · Introduction Remotely accessing machine to do tasks such as • Play music • Type e-mail

References• Carnegie Mellon University, “CMU Sphinx – Open Source Toolkit for Speech

Recognition,” March 2011. http://cmusphinx.sourceforge.net/wiki/

• Huggins-Daines, D.; Kumar, M.; Chan, A.; Black, A.W.; Ravishankar, M.; Rudnicky, A.I.; ,"Pocketsphinx: A Free, Real-Time Continuous Speech Recognition System for Hand-Held Devices," Acoustics, Speech and Signal Processing, 2006. ICASSP 2006Proceedings. 2006 IEEE International Conference on , vol.1, no., pp.I, 14-19 May 2006.

• Gulic, M.; Lucanin, D.; Simic, A.; , "A digit and spelling speech recognition system for theCroatian language," MIPRO, 2011 Proceedings of the 34th International Convention ,vol., no., pp.1673-1678, 23-27 May 2011.

• K. F. Lee, H. W. Hon, and R. Reddy, "An overview of the SPHINX speech recognitionsystem," IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 38, no.1pp. 35-45, Jan. 1990.

• Cohen, J. Embedded speech recognition applications in mobile phones: status, trendsand challenges. Proc. ICASSP 2008, IEEE Press (2008), pp. 5352-5355.

• Python Software Foundation, “The Python Standard Library,” November 2011.http://docs.python.org/library/index.html

Page 19: iROBOTROCK: A Speech Recognition Mobile Applicationltahvild/courses/ECE750-11/material… · Introduction Remotely accessing machine to do tasks such as • Play music • Type e-mail

Thank You

Questions?