developing with speech and voice recognition in mobile apps

47
Nick Landry, MVP App Artisan Nokia Developer Champion & Ambassador [email protected] @ActiveNick – www.mobility42.com Developing with Speech and Voice Recognition in Mobile Apps talk2me M3Conference

Upload: nick-landry

Post on 14-Jan-2015

1.186 views

Category:

Technology


2 download

DESCRIPTION

Can you hear me now? Move over Siri, here comes an army of speech-enabled mobile applications on Windows Phone. Mobile applications are not always easy to work with due to the small screen and small on-screen keyboard. Using our voice is a natural form of communication amongst humans, and ever since 2001: A Space Odyssey, we’ve been dreaming of computers who can converse with us like HAL9000. Or maybe you’re part of the new generation of geeks dreaming of Cortana? Thanks to the new Microsoft SDKs for voice recognition and speech synthesis (aka text-to-speech), we are now several steps closer to this reality. This session explores the development techniques you can use to add voice recognition to your Windows Phone applications, including in-app commands, standard & custom grammars, and voice commands usable outside your app. We’ll also see how your apps can respond to the user via speech synthesis, opening-up a new world of hands-free scenarios. This reality is here, you’ll see actual live demos with speech and you can now learn how to do it.

TRANSCRIPT

Page 1: Developing with Speech and Voice Recognition in Mobile Apps

Nick Landry, MVP

App Artisan

Nokia Developer Champion & Ambassador

[email protected]

@ActiveNick – www.mobility42.com

Developing with Speech and

Voice Recognition in Mobile Apps

talk2me

M3Conference

Page 2: Developing with Speech and Voice Recognition in Mobile Apps

Who is ActiveNick?

• App Artisan – Mobile Development Consultant – Mobility42

• Microsoft MVP: Windows Phone Development

• Mobile Publisher – Big Bald Apps: http://www.bigbaldapps.com

• Nokia Developer Champion and Ambassador

• Speaker. Blogger. Author. Tweeter. Gamer

• 20+ Years of Professional Experience

• Specialties:

• Mobile Development

• Location Intelligence & Geospatial Systems

• Data Visualization, HPC, Cloud

• Mobile Game Development

• Blog: www.ActiveNick.net

• Twitter: @ActiveNick

Page 3: Developing with Speech and Voice Recognition in Mobile Apps

Agenda

• Speech on Windows Phone 8

• Speech synthesis

• Controlling applications using speech

• Voice command definition files

• Building conversations

• Selecting application entry points

• Simple speech input

• Speech input and grammars

• Using Grammar Lists

Page 4: Developing with Speech and Voice Recognition in Mobile Apps

Speech on

Windows Phone 8

Page 5: Developing with Speech and Voice Recognition in Mobile Apps

Windows Phone Speech Support

• Windows Phone 7.x had voice support built into the operating system

• Programs and phone features could be started by voice commands e.g “Start MyApp”

• Incoming SMS messages could be read to the user

• The user could compose and send SMS messages

• Windows Phone 8 builds on this to allow applications to make use of speech

• Applications can speak messages using the Speech Synthesis feature

• Applications can be started and given commands

• Applications can accept commands using voice input

• Speech recognition requires an internet connection, but Speech Synthesis does not

Page 6: Developing with Speech and Voice Recognition in Mobile Apps

Speech

Synthesis

Page 7: Developing with Speech and Voice Recognition in Mobile Apps

Enabling Speech Synthesis

• If an application wishes to use speech output the

ID_CAP_SPEECH_RECOGNITION capability must

be enabled in WMAppManifest.xml

• The application can also reference the Synthesis

namespace

using Windows.Phone.Speech.Synthesis;

Page 8: Developing with Speech and Voice Recognition in Mobile Apps

Simple Speech

• The SpeechSynthesizer class provides a simple way to produce speech

• The SpeakTextAsync method speaks the content of the string using the default voice

• Note that the method is an asynchronous one, so the calling method must use the

async modifier

• Speech output does not require a network connection

async void CheeseLiker(){

SpeechSynthesizer synth = new SpeechSynthesizer();

await synth.SpeakTextAsync("I like cheese.");}

Page 9: Developing with Speech and Voice Recognition in Mobile Apps

Selecting a language

• The default speaking voice is selected automatically from the locale set for the phone

• The InstalledVoices class provides a list of all the voices available on the phone

• The above code selects a French voice

// Query for a voice that speaks French.var frenchVoices = from voice in InstalledVoices.All

where voice.Language == "fr-FR"select voice;

// Set the voice as identified by the query.synth.SetVoice(frenchVoices.ElementAt(0));

Page 10: Developing with Speech and Voice Recognition in Mobile Apps

Demo 1: Speech Synthesis

and Voice Selection

talk2me - http://bit.ly/wpt2m

Page 11: Developing with Speech and Voice Recognition in Mobile Apps

Speech Synthesis Markup Language

• You can use Speech Synthesis Markup Language (SSML) to control the spoken output

• Change the voice, pitch, rate, volume, pronunciation and other characteristics

• Also allows the inclusion of audio files into the spoken output

• You can also use the Speech synthesizer to speak the contents of a file

<?xml version="1.0" encoding="ISO-8859-1"?><speak version="1.0"xmlns=http://www.w3.org/2001/10/synthesis xml:lang="en-US"><p> Your <say-as interpret-as="ordinal">1st</say-as> request was for

<say-as interpret-as="cardinal">1</say-as> room on<say-as interpret-as="date" format="mdy">10/19/2010</say-as> ,arriving at <say-as interpret-as="time" format="hms12">12:35pm</say-as>.

</p></speak>

Page 12: Developing with Speech and Voice Recognition in Mobile Apps

Controlling

Applications

using Voice

Commands

Page 13: Developing with Speech and Voice Recognition in Mobile Apps

Application Launching using Voice command

• The Voice Command feature of Windows Phone 7 allowed users to start applications

• In Windows Phone 8 the feature has been expanded to allow the user to request data

from the application in the start command

• The data will allow a particular application page to be selected when the program starts

and can also pass request information to that page

• To start using Voice Commands you must Create a Voice Command Definition (VCD) file

that defines all the spoken commands

• The application then calls a method to register the words and phrases the first time

it is run

Page 14: Developing with Speech and Voice Recognition in Mobile Apps

The Fortune Teller Program

• The Fortune Teller program will tell

your future

• You can ask it questions and it will

display replies

• It could also speak them

• Some of the spoken commands activate

different pages of the application and

others are processed by the application

when it starts running

Page 15: Developing with Speech and Voice Recognition in Mobile Apps

<CommandPrefix> Fortune Teller </CommandPrefix><Example> Will I find money </Example><Command Name="showMoney">

<Example> Will I find money </Example><ListenFor> [Will I find] {futureMoney} </ListenFor><Feedback> Showing {futureMoney} </Feedback><Navigate Target="/money.xaml"/>

</Command><PhraseList Label="futureMoney">

<Item> money </Item><Item> riches </Item><Item> gold </Item>

</PhraseList>

The Voice Command Definition (VCD) file

• This is the “money” question: “Fortune Teller Will I find money”

Page 16: Developing with Speech and Voice Recognition in Mobile Apps

<CommandPrefix> Fortune Teller </CommandPrefix><Example> Will I find money </Example><Command Name="showMoney">

<Example> Will I find money </Example><ListenFor> [Will I find] {futureMoney} </ListenFor><Feedback> Showing {futureMoney} </Feedback><Navigate Target="/money.xaml"/>

</Command><PhraseList Label="futureMoney">

<Item> money </Item><Item> riches </Item><Item> gold </Item>

</PhraseList>

The Voice Command Definition (VCD) file

• This is the phrase the user

says to trigger the

command

• All of the Fortune Teller

commands start with this

phrase

Page 17: Developing with Speech and Voice Recognition in Mobile Apps

<CommandPrefix> Fortune Teller </CommandPrefix><Example> Will I find money </Example><Command Name="showMoney">

<Example> Will I find money </Example><ListenFor> [Will I find] {futureMoney} </ListenFor><Feedback> Showing {futureMoney} </Feedback><Navigate Target="/money.xaml"/>

</Command><PhraseList Label="futureMoney">

<Item> money </Item><Item> riches </Item><Item> gold </Item>

</PhraseList>

The Voice Command Definition (VCD) file

• This is example text that

will be displayed by the

help for this app as an

example of the commands

the app supports

Page 18: Developing with Speech and Voice Recognition in Mobile Apps

<CommandPrefix> Fortune Teller </CommandPrefix><Example> Will I find money </Example><Command Name="showMoney">

<Example> Will I find money </Example><ListenFor> [Will I find] {futureMoney} </ListenFor><Feedback> Showing {futureMoney} </Feedback><Navigate Target="/money.xaml"/>

</Command><PhraseList Label="futureMoney">

<Item> money </Item><Item> riches </Item><Item> gold </Item>

</PhraseList>

The Voice Command Definition (VCD) file

• This is the command

name

• This can be obtained from

the URL by the application

when it starts

Page 19: Developing with Speech and Voice Recognition in Mobile Apps

<CommandPrefix> Fortune Teller </CommandPrefix><Example> Will I find money </Example><Command Name="showMoney">

<Example> Will I find money </Example><ListenFor> [Will I find] {futureMoney} </ListenFor><Feedback> Showing {futureMoney} </Feedback><Navigate Target="/money.xaml"/>

</Command><PhraseList Label="futureMoney">

<Item> money </Item><Item> riches </Item><Item> gold </Item>

</PhraseList>

The Voice Command Definition (VCD) file

• This is the example for this

specific command

Page 20: Developing with Speech and Voice Recognition in Mobile Apps

<CommandPrefix> Fortune Teller </CommandPrefix><Example> Will I find money </Example><Command Name="showMoney">

<Example> Will I find money </Example><ListenFor> [Will I find] {futureMoney} </ListenFor><Feedback> Showing {futureMoney} </Feedback><Navigate Target="/money.xaml"/>

</Command><PhraseList Label="futureMoney">

<Item> money </Item><Item> riches </Item><Item> gold </Item>

</PhraseList>

The Voice Command Definition (VCD) file

• This is the trigger phrase for

this command

• It can be a sequence of

words

• The user must prefix this

sequence with the words

“Fortune Teller”

Page 21: Developing with Speech and Voice Recognition in Mobile Apps

<CommandPrefix> Fortune Teller </CommandPrefix><Example> Will I find money </Example><Command Name="showMoney">

<Example> Will I find money </Example><ListenFor> [Will I find] {futureMoney} </ListenFor><Feedback> Showing {futureMoney} </Feedback><Navigate Target="/money.xaml"/>

</Command><PhraseList Label="futureMoney">

<Item> money </Item><Item> riches </Item><Item> gold </Item>

</PhraseList>

The Voice Command Definition (VCD) file

• This is the phraselist for the

command

• The user can say any of the

words in the phraselist to

match this command

• The application can

determine the phrase used

• The phraselist can be

changed by the application

dynamically

Page 22: Developing with Speech and Voice Recognition in Mobile Apps

<CommandPrefix> Fortune Teller </CommandPrefix><Example> Will I find money </Example><Command Name="showMoney">

<Example> Will I find money </Example><ListenFor> [Will I find] {futureMoney} </ListenFor><Feedback> Showing {futureMoney} </Feedback><Navigate Target="/money.xaml"/>

</Command><PhraseList Label="futureMoney">

<Item> money </Item><Item> riches </Item><Item> gold </Item>

</PhraseList>

The Voice Command Definition (VCD) file

• This is the spoken feedback

from the command

• The feedback will insert the

phrase item used to

activate the command

Page 23: Developing with Speech and Voice Recognition in Mobile Apps

<CommandPrefix> Fortune Teller </CommandPrefix><Example> Will I find money </Example><Command Name="showMoney">

<Example> Will I find money </Example><ListenFor> [Will I find] {futureMoney} </ListenFor><Feedback> Showing {futureMoney} </Feedback><Navigate Target="/money.xaml"/>

</Command><PhraseList Label="futureMoney">

<Item> money </Item><Item> riches </Item><Item> gold </Item>

</PhraseList>

The Voice Command Definition (VCD) file

• This is the url for the page

to be activated by the

command

• Commands can go to

different pages, or all go to

MainPage.xaml if required

Page 24: Developing with Speech and Voice Recognition in Mobile Apps

<CommandPrefix> Fortune Teller </CommandPrefix><Example> Will I find money </Example><Command Name="showMoney">

<Example> Will I find money </Example><ListenFor> [Will I find] {futureMoney} </ListenFor><Feedback> Showing {futureMoney} </Feedback><Navigate Target="/money.xaml"/>

</Command><PhraseList Label="futureMoney">

<Item> money </Item><Item> riches </Item><Item> gold </Item>

</PhraseList>

The Voice Command Definition (VCD) file

• These are the phrases that

can be used at the end of

the command

• The application can modify

the phrase list of a

command dynamically

• It could give movie times

for films by name

Page 25: Developing with Speech and Voice Recognition in Mobile Apps

Installing a Voice Command Definition (VCD) file

• The VCD file can be loaded from the application or from any URI

• In this case it is just a file that has been added to the project and marked as Content

• The VCD can also be changed by the application when it is running

• The voice commands for an application are loaded into the voice command service when

the application runs

• The application must run at least once to configure the voice commands

async void setupVoiceCommands(){

await VoiceCommandService.InstallCommandSetsFromFileAsync(new Uri("ms-appx:///VCDCommands.xml", UriKind.RelativeOrAbsolute));

}

Page 26: Developing with Speech and Voice Recognition in Mobile Apps

Launching Your App With a Voice Command

• If the user now presses and holds the Windows button, and says:

Fortune Teller, Will I find gold?

the Phone displays “Showing gold”

• It then launches your app and navigates to the page associated with this command, which is

/Money.xaml

• The query string passed to the page looks like this:

"/?voiceCommandName=showMoney&futureMoney=gold&reco=Fortune%20Teller%Will%20I%20find%20gold"

Command

Name

Phaselist

Name

Recognized

phrase

Whole phrase as it

was recognized

Page 27: Developing with Speech and Voice Recognition in Mobile Apps

Handling Voice Commands

• This code runs in the OnNavigatedTo method of a target page

• Can also check for the voice command phrase that was used

if (e.NavigationMode == System.Windows.Navigation.NavigationMode.New) {if (NavigationContext.QueryString.ContainsKey("voiceCommandName")) {

string command = NavigationContext.QueryString["voiceCommandName"];switch command) {

case "tellJoke":messageTextBlock.Text = "Insert really funny joke here";break;

// Add cases for other commands. default:

messageTextBlock.Text = "Sorry, what you said makes no sense.";break;

}}

}

Page 28: Developing with Speech and Voice Recognition in Mobile Apps

Identifying phrases

• The navigation context can be queried to determine the phrase used to trigger the navigation

• In this case the program is selecting between the phrase used in the “riches” question

<PhraseList Label="futureMoney"><Item> money </Item><Item> riches </Item><Item> gold </Item>

</PhraseList>

string moneyPhrase = NavigationContext.QueryString["futureMoney"];

Page 29: Developing with Speech and Voice Recognition in Mobile Apps

Demo 2:

Fortune Teller

Page 30: Developing with Speech and Voice Recognition in Mobile Apps

Modifying the phrase list

• An application can modify a phrase list when it is running

• It cannot add new commands however

• This would allow a program to implement behaviours such as:

“Movie Planner tell me showings for Batman”

VoiceCommandSet fortuneVcs = VoiceCommandService.InstalledCommandSets["en-US"];

await fortuneVcs.UpdatePhraseListAsync("futureMoney", new string[] { "money", "cash", “millions", “piles of dough" });

Page 31: Developing with Speech and Voice Recognition in Mobile Apps

Simple Speech Input

Page 32: Developing with Speech and Voice Recognition in Mobile Apps

Recognizing Free Speech

• A Windows Phone application can recognize words and phrases

and pass them to your program

• From my experiments it seems quite reliable

• Note that a network connection is required for this feature if

you use the generic dictation grammar

• Your application can just use the speech string directly

• The standard “Listening” interface is displayed over

your application

Page 33: Developing with Speech and Voice Recognition in Mobile Apps

Simple Speech Recognition

• The above method checks for a successful response

• By default the system uses the language settings on the Phone

SpeechRecognizerUI recoWithUI;

async private void ListenButton_Click(object sender, RoutedEventArgs e){

this.recoWithUI = new SpeechRecognizerUI();

SpeechRecognitionUIResult recoResult = await recoWithUI.RecognizeWithUIAsync();

if ( recoResult.ResultStatus == SpeechRecognitionUIStatus.Succeeded )MessageBox.Show(string.Format("You said {0}.",

recoResult.RecognitionResult.Text));}

Page 34: Developing with Speech and Voice Recognition in Mobile Apps

Customizing Speech Recognition

• InitialSilenceTimeout

• The time that the speech recognizer will wait until it hears speech

• The default setting is 5 seconds

• BabbleTimeout

• The time that the speech recognizer will listen while it hears background noise

• The default setting is 0 seconds (the feature is not activated)

• EndSilenceTimeout

• The time interval during which the speech recognizer will wait before finalizing the

recognition operation

• The default setting is 150 milliseconds

Page 35: Developing with Speech and Voice Recognition in Mobile Apps

Customizing Speech Recognition

• A program can also select whether or not the speech recognition echoes back the user

input and displays it in a message box

• The code above also sets timeout values

recoWithUI.Settings.ReadoutEnabled = false; // don't read the saying backrecoWithUI.Settings.ShowConfirmation = false; // don't show the confirmation

recoWithUI.Recognizer.Settings.InitialSilenceTimeout = TimeSpan.FromSeconds(6.0);recoWithUI.Recognizer.Settings.BabbleTimeout = TimeSpan.FromSeconds(4.0);recoWithUI.Recognizer.Settings.EndSilenceTimeout = TimeSpan.FromSeconds(1.2);

Page 36: Developing with Speech and Voice Recognition in Mobile Apps

Handling Errors

• An application can bind to events which indicate problems with the audio input

• There is also an event fired when the state of the capture changes

recoWithUI.Recognizer.AudioProblemOccurred +=Recognizer_AudioProblemOccurred;recoWithUI.Recognizer.AudioCaptureStateChanged +=

Recognizer_AudioCaptureStateChanged;...

void Recognizer_AudioProblemOccurred(SpeechRecognizer sender, SpeechAudioProblemOccurredEventArgs args)

{MessageBox.Show("PLease speak more clearly");

}

Page 37: Developing with Speech and Voice Recognition in Mobile Apps

Using Grammars

Page 38: Developing with Speech and Voice Recognition in Mobile Apps

Grammars and Speech input

• The simple speech recognition we have seen so far uses the “Short Dictation” grammar

which just captures the text and returns it to the application

• You can add your own grammars that will structure the conversation between the user and

the application

• Grammars can be created using the Speech Recognition Grammar Specification (SRGS)

Version 1.0 and stored as XML files loaded when the application runs

• This is a little complex, but worth the effort if you want to create applications with rich

language interaction with the user

• If the application just needs to identify particular commands you can use a grammar list to

achieve this

• Custom grammars can be handled on the client without any network access

Page 39: Developing with Speech and Voice Recognition in Mobile Apps

Using Grammar Lists

• To create a Grammar List an application defines an array of strings that form the words in

the list

• The Grammar can then be added to the recognizer and given a name

• Multiple grammar lists can be added to a grammar recognizer

• The recognizer will now resolve any of the words in the lists that have been supplied

string [] strengthNames = { "weak", "mild", "medium", "strong", "english"};

recoWithUI.Recognizer.Grammars.AddGrammarFromList("cheeseStrength", strengthNames);

Page 40: Developing with Speech and Voice Recognition in Mobile Apps

Enabling and Disabling Grammar Lists

• An application can enable or disable particular grammars before a recognition action

• It is also possible to set relative weightings of grammar lists

• The text displayed as part of the listen operation can also be set, as shown above

recoWithUI.Settings.ListenText = "How strong do you like your cheese?";

recoWithUI.Recognizer.Grammars["cheeseStrength"].Enabled = true;

SpeechRecognitionUIResult recoResult = await recoWithUI.RecognizeWithUIAsync();

Page 41: Developing with Speech and Voice Recognition in Mobile Apps

Determining the confidence in the result

• An application can determine the confidence that the speech system has in the result that

was obtained

• Result values are High, Medium, Low, Rejected

SpeechRecognitionUIResult recoResult = await recoWithUI.RecognizeWithUIAsync();

if ( recoResult.RecognitionResult.TextConfidence == SpeechRecognitionConfidence.High )

{// select cheese based on strength value

}

Page 42: Developing with Speech and Voice Recognition in Mobile Apps

Matching Multiple Grammars

• If the spoken input matches multiple grammars a program can obtain a list of the

alternative results using recoResult.RecognitionResult.GetAlternatives

• The list is supplied in order of confidence

• The application can then determine the best fit from the context of the voice request

• This list is also provided if the request used a more complex grammar

var alternatives = recoResult.RecognitionResult.GetAlternates(3);

Page 43: Developing with Speech and Voice Recognition in Mobile Apps

Profanity

• Words that are recognised as profanities are not displayed in the response from a

recognizer command

• The speech system will also not repeat them

• They are enclosed in <Profanity> </Profanity> when supplied to the program that

receives the speech data

Page 44: Developing with Speech and Voice Recognition in Mobile Apps

Summary

• Applications in Windows Phone 8 can use speech generation and recognition to interact

with users

• Applications can produce speech output from text files which can be marked up with

Speech Synthesis Markup Language (SSML) to include sound files

• Applications can be started and provided with initial commands by registering a Voice

Command Definition File with the Windows Phone

• The commands can be picked up when a page is loaded, or the commands specify a

particular page to load

• An application can modify the phrase part of a command to change the

activation commands

• Applications can recognise speech using complex grammars or simple word lists

Page 45: Developing with Speech and Voice Recognition in Mobile Apps

Summary and Next Steps…

Get Ready to Become a Windows Phone Developer

Download the SDK at dev.windowsphone.com

Explore the Microsoft samples and start building apps in Visual Studio

Learn More About Windows Phone Development via Official Microsoft Videos

Windows Phone 8 Jump Start Training: http://bit.ly/wp8jump

Windows Phone 8 Dev for Absolute Beginners: http://bit.ly/wp8devAB

Check Out Additional Learning Resources

Pluralsight WP Training: www.pluralsight.com/training/Courses#windows-phone

Nokia Developer: www.developer.nokia.com

Download Additional Resources & Become an Expert

Download the Windows Phone Toolkit: phone.codeplex.com

Nokia Developer Offers: http://bit.ly/nokiadevoffers

45

1

2

3

4

Page 46: Developing with Speech and Voice Recognition in Mobile Apps

Windows Phone Resources

• Windows Phone Developer Blog: blogs.windows.com/windows_phone/b/wpdev

• Windows Phone Consumer Blog: blogs.windows.com/windows_phone/b/windowsphone

• Nokia WP Wiki: www.developer.nokia.com/Community/Wiki/Category:Windows_Phone

• Nokia Dvlup Challenges & Rewards: www.dvlup.com

• Nokia Conversations Blog: http://conversations.nokia.com

• Microsoft App Studio: http://apps.windowsstore.com

• Nick Landry’s Blog: ActiveNick.net

• Windows Phone Developer Magazine (online): http://flip.it/95YFG

• GeekChamp (WP & Win8 dev): www.geekchamp.com

• Windows Phone Central (News): www.wpcentral.com

Page 47: Developing with Speech and Voice Recognition in Mobile Apps

Thank You!

Slides and demos will be posted on SlideShare (see links below)

Let me know how you liked this session. Your feedback is important and appreciated.

Slideshare: www.slideshare.net/ActiveNick

Blog: www.ActiveNick.net

Twitter: @ActiveNick

Mobile Apps: www.bigbaldapps.com

LinkedIn: www.linkedin.com/in/activenick

Website: www.mobility42.com

Email: [email protected]