how to use the speech to text service

8/2/2019 How to Use the Speech to Text Service

1/7

Cloud Services SDK for WP7 1.0.7 Page 1

How to Use the Speech to Text (STT)

Servicein a WP7 Client Application

Disclaimer: This document is provided as-is. Information and views expressed in this document, including URL and

other Internet Web site references, may change without notice. You bear the risk of using it.

This document does not provide you with any legal rights to any intellectual property in any Microsoft product. You

may copy and use this document for your internal, reference purposes.

2011 Microsoft Corporation. All rights reserved.

Microsoft, Visual Basic, Visual Studio, and Windows are trademarks of the Microsoft group of companies. All other

trademarks are property of their respective owners.

IntroductionThis document describes the steps for a developer to follow to invoke the Hawaii STT API on a Windows

Phone 7 application. In order to highlight specific points this article references the

SpeechRecognitionTestClient sample application provided as part of the Hawaii Cloud Services SDK for

WP7 1.0.7. You can download the SDKhere. The source code for this sample can be found in the

download location in the folder Cloud Services SDK for

WP7\1.0.7\Samples\SpeechRecognitionTestClient.

The Speech Recognition Client LibraryThe simplest way to communicate with the Hawaii STT service is to use the Speech Recognition Client

Library. This library is included in the SDK as source code and it provides a set of simple APIs that allows

a client WP7 application to communicate with the Hawaii STT service. The location of the source code

for this library is under the SDK download folder at:

\Cloud Services SDK for WP7\Clients\SpeechRecognitionClientLibrary

As an example you can look at the Visual Studio solution that implements the

SpeechRecognitionTestClient application. As shown in the following screenshot the client library is

included in this solution as a class library project
http://research.microsoft.com/en-us/um/redmond/projects/hawaii/students/http://research.microsoft.com/en-us/um/redmond/projects/hawaii/students/http://research.microsoft.com/en-us/um/redmond/projects/hawaii/students/http://research.microsoft.com/en-us/um/redmond/projects/hawaii/students/


2/7


Create an Application Using the SpeechRecognitionClientLibrary

When writing an application using the SpeechRecognitionClientLibrary you should include the following

steps.

1. When creating your own application you can include the SpeechRecognitionrClientLibrary andthe HawaiiBaseClientProxy projects (SpeechRecognitionrClientLibrary has a dependency on

the HawaiiBaseClientProxy) in your Visual Studio solution.

Alternatively you can build those libraries separately and use references to the resultant dlls in

your Visual Studio solution.


3/7


2. In your client application, at the point where you want to initiate the STT process you need tocreate an instance of the Speech Recognition Client library class:

SpeechRecognitionClient service = newSpeechRecognitionClient("stt.hawaii-services.net", clientId);

The first parameter of the SpeechRecognitionClient constructor specifies the Uri of the Hawaii

STT service, "stt.hawaii-services.net". The second parameter, clientId, is a Guid that should

uniquely identify the client.

In the sample SpeechRecognitionTestClient application, this part is implemented in the

MainPage.xaml.cs file.

3. As an optional step you can query the server for the list of available grammars. The user canthen select a particular grammer from this list to use as the context in which they run the

speech to text process. To obtain the list of available grammars callservice.GetSpeechGrmmarsAsync(). You will also have to provide a callback method that will be

called by the client library when the asynchronous call to GetSpeechGrmmarsAsynccompletes.

service.SpeechGrammarsReceived += this.OnSpeechGrammarsReceived;service.GetSpeechGrmmarsAsync();

The Speech Recognition client library will call the OnSpeechGrammarsReceivedat the

completion of the asynchronous service call. However it is important to note that the STT client

library will make this call on a worker thread. In Silverlight you can only access UI elements onthe main UI thread. Since in the OnSpeechGrammarsReceivedmethod you will most likely want

to directly or indirectly set elements in the UI, you must make sure that this method is executed

in the main UI thread. One simple solution is to set the service.SpeechGrammarsReceived

event to a method that will invoke OnSpeechGrammarsReceived via Dispatcher. BeginInvoke.

Using Dispatcher. BeginInvoke will ensure that OnSpeechGrammarsReceived is executed on

the main UI thread. The following code illustrates this process:

service.SpeechGrammarsReceived += (s, e) =>{

// This section defines the body of what is known// as an anonymous method.// This anonymous method is the event handler method// for the service.SpeechGrammarsReceived event.

// Using Dispatcher.BeginInvoke ensures that// OnSpeechGrammarsReceived is invoked on the Main UI thread.this.Dispatcher.BeginInvoke(() => OnSpeechGrammarsReceived(s, e));

};


4/7


...

privatevoid OnSpeechGrammarsReceived(object sender,

SpeechGrammarsReceivedEventArgs e){

...

}

The syntax

(s, e) => { statement ;}

that you see in the code is a simple example of a lambda expression. It can be confusing when

seen for the first time but is a simple way to write an inline delegate. Think of the content

inside the curly brackets as the content of a method. This is called an anonymous method since

it does not have a declaration in which you provide a name for it. It is equivalent to the

following code:

service.SpeechGrammarsReceived += OnSpeechGrammarsReceivedDispatcher;...

privatevoid OnSpeechGrammarsReceivedDispatcher(

object sender,SpeechGrammarsReceivedEventArgs e)

{this.Dispatcher.BeginInvoke(() => OnSpeechGrammarsReceived(sender, e));

}

privatevoid OnSpeechGrammarsReceived(object sender,

SpeechGrammarsReceivedEventArgs e){

...

}

4. You also need to implement the event handler that is called when the asynchronousGetSpeechGrmmarsAsyncmethod completes. This means implementing the content of the

previously mentioned OnSpeechGrammarsReceivedmethod. Inside the on complete event

handler you must do the following:

a. Check whether the call completed successfully or if it had an error.


5/7


b. On successful completion do the appropriate processing. This can be as simple asshowing a list of all available grammars.

c. In the case of an error take care of the error handling. This could be as simple asdisplaying an error message. The following code illustrates the

OnSpeechRecognitionCompleted method implementation.

privatevoid OnSpeechGrammarsReceived(object sender,SpeechGrammarsReceivedEventArgs e)

{if (!e.IsErrored){

// Use the response from the service. In our case the useful// data is in e.Grammars.// Each item is a string containing the name of a grammar.

}

else{

// Display the error state.}

}

5. Next, you can create another instance ofSpeechRecognitionClient to trigger an asynchronouscall that does the actual speech to text processing.

// The grammar parameter is optional. The default is "Dictation" grammar.

SpeechRecognitionClient service = newSpeechRecognitionClient("stt.hawaii-services.net" , clientId, grammar);

service.SpeechRecognitionCompleted += (s, e) =>this.Dispatcher.BeginInvoke(() => OnSpeechRecognitionCompleted(s, e));

service.RecognizeSpeechAsync(audioBuffer);

...

privatevoid OnSpeechRecognitionCompleted (object sender,SpeechRecognitionCompletedEventArgs e)

{...

}

The audioBuffer parameter is a byte array containing the content of a PCM audio wave you

want to process. In the SpeechRecognitionTestClient sample this is the content of an audio wave

returned by the MicroPhone.GetData. When this statement is executed, a call to the Hawaii STT

service is made. Since the call is performed asynchronously, the RecognizeSpeechAsync will

return immediately. The execution of the client application will continue in parallel with the


6/7


execution of the asynchronous service call. At some point that call will complete and the on

complete handler will be invoked.

6. You also need to implement the event handler that is called when the asynchronous Hawaii STTservice call completes. This means implementing the content of the previously mentioned

OnSpeechRecognitionCompletedmethod. Inside the on complete event handler you must do

the following:

a. Check whether the call completed successfully or if it had an error.b. On successful completion do the appropriate processing. This can be as simple as

updating a list with the text options provided by the speech to text translation.

c. In the case of an error take care of the error handling. This could be as simple asdisplaying an error message. The following code illustrates the

OnSpeechRecognitionCompleted method implementation.

privatevoid OnSpeechRecognitionCompleted(object sender,

SpeechRecognitionCompletedEventArgs e){

if (!e.IsErrored){

// Use the response from the service. In this case the relevant// data is in e.RecognitionResults.// Each item is a string that represents one possible text translation.

}else{

// Display the error state.

}}

7. On successful completion the result of the STT process is provided by the e. RecognitionResultsproperty.

// Gets or sets the list of recognized texts.List SpeechRecognitionCompletedEventArgs.RecognitionResults

RecognitionResults is a list of 10 strings each representing a possible text string for the speech

identified by the STT service. The strings are listed in descending order of their recognition

confidence level with the first string having the highest confidence level. For more information

on the classes and properties see the Cloud Services SDK for WP7 help file, Cloud Services SDK

for WP7.chm, at ..\Microsoft Research\Cloud Services SDK for WP7\1.0.7\Documentation or


7/7


wherever you downloaded the SDK. The following class diagram shows the STT service results.

Audio Tips and GuidelinesUse the following tips and guidelines for the STT service.

Limit speech input to a maximum of 10 seconds. Up to 10 seconds of speech is supported by theSpeech to Text service. Audio streams longer than this will result in the error Null/Invalid

response object from server.

You may experience lower-quality results on Speech-to-Text services with the Dell Venue.

ConclusionYour client application can now call the Hawaii STT service and your event handler will do the

appropriate processing when the asynchronous Hawaii STT service call completes.

how to use the speech to text service

Documents