how to use the speech to text service
TRANSCRIPT
-
8/2/2019 How to Use the Speech to Text Service
1/7
Cloud Services SDK for WP7 1.0.7 Page 1
How to Use the Speech to Text (STT)
Servicein a WP7 Client Application
Disclaimer: This document is provided as-is. Information and views expressed in this document, including URL and
other Internet Web site references, may change without notice. You bear the risk of using it.
This document does not provide you with any legal rights to any intellectual property in any Microsoft product. You
may copy and use this document for your internal, reference purposes.
2011 Microsoft Corporation. All rights reserved.
Microsoft, Visual Basic, Visual Studio, and Windows are trademarks of the Microsoft group of companies. All other
trademarks are property of their respective owners.
IntroductionThis document describes the steps for a developer to follow to invoke the Hawaii STT API on a Windows
Phone 7 application. In order to highlight specific points this article references the
SpeechRecognitionTestClient sample application provided as part of the Hawaii Cloud Services SDK for
WP7 1.0.7. You can download the SDKhere. The source code for this sample can be found in the
download location in the folder Cloud Services SDK for
WP7\1.0.7\Samples\SpeechRecognitionTestClient.
The Speech Recognition Client LibraryThe simplest way to communicate with the Hawaii STT service is to use the Speech Recognition Client
Library. This library is included in the SDK as source code and it provides a set of simple APIs that allows
a client WP7 application to communicate with the Hawaii STT service. The location of the source code
for this library is under the SDK download folder at:
\Cloud Services SDK for WP7\Clients\SpeechRecognitionClientLibrary
As an example you can look at the Visual Studio solution that implements the
SpeechRecognitionTestClient application. As shown in the following screenshot the client library is
included in this solution as a class library project
http://research.microsoft.com/en-us/um/redmond/projects/hawaii/students/http://research.microsoft.com/en-us/um/redmond/projects/hawaii/students/http://research.microsoft.com/en-us/um/redmond/projects/hawaii/students/http://research.microsoft.com/en-us/um/redmond/projects/hawaii/students/ -
8/2/2019 How to Use the Speech to Text Service
2/7
Cloud Services SDK for WP7 1.0.7 Page 2
Create an Application Using the SpeechRecognitionClientLibrary
When writing an application using the SpeechRecognitionClientLibrary you should include the following
steps.
1. When creating your own application you can include the SpeechRecognitionrClientLibrary andthe HawaiiBaseClientProxy projects (SpeechRecognitionrClientLibrary has a dependency on
the HawaiiBaseClientProxy) in your Visual Studio solution.
Alternatively you can build those libraries separately and use references to the resultant dlls in
your Visual Studio solution.
-
8/2/2019 How to Use the Speech to Text Service
3/7
Cloud Services SDK for WP7 1.0.7 Page 3
2. In your client application, at the point where you want to initiate the STT process you need tocreate an instance of the Speech Recognition Client library class:
SpeechRecognitionClient service = newSpeechRecognitionClient("stt.hawaii-services.net", clientId);
The first parameter of the SpeechRecognitionClient constructor specifies the Uri of the Hawaii
STT service, "stt.hawaii-services.net". The second parameter, clientId, is a Guid that should
uniquely identify the client.
In the sample SpeechRecognitionTestClient application, this part is implemented in the
MainPage.xaml.cs file.
3. As an optional step you can query the server for the list of available grammars. The user canthen select a particular grammer from this list to use as the context in which they run the
speech to text process. To obtain the list of available grammars callservice.GetSpeechGrmmarsAsync(). You will also have to provide a callback method that will be
called by the client library when the asynchronous call to GetSpeechGrmmarsAsynccompletes.
service.SpeechGrammarsReceived += this.OnSpeechGrammarsReceived;service.GetSpeechGrmmarsAsync();
The Speech Recognition client library will call the OnSpeechGrammarsReceivedat the
completion of the asynchronous service call. However it is important to note that the STT client
library will make this call on a worker thread. In Silverlight you can only access UI elements onthe main UI thread. Since in the OnSpeechGrammarsReceivedmethod you will most likely want
to directly or indirectly set elements in the UI, you must make sure that this method is executed
in the main UI thread. One simple solution is to set the service.SpeechGrammarsReceived
event to a method that will invoke OnSpeechGrammarsReceived via Dispatcher. BeginInvoke.
Using Dispatcher. BeginInvoke will ensure that OnSpeechGrammarsReceived is executed on
the main UI thread. The following code illustrates this process:
service.SpeechGrammarsReceived += (s, e) =>{
// This section defines the body of what is known// as an anonymous method.// This anonymous method is the event handler method// for the service.SpeechGrammarsReceived event.
// Using Dispatcher.BeginInvoke ensures that// OnSpeechGrammarsReceived is invoked on the Main UI thread.this.Dispatcher.BeginInvoke(() => OnSpeechGrammarsReceived(s, e));
};
-
8/2/2019 How to Use the Speech to Text Service
4/7
Cloud Services SDK for WP7 1.0.7 Page 4
...
privatevoid OnSpeechGrammarsReceived(object sender,
SpeechGrammarsReceivedEventArgs e){
...
}
The syntax
(s, e) => { statement ;}
that you see in the code is a simple example of a lambda expression. It can be confusing when
seen for the first time but is a simple way to write an inline delegate. Think of the content
inside the curly brackets as the content of a method. This is called an anonymous method since
it does not have a declaration in which you provide a name for it. It is equivalent to the
following code:
service.SpeechGrammarsReceived += OnSpeechGrammarsReceivedDispatcher;...
privatevoid OnSpeechGrammarsReceivedDispatcher(
object sender,SpeechGrammarsReceivedEventArgs e)
{this.Dispatcher.BeginInvoke(() => OnSpeechGrammarsReceived(sender, e));
}
privatevoid OnSpeechGrammarsReceived(object sender,
SpeechGrammarsReceivedEventArgs e){
...
}
4. You also need to implement the event handler that is called when the asynchronousGetSpeechGrmmarsAsyncmethod completes. This means implementing the content of the
previously mentioned OnSpeechGrammarsReceivedmethod. Inside the on complete event
handler you must do the following:
a. Check whether the call completed successfully or if it had an error.
-
8/2/2019 How to Use the Speech to Text Service
5/7
Cloud Services SDK for WP7 1.0.7 Page 5
b. On successful completion do the appropriate processing. This can be as simple asshowing a list of all available grammars.
c. In the case of an error take care of the error handling. This could be as simple asdisplaying an error message. The following code illustrates the
OnSpeechRecognitionCompleted method implementation.
privatevoid OnSpeechGrammarsReceived(object sender,SpeechGrammarsReceivedEventArgs e)
{if (!e.IsErrored){
// Use the response from the service. In our case the useful// data is in e.Grammars.// Each item is a string containing the name of a grammar.
}
else{
// Display the error state.}
}
5. Next, you can create another instance ofSpeechRecognitionClient to trigger an asynchronouscall that does the actual speech to text processing.
// The grammar parameter is optional. The default is "Dictation" grammar.
SpeechRecognitionClient service = newSpeechRecognitionClient("stt.hawaii-services.net" , clientId, grammar);
service.SpeechRecognitionCompleted += (s, e) =>this.Dispatcher.BeginInvoke(() => OnSpeechRecognitionCompleted(s, e));
service.RecognizeSpeechAsync(audioBuffer);
...
privatevoid OnSpeechRecognitionCompleted (object sender,SpeechRecognitionCompletedEventArgs e)
{...
}
The audioBuffer parameter is a byte array containing the content of a PCM audio wave you
want to process. In the SpeechRecognitionTestClient sample this is the content of an audio wave
returned by the MicroPhone.GetData. When this statement is executed, a call to the Hawaii STT
service is made. Since the call is performed asynchronously, the RecognizeSpeechAsync will
return immediately. The execution of the client application will continue in parallel with the
-
8/2/2019 How to Use the Speech to Text Service
6/7
Cloud Services SDK for WP7 1.0.7 Page 6
execution of the asynchronous service call. At some point that call will complete and the on
complete handler will be invoked.
6. You also need to implement the event handler that is called when the asynchronous Hawaii STTservice call completes. This means implementing the content of the previously mentioned
OnSpeechRecognitionCompletedmethod. Inside the on complete event handler you must do
the following:
a. Check whether the call completed successfully or if it had an error.b. On successful completion do the appropriate processing. This can be as simple as
updating a list with the text options provided by the speech to text translation.
c. In the case of an error take care of the error handling. This could be as simple asdisplaying an error message. The following code illustrates the
OnSpeechRecognitionCompleted method implementation.
privatevoid OnSpeechRecognitionCompleted(object sender,
SpeechRecognitionCompletedEventArgs e){
if (!e.IsErrored){
// Use the response from the service. In this case the relevant// data is in e.RecognitionResults.// Each item is a string that represents one possible text translation.
}else{
// Display the error state.
}}
7. On successful completion the result of the STT process is provided by the e. RecognitionResultsproperty.
// Gets or sets the list of recognized texts.List SpeechRecognitionCompletedEventArgs.RecognitionResults
RecognitionResults is a list of 10 strings each representing a possible text string for the speech
identified by the STT service. The strings are listed in descending order of their recognition
confidence level with the first string having the highest confidence level. For more information
on the classes and properties see the Cloud Services SDK for WP7 help file, Cloud Services SDK
for WP7.chm, at ..\Microsoft Research\Cloud Services SDK for WP7\1.0.7\Documentation or
-
8/2/2019 How to Use the Speech to Text Service
7/7
Cloud Services SDK for WP7 1.0.7 Page 7
wherever you downloaded the SDK. The following class diagram shows the STT service results.
Audio Tips and GuidelinesUse the following tips and guidelines for the STT service.
Limit speech input to a maximum of 10 seconds. Up to 10 seconds of speech is supported by theSpeech to Text service. Audio streams longer than this will result in the error Null/Invalid
response object from server.
You may experience lower-quality results on Speech-to-Text services with the Dell Venue.
ConclusionYour client application can now call the Hawaii STT service and your event handler will do the
appropriate processing when the asynchronous Hawaii STT service call completes.