how to use the speech to text service

Upload: ayman-farhat

Post on 06-Apr-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/2/2019 How to Use the Speech to Text Service

    1/7

    Cloud Services SDK for WP7 1.0.7 Page 1

    How to Use the Speech to Text (STT)

    Servicein a WP7 Client Application

    Disclaimer: This document is provided as-is. Information and views expressed in this document, including URL and

    other Internet Web site references, may change without notice. You bear the risk of using it.

    This document does not provide you with any legal rights to any intellectual property in any Microsoft product. You

    may copy and use this document for your internal, reference purposes.

    2011 Microsoft Corporation. All rights reserved.

    Microsoft, Visual Basic, Visual Studio, and Windows are trademarks of the Microsoft group of companies. All other

    trademarks are property of their respective owners.

    IntroductionThis document describes the steps for a developer to follow to invoke the Hawaii STT API on a Windows

    Phone 7 application. In order to highlight specific points this article references the

    SpeechRecognitionTestClient sample application provided as part of the Hawaii Cloud Services SDK for

    WP7 1.0.7. You can download the SDKhere. The source code for this sample can be found in the

    download location in the folder Cloud Services SDK for

    WP7\1.0.7\Samples\SpeechRecognitionTestClient.

    The Speech Recognition Client LibraryThe simplest way to communicate with the Hawaii STT service is to use the Speech Recognition Client

    Library. This library is included in the SDK as source code and it provides a set of simple APIs that allows

    a client WP7 application to communicate with the Hawaii STT service. The location of the source code

    for this library is under the SDK download folder at:

    \Cloud Services SDK for WP7\Clients\SpeechRecognitionClientLibrary

    As an example you can look at the Visual Studio solution that implements the

    SpeechRecognitionTestClient application. As shown in the following screenshot the client library is

    included in this solution as a class library project

    http://research.microsoft.com/en-us/um/redmond/projects/hawaii/students/http://research.microsoft.com/en-us/um/redmond/projects/hawaii/students/http://research.microsoft.com/en-us/um/redmond/projects/hawaii/students/http://research.microsoft.com/en-us/um/redmond/projects/hawaii/students/
  • 8/2/2019 How to Use the Speech to Text Service

    2/7

    Cloud Services SDK for WP7 1.0.7 Page 2

    Create an Application Using the SpeechRecognitionClientLibrary

    When writing an application using the SpeechRecognitionClientLibrary you should include the following

    steps.

    1. When creating your own application you can include the SpeechRecognitionrClientLibrary andthe HawaiiBaseClientProxy projects (SpeechRecognitionrClientLibrary has a dependency on

    the HawaiiBaseClientProxy) in your Visual Studio solution.

    Alternatively you can build those libraries separately and use references to the resultant dlls in

    your Visual Studio solution.

  • 8/2/2019 How to Use the Speech to Text Service

    3/7

    Cloud Services SDK for WP7 1.0.7 Page 3

    2. In your client application, at the point where you want to initiate the STT process you need tocreate an instance of the Speech Recognition Client library class:

    SpeechRecognitionClient service = newSpeechRecognitionClient("stt.hawaii-services.net", clientId);

    The first parameter of the SpeechRecognitionClient constructor specifies the Uri of the Hawaii

    STT service, "stt.hawaii-services.net". The second parameter, clientId, is a Guid that should

    uniquely identify the client.

    In the sample SpeechRecognitionTestClient application, this part is implemented in the

    MainPage.xaml.cs file.

    3. As an optional step you can query the server for the list of available grammars. The user canthen select a particular grammer from this list to use as the context in which they run the

    speech to text process. To obtain the list of available grammars callservice.GetSpeechGrmmarsAsync(). You will also have to provide a callback method that will be

    called by the client library when the asynchronous call to GetSpeechGrmmarsAsynccompletes.

    service.SpeechGrammarsReceived += this.OnSpeechGrammarsReceived;service.GetSpeechGrmmarsAsync();

    The Speech Recognition client library will call the OnSpeechGrammarsReceivedat the

    completion of the asynchronous service call. However it is important to note that the STT client

    library will make this call on a worker thread. In Silverlight you can only access UI elements onthe main UI thread. Since in the OnSpeechGrammarsReceivedmethod you will most likely want

    to directly or indirectly set elements in the UI, you must make sure that this method is executed

    in the main UI thread. One simple solution is to set the service.SpeechGrammarsReceived

    event to a method that will invoke OnSpeechGrammarsReceived via Dispatcher. BeginInvoke.

    Using Dispatcher. BeginInvoke will ensure that OnSpeechGrammarsReceived is executed on

    the main UI thread. The following code illustrates this process:

    service.SpeechGrammarsReceived += (s, e) =>{

    // This section defines the body of what is known// as an anonymous method.// This anonymous method is the event handler method// for the service.SpeechGrammarsReceived event.

    // Using Dispatcher.BeginInvoke ensures that// OnSpeechGrammarsReceived is invoked on the Main UI thread.this.Dispatcher.BeginInvoke(() => OnSpeechGrammarsReceived(s, e));

    };

  • 8/2/2019 How to Use the Speech to Text Service

    4/7

    Cloud Services SDK for WP7 1.0.7 Page 4

    ...

    privatevoid OnSpeechGrammarsReceived(object sender,

    SpeechGrammarsReceivedEventArgs e){

    ...

    }

    The syntax

    (s, e) => { statement ;}

    that you see in the code is a simple example of a lambda expression. It can be confusing when

    seen for the first time but is a simple way to write an inline delegate. Think of the content

    inside the curly brackets as the content of a method. This is called an anonymous method since

    it does not have a declaration in which you provide a name for it. It is equivalent to the

    following code:

    service.SpeechGrammarsReceived += OnSpeechGrammarsReceivedDispatcher;...

    privatevoid OnSpeechGrammarsReceivedDispatcher(

    object sender,SpeechGrammarsReceivedEventArgs e)

    {this.Dispatcher.BeginInvoke(() => OnSpeechGrammarsReceived(sender, e));

    }

    privatevoid OnSpeechGrammarsReceived(object sender,

    SpeechGrammarsReceivedEventArgs e){

    ...

    }

    4. You also need to implement the event handler that is called when the asynchronousGetSpeechGrmmarsAsyncmethod completes. This means implementing the content of the

    previously mentioned OnSpeechGrammarsReceivedmethod. Inside the on complete event

    handler you must do the following:

    a. Check whether the call completed successfully or if it had an error.

  • 8/2/2019 How to Use the Speech to Text Service

    5/7

    Cloud Services SDK for WP7 1.0.7 Page 5

    b. On successful completion do the appropriate processing. This can be as simple asshowing a list of all available grammars.

    c. In the case of an error take care of the error handling. This could be as simple asdisplaying an error message. The following code illustrates the

    OnSpeechRecognitionCompleted method implementation.

    privatevoid OnSpeechGrammarsReceived(object sender,SpeechGrammarsReceivedEventArgs e)

    {if (!e.IsErrored){

    // Use the response from the service. In our case the useful// data is in e.Grammars.// Each item is a string containing the name of a grammar.

    }

    else{

    // Display the error state.}

    }

    5. Next, you can create another instance ofSpeechRecognitionClient to trigger an asynchronouscall that does the actual speech to text processing.

    // The grammar parameter is optional. The default is "Dictation" grammar.

    SpeechRecognitionClient service = newSpeechRecognitionClient("stt.hawaii-services.net" , clientId, grammar);

    service.SpeechRecognitionCompleted += (s, e) =>this.Dispatcher.BeginInvoke(() => OnSpeechRecognitionCompleted(s, e));

    service.RecognizeSpeechAsync(audioBuffer);

    ...

    privatevoid OnSpeechRecognitionCompleted (object sender,SpeechRecognitionCompletedEventArgs e)

    {...

    }

    The audioBuffer parameter is a byte array containing the content of a PCM audio wave you

    want to process. In the SpeechRecognitionTestClient sample this is the content of an audio wave

    returned by the MicroPhone.GetData. When this statement is executed, a call to the Hawaii STT

    service is made. Since the call is performed asynchronously, the RecognizeSpeechAsync will

    return immediately. The execution of the client application will continue in parallel with the

  • 8/2/2019 How to Use the Speech to Text Service

    6/7

    Cloud Services SDK for WP7 1.0.7 Page 6

    execution of the asynchronous service call. At some point that call will complete and the on

    complete handler will be invoked.

    6. You also need to implement the event handler that is called when the asynchronous Hawaii STTservice call completes. This means implementing the content of the previously mentioned

    OnSpeechRecognitionCompletedmethod. Inside the on complete event handler you must do

    the following:

    a. Check whether the call completed successfully or if it had an error.b. On successful completion do the appropriate processing. This can be as simple as

    updating a list with the text options provided by the speech to text translation.

    c. In the case of an error take care of the error handling. This could be as simple asdisplaying an error message. The following code illustrates the

    OnSpeechRecognitionCompleted method implementation.

    privatevoid OnSpeechRecognitionCompleted(object sender,

    SpeechRecognitionCompletedEventArgs e){

    if (!e.IsErrored){

    // Use the response from the service. In this case the relevant// data is in e.RecognitionResults.// Each item is a string that represents one possible text translation.

    }else{

    // Display the error state.

    }}

    7. On successful completion the result of the STT process is provided by the e. RecognitionResultsproperty.

    // Gets or sets the list of recognized texts.List SpeechRecognitionCompletedEventArgs.RecognitionResults

    RecognitionResults is a list of 10 strings each representing a possible text string for the speech

    identified by the STT service. The strings are listed in descending order of their recognition

    confidence level with the first string having the highest confidence level. For more information

    on the classes and properties see the Cloud Services SDK for WP7 help file, Cloud Services SDK

    for WP7.chm, at ..\Microsoft Research\Cloud Services SDK for WP7\1.0.7\Documentation or

  • 8/2/2019 How to Use the Speech to Text Service

    7/7

    Cloud Services SDK for WP7 1.0.7 Page 7

    wherever you downloaded the SDK. The following class diagram shows the STT service results.

    Audio Tips and GuidelinesUse the following tips and guidelines for the STT service.

    Limit speech input to a maximum of 10 seconds. Up to 10 seconds of speech is supported by theSpeech to Text service. Audio streams longer than this will result in the error Null/Invalid

    response object from server.

    You may experience lower-quality results on Speech-to-Text services with the Dell Venue.

    ConclusionYour client application can now call the Hawaii STT service and your event handler will do the

    appropriate processing when the asynchronous Hawaii STT service call completes.