log analysis and tuning using analytics and tuning...

Log Analysis and Tuning Using Analytics and Tuning Studio

The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication.

This White Paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT.

Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation.

Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property.

Unless otherwise noted, the companies, organizations, products, domain names, e-mail addresses, logos, people, places, and events depicted in examples herein are fictitious. No association with any real company, organization, product, domain name, e-mail address, logo, person, place, or event is intended or should be inferred.

© 2007 Microsoft Corporation. All rights reserved.

Microsoft, SQL Server, Windows are trademarks of the Microsoft group of companies.

All other trademarks are property of their respective owners.

Log Analysis and Tuning Using Analytics and Tuning Studio 2

Table of Contents1 Overview: Log Analysis and Tuning.......................................................................................................4

1.1 Log Analysis and Tuning Scenarios..............................................................................................4

1.2 Trace Logging Configuration........................................................................................................4

1.2.1 Logging Grammars..........................................................................................................5

1.3 Log Analysis Framework..............................................................................................................6

1.4 Introducing Analytics and Tuning Studio.....................................................................................7

2 Developing Applications for Easier Log Analysis...................................................................................8

2.1 Define Tasks and Set Task Status.................................................................................................8

2.2 Label Sessions for Later Reporting............................................................................................11

2.3 Skip Logging for Confidential Data.............................................................................................11

2.4 Use Application Log Events for Debugging................................................................................13

2.5 Follow Naming Guidelines.........................................................................................................14

3 Log Analysis and Tuning: Analytics and Tuning Studio........................................................................14

3.1 Reducing No-Recognitions Due to Out-of-Grammar Utterances...............................................14

3.1.1 Measuring Improvement with a New Grammar...........................................................16

3.2 Session Analysis.........................................................................................................................18

3.2.1 Find Repeat Callers.......................................................................................................20

3.2.2 Errors Affecting User Calls............................................................................................20

3.2.3 Locating Calls with a Custom Label...............................................................................21

3.3 Task Analysis..............................................................................................................................21

3.3.1 Find the Most Popular Task and Task Flow...................................................................23

3.3.2 Find Tasks on Which User Has Most Trouble................................................................25

3.4 Turn Analysis.............................................................................................................................28

3.4.1 Finding Recognition Issues............................................................................................28

3.4.2 Finding the Most Popular Response to a Turn..............................................................30

3.4.3 Turns with Unexpected TTS..........................................................................................31

4 Best Practices for Speech Application Tuning.....................................................................................31


1 Overview: Log Analysis and TuningThis section provides an overview of analysis and tuning scenarios and summarizes the Microsoft Office Communications Server 2007 Speech Server trace logging configurations and Analytics and Tuning Studio.

1.1 Log Analysis and Tuning ScenariosLog analysis and tuning is a part of the speech application development cycle. Successful voice response applications in deployment are those that have been tuned on samples that are representative of the general caller population and the actions of users who are behaving as real end users will.

For example, during the design phase developers can try to predict the full range of input that users are likely to speak in response to a given prompt. After collecting data from a user trial, developers can examine what users really said to the system and update the grammars accordingly.

Predeployment TrialsMany developers roll out their applications in gradual phases to ensure the highest possible quality when it is time for full deployment. After design, debugging, and initial installation are complete, a phased deployment model such as the following is typically used:

1. Initial pilot. The phone number of the system is given to colleagues, friends, and family with instructions to try out some tasks.

2. Trial phase. A broader set of trial users is selected (either a subset of the expected caller set or a more extensive range of acquaintances) and more data is gathered.

3. Further trials. One or more trials can be carried out with a broader or different set of users.

4. Full rollout. This is followed by continual monitoring. Trace Log Configuration should be set to Default during the full rollout with periodic verbose logging (see Trace Logging Configuration in this document) during the monitoring period.

After each phase, different parts of the application can be tuned based on the user behavior observed, explicit user feedback about the system, or both.

What Should Be Analyzed and Tuned?Log analysis offers the developer firsthand experience of users’ interactions with the system. All user-facing components of the application can be tuned to improve the system. Not only can big problems be uncovered, such as bugs that did not surface during the developer's own testing, but considerable improvements can be made to the user experience:

Grammars can be tuned to cover the most likely input.

Prompts can be made clearer.

Dialog flow can be tuned toward more efficient task completion.

Confidence thresholds can be optimized to minimize unnecessary confirmations.

Timeouts can be optimized for efficient collection of user input.

Commands can be enabled where users naturally expect them (and removed where they do not).

1.2 Trace Logging ConfigurationSpeech Server logs data in the form of binary ETL files. Default logging configuration is set to a minimal set of events that are needed for generation of the reports provided with Speech Server.


Logging Tuning Events and Audio for a Tuning CycleUse the following steps to enable logging for Speech Server log analysis:

1. In the Speech Server Administrator console, right-click the server name.

2. Click Properties.

3. Click the Trace Logging tab.

4. Make sure the Enable trace logging check box is selected.

5. Select the Analytics and tuning events and Application events check boxes.

6. Select the All audio for check box, and then enter 100 in the percent of calls box.

Note: Including sample audio can make ETL files larger. Whether you sample audio depending on the available disk space.

7. Click OK to close the properties window.

1.2.1 Logging GrammarsGrammars are automatically logged at load time when logging of Analytics and tuning events is enabled. Speech Server loads grammars at two instances. At startup time, the grammars are loaded if they are specified in the application manifest file. Also, grammars are loaded at runtime when required. Grammar Tuning Advisor can use logged grammars and recognition audio from the ETL files. Logged grammars are needed if the original grammars are not available. If the service was started before the changes were made for the logging tuning events, it might be necessary to refresh the cache. Refreshing the cache ensures that the grammars are logged correctly in the ETL files.


1. In the Speech Server Administrator console, right-click the server name.

2. Click All Tasks, and then click Update Cached Resources.

1.3 Log Analysis Framework

Importing Log File DataTo view log data using Analytics and Tuning Studio, it is necessary to import it into a SQL Server database. The import process is executed using the command-line utility MSSLogToDatabase.


By default, MSSLogToDatabase imports only the user input audio. During the Tuning cycle, to import the complete session audio, the /audio:session parameter must be passed in the command line.

When log data for only the reports is required, import the log data by passing /filter:reports parameter in the command line.

1.4 Introducing Analytics and Tuning StudioAnalytics and Tuning Studio provides an easy way to see log information sorted into different views based on the granularity of information. It has three levels of granularity: session, task, and turn. Typically, a session contains one or more tasks and a task contains one or more turns. Session. A session is generally equivalent to a call.

Tasks. A task is a focused dialog in the call whose aim is to get a specific set of information to achieve a goal. In a voice response workflow application, it is analogous to a SpeechSequenceActivity as defined by the developer. A session can contain multiple tasks. A task can contain multiple subtasks.

Turns. A turn is usually associated with a prompt or with a prompt and response. A task or session can contain multiple turns.

Analytics and Tuning Studio provides three levels of detail for each session, task, or turn view: report, list, and detail.

Report. Gives a quick reference to the most important details for the item in the form of colored graphs and charts.

List. Gives one line of data per session, task, or turn.

Detail. Gives detailed information for each session, task, and turn and any tasks or turns within them.


2 Developing Applications for Easier Log AnalysisMaking the correct decisions at application development time ensures that tuning the application from the logs is much easier. The following are suggestions a developer can use during application development.

2.1 Define Tasks and Set Task StatusDefining tasks during voice response application development is useful when tracking business metrics such as transaction completion rates and for gauging the usability of parts of the application. The developer can use tasks to measure the characteristics (for example, the average duration and number of repetitions) for different dialogs paths in the application.

The use of tasks allows the developer to:

Structure the application into a hierarchy of tasks and subtasks.

Provide a means to signal the success or failure of the task.

The Task Report view in Analytics and Tuning Studio provides a snapshot of the performance of the various tasks defined in the application. The Task List view and Task Detail view can be used for detailed analysis.

In Voice Response Workflow Applications SpeechSequenceActivity can be used to define a task. Task logging is automatically done for

this activity. The default status associated with a task is Unset.

Some of the activities, such as FormFillingDialog and GetAndConfirm, have task logging enabled by default. If task logging on these activities is not required for business metrics or performance tuning, the developer can turn off task logging for these activities. This allows the developer to better focus task analysis on tasks important for tracking of business metrics.

SetTaskStatusActivity is used to set the status of a task on completion. Use this activity to also associate a message with the status. This message text can be used by the Task Reason report in Analytics and Tuning Studio to track both the reason of failures or success of an individual task as well as the number of tasks associated with a particular message. SetTaskStatusActivity is critical to tracking the success or failure of a task. The idea is to make sure that every possible exit out of a task sets the task status, unless you explicitly want it Unset.

Change the task status based on execution.

Do this when the default task status is set to Failure and the task completion changes the status to Success. This can be used in cases where the task has a bailout and hence bailout leaves the task status as Failure. For example, a bailout is set to occur after three consecutive no inputs for a task. If the user does not provide any input three times in a


row, execution exits the task. In this case, the status can start with the status set to Failure, and then if the bailout executes, the task status is appropriately recorded as failure.

The previous task sequence has a bailout set as shown by the icon on the top right corner. The task status is set to Failure to start. If the user provides the information, the task status is changed to Success. If the user bails out in between, the task status is recorded as Failure by default.

Set the task status appropriately for all the exit paths for a given task.


In the previous example, based on confirmation of the user input, two SetTaskStatusActivities are being used in the branches to set the task status accordingly with SetUserInputSucceeded and SetUserInputFailed.

In VXML ApplicationVoiceXML does not implement tasks with a completion state. However, with the Speech Server implementation of VoiceXML, developers can specify the task states in code. Accordingly, authors should signal the start and end of a task using the standard VoiceXML logging mechanism. See the following examples.

1. Signaling the Start of Task


Syntax<log label=”__EnterTask”>TaskName</log>

Example<log label=”__EnterTask”>CollectUserInput</log>

2. Signaling the Completion of Task Syntax<log label=”__ExitTask”>

TaskName:TaskStatus:StatusMessage </log>

Example<log label=”__ExitTask”>

CollectUserInput:Success:User Completed Task </log>

In the previous examples, TaskName is a string that names the task, for example GetDate; TaskStatus is a string that should be set as Success, Failure, or Unset; and StatusMessage is a string that defines a reason for the completion status, for example User Hang-up.

Refer to the product help documentation Topic Design VoiceXML Applications for Easy Reporting and Tuning.

2.2 Label Sessions for Later ReportingA given application might want to label or otherwise annotate a call at runtime to reflect the particular analytical need at the time of execution depending on the characterization needed for the business. For example, a distributor might want to classify its customers into different types, such as Gold and Platinum. This type of labeling, which provides information about the caller, helps give more specific insights into an application’s performance and user experience.

Use the LogSessionLabel method to custom label a call at runtime. See the following example.

ApplicationHost.TelephonySession.LoggingManager.LogSessionLabel("CustomerSubscriptionPlan", “Gold”);

Default trace logging configuration is sufficient to log the trace event in the ETL file.

The calls can then be queried in Analytics and Tuning Studio based on filtering with the custom labels defined in the application and meeting a certain developer-defined criteria. Analytics and Tuning Studio provides a Session Label filter to accomplish this (see Locating Calls with a Custom Label in this document).

The custom label data exists in the UserEvents and UserEventTypes tables of the database.


2.3 Skip Logging for Confidential DataSome applications are required to ask for sensitive information such as a social security number or credit card number. This information must be kept completely confidential. It might be a security or privacy violation to record it in the logs. The developer can turn off logging for the dialog elements that collect confidential data. Disabling the logging for dialog elements that collect confidential data addresses privacy concerns in log data. When logging is disabled, prompt text, recognized text, and the associated audio are not stored in the logs. Of course, data that is not logged is also not available for tuning. See the following screen capture from the Turn List view, which shows how the absence of confidential data is indicated in Analytics and Tuning Studio.

In Voice Response Workflow Applications

Also, be sure to turn off logging for any turns that repeat confidential information, such as when there is a statement or separate confirmation question. For higher level controls, such as GetAndConfirm and NavigableList, turning off logging for the parent activity is sufficient to turn off


the logging for the confirm phase inside these activities. This precaution avoids any accidental data disclosure.

In VXML ApplicationsStandard VoiceXML does not support controlling the logging of turn data to protect privacy, but this feature is available for VoiceXML applications on Speech Server. This is platform-specific functionality, added by Speech Server through a proprietary <property>. See the general form in the following example.

<property name="com.microsoft.[component].LogResults" value="false" />

In the previous example, [component] is a reco, dtmf, or prompt.

2.4 Use Application Log Events for DebuggingAnother important tracking facility is the use of application events. Application events can be used in exception handlers and application code to track application activity. It can be used for application diagnostics and call flow tracing purposes. This provides an ability to log more atomic information for application-specific data.

In Voice Response Workflow ApplicationsUse the four methods described in the following table.

Method Description

LogApplicationData This method logs an ApplicationDataEvent and can be used to log any useful data from the application.

LogApplicationError This method logs an ApplicationErrorEvent and must be used in case of any error that the application encounters.

LogApplicationWarning This method logs an ApplicationWarningEvent and can be used for logging information in cases of unexpected but non-fatal results.

LogApplicationInformation This method logs an ApplicationInformationEvent and can be used for information logging. Logging configuration needs to be modified (Application Events check box should be selected) to log application information events.

See the following code example.


In VXML ApplicationsUse the VoiceXML <log> tag to log data that is important at analysis time. Speech Server uses the ApplicationDataEvent class to log this into the trace logs. However, <log> tags that specify __EnterTask or __ExitTask are treated as task events and are not mapped to ApplicationDataEvent.

2.5 Follow Naming GuidelinesIt is important to appropriately name the various application activities (tasks and turns). Appropriate and consistent naming makes it easier to locate the necessary tasks or turns when analyzing data in Analytics and Tuning Studio.

One suggestion is in the product documentation. See the “Workflow Activity Naming Guidelines” topic.

3 Log Analysis and Tuning: Analytics and Tuning Studio

3.1 Reducing No-Recognitions Due to Out-of-Grammar UtterancesGrammar Tuning Advisor provides an easy way to find out-of-grammar utterances. When using Grammar Tuning Advisor, it is most helpful to run it on a single turn at a time. This ensures that the grammars associated with the turn (an application dialog element) are used to find out-of-grammar utterances. Executing Grammar Tuning Advisor on every turn in a single execution can cause the turns with lower confidence to be recognized fine with one of the other grammars in the system. This decreases the chance of finding useful clustering information.

Executing Grammar Tuning Advisor on every turn in a single pass can be slow. Trimming the input to only one turn helps, but in a case where there are thousands of calls, this might also be slow. The solution to this is to run Grammar Tuning Advisor on only the turns with low confidence scores. Note that running on just the no-recognitions is not enough; there must be a larger sample to achieve good results. One suggestion is to use a confidence score of 0.75 as a threshold for this purpose. You can also investigate using a different confidence score to find the


optimal score for a different data set. In one case, the threshold of 0.75 gave identical results when compared with the results obtained on execution on all the instances of a turn.

You can add a Custom Turn SQL filter in the Turn List view and then execute the query to refresh the data based on the confidence score threshold.

Custom turn SQL: Confidence<0.75

Grammar Tuning Advisor can be started from the Analytics and Tuning Studio toolbar.

1. In the Grammar Tuning Advisor Settings dialog box, pick a specific turn name, and then make sure the check box next to the original grammar for that turn is selected. Original grammars are detected if the grammar has been logged into the ETL files and imported to the database. For more information about logging grammars, see the Trace Logging Configuration section earlier in this document.

Select only the grammars in the Original grammars in database list that are associated with the selected turn. The grammars that are not associated with the turn should not be selected.


This helps Grammar Tuning Advisor to improve the accuracy of the estimated out-of-grammar results.

2. Click OK and wait until the Display Grammar Tuning Advisor Results link appears in the Results pane.

3. Click the Display Grammar Tuning Advisor Results link.

4. Evaluate the information in the Estimated out-of-grammar input column.

Each row in the Estimated out-of-grammar input pane represents a cluster of similar results. The clusters are labeled with a phrase that best represents the utterance. An asterisk in the label represents a wildcard. The clusters are listed in order of frequency.

5. Examine the clusters to find out-of-grammar utterances to add to the application grammar. Click one of the possible input utterance to start an audio player to verify the input and the results. Add the suggested input words to the application grammar using the grammar editing tool.

3.1.1 Measuring Improvement with a New GrammarThis is an objective procedure to find the relative increase in recognition success after editing the application grammar by adding the suggested input words of Grammar Tuning Advisor. Prior to deployment, it is important to validate the grammar changes. The Re-recognition tool shows what is recognized with both the original and new grammars, as well as the relative success of the new grammar compared to the original grammar or the human transcription.

1. Enter transcriptions when listening to the user input audio. This is an optional step but can be useful for comparison when comparing with the results of the Re-recognizer.

a. Switch to the Turn List view in Analytics and Tuning Studio.

b. Choose a turn name and Filter by the turn name.


c. Highlight one turn in the results pane.

d. Click the Play Recognized Audio button on the toolbar to listen to the audio.

e. Enter transcription in the Transcription column of the Turn List view.

f. Repeat steps c, d, and e.

2. Modify the application grammar based on the Grammar Tuning Advisor suggestions for out-of-grammar utterances or based on the transcriptions.

3. Start Re-recognizer on the toolbar on that turn.

4. Select the turn that needs to be verified using the Re-recognizer. Make sure the check box next to the original grammar for that turn is selected, and then click the File button to add the new grammars by selecting an active rule in the Rule Browser.

5. Click OK to start the run.


6. After execution completes, click the Display Re-Recognition Results Comparison link.

7. Review the results in the Re-Recognition Comparison view to evaluate the differences in the recognition results, presumably due to the changes you made to grammars.

The amount of improvement can be judged by the Semantics columns. Look at the new re-recognition cycle Semantics column to see which utterances are the most and least accurate. Also, look for any utterances that had been recognized or partially recognized correctly with the old grammar and that are no longer recognized or have lower accuracy now.

8. See the Reco Accuracy Reports view for a snapshot of improvement or degradation due to the change made in the new grammar.

3.2 Session AnalysisThe Session Reports view is useful to gather general characteristics of a user call. It provides a snapshot showing the:

Session ending status. Occurrences of session ending status, such as User Hang-up, can suggest users disconnecting a call in the middle due to a frustrating experience. In outbound calls, the session ending status can reflect the number of sessions encountering a gateway error or the end user being unreachable.

Turn counts and session duration. A high number of turn counts or a high session duration can suggest calls where users got stuck. A combination of high session duration or high turn count and co-relating with the calls with user hang-ups would be the calls that need to be investigated.

For example, define a value that you do not expect to be a typical value for session duration or turn count. One suggestion is to select values greater than or equal to the average + standard deviation.

A Custom Session SQL filter can be used to find such calls.


Custom session SQL:

SessionDuration> (SELECT AVG (SessionDuration) FEOM SessionInfo) + (SELECT STDEV (SessionDuration) FROM SessionInfo)

To navigate to the Session List view and Session Detail view from the Session Reports view, click the section of the sessions of interest in a particular report.

The Session List view and Session Detail view are useful for detailed analysis of a call. If All Audio traces logging (see Trace Logging Configuration in this document) is enabled and the session audio from the ETL files imported to the database (see Log Analysis Framework in this document), the complete audio of the Call can be played back using the Play Session Audio button on the Analytics and Tuning Studio toolbar.


3.2.1 Find Repeat CallersTo track users who called back within a certain period of time, you can use following Custom Session SQL filter to define a period of time within which the callback occurs. In the following example, the period of time is set to 3 days.

Add the following to the Custom Session SQL filter in Analytics and Tuning Studio.

Custom session SQL:

SessionInfo.SessionInstanceIdIN (SELECT B.SessionInstanceIdFROM SessionInfo AS A INNER JOIN SessionInfo AS BON A.SourceDeviceNumber = B.SourceDeviceNumber -- The SourceDeviceNumbers must matchAND A.ApplicationID = B.ApplicationID -- The sessions must be for the same applicationAND A.SessionInstanceId <> B.SessionInstanceID -- Session A is a different session to session BAND A.TimeStamp <= B.TimeStamp -- Session A must have occurred before session BAND (B.TimeStamp - A.TimeStamp) < 3 -- Session A must have occurred within 3 days before B)

3.2.2 Errors Affecting User CallsApplication errors or Speech Server exceptions can terminate a call in the middle of a transaction. It is important to eliminate errors that lead to termination of a call or a bad user experience.

The Error Reports view provides a snapshot of the errors in the user sessions.

To locate the sessions with errors in the Session views

1. Using a Custom Session SQL filter

The following query can be used to filter on sessions that are affected by the ErrorsMessageEvent.

Custom session SQL:

SessionInfo.SessionInstanceId IN (SELECT SessionErrorCounts.InstanceId FROM SessionErrorCounts INNER JOIN EventTypes ON EventTypes.EventTypeId=SessionErrorCounts.EventTypeId WHERE EventTypes.EventTypeName='ErrorMessageEvent')


You can replace ErrorMessageEvent from the previous query with ApplicationErrorEvent or other error events that needs to be tracked. A list of events that can be logged by Speech Server can be found in the Event Types table in the database.

In SQL Management Studio, you can execute the following query to determine the distinct event types that are logged.

SELECT DISTINCT(EventTypeName) FROM EventTypes INNER JOIN Messages ON EventTypes.EventTypeId = Messages.EventTypeId

2. In the Session List view , sort the data based on the Errors column and investigate the sessions with errors by navigating to the Session Detail view after selecting the sessions with errors.

3.2.3 Locating Calls with a Custom LabelLocating calls with predefined custom labels in an application (see Label Sessions for Later Reporting in this document) can be achieved using the Session Label filter in Analytics and Tuning Studio.

Custom labeling a call can be used to determine call characteristics depending on the label defined to aggregate calls according to business requirements.

In this example, the application is using the custom label AccountBalance to track the user call flow through the application to mark sessions where the user chose to request an account balance after a transaction.

3.3 Task AnalysisTask analysis is an important part of tuning an application. Tasks provide more details about specific parts of the application and about dialog flow. Bottlenecks can be identified and business metrics can be tracked. Defining proper and useful tasks is an important part of the application development process. See the Define Tasks and Set Task Status section of this document.


The Task Reports view provides a high-level overview of the different tasks defined in the application. Business metrics can be tracked by using the Task Reports view to provide a breakdown of the number of tasks that were successful or resulted in a failure.

The Task Reason report associates the message defined by the developer with a success or failure and the respective totals.

The Task Shape report is useful for tracking the performance of different parts (or subsections) of the application and the dialog flow. A high average duration can suggest bottlenecks in the application. Performance analysis should be performed to determine whether there are any time-consuming operations occurring. A high turn count along with task repetitions suggests usability issues with the application. Analysis of the turns contained in the task is suggested to determine whether users faced problems with certain turns (such as QuestionAnswer activities).

For example, define a value that you do not expect to be typical value for task duration or turn count. One suggestion is to select values greater than or equal to the average + standard deviation.

A Custom task SQL filter can be used to find calls with a high turn count for a particular task. In the following example, all the turns contained in the task ‘ticketChooser.ticketChooserQuestions’ is used irrespective of the fact that the turn might be contained in a subtask of the ‘ticketChooser.ticketChooserQuestions’ task. Thus, the field TotalCount is used from the TaskInfo view of the database.

Custom task SQL for a particular Task Name:

TotalCount >= (SELECT AVG(TotalCount) FROM TaskInfo WHERE TaskName='<Task Name>') + (SELECT STDEV(TotalCount) FROM TaskInfo WHERE TaskName='<Task Name>')

Replace <Task Name> in the previous query with an appropriate task name in your application.

To navigate from the Task Reports view to the Task List view or Task Detail view, click a particular task and status.


The Task List view and Task Detail view can be used for detailed analysis of the tasks and the turns associated with them.

When performing turn analysis for a given task (see Turn Analysis in this document), switch to the Turn List view and select the task name from the Task Name filter.

3.3.1 Find the Most Popular Task and Task FlowLocating the most popular task or a task flow in an application is important for the following reasons:

Business metrics can be tracked by analyzing the task flow and new business opportunities might surface based on the behavior of application users.

Analyzing the task flow and the most popular task from the user perspective can be useful to determine whether there is a need to refactor the application to move the most popular tasks higher in the application session hierarchy (for example, move the popular tasks to the main menu or the application’s entry point).

Optimizing the performance of the most popular task and the task flow and analyzing the popular tasks for user issues (see Find Tasks on Which User Has Most Trouble for troubleshooting tips) yields an immediate improvement in user experience for the majority of the application users.

To Locate the Most Popular Task FlowUse the Task Flow report in the Session Reports view.


For illustration purposes, in this table it might be possible to infer that most users are interested in Buying a Ticket for a Concert rather than their Account Status.

To Find the Most Popular TaskOne way to make calls more efficient is to move the more popular tasks closer to the main menu in the call structure or make popular tasks easily accessible to the caller. Analyzing the most popular task and optimizing the performance or removing any perceived negative user experience can yield a higher percentage of better end-user experience and successful task completions.

Analyzing the Task Volume report in the Task Reports view would help in finding the tasks with the highest number of occurrences. This probably is the most popular task. However, care must be taken because it is possible that some occurrences are connected to unsuccessful earlier attempts or occurred during the normal execution flow when the user re-entered the task for another transaction.

The most popular task can also be found by analyzing the Task Shape report. The minimum repetition of a task is one for a call.

If it is necessary to eliminate the repetitions:

1. Find the number of sessions containing a particular task:

a. Switch to the Session Reports view.

b. Add the following Custom Session SQL filter, and then execute the query to get the count of the sessions containing a particular task name.


Custom session SQL:

SessionInfo.SessionInstanceId IN (SELECT SessionInfo.SessionInstanceId FROM SessionInfo INNER JOIN TaskInfo ON TaskInfo.SessionInstanceId=SessionInfo.SessionInstanceId WHERE TaskInfo.Taskname='<Task Name>')

2. Repeat step 1 for all the tasks that have Average Repetition > 1 in the Task Shape report.

3. Switch to the Task Reports view, clear the Custom session SQL check box, and then execute the query.

4. The task volume for a particular task can be calculated as:

True Task Volume = Task Count in Task Shape Report – (Average Repetition -1)*(Count of Sessions).

3.3.2 Find Tasks on Which User Has Most TroubleThere are several ways to see whether many users are having trouble with certain tasks. Low success count for a given task indicates that the user might be having the most trouble completing the task.

Using the Task Reports View1. Open the Task Reports view.

2. On the Task Ending report, click a segment to open the context menu for navigation.


3. Click the Failure or Unset section, and then select the Task Reports view to perform a focused analysis.

4. If a message has been set during application development for a TaskStatusActivity, the Task Reason report can be used to find the reason of a failure.

In this example, it seems that a transfer to the operator failed.


5. You can drill down in the Task List view and the Task Detail view for further analysis. Click the segment of interest to open the context menu for navigation.

Using the Turn ViewAnother approach is use the Turn Reports view and Turn List view. Additionally, see to the Turn Analysis section of this document.

1. Select a task from the Task Reports view that has a low success count.

2. Switch to the Turn Reports view, select the particular task in the Task Name filter, and then execute the query.

3. In the Turn Reports view, analyze the Turn Disposition report to get the counts for issues related to silence, errors, and hang-ups.

Note the turns with a high count of invalid speech or DTMF, silence, errors, and hang-ups.

4. Switch to the Turn List view for analysis of the reasons of failures.


Recognizer and prompt audio playback can be used for complete the analysis.

3.4 Turn AnalysisHaving identified the sessions or tasks containing the problematic events, the developer can then look at the prompts, listen to the audio, and determine exactly where the problem lies using the Turn views of Analytics and Tuning Studio.

3.4.1 Finding Recognition IssuesUse a combination of the Turn Reports view and Turn List view to analyze recognition issues and general problems.

Using the Turn Disposition report, note the turns that have a high count of invalid speech or DTMF input, silence, hang-up, and errors associated with them. The turns should be analyzed to find recognition issues. After noting the turns, switch to the Turn List view for further analysis.

1. Audio playback can be used to determine whether the recorded speech is faint and inaudible to the human listener or there is a lot of background noise. Background noise might be caused by the specific user’s environment. Verify that the issue is not replicated with a majority of the users.

2. Out-of-grammar input can be found and analyzed using Grammar Tuning Advisor (see Reducing No-Recognitions due to Out-of-Grammar Utterances in this document).

3. If a phrase is clearly audible and "in grammar," yet the turn still registers no recognition or low-confidence recognition, the developer should check the following:

Pronunciation. The developer should check to see whether a word cannot be in the Recognizer's default dictionary (for example, the name of a person, place, or company). If a word is not in the dictionary, the Recognizer might be using its own pronunciation that does not match the pronunciation of some callers. To fix this, the developer can add a custom pronunciation to the grammar item in question. Use custom application lexicon (.cal) files to customize recognized pronunciations in


speech applications running on Speech Server. Lexicons are supported by Speech Grammar Editor and Conversational Grammar Builder in the application development tools.

Phonetic confusability. The developer should check to see whether there is a similar-sounding word or phrase within any of the grammars used at the problem point in the dialog ("replay" and "reply" in a unified messaging application, for example). If this is the case, there might be increased potential for misrecognitions and lower confidence of the similar sounding words. If possible, the developer should remove one of the phrases from the grammar, or replace or extend the phrase with one that has a similar meaning (such as "play message again" instead of "replay"). Any prompts that encourage the use of that word should be amended to remove the phrase or encourage the new one (for example, "You can also play the message again"). Use Speech Grammar Editor and Conversational Grammar Builder, which are part of the application development tools.

Confidence thresholds. If on several occasions the Recognizer is recognizing words correctly but rejecting them or attempting to confirm them unnecessarily, the confidence threshold might be set too high. Audio playback of recognitions that were rejected can help determine a confidence threshold. A Custom Turn SQL filter can be used to filter the rejected recognitions for a selected turn.

Custom turn SQL: SpeechCompletionState=’Rejected’

Also, use the Re-recognition tool with a confidence threshold to automatically determine whether changing the confidence threshold is beneficial when comparing with the original recognitions.

General problems. Sometimes users can be confused by prompts that do not provide a clear idea of how to respond or are otherwise unclear about what state the dialog is in. It is usually a good idea to use the prompt to set clear expectations about what users can say (for example, "Please give me your phone number, beginning with the area code," or "Please say a city name, or 'help' for more instructions"). Clear prompts that encourage constrained user responses are essential to successful speech applications.

A prompt with many barge-ins could mean that either the prompt is too long or is confusing. In the Turn Shape report, the barge-in counts associated with a turn help determine whether there is a need to rephrase the prompt. There is also a possibility of false barge-ins such as the system recognizing its own prompt as user input as a result of an echo. Use audio playback to narrow down this issue.


Dialog problems. In some cases, users have expectations about the dialog and what is possible to do at any point. If these expectations are not met by the grammars and the possible dialog flow, misrecognitions occur, users might be taken down paths of the dialog that they do not want to follow, or both. If the logs show a large number of tasks that do not end in successful completion or show turns with greater numbers of Help or Cancel commands, then confusing dialog flow might be the problem.

The Speech Input report can be used to determine whether the user is having problems. A high count of commands for a specific turn can suggest that the dialog needs to be simplified to help the user to a successful completion.

4. Confirmation Turn Analysis

Confirmation is a very important aspect in a speech application. It is very likely that important turns in the application are followed by a confirmation turn. Higher level activities such as GetAndConfirm and NavigableList have built-in confirmation turn. Confirmation is triggered based on the confirmation threshold. Any recognition whose recognition confidence is below this threshold triggers a confirmation. For this reason, it is important to have the confirmation threshold at its optimum value.

For general turns that are followed by another confirmation turn, analyze the confirmation turn to verify that there is a high count of negative answers from the users like “No” (see Finding the Most Popular Response to a Turn). If this is the case, further analysis of the turn preceding the confirmation should be done as outlined in Finding Recognition Issues.

Higher level activities such as GetAndConfirm and NavigableList have built-in confirmation turns. If there is a high count of confirmation turn that has a positive user responses like “Yes,” the turn that gathers the user input needs to be analyzed in the Turn List view for the confidence values associated with the speech input. Doing so sets an appropriate confirmation threshold so as not to reprompt the user for confirmation even in case of high confidence value.

3.4.2 Finding the Most Popular Response to a TurnAnother useful indicator is the most popular response to a turn. In cases where there is a long list of options in the prompt (for example, a Menu being prompted to the user), the most used options should be at the beginning of the list to maximize total call efficiency.

1. Go to the Turn List view.

2. In the left panel, select the Turn name check box, and then select the turn name in the Turn Name filter.

3. To find the rows with a particular recognition, add a Custom turn SQL filter.

Custom turn SQL: SpeechResult=’<Recognized Text>’


Replace <Recognized Text> with the recognition whose count you are interested in.

4. Execute the query.

5. The number of turns with this recognition is displayed as Rows n in the Visual Studio status bar at the bottom.

The steps need to be repeated for the possible user input choices.

3.4.3 Turns with Unexpected TTSUnexpected TTS can occur if a turn has an associated recorded prompt in the prompt database but a TTS is used to play back the prompt to the user. This indicates an incomplete coverage of the recorded prompts in the prompt database. The developer would need to add the corrected recorded prompts to the prompt database using the prompt editor application authoring tool.

Error Reports is the starting point to find whether there are unexpected TTS played to the user.

If there are sessions containing unexpected TTS, switch to the Turn List view and add a Custom turn SQL filter.

Custom turn SQL: PromptIsTTS=1

The query results shows turns with prompts played using TTS instead of using the recorded prompts in the prompt database. Prompt audio playback can be used by selecting a turn rows and listening to the prompt audio.

4 Best Practices for Speech Application TuningWhen conducted on a regular basis, log analysis can provide significant insights into caller behavior, and tuning on the basis of such analysis can provide considerable enhancement of the user experience. Use the following guidelines to get the most out of analysis and tuning:

Phase the rollout to end users. Many big problems with the user experience are discovered during the initial phases of a deployment. It is important to minimize the number of real users affected by these problems. For an example of a rollout plan, see the predevelopment trials in


the Log Analysis and Tuning Scenarios section. This methodology should also be followed as an application evolves to take on new services or when big changes are made to the dialog flow or other major parts of the user interface.

Ensure the quality and quantity of the data collected. The more data that is gathered, the more likely it is that big problems will surface and that the data on which the analysis is conducted is representative of the end-user population. It is also important to distinguish between motivated end users (those who attempt to carry out real tasks) and curious, or test, users (those who have less interest in carrying out real tasks) when applying updates. It is more useful to give test users a range of specific tasks to carry out, rather than for them to perform blind testing.

Prioritize the components to analyze and tune. In general, solving higher level problems achieves much greater gains than tweaking lower level components. High-level problems include confusing prompts, poor-coverage grammars, flawed dialog flow, and in some cases incomplete pronunciation lexicons and inappropriate confidence thresholds. Lower level issues include the speech recognition engine itself, certain configuration parameters, and tweaking the prompt database.

Be sure that each update implements an improvement to the system. In some cases, changes made on the basis of data collection can make the system worse. For example, as noted earlier, broadening the coverage of a grammar to accept different ways of saying a particular command or item can increase confusion with other phrases in the grammar and thereby introduce new recognition errors where there were none before. Use Grammar Tuning Advisor for suggestions and measure the improvements using the Re-recognition tool (see Reducing No-Recognitions Due to Out-of-Grammar Utterances). If running offline tests on previous data is impossible, the developer should consider applying tuning updates (only if the same input occurs from multiple users as opposed to just a single occurrence or user), small trials with the update in place to check that performance is no worse, or both.

Problems encountered by a single user or during a single call might not be representative of the typical user experience. In general, a problem should be observed by several callers before a change is made, because fixes based on one caller can make the experience worse for typical users. For example, enabling a rich set of actions that users can take at every point of the dialog, but that are only rarely (or never) used by callers, can increase grammar confusion and reduce overall task completion rates. Such rarely used options should be eliminated.

Consider using transcriptions of utterances. It might be more convenient and efficient for the application developer investigating misrecognitions to use written transcriptions of what the callers said. Transcriptions can be added in the Turn List view.

Monitor logs regularly. As the deployment proceeds, continuous monitoring of the logs is necessary to ensure that all the components of the application (prompts, grammars, and confidence thresholds, for example) are performing well. Any changes to the application (for example, the addition or removal of services or a change in dialog flow) should be accompanied by a critical examination of the log data.


log analysis and tuning using analytics and tuning...

Documents