spoken dialogue systems sig-ai fall 2003 by: sachin kamboj

Spoken Dialogue Systems

SIG-AI Fall 2003

By: Sachin Kamboj

Spoken Dialogue Systems October 6, 2003

Slide 2

Outline

Introduction to Spoken Dialogue Systems (SDS)

Applications of SDS

Components of SDS

Classification of SDSOn the basis of dialogue control

On the basis of initiative

On the basis of the verification strategy

Dialogue Manager Components

Challenges in the Design of an SDS

Speech Recognition

Language Understanding

Dialogue Manager

Response Generation

Speech Synthesis

Domain Specific Components


Slide 3

Introduction

Any computer system that interacts with a human using natural language.

Computer systems with which humans interact on a turn-by-turn basis and in which spoken natural language plays an important part in the communication. [Fraser 1997]

Spoken Dialogue Systems provide an interface between the user and a computer-based application that permits spoken interaction with the application in a relatively natural manner. [McTear 2002]


Slide 4

Applications

Automated reservation systemsCU Communicator System

TOOT

Mercury Flight Reservation System

NL email interfacesELVIS (EmaiL Voice Interactive System)

MailSec

Planning & Problem Solving Systems TRIPS & TRAINS

Circuit-Fix-It Shop System

Virtual Immersive Worlds (Steve)

Automated Banking Systems (Naunce)

Multimodal Information Systems (MATCH)


Slide 5

Components

Dialogue ManagerDialogue Manager

Speech RecognizerSpeech Recognizer Text-to-Speech System

Text-to-Speech System

Response GeneratorResponse GeneratorLanguage Understanding


Domain SpecificComponents

Domain SpecificComponents


Slide 6

Speech Recognition

Involves the conversion of Spoken Sounds (user utterances) to Text (a string of words)

Requires knowledge of Phonetics and Phonology

Basic Idea:

Ŵ = argmaxw P(O/W) P(W)

Challenges:

Variability in speech signal due to the language, speaker and channel.

Handling continuous spontaneous speech.

Handling large vocabularies.

Providing a Speaker Independent Recognition System


Slide 7


Converts a sequence of words into a Semantic Representation that can be used by the Dialogue Manager.

Involves the use of Morphology, Syntax and Semantics.

Example:

I want to fly to California

want(speaker, fly(_x, California))

Need robust parsing mechanisms to account for errors in speech recognition and ungrammatical utterances.


Slide 8

Dialogue Manager

“Manages” all the aspects of the dialogue.

It takes a semantic representation of the user’s utterance, figures out how the utterance fits in the overall context and creates a semantic representation of the systems response.

Performs all of the following:

Interprets the user's utterance within the current context.

Deal with malformed or unrecognized utterances.

Create a user model.

Perform grounding so that the user and the system have a common set of beliefs.

Manage initiative and system responses.

Handle issues of pragmatics in generation.


Slide 9

Response Generation

Involves constructing the message that is to be spoken to the user.

Requires the making of decision regarding:

What information should be included.

How the information should be structured.

The form of the messageThe choice of words

The syntactic structure

Current systems use simple methods such as the insertion of retrieved data into predefined slots in a template.


Slide 10

Speech Generation

Translates the message constructed by the response generation component into spoken form.

Two approaches may be used:

Prerecorded canned speech may be used with spaces to be filled by retrieved or previously recorded samples.

You have fifteen new emails.

Text-to-speech synthesis

Also known as concatenative speech synthesis.

Text-to-phoneme conversion. (spēch, d ī’əlộg’)

Phoneme-to-speech conversion.


Slide 11

Domain Specific Components

The dialogue manager usually needs to interface with some external software such as a database or an expert system.

The query or plans thus have to be converted from the internal representation used by the dialogue manager to the format used by the external domain specific system (e.g. SQL or STRIPS style goals).

This interfacing is handled by the domain specific components.


Slide 12

Classification of SDS

Based on the method used to control the dialogue with the user:

Finite state (or graph) based systems

Frame based systems

Agent based systems

Type of initiative

User Initiative

System Initiative

Mixed Initiative

Type of verification

Explicit Verification

Implicit Verification


Slide 13

Finite State Based Systems

The user is taken through a dialogue consisting of a sequence of predetermined steps or stages.

The dialogue flow is specified as a set of dialogue states with transitions denoting various alternative paths through the dialogue graph.

Get Destination

Get Destination

Verify Destination

Verify Destination

Get TravelDay

Get TravelDay

System: What is your destination?User: LondonSystem: Was that London?User: YesSystem: What day do you want to travel?User: FridaySystem: Was that Sunday?User: NoSystem: What day do you want to travel?

Verify Travel Day

Verify Travel Day


Slide 14

Finite State Based System (2)

Advantages:

Simple to construct

The required vocabulary and grammar for each state can be specified in advance

Results in more constrained speech recognition and language understanding.

Disadvantages:

Inhibits the user’s ability to ask questions and take initiative.

Do not allow over-informative answers.

Dialogues are not actually natural.

Example: Nuance demo banking system.


Slide 15

Frame Based System

User is asked questions that enable the system to fill slots in a template in order to perform tasks.

Dialogue flow is not predetermined but depends on:

the contents of the user’s input

the information that the system has to elicit.

System: What is your destination?User: LondonSystem: What day do you want to travel?User: Friday

System: What is your destination?User: London on Friday, October 10 around 9 in the morning.System: I have the following connection…

Destination City: London

Departure Day: Friday

Departure Date: October 10

Departure Time: 09 am


Slide 16

Frame Based Systems (2)

Act like rule-based systems, taking a particular action based on the current state of affairs.

Questions and other prompts that the system can ask should be listed along with conditions that have to be true for that particular question.

Advantages:

User can provide over-informative answers.

Allows more natural dialogues.

Disadvantages:

Cannot handle complex dialogues.

Range of applications limited to systems that elicit information from users and act on the basis of the same.

Example: Philips train timetable information system


Slide 17

Agent Based Systems

Allow complex communication between the system, the user and the underlying application in order to solve some problem or task.

Many variations depending on the application.

User: I’m looking for a job in the Calais area. Are there any server?

System: No, there aren’t any employment servers for Calais. However, there is an employment server for Pas-de-Calais and an employment server for Lille. Are you interested in one of these?

User: What time does the bank open?System: 9 am but they only accept job applications at noon.

User: What time does the bank open?System: 9 am but the guards come around 8.


Slide 18

Agent Based Systems (2)

Communication is viewed as interaction between two agents, each of which is capable of reasoning about its own actions and beliefs.

The dialogue model takes the preceding context into account

The dialogue evolves dynamically as a sequence of related steps that build on top of each other.

Advantages:

Allow natural dialogue in complex domains.

Disadvantage:

Such agents are usually very complex.

Hard to build.


Slide 19

Dialogue Manager Components

Dialogue Model: contains information about:Whether the system or the user should take the initiativeWhether explicit or implicit confirmation should be usedThe kind of speech acts that needs to be generated.

User Model: contain the systems beliefs about:What the user knowsThe user's expertise, experience and ability to understand the system's utterances.

Knowledge Base: contains information about the world and the domain.

Discourse Context: contains the dialogue history and current discourse.

Reference Resolver: performs reference resolution and handles ellipsis.

Plan Recognizer and Grounding Module: Interprets the user's utterance given the current contextReasons about the user's goals and beliefs.

Domain Reasoner/Planner: generates plans to achieve the shared goals.

Discourse Manager: manages the flow of information between all of the above modules.


Slide 20

Challenges in the Design of an SDS

Recovery from errors

Understanding pragmatically ill-formed utterances

Design of system prompts

Reference resolution

Understanding inter-sentential ellipsis

Plan recognition

Detection of conflicts

Performing grounding

And many more…


Slide 21

Recovery From Errors

A SDS should be able to detect errors or misunderstandings and recover from them.

Errors may be of the following types:

Uncertainties – speech recognition o/p has a low confidence score.

Inconsistencies – utterance conflicts with domain model/prev utterances

Ambiguities – more than one interpretation of a sentence

Luperfoy proposes a recovery strategy based on the following four stage algorithm:

Detection

Diagnosis (Classification of the error)

Repair plan selection

Interactive plan execution


Slide 22

Pragmatically Ill-formed Utterances

Listeners assume their beliefs of the world match the speaker’s

Hence, listeners interpret the utterances with respect to their beliefs

However, the speakers views of the world may differ from those of the listener:

As a result, the speakers utterance may be syntactically and semantically correct – yet violate the pragmatic rules.

Pragmatically Ill-formed utterances are of two types:

Extensional failuresHow many women on the UD wrestling team are CIS majors?

Intensional failuresWhich apartments are for sale?

What advanced placement courses did BOB take in high school?

What is Dr. Smith’s home address?


Slide 23

Design of System Prompts

Prompt design is important for:

Natural flowing conversations

To overcome shortcomings in speech recognition technology

One of the most challenging aspects is implicitly letting the user know what they can say. By not knowing:

Users can go beyond the functionality of the system

Not utilize the system as fully as they could

Prompt design is related to initiative

This is AZ Banking. How may I help you?

This is AZ banking. Say ‘check balance’ to check your balance, ‘pay bill’ to pay a bill or ‘transfer funds’ to transfer funds…

Prompts should be more explicit in the case of recognition errors and less explicit as the user shows greater familiarity with the system.


Slide 24

Reference Resolution

Reference is the process by which speakers use expressions like he and it to refer to entities salient in the discourse.

Reference resolution is the process of determining the referent entity of a referring expression.

For example:

John went to Bill’s car dealership to check out an Acura Integra. He looked at it for about an hour.

Before he bought it, John checked over the Integra very carefully.


Slide 25

Inter-sentential Ellipsis

Is the use of a syntactically incomplete sentence fragment, along with the context in which the fragment occurs, to communicate a complete thought and accomplish a speech act.

Examples:I want to cash this check. Small bills only please.

Speaker 1: Who are the candidates for the consultants?

Speaker 2: Mary Smith, Bob Jones and Ann Doe.

Speaker 1: Tom’s recommendations?


Slide 26

References

Carberry, Sandra: “Plan Recognition in Natural Language Dialogue”, ACL-MIT Press Series on Natural Language Processing, MIT Press, 1990.


Slide 27

Questions?

spoken dialogue systems sig-ai fall 2003 by: sachin kamboj

Documents

spoken dialogue systemsoctober

dialogue flow

basis of dialogue control

spoken form

speech system text

spoken interaction

set of dialogue states

systems response