spoken dialogue systems prof. alexandros potamianos dept. of electrical & computer engineering...
Post on 19-Dec-2015
216 views
TRANSCRIPT
Spoken Dialogue Systems
Prof. Alexandros Potamianos
Dept. of Electrical & Computer EngineeringTechnical University of Crete, Greece
May 2003
Outline
Discourse Research Issues
Spoken Dialogue Systems Pragmatics (dialogue acts)
Dialogue management
Multimodal Systems
Examples
Discourse: Research Issues
Reference resolution, e.g., “That was a lie” Anaphora, e.g., “John left …. He was bored.”
Co-reference, e.g., “John” and “He” refer to the same entity
Text coherence, e.g., Coherence: “John left early. He was tired”
Incoherence: “John left early. He likes spinach”
Spoken Dialogue Systems: Concepts Turn-taking
Dialogue Segmentation
Grounding Backchannel, e.g., ‘Mm Hmm’ Acknowledgment Explicit/implicit confirmation
Implicature “What time are you flying” “Well, I have a meeting at three”
Initiative “What time are you flying?” “Don’t feel like booking the flight right now. Lets look at hotels”
Speech, Dialogue and Application Acts Speech Acts (Austin 1962, Searle 1975)
Assertive (conclude), Directive (ask, order), Commissive (promise), Expressive(apologize, thank), Declarations
Dialogue Acts Statement, Info-Request, Wh-Question, Yes-No Question,
Opening, Closing, Open-Option, Action-Directive, Offer, Commit, Agree etc.
Application Acts Domain specific but general, e.g., Info-Request into system’s
semantic state, Info-Request into database, Info-Request into database results
Dialogue/Application Act Classification Semantic Parsing follows by deterministic rules, e.g.,
‘what’, ‘when’, ‘where’, ‘who’ starts a Wh-Question
Bayesian Formulation Given a sentence W the most probable dialogue act A is
argmax P(A|W) = argmax P(W|A) P(A)
P(W|A) can be an n-gram model one for each dialogue act
P(A) can also be an n-gram model of dialogue acts
Dialogue Management 1
Frame-based, e.g., DeptCity “From what city are you leaving?” GRM_CITY ArrCity “Where are you flying to” GRM_CITY DeptTime “What time would you like to fly?” GRM_TIME DeptDate “When are you flying?” GRM_DATETIME
DeptTime
Finite state machine dialogue manager Mostly system-initiated dialogue VXML-like dialogue structure (forms and frames)
Dialogue Management 2
Application Independent Flow Chart structure Generic dialogue/application manager
(really this is more like a controller)
Dialogue Management 3
Generalized Finite State Machine Dialogue Management
Application Dependent but General Dialogue Superstates
Fill: adaptive dialogue module, uses dynamic e-forms to elicit AV pairs from the user; resolves value and tree-position ambiguities
Navigate: presents database results and lets the user select the appropriate ones
Fill
Verify
Create
Query
Navigate
No
Yes
Is Full
Is Correct
Advanced Dialogue Systems Mixed Initiative:
Allow user to say anything (global grammar active at all states), e.g., “What date are you flying”
“I am flying next Tuesday in the morning” Allow user to navigate the systems state machine, e.g.,
“I would like to look at hotels first” Open prompts, give user the initiative, e.g., “What next?”
Advanced dialogue features Corrections, e.g., “No not Boston, Atlanta” Negation, e.g., “Anything but Olympic” Complex semantic expressions, e.g., “tomorrow evening or Sunday
morning” Ambiguity resolution and representation, e.g., “next Tuesday” Persistent Semantics, e.g., “Info about his organization”
Ambiguity Representation
L eg 1 L eg 2
D ep a rtu re
Trip
C ar
C ityAtlanta
DateJune 1
F ligh t
D ep a rtu re
C ityN e w Yo rk
C ityAtlanta
C ityN e w Yo rk
A rriva l
Position ambiguity
Valueambiguity
Error Correction MechanismsSys: How can I help you?
Usr: I’d like to fly from Austin to Newark on August fifth in the morningAsr: I’d like to fly from Boston to Newark on August fifth in the morning
Attribute Value Score
.trip.f light.leg1.departure.city BOS 0.5
.leg1.departure.date 8/5/2003 0.5
.leg1.departure.time 6:00 - 11:59 0.5
.leg1.arrival.city EWR 0.5
Sys: I've got you leaving Boston on Sunday, August fifth in the morning and arriving in Newark airport. What is your preferred airline?
Usr: Leaving from Austin, Texas
Attribute Value Score
.leg1.departure.city BOS 0.44
AUS 0.44
Sys: Sorry …, are you leaving from Austin, Texas, or from Boston?
Usr1: Austin, TexasAttribute Value Score
.leg1.departure.city AUS 0.72
BOS 0.38
Sys: Leaving from Austin, Texas.
Usr2: Change the departure city to Austin, TexasAttribute Value Score
.leg1.departure.city AUS 0.6
Alternate: use error correction
Spoken Dialogue System Architecture
Controller
Database
Parser
TTS Platform
ASR
Telephony
Generation
App. Controller
DM/Initiative
Interpreter/Context Tr.
AI
…
System Architecture and PortabilityAmbiguity representation
Pragmatic ConfidenceScores
Application dependentApplication independent
DialogueManager
Semantics Pragmatics Generation
ParserSemantic
Interpreter
ContextTracker
PragmaticInterpreter
ExpertDomain
Knowledge
InitiativeTracking
UtterancePlanner
SurfaceRealizer
Controller
Advantages of application-centric system design: Increased modularity. Flexible multi-stage data
collection. Extensible to multi-modal
input (universal access).
Multimodal Systems
Definition Input Modalities/Output Media Research Issues
User Interface Design Semantic Module
Examples
Input Modalities/Output Media Unimodal:
Speech input/Speech output.
Multimodal: Speech+DTMF input/Speech output. Speech input/Speech and GUI output. Speech and pen input/Speech and GUI output.
Definitions: Pen input: buttons, pull-down menus, graffiti, pen gestures. GUI output: text and graphics
S D P S+D
S+P
S
G
S+G
Issues
Semantic/Pragmatic Module: Merging semantic information from different modalities, e.g.,
“Draw a line from here to there” Ambiguity representation and resolution
User Interface: Synergies between input modalities Turn-taking and appropriate mix of modalities Maintain interface consistency Focus/context visualization
System issues: Synchronization and latency
July fifth 7/10
NL Parser GUI Parser
Pragmatic Analysis
Update Semantic Tree & Pragmatic Scores
Context Tracking
GUI InterpreterNL Interpreter GUI InterpreterNL Interpreter<date>
“fifth”
<day><month>
“July” <number>
<date>
“10”
<day><month>
“7” “/”
<number><number>{“date”, “Jul 5, 2002”} {“date”, “Jul 10, 2002”}
{“travel.flight.leg1.departure.date”, “Jul 5, 2002”}
{“travel.flight.leg1.departure.date”, “Jul 10, 2002”}
{“travel.flight.leg1.departure.date”, “Jul 5, 2002”, 0.4}
{“travel.flight.leg1.departure.date”, “Jul 10, 2002”, 0.9}
Semantic and Pragmatic Module
departure
travel
flight
leg 1
arrival
city datecity
{“BOS”, 0.5} {“Jul 5, 2002”, 0.4}
{“Jul 10, 2002”, 0.9}
{“NYC”, 0.5}
Multi-Modal User Interface Emphasis on synergies between modalities:
Value(s) of attributes are displayed graphically Erroneous values can be easily corrected via the GUI Focus (aka context) of speech modality is highlighted Position and value ambiguity are shown (and typically resolved)
via the GUI Voice prompts are significantly shorter and mostly used to
emphasize information that is already displayed graphically GUI takes full advantage of intelligence of voice UI, e.g., ‘round
trip’ speech input will ‘gray out’ the third leg button in the GUI Seamless integration of semantics from the two modalities using
modality-specific pragmatic scores
ASR: I want to fly from Boston to New York on September 6th.
new focus
field disabled
Example 1: Flight First Leg
navigation buttons
Mixing the Modalities: Turn-Taking
“Click to talk” vs “Open Mike” “Click to talk” can be restrictive “Open mike” can be confusing (falling out of turn) Both have limitations
Often there is a dominant modality based on Type of input, e.g., “select from menu” vs enter free text Recent input history User preferences
System automatically selects the dominant modality and the user can click to change it Dominant modality selection algorithm is adaptive