language technology ii language-based interaction: dialogue design and evaluation manfred pinkal...
TRANSCRIPT
![Page 1: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/1.jpg)
Language Technology IILanguage-Based
Interaction:Dialogue design and
evaluationManfred Pinkal
Course website:www.coli.uni-saarland.de/courses/late2
![Page 2: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/2.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
2
Outline
• The Software Development Cycle• Dialogue Design • Wizard-of-Oz Experiments• Dialogue System Evaluation
![Page 3: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/3.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
3
The Software Development Cycle
• Requirements Analysis• Design• Implementation• Testing and Evaluation• Integration• Maintenance
![Page 4: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/4.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
4
The Software Development Cycle
• Requirements Analysis• Design• Implementation• Testing and Evaluation• Integration• Maintenance
![Page 5: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/5.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
5
Outline
• The Software Development Cycle• Dialogue Design • Wizard-of-Oz Experiments• Dialogue System Evaluation
![Page 6: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/6.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
6
Dialogue Design: Overall Aims
• Effectiveness (Task Success)• Efficiency • User Satisfaction
![Page 7: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/7.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
7
Dialogue Design: General Steps
• 1. Make sure you understand what you are trying to achieve(use scenarios and build a conceptual model).
• 2. See if you can decompose the task into smaller meaningful subtasks.
• 3. Identify the information tokens you need for each task or subtask.
• 4. Decide how you will obtain this information from the user.
• 5. Sketch a dialogue model that capture this information.
• 6. Test your dialogue model.• 7. Revise the dialogue model and repeat Step 6 …
The following slides are compiled from slides Rolf Schwitters and Bernd Plannerer
![Page 8: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/8.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
8
Dialogue Design: Principal Decisions
• Specification of Target Group and Supported languages– Frequency of usage– Regional / National– Monolingual / multilingual / foreign language
speakers– Age
• Environment– Quiet Environment: Home, Office– Noisy Environment: Car, Outdoor, Noisy Working
Environments• Choice of Persona and Voice• Dialogue Structure
![Page 9: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/9.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
9
Dialogue Design: Practical Tips
• Guide the user towards responses that maximize– clarity and– unambiguousness.
• Allow for the user not knowing – the active vocabulary– the answer to a question or– understanding a question.
• Guide users toward natural ‘in vocabulary’ responses. – Version 1: Welcome to ABC Bank. How can I help you?– Version 2: Welcome to ABC Bank. What would you like to do?– Version 3: Welcome to ABC Bank. You can check an account
balance, transfer funds, or pay a bill. What would you like to do?
![Page 10: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/10.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
10
More Practical Tips
• Do not give too many options at once (maximum 5)
• Keep prompts brief to encourage the user to be brief.
• Supply confirmation messages frequently, especially when the cost or likelihood of a recognition error is high.
• Prefer implicit over explicit grounding.• Use recognizer confidence values to avoid
unnecessary grounding steps.
![Page 11: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/11.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
11
More Practical Tips• Assume a frequent user will have a rapid learning
curve.• Allow shortcuts:
– Switch to expert mode/ command level.– Combine different steps in one.– Barge-In
• Assume errors are the fault of the recognizer, not the user.
• Allow the user to access (context-sensitive) help at any state.
• Provide escape commands. • Design graceful recovery when the recognizer
makes an error.
![Page 12: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/12.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
12
Outline
• The Software Development Cycle• Dialogue Design • Wizard-of-Oz ExperimentsDialogue System Evaluation
![Page 13: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/13.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
13
Dialogue Design: General Steps
• 1. Make sure you understand what you are trying to achieve(use scenarios and build a conceptual model).
• 2. See if you can decompose the task into smaller meaningful subtasks.
• 3. Identify the information tokens you need for each task or subtask.
• 4. Decide how you will obtain this information from the user.
• 5. Sketch a dialogue model that capture this information.
• 6. Test your dialogue model.• 7. Revise the dialogue model and repeat Step 6 …
![Page 14: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/14.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
14
Wizard-of-Oz Experiments
• Central parts of the system are simulated by a human "wizard".
• Experimental WoZ systems allow to test a dialogue system (to some extent) before it has been (fully) implemented, thus uncovering basic problems of the dialogue model.
• Also, they allow to collect – data about dialogue behavior of subjects– the used syntax and lexicon (to hand-code
language models)– speech data (to train statistical language models)
at an early stage.
![Page 15: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/15.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
15
Wizard-of-Oz Experiments
The WoZ is not just a person in a box: The WoZ system must:• perform as poor as a computer: "artificial" speech output by
typing and TTS system, simulation of shortcomings in recognition: wizard sees typed input (no prosody), maybe even with simulated recognition failure (e.g., by randomly overwriting words in typed input).
• perform as efficient as a computer: support of quick database access, complex real time decisions, e.g., by displaying dialogueflow diagram, marking the current state, offering menus with contextually appropriate dialogue moves and system prompts.
• impose constraints on the options of the wizard (to support impression of artificiality), and allow to vary those constraints (to test different dialogue strategies.
• log all kinds of data in an appropriate and easily accessible form.
![Page 16: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/16.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
16
Wizard-of-Oz Experiments
• Ideally, a WoZ system is set up in a modular way, allowing to replace functions contributed by humans subsequently in the course of system implementation.
• Gradual transition between WoZ and fully artificial system.
• An example: The DiaMant tool, run in WoZ mode.
![Page 17: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/17.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
17
Motivations for WoZ experiments
• The original motivation:– Eearly testing, avoiding time-consuming and expensive
programming.– Studying dialogues disregarding the bottle-neck of unreliable
speech recognition.• Changing conditions:
– Configuration and design of dialogue systems is becoming comfortable, recognizers are becoming pretty reliable: Are WoZ experiments necessary?
– Dialogue interaction is becoming increasingly flexible, adaptive, complex. Are WoZ experiments feasible?
• A shift in motivation:– From: exploration of the user's behavior, given constraint and
schematic system's behavior– To: exploration of alternative wizard's behavior, who is given a
range of freedom for his/her reaction.
![Page 18: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/18.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
18
An example
• A WoZ study in the TALK project, Spring 2005
• MP3 Player• Multi-modal dialogue, language
German• In-car/in-home scenario• Saarland University, DFKI, CLT
![Page 19: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/19.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
19
Tasks for the Subjects
• MP3 domain– “in-car” with primary task Lane Change Task (LCT)– “in-home” domain without LCT
• Tasks for the subject:– Play a song from the album "New Adventures in
Hi-Fi" by REM– Find a song with “believe” in the title and play it.
• Task for the wizard:– Help the user reach their goals
(Deliberately vague!)
![Page 20: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/20.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
20
Goals of WOZ MP3 Experiment
• Gather pilot data on human multi-modal turn planning
• Collect wizard dialogue strategies• Collect wizard media allocation
decisions• Collect wizard speech data• Collect user data (speech signals and
spontaneous speech)
![Page 21: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/21.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
21
User View
• Primary task: driving
• Secondary task on second screen: MP3 player
![Page 22: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/22.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
22
Video Recording
![Page 23: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/23.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
23
DFKI/USAAR WOZ system
• System features:– 14 (via OAA) communicating components
distributed over– 5 machines (3 windows, 2 linux)– Plus LCT on a seperate machine
• People involved to run an experiment: 5– 1 experiment leader – 1 wizard– 1 subject– 2 typists
![Page 24: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/24.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
24
Data Flow
texttext synthesizedsynthesizedaudioaudio datadata
audioaudio datadata audioaudio datadata
graphicsgraphicsWizardWizard SubjectSubject
TypistTypist TypistTypist
![Page 25: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/25.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
25
A Walk Through the final turns
• Wizard: “Ich zeige Ihnen die Liste an.”I am displaying the list.
• User: “Ok. Zeige mir bitte das Lied aus dem ausgewählten Album und spiel das vor.”Ok. Please show me that song (“Believe”) from the selected album and play it.
![Page 26: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/26.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
26
A Walk Through the Final Turns
• Wizard's actions:– Database search– Select “album presentation” (vs. songs or artists)– Select “list presentation” (vs. tables or textual summary)– Utterance: “Ich zeige Ihnen die Liste an.”
I am displaying the list.– Audio is sent to typist– Text is sent to speech synthesis
• User: “Ok. Zeige mir bitte das Lied aus dem ausgewählten Album und spiel das vor.”Ok. Please show me that song (“Believe”) from the selected album and play it.
![Page 27: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/27.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
27
Example(1) Wizard
says: “Ich zeige Ihnen die Liste an.” (I am displaying the list.) and clicks on the list presentation
![Page 28: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/28.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
28
![Page 29: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/29.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
29
![Page 30: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/30.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
30
Options presenter with User-Tab
![Page 31: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/31.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
31
Data Flow
audioaudio datadata
graphicsgraphicsWizardWizard SubjectSubject
TypistTypist TypistTypist
![Page 32: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/32.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
32
Example(2) WizardTypist
types the wizard’s spoken text
I am displaying the list.
![Page 33: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/33.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
33
Data Flow
synthesizedsynthesizedaudioaudio datadata
audioaudio datadata
graphicsgraphicsWizardWizard SubjectSubject
TypistTypist TypistTypist
![Page 34: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/34.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
34
Example(3) UserListens to wizard text synthesized by Mary and receives the selected list presentation
![Page 35: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/35.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
35
![Page 36: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/36.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
36
Example(4) User• Selects one album and says: “Ok. Zeige mir bitte das Lied aus
dem aus gewählten Album und spiel das vor.”
Ok. Please show me that song (“Believe”) from the selected album and play it.
![Page 37: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/37.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
37
Automatically updated wizard screen with check
![Page 38: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/38.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
38
Data Flow
texttext synthesizedsynthesizedaudioaudio datadata
audioaudio datadata audioaudio datadata
graphicsgraphicsWizardWizard SubjectSubject
TypistTypist TypistTypist
![Page 39: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/39.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
39
Example(5) UserTypist
• Types the user’s spoken text
Ok. Please show me that song (“Believe”) from the selected album and play it.
![Page 40: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/40.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
40
Data Flow
texttext synthesizedsynthesizedaudioaudio datadata
audioaudio datadata audioaudio datadata
graphicsgraphicsWizardWizard SubjectSubject
TypistTypist TypistTypist
![Page 41: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/41.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
41
Example(6) WizardGets a correspondingly updated TextBox Window
![Page 42: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/42.jpg)
42 © TALK Consortium, 2006
©The current experimmental setupThe current experimmental setup
Usability Lab, Building C7 4
![Page 43: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/43.jpg)
43 © TALK Consortium, 2006
©GUI DevelopmentGUI Development
Old:
New:
![Page 44: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/44.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
44
Outline
• The Software Development Cycle• Dialogue Design • Wizard-of-Oz Experiments• Dialogue System Evaluation
![Page 45: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/45.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
45
Different levels of evaluation
• Technical evaluation• Usability evaluation• Customer evaluation
• According to: L. Dybkjaer/ N.Bernsen/ W.Minker, "Overview of evaluation and usability", in: W. Minker et al., Spoken multimodal human-computer dialogue in mobile environments, Springer 2005
![Page 46: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/46.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
46
Different levels of evaluation
• Technical evaluation– Typically component evaluation (ASR,
TTS, Grammar, but e.g.: System robustness)
– Quantitative and objective, to some extent
• Usability evaluation• Customer evaluation
![Page 47: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/47.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
47
Evaluation of ASR Systems
• WER• Speed (real-time performance)• Size of lexicon• Perplexity
![Page 48: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/48.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
48
Evaluation of TTS
• Intuitive evaluation by users with respect to – intellegibility– pleasantness– naturalness
• No objective (though quantitative) criteria, but extremely important for user satisfation
![Page 49: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/49.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
49
Different levels of evaluation
• Technical evaluation• Usability evaluation
– Evaluation of user satisfaction– Typically end-to-end evaluation– Mostly subjective and qualitative
measures
• Customer evaluation
![Page 50: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/50.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
50
Different levels of evaluation
• Technical evaluation• Usability evaluation• Customer evaluation, including
aspects like:– Costs– Platform compatibility– Maintenance
![Page 51: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/51.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
51
Usability Evaluation
• Mostly soft criteria:– "Usability Guidelines", best-practice rules,
form the basis of expert evaluation or user questionnaires.
![Page 52: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/52.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
52
Usability Guidelines
• … from Dybkjaer et al.:– Feedback adequacy: The user must feel
confident that the system has understood the information input in the way it was intended …
– Naturalness of the dialogue structure– Sufficiency of interaction guidance– Sufficiency of adaptation to user
differences– …
![Page 53: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/53.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
53
Usability Evaluation
• Mostly soft criteria:– "Usability Guidelines", best-practice rules, form the
basis of expert evaluation or user questionnaires.
• Hard, measurable criteria often contradict each other: Systems with high task success may lack efficiency, and vice versa.
• Is it possible to evaluate usability in a objective, predictive, and general way?
• Is there one (maybe parametrized) measure for User Satisfaction?
![Page 54: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/54.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
54
PARADISE
• An attempt to provide an objective, quantitative, operational basis for qualitative user assessments
• M. Walker/ D. Litman/C.Kamm/A.Abella: "PARADISE: A framework for evaluating spoken dialogue agents", Proc. of ACL 1997
![Page 55: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/55.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
55
PARADISE: The Idea
• The top criterion for usability evaluation is user satisfaction – it is an intuitive criterion which can not be directly measured, but is only accessible through qualitative user judgments.
• User satisfaction is – correlated to task success (effectiveness) – inversely correlated to the dialogue costs.
• There are features that can be easily and objectively extracted from dialogue logfiles, which approximate both task success and dialogue costs.
![Page 56: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/56.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
56
PARADISE: The Idea
• Take a set of dialogues produced by interaction of a dialogue system A with different subjects.
• Let the users assess their satisfaction with the dialogue.• Calculate the task success, and read the different measures
for dialogue costs off the log-files.• Compute the correlation between satisfaction assessment and
quantitative measures (via multiple linear regression).• Results:
– Prediction of user satisfaction for new individual dialogues with system A, or
– or for dialogues with a modified system A'.– Comparison of different dialogue systems A and B with respect to
user satisfaction.
![Page 57: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/57.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
57
PARADISE: The Structure
Maximise user satisfaction
Maximise task success Minimize costs
Efficiency measures
Qualitative measures
![Page 58: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/58.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
58
Efficiency and Quality Measures• Efficiency measures
– Elapsed time– System turns– User turns
• Quality measures– # of timeout prompts– # of rejects– # of helps– # of cancels – # of barge-ins– Mean ASR score
![Page 59: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/59.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
59
A Measure for Task Success
• Option 1: Yes/No evaluation for the complete dialogue
• Option 2, available for dialog systems using the form-filling paradigm: Let task success be determined by the fields in the form filled with correct values.
This and the following 3 slides will not be part of the exam
![Page 60: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/60.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
60
Tasks as Attribute-Value Matrices
![Page 61: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/61.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
61
An Instance
![Page 62: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/62.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
62
A Measure for Task Success• Identify task success with the value
for agreement between actual and intended values for the AVM ( is usually employed for measuring inter-annotator agreement).
P(A) –P(E) 1- P(E)
P(A) is the actual relative frequency of coincidence between values, P(E) the expected frequency.
=
![Page 63: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/63.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
63
PARADISE: The Structure
Maximise user satisfaction
Maximise task success Minimize costs
Efficiency measures
Qualitative measures
![Page 64: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/64.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
64
User Satisfaction
• Measured by adding the scores assigned to 8 questions by the subjects.
![Page 65: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/65.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
65
A user satisfaction questionnaire
• Was the system easy to understand?• Did the system understand what you
said?• Was it easy to find the information you
wanted?• Was the pace of interaction with the
system appropriate?• Did you know what you could say at
each point in the dialogue?
![Page 66: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/66.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
66
A user satisfaction questionnaire
• How often was the system sluggish and slow to reply to you?
• Did the system work the way you expected it to?
• From your current experience with using the system, do you think you would use the system regularly?
![Page 67: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/67.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
67
A hypothetical exampleThis and the following slide will not be part of the exam
![Page 68: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/68.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
68
The Performance Function
• N is a normalisation function, based on standard deviation,
• N() is normalised task success
• N(ci) are the normalised cost factors,
• and wi are weights on and the ci, respectively.
![Page 69: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/69.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
69
Comments on PARADISE
• Criterion for the feature selection is the easy availability of features through log-files. Is it really the interesting features that are selected?
• There is no strong theoretical foundation for the choice of questions in the user questionnaire.
• Does the methodology extend to more complex dialogue applications in real-world environments?
![Page 70: Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:](https://reader031.vdocuments.us/reader031/viewer/2022012916/56649c765503460f9492b56d/html5/thumbnails/70.jpg)
21.04.23 LaTeII: Language-based InteractionManfred Pinkal
70
General Comments
A trade-off between precision/objectivity and usefulness:
• PARADISE: (More or less) Precise and objective, but of limited practical use.
• Evaluation Guidelines: Of some practical use, but not really objective.
• The most useful device is intuition – If it is, at least in part, an artist's intuition: Dialogue design is art, as well as technology.