POSTECH Dialog-Based Computer Assisted Language Learning System
Intelligent Software Lab. POSTECH
Prof. Gary Geunbae Lee
Contents Introduction Methods
DB-CALL System Example-based Dialog Modeling Feedback Generation Translation Assistance Comprehension Assistance
Language Learner Simulation User Simulation Grammar Error Simulation
Discussion
RESEARCH BACKGROUND
BACKGROUND
• Globalization makes English more important as a world language• Extremely high cost of native speaker tutors• Most language learning software are dedicated to pronunciation practice• Dialog-based Computer-assisted Language Learning will be an excellent solution
ISSUES
• DB-CALL system should be able to understand student’s poor and non-native expressions• DB-CALL system should have high domain scalability to support various practical scenarios• DB-CALL system should provide educational functionalities which help students improve
their linguistic ability
PREVIOUS WORKS ON DB-CALL Let’s Go (CMU, 02-04)
Providing bus schdule information for CMU Non-native students
Adaptation the acoustic model and language model to non-native speakers
Edit-distance based corrective feedback
PREVIOUS WORKS ON DB-CALL
SPELL (Edinburgh, 05) Restourant Domain Scenario-based virtual
space Incorporating mal-rules
into the ASR grammar
PREVIOUS WORKS ON DB-CALL
DEAL (KTH, 07) Trade Domain Finite State Network-
based limited dialog management
When leaners get stuck, the system provides hints
POSTECH DB-CALL System
Crawler
Descrip-tion
Extractor
+
Parallel Sentence Extractor
+
<parallel><source>~~~~~~~</source><target
<parallel><source>~~~~~</source><target>~~~~~</target></parallel><Alignment Info><s2t>~~~~~~~~</s2t><t2s>~~~~~~~~</t2s><composition>~</composition><Additional><url>~~~~~~</url>
Example 1
Description 1
Example 2
Description 2
Example 3
Description 3
… …
ESL Dialog Tutoring
User Input
Tutor: ----------User: ----------Tutor: ----------User: ----------Tutor: ----------User: ----------Tutor: ----------User: ----------Tutor: ----------
> Expression> Description…
> Korean EXP> English EXP…
Try this ex-pression
DB-CALL System
1. Example-based Dialog Modeling
INTRODUCTION Spoken Dialog System
Applications Human-Robot Interface, Telematics, Tutoring, ...
PROBLEM & GOAL PROBLEM
How to determine the next system action Knowledge-based approach
Plan recipe / ISU rule / Agenda Data-driven approach
Statistical approach Supervised Learning based on state approximation Reinforcement Learning based on MDP/POMDP
Example-based approach
GOAL To develop a simple and practical approach to dia-
log modeling for multi-domain dialog systems
IDEA
Dialog State Space
Domain = Building_GuidanceDialog Act = WH-QUESTIONMain Goal = SEARCH-LOCROOM-TYPE=1 (filled), ROOM-NAME=0 (unfilled)LOC-FLOOR=0, PER-NAME=0, PER-TITLE=0Previous Dialog Act = <s>, Previous Main Goal = <s> Discourse History Vector = [1,0,0,0,0]Lexico-semantic Pattern = ROOM_TYPE 이 어디 지 ?System Action = inform(Floor)
Dialog Corpus
USER: 회의 실 이 어디 지 ?[Dialog Act = WH-QUESTION][Main Goal = SEARCH-LOC][ROOM-TYPE = 회의실 ]SYSTEM: 3 층에 교수회의실 , 2 층에 대회의실 , 소회의실이 있습니다 . [System Action = inform(Floor)]
Turn #1 (Domain=Building_Guidance)
Dialog Example
Indexed by using semantic & discourse features
Having the simi-lar state
),(argmax* heSe iEei
Lee et al., (2006), A Situation-based Dialogue Management using Dialogue Examples, IEEE ICASSP
ALGORITHM
Query Generation Making SQL statement using Discourse
History and SLU results.
Example Search Trying to search semantically close
dialog examples in example DB given the current dialog state.
Example Selection Selecting the best example to max-
imize the utterance similarity mea-sure based on lexical and discourse information.
Noisy Input(from ASR/SLU)
ExampleSearch
ExampleSelection
QueryGeneration
Example DB
ContentDB
DiscourseHistory
NLG
RelaxationStrategy
SystemTemplate
EXPERIMENTAL RESULTS Real user evaluation
10 undergraduates
Evaluation Metric STR (Success Turn Rate)
# of successful turns / # of total turns
TCR (Task Completion Rate) # of successful dialogs / # of total dialogs
AvgUserTurn Average user’s turn length per dialog
Lee et al., (2009), Example-based Dialog Modelng for Practical Multi-domain Dialog Systems, SPECOM
System #Dialogs AvgUserTurnSTR(%)
TCR(%)
Car Navigation 50 4.54 86.25 92.00
Weather Informa-tion
50 4.46 89.01 94.00
EPG 50 4.50 83.99 90.00
Chatbot 50 5.60 64.31 -
Multi-domain 15 6.08 78.77 86.67
EXPERIMENTAL RESULTS
Lee et al., (2009), Example-based Dialog Modelng for Practical Multi-domain Dialog Systems, SPECOM
System Exact match Partial match No example
Car Navigation 50.22 44.49 5.29
Weather Informa-tion
69.49 25.00 5.51
EPG 58.33 37.22 4.45
Chatbot 50.71 14.29 35.00
Multi-domain 69.23 24.62 6.15
Example match rate of each dialog system
ROBUST DIALOG MANAGEMENT PROBLEM
How to overcome errors in the real world
ROBUST DIALOG MANAGEMENT Error handling
Recovering ASR/SLU errors by interacting with the user at the conversational level
N-best support Estimating the current state with uncertanity
ASR SLU DM
Noise reductionAdaptationN-best & lattice & CN
Robust parsingData-driven app.
Error handlingN-best support
+ERROR +ERROR
Lee et al., (2008), Robust management with n-best hypotheses using dialog examples and agenda, ACL
GOAL & IDEA To increase the robustness of EBDM with prior
knowledge1) Error Handling
If the system knows what the user will do next
Dynamic Help Generation
LOCATION
OFFICE PHONE NUMBER
ROOM ROLE
GUIDE
FOCUS NODE
NEXT_TASK
AgendaHelpS: Next, you can do the subtask 1) Asking the room's role, or 2)Asking the office phone num-ber, or 3) Selecting the desired room for navi-gation.
UtterHelpS: Next, you can say 1) “What is it?”, or 2) “What’s the phone number of [ROOM_NAME]?”, or 3) “ Let’s go there.
GOAL & IDEA To increase the robustness of EBDM with prior
knowledge2) N-best supportIf the system knows which subtask will be more probable next
Rescoring N-best hypotheses (h1~hn)
LOCATION
OFFICE PHONE NUMBER
FLOOR
ROOMNAME
h2
h1
h3
h4
Subtask System Utterance System Action
LOCATIONThe director’s room is Room No. 201.
Inform(RoomNumber)
N-best User Utterances Subtask P(hi|S)
U1 (h1)What are office rooms in this building?
ROOM NAME
0.2
U2 (h2) What is the floor? FLOOR 0.4
U3 (h3) Where is it? LOCATION 0.3
U4 (h4)What is the phone num-ber?
OFFICEPHONE NUMBER
0.5(More proba-
ble)
ALGORITHM
ASR SLUFromUser
w1
w2
wn
u1
u2
un
EBDM
V1
V2 V3
V6V7
V4
V5
V8
V9
s1
s2
sn
Discourse Interpretation
Focus Stack
V1
V2
ArgmaxNode
ArgmaxExample
am*
V3 V4 V6
V6
e1 e2 ek
ej*
V6
EXPERIMENT SET-UP Simulated User Evaluation
Test set : 1000 simulated dialogs (<20 user turns) Domain : Intelligent robot for building guidance Using 5-best recognition hypotheses
Evaluation Metric TCR
# of successful dialogs / # of total dialogs AvgUserTurn
Average user’s turn length per dialog AvgScore
20 * TCR + (-1) * AvgUserTurn
EXPERIMENTAL RESULTS
Lee et al., (2009), Hybrid Approach to Robust Dialog Management using Agenda and Dialog Examples, CSL, (Submitted)
0 10 20 30 40 503
5
7
9
11
13
15
17
P-E
P-ER
P-EA
P-EAR
WER (%)
Avera
ge S
core
Legends Methods
P-E Using only Examples
P-ER Using Examples + Recovery
P-EA Using Examples + Agenda Graph
P-EAR Using Examples + Agenda Graph + Recovery
The average score of different methods
EXPERIMENTAL RESULTS
Lee et al., (2009), Hybrid Approach to Robust Dialog Management using Agenda and Dialog Examples, CSL, (Submitted)
1 2 5 10 15 20 30 50 1002
4
6
8
10
12
14
16
18
WER0WER10WER20
n-best size
Avera
ge S
core
The average score of the P-EAR system according to n-best size
DEMO VIDEO PC demo
DEMO VIDEO Robot demo
2. Feedback Generation
INTRODUCTION
Recast Feedback
User Input
Tutor: ----------User: ----------Tutor: ----------User: ----------
> Expression> Description…
> Korean EXP> English EXP…
Tutor: What is the purpose of you trip?User: My purpose business
Tutor: Sorry, I don’t understand. What did you say?User: I am here on business
Try this expression“I am here on business”
Clarifica-tion Re-quest
Recast Feedback
Learner Uptake
Tutoring Process
INTRODUCTION
Expression Suggestion
User Input
Tutor: ----------User: ----------Tutor: ----------User: ----------
> Expression> Description…
> Korean EXP> English EXP…
Tutor: What is the purpose of you trip?
Tutor: Sorry, I can’t hear you.User: I am here on business
Try this expression“I am here on business”
TIMEOUT
Expression Sugges-
tion
Learner Uptake
Tutoring Process
PROBLEMS How to recognize user intentions despite numerous errors
in their utterances The mal-rule based technique used in previous studies doesn’t
work on low level learners due to multiple errors Some utterances even seem to have a meaning that dif-
fers from what they intended to say Intended meaning : When does the bus leave? learner’s utterance : Which time I have to leave?
How to choose appropriate user intentions to suggest when a timeout is expired
The system should take into consideration the dialog con-text as human tutors do
Performing Intention-based soft pattern-matching to gen-erate correct feedback
MATHODS Context-aware & Level-specific Intention
Recognition Intention-based pattern matching
Level 1Utterance Model
Level 1Data
Learner’s Utterance
Dialog State –basedModel
Level 2Utterance Model
Level NUtterance Model
Level 2Data
Level NData
Dialog State
Learner‘s Intention
ExampleExpresssion DB
Example Search
Example Ex-pressions
Pattern Matching
Feedback
Intention Recognizer Dialog Manager
Dialog StateUpdate
EXPERIMENT SET-UP Primitive data set
Immigration domain 192 dialogs, 3517 utterances (18.32 utt/dialog) Annotation
Manually annotated each utterance with the speaker’s intention and component slot-values
Automatically annotated each utterance with the dis-course information
EXPERIMENTAL RESULTS
Utterance Model Hybrid Model
EXPERIMENTAL RESULTS
Level-spec Hybrid
Level-spec Utterance
Level-ignore Hybrid
Level-ignore Utterance
EXPERIMENTAL RESULTS
Demo: POSTECH DB-CALL initial version 2008
3. Translation Assistance
ArchitectureExample format
Web
Parallel Sentence
Example
Extraction
ESL Dialog system / Other Applications
QueryExpression
Search Engine
Interface(function call)
<parallel><source>~~~~~~~</source><target>~~~~~~~~</target></parallel>
<Alignment Info><s2t>~~~~~~~~</s2t><t2s>~~~~~~~~</t2s><composition>~~~~<composition>
<Additional><url>~~~~~~</url>
<parallel><source>~~~~~~~</source><target>~~~~~~~~</target></parallel>
<Alignment Info><s2t>~~~~~~~~</s2t><t2s>~~~~~~~~</t2s><composition>~~~~<composition>
<Additional><url>~~~~~~</url>
<parallel><source>~~~~~~~</source><target>~~~~~~~~</target></parallel>
<Alignment><s2t>~~~~~~~~</s2t><t2s>~~~~~~~~</t2s><composition>~~~~</composition></Alignment>
<Additional><url>~~~~~~</url></Additional>
Analysis
Building Bilingual Example Word alignment Widely used in Statistical Machine Translation
IBM Model 1~5, Symmetrization heuristics Word alignment presents a correspondence of
each word/phrase in a given bilingual example Example word alignment ( GIZA++ )
4. Comprehension Assistance
INTRODUCTION
ESL pob-cast
website
Expression-description
DB
Dialog Sys-tem
Description Suggestion System
English Expression-Description Example Suggestion System When the user asks for a unfamiliar English ex-
pression, the system present its description to help understanding
Expression detection
Recommend
sentence
description
INTRODUCTION Expression-Description Pair Extraction Sys-
tem To present the expression example and its descrip-
tion, the system extracts expression-description pair from ESL podcast site
Phrase Description
routine test … we mean it's a normal,regular test that the doctor runs many, many different times with differentpatients, not a special test.
Treatment “Treatment” is anotherword for what the doctor gives you or does to you to help you.
EXAMPLE[script]
[description]
EXAMPLE[script]
[description]
Language Learner Simulation
1. User Simulation
INTRODUCTION User Simulation For Spoken Dialog System
Developing `simulated user’ who can replace real users
Application Automated evaluation of Spoken Dialog System
Detecting potential flaws Predicting overall behaviors of system
Learning dialog strategy in reinforcement learning framework
PROBLEM & GOAL PROBLEM
How to model real user User Intention simulation User Surface simulation ASR channel simulation
GOAL Natural Simulation Diverse Simulation Controllable Simulation
IDEA – User Intention Simulation
Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems, Computer Speech and Language.
Discourse Factors + Knowledge + Events …
Dialog is sequential behaviors Especially, user intention
User Intention simulation should take care of various discourse information
User
Sys
User
Sys
User
Sys
User Intention Simulation- Linear Conditional Random Field model
Turn Turn TurnTurn
Assumption An user utterance has only one intention
UI : User Intention State State=[dialog_act, main_goal, named_entities]
DI : Previou Discourse Information System Response + Discourse History
UI
DI
UI
DI
UI
DI
UI
DI
Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems, Computer Speech and Language.
ALGORITHM
Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems, Computer Speech and Language.
User Surface Simulation PROBLEM
How to generate user surface utterance which ex-press given user intention
Approach 2-phase user utterance generation
1-phase : candidate generation 2-phase : rescoring
UtteranceUtteranceUtteranceUtterance
..
UserUtterance
Model
Simula-tion
Selected Utter-ance
Selected Utter-ance
Selected Utter-ance
…
Rescor-ing
1 - phase 2 -phase
1 phase - Generation
Dialog_Act _X_ Main_Goal
S1
W1
S2
W2
S3
W3
S4
W4
S5
W5
Structure Tag Transi-tion
Emission Prob.
Structure Tags : Component Slot Names + Part of Speech Tags S : member of Structure Tags given space W : member of vocabulary given space
Genera-tion
Genera-tion
Genera-tion
Genera-tion
Genera-tion
Genera-tion
Genera-tion
Genera-tion
2phase - Rescoring PROBLEM
Rescoring and Selecting the good utterances Criteria
Human-like utterance Natural word transition
APPROACH Structure and Word interpolated BLEU score
SWB score Notice that
Evaluation on system generated utterances on utterance simulation and machine translation shares the same task
SWB = β * Structure_Sequence_BLEU + (1- β)* Word_Sequence_BLEU, where 0 ≤ β ≤1
We set beta as 0.2 since Korean language is an agglutina-tive language so that it is relatively free to the structural grammar.
Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems, Computer Speech and Language.
ALGORITHM
Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems, Computer Speech and Language.
ASR Channel Simulation PROBLEM
How to simulate ASR channel Knowledge-based approach Statistical Approach
It is difficult to collect ‘speech’ data for target domain.
WER controllable simulation
APPROACH Linguistic Knowledge based simulation
Step 1 : Determining error position Step 2 : Generating Error types on error marked words Step 3 : Generating ASR Errors ( Substitution, Deletion, Insertions) Step 4 : Rescoring and selecting erroneous utterance
Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems, Computer Speech and Language.
Error Type Distribution Determining Error types
Based on the results of English Speech Recogni-tion We assume that Korean speech recognition has similar
error distribution generally.
Greenberg et al., 2000
Error Generation Insertion error
Insert random word before the ‘insertion error mark’
Deletion error Just delete it
Substitution Error Based on Sequence Alignment Algorithm
Syllable-and Phone-based Alignment Selecting some candidates in a dictionary Dynamic local alignment algorithm :
Needleman and Wunsch (1970) Get the similarity score
Similarity = α * Syllable_Alignment_Score + (1- α)* Phoneme_Alignment_Score, where 0 ≤ α ≤1
Vowel Confusion Matrix example
EXPERIMENT SET-UP Korean Car navigation Dialog system
SLU : Jeong and Lee (2006) DM : Lee et al. (2009)
Word Error Rate : 0.0 ~ 0.4 5000 dialog samples at each WER setting
Intention
Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems, Computer Speech and Language.
D-BLEU ( Discourse BLEU) is a metric for measuring naturalness of simulated dialogs in the sense of n-gram precision based on BLEU metric calculation.
Intention
Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems, Computer Speech and Language.
Utterance
Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems, Computer Speech and Language.
ASR channel
Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems, Computer Speech and Language.
ASR channel
Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems, Computer Speech and Language.
Overall prediction
Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems, Computer Speech and Language.
2. Grammar Error Simulation
INTRODUCTION Language learner simulation requires us to in-
vent grammar error simulation on top of the general user simulation
SLU
Dialog Manager
System Utterance Generator
Dialog System
Non-native ASR
TTS
Grammar Errors Simula-tor
User Utterance Simulator
User Intention Simulator
ASR Errors Simula-tor
Language Learner Simulator
REALISTIC ERROR
He wants to go to a movie theater
He wants to to a movie theater
He want go to movie theater
VS.
PROBLEMS How to incorporate expert knowledge about
error characteristics of Korean language learners into the statistical model Subject-verb agreement errors Omission errors of the preposition of prepositional
verbs Omission errors of articles Etc.
MARKOV LOGIC NETWORK
Sungjin Lee, Gary Geunbae Lee. Realistic grammar error simulation using markov logic. ACL 2009
METHOD The generation procedure involves three steps:
Generating probability over error types for each word through MLN inference
Determining an error type by sampling the generated prob-ability for each word
Creating an ill-formed output sentence by realizing the cho-sen error types
He wants to go to a movie theater
v_agr_subprp_lex_del
at_del
none
0.0000.0000.000
0.921
0.3710.0000.000
0.449
0.0000.2840.000
0.604
0.0000.0000.000
0.866
0.0000.2690.000
0.605
0.0000.0000.355
0.506
0.0000.0000. 000
0.781
0.0000.0000.000
0.798
none v_agr_sub prp_lex_del none none at_del none none
He want go to movie theater
1 step
2 step
3 step
Inference
Sampling
Realization
EXPERIMENT SET-UP Data Sets
NICT JLE Corpus Dividing the 167 error annotated files into 3 level
groups: Beginner(1-4) : 2,905 Intermediate(5-6) : 3,296 Advanced(7-9) : 2,752
Evaluation 10-fold cross validations performed for each group
The validation results were added together across the rounds
EXPERIMENTAL RESULTS Advanced
DKL(Real || Proposed)=0.068 vs. DKL(Real || Baseline)=0.122
EXPERIMENTAL RESULTS Intermediate
DKL(Real || Proposed)=0.075 vs. DKL(Real || Baseline)=0.142
EXPERIMENTAL RESULTS Beginner
DKL(Real || Proposed)=0.075 vs. DKL(Real || Baseline)=0.092
EXPERIMENTAL RESULTS Human Judgment
Evaluated 100 randomly chosen sentences con-sisting of 50 sentences each from the real and simulated data
The sequence of the test sentences was mixed so that the human judges did not know whether the source of the sentence was real or simulated
Two-level scale (0: Unrealistic, 1: Realistic)
Sungjin Lee, Gary Geunbae Lee. Realistic grammar error simulation using markov logic. ACL 2009
Q & A