student simulation and evaluation dod meeting hua ai ([email protected]) 03/03/2006

30
Student simulation Student simulation and evaluation and evaluation DOD meeting DOD meeting Hua Ai ([email protected]) Hua Ai ([email protected]) 03/03/2006 03/03/2006

Post on 19-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Student simulation Student simulation and evaluation and evaluation DOD meetingDOD meeting

Hua Ai ([email protected])Hua Ai ([email protected])

03/03/200603/03/2006

22

OutlineOutline

MotivationsMotivations BackgroundsBackgrounds CorpusCorpus Student Simulation ModelStudent Simulation Model ComparisonsComparisons Conclusions & Future WorkConclusions & Future Work

33

MotivationsMotivations

For larger corpusFor larger corpus Reinforcement Learning (RL) is used to Reinforcement Learning (RL) is used to

learn the best policy for spoken dialogue learn the best policy for spoken dialogue systems automaticallysystems automatically

Best strategy may often not even be present Best strategy may often not even be present in small datasetin small dataset

For cheaper corpusFor cheaper corpus Human subjects are expensiveHuman subjects are expensive

44

Simulated User

Dialog Manager

Strategy

Reinforcement Learning

DialogCorpus

Simulation models

Strategy learning using a simulated user (Schatzmann et al., Strategy learning using a simulated user (Schatzmann et al., 2005)2005)

55

Backgrounds (1)Backgrounds (1)

Education communityEducation community Focusing on changes of student’s inner-Focusing on changes of student’s inner-

brain knowledge representation formsbrain knowledge representation forms Usually not dialogue basedUsually not dialogue based Simulated students for (Venlehn et al., 1994) Simulated students for (Venlehn et al., 1994)

tutor trainingtutor training Collaborative learningCollaborative learning

66

Backgrounds (2)Backgrounds (2)

Dialogue communityDialogue community Focusing on interactions and dialogue Focusing on interactions and dialogue

behaviorsbehaviors Simulated users have limited actions to takeSimulated users have limited actions to take (Schatzmann et al., 2005)(Schatzmann et al., 2005)

Simulating on DA levelSimulating on DA level

77

Corpus (1)Corpus (1)

Spoken dialogue physics tutor (ITSPOKE)Spoken dialogue physics tutor (ITSPOKE)

88

Corpus (2)Corpus (2)

Tutoring procedureTutoring procedure

(T) Question

(S) Answer

Dialogue(T) Q(S) A

Essay revision

Dialogue

(T) Question

(S) Answer

Dialogue(T) Q(S) A

Essay revision

Dialogue

… …

5 problems

99

Corpus (3)Corpus (3)

Tutor’s behaviorsTutor’s behaviors Defined in KCD (Knowledge Construction Defined in KCD (Knowledge Construction

Dialogues)Dialogues)

Correct

Incorrect/Partially Correct

1010

Corpus (4)Corpus (4)

  #dialogues   stuWord stuTurn tutorWord tutorTurn

f03 100 avg 57.16 23.35 1256.92 29.64

(Synthesized)    stdev 45.57638 17.44334 849.8195 19.76351

05syn 136 avg 91.0963 30.78519 1655.467 38.06667

(Synthesized)     stdev 53.82931 14.42551 757.8744 16.32469

05pre 135 avg 87.34559 30.11765 1597.206 37.33088

(pre-

recorded)     stdev 55.48004 16.96972 832.9845 18.20096

f03:s05 Different groups of subjectsf03:s05 Different groups of subjects

1111

Simulation Models (1)Simulation Models (1)

Simulating on word levelSimulating on word level Student’s have more complex behaviorsStudent’s have more complex behaviors DA info alone isn’t enough for the systemDA info alone isn’t enough for the system

Two models trained on two corpusTwo models trained on two corpus

ProbCorrect

Random

f03

s05

03ProbCorrect

03Random

05ProbCorrect

05Random

1212

Simulation Models (2)Simulation Models (2)

ProbCorrect ModelProbCorrect Model Simulates average knowledge level of real Simulates average knowledge level of real

studentsstudents Simulate meaningful dialogue behaviorsSimulate meaningful dialogue behaviors

Random ModelRandom Model Non-senseNon-sense As a contrastAs a contrast

1313ProbCorrect ModelProbCorrect Model

Real corpusquestion1Answer1_1 (c)Answer1_2 (ic)Answer1_3 (ic)

question2Answer2_1 (c)Answer2_2 (ic)

Candidate Ans:For question1c:ic = 1:2c:Answer1_1ic:Answer1_2Answer1_3

For question2c:ic = 1:1c:Answer2_1icAnswer2_2

ProbCorrect Model:Question 1Answer: 1) Choose to give a

c/ic answer with the same average probability as real student

2) Randomly choose one answers from the corresponding answer set

1414

HC03&05Question1Answer1_1Answer1_2Answer1_3Answer1_4

Question2Answer2_1Answer2_2

Candidate Ans:1) Answer1_12) Answer1_23) Answer1_34) Answer1_45) Answer2_16) Answer2_2

Big random Model:Question i:

Answer: any of the 6 answers with the same probability

(Regardless the question!)

Random ModelRandom Model

1515

ExperimentsExperiments

Comparisons between real corporaComparisons between real corpora Comparisons between real & simulated Comparisons between real & simulated

corporacorpora Comparisons between simulated corporaComparisons between simulated corpora

1616

Evaluation metricsEvaluation metrics High-level dialog features High-level dialog features Dialog style and cooperativeness Dialog style and cooperativeness Dialog Success Rate and Efficiency Dialog Success Rate and Efficiency Learning GainsLearning Gains

Real Corpora Real Corpora Comparisons (1)Comparisons (1)

1717

High-level dialog featuresHigh-level dialog features

Real corpora comparisons Real corpora comparisons (2)(2)

1818

Real corpora comparisons Real corpora comparisons (3)(3)

Dialogue style featuresDialogue style features

1919

Real corpora comparisons Real corpora comparisons (3)(3)

Dialogue success rateDialogue success rate

2020

Real corpora comparisons Real corpora comparisons (4)(4)

Learning gains featuresLearning gains features

2121

ResultsResults

Differences captured by these simple Differences captured by these simple metrics can’t help to conclude whether a metrics can’t help to conclude whether a corpus is real or not (Schatzmann et al., corpus is real or not (Schatzmann et al., 2005)2005)

Differences could be due to different user Differences could be due to different user population population

2222

Real Vs Simulated Real Vs Simulated Corpora Comparisons Corpora Comparisons

00.20.40.60.8

11.21.41.61.8

2

tuto

rTur

n

tuto

rWord

tWor

dRate

stuTurn

stuW

ord

sWor

dRate

corre

ctRat

e

f03 03smooth 03random s05 05smooth

2323

Results (1) Results (1)

Most of the measurements are able to Most of the measurements are able to distinguish between Random and distinguish between Random and ProbCorrect modelProbCorrect model

ProbCorrect model generates more ProbCorrect model generates more realistic behaviorsrealistic behaviors

We can’t conclude on the power of these We can’t conclude on the power of these metrics since the two simulated corpus metrics since the two simulated corpus are really differentare really different

2424

Results (2)Results (2)

Differences between real and random Differences between real and random models are captured clearly, but models are captured clearly, but differences between real and differences between real and ProbCorrect is not clearProbCorrect is not clear

We don’t expect this simple model to give We don’t expect this simple model to give very real corpus. It’s surprising that the very real corpus. It’s surprising that the differences are small differences are small

2525

Results (3)Results (3)

S05 variety > f03 variety S05 variety > f03 variety 05probCorrect variety > 03probCorrect 05probCorrect variety > 03probCorrect varietyvariety

However, we don’t get significantly more However, we don’t get significantly more varieties in the simulated corpus than the varieties in the simulated corpus than the real onesreal ones Could be the computer tutor is simple (c/ic)Could be the computer tutor is simple (c/ic) We’re using the same candidate answer setWe’re using the same candidate answer set

2626

Results (4)Results (4)

ProbCorrect models trained on different ProbCorrect models trained on different real corpora are quite differentreal corpora are quite different

The ProbCorrect model is more similar to The ProbCorrect model is more similar to the real corpus it is trained from than to the real corpus it is trained from than to the other real corpusthe other real corpus

2727

Comparisons between Comparisons between simulated dialogues with simulated dialogues with different dialogue structuredifferent dialogue structure

f03problem34

0

0.2

0.4

0.6

0.8

1

1.2

1.4

03prob 03smoothed 03random

f03problem7

00.20.40.60.8

11.21.41.6

tuto

rTur

n

tuto

rWor

d

tWor

dRat

e

stuTur

n

stuW

ord

sWor

dRat

e

corre

ctRat

e

03prob 03smoothed 03random

2828

ResultsResults

Larger differences between the two Larger differences between the two simulated corpora in prob7 than in simulated corpora in prob7 than in prob34prob34

Dialogue structure of prob34 is more Dialogue structure of prob34 is more restrictedrestricted

The power of these simple metrics is The power of these simple metrics is restricted by the dialogue structurerestricted by the dialogue structure

2929

ConclusionsConclusions

The simple measurements can The simple measurements can distinguish between distinguish between real corporareal corpora

Different populationDifferent population simulated and real corpora simulated and real corpora

To different extentTo different extent simulated corporasimulated corpora

Different modelsDifferent models Trained on different corporaTrained on different corpora Limited to different Dialog structureLimited to different Dialog structure

3030

Future workFuture work

Explore “deep” evaluation metricsExplore “deep” evaluation metrics Test simulated corpus on policyTest simulated corpus on policy More simulation modelsMore simulation models

More human featuresMore human features Emotion, learningEmotion, learning

Special casesSpecial cases Quick learners, slow learnersQuick learners, slow learners