educational data mining in a computer tutor that listens

42
1 Educational data mining in a computer tutor that listens Joseph E. Beck Acknowledgements: NSF, Heinz

Upload: mackensie-hinton

Post on 04-Jan-2016

21 views

Category:

Documents


4 download

DESCRIPTION

Educational data mining in a computer tutor that listens. Joseph E. Beck Acknowledgements: NSF, Heinz. Take away point. Computer tutors provide gold mine of fine-grained interaction data. Standardized tests. Creates the ability to assess students and - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Educational data mining in a computer tutor that listens

1

Educational data mining in a computer tutor that listens

Joseph E. Beck

Acknowledgements: NSF, Heinz

Page 2: Educational data mining in a computer tutor that listens

2

Take away point

Computer tutors provide gold mine of fine-grained interaction data

Standardized tests

Creates the ability to assess students and improve capabilities of computer tutors

Page 3: Educational data mining in a computer tutor that listens

3

What is educational data mining?

• Using data to learn about students and instruction– E.g. predict student behavior, assess

students, evaluate the tutor’s teaching, etc.

• Motivation: computer tutors provide large samples of fine-grained, longitudinal data that are a powerful (unique?) source of knowledge to improve educational outcomes

Page 4: Educational data mining in a computer tutor that listens

4

Difference between educational and standard data mining

• Data collected with purpose in mind– Have control over schema

• Describe more interactive phenomena

• Generally smaller datasets

Page 5: Educational data mining in a computer tutor that listens

Project LISTEN’s Reading Tutor

Page 6: Educational data mining in a computer tutor that listens

6

Data we’ve collected

• Record several items in database– Student’s speech (as recognized by ASR)– Student’s help requests– Tutor’s teaching actions– (among other things)

• Scale of DB from 2002-2003 school year– 456 students– 423,149 student clicks for help– 4.1 million words heard by speech recognizer

Page 7: Educational data mining in a computer tutor that listens

7

Outline

Predicting student behavior• E.g. will the student click for help on this

word?

Inferring student’s skills• E.g. does the student know “ch” can make a K

sound (e.g. “chaos”)?

Future work

Page 8: Educational data mining in a computer tutor that listens

8

Predicting student help requests

Page 9: Educational data mining in a computer tutor that listens

9

Why predict help requests?

• Goal is to understand the student– In two senses

• A good model of the student should be able to predict future actions (e.g. outcome measure)

• Help requests provide window into student’s reading proficiency (e.g. source of knowledge)

• Non-speech events are less noisy• Applications of help requests

– Avoid overly complex material– Provide help ahead of time

Page 10: Educational data mining in a computer tutor that listens

10

Learning curves in students’ help requests (with Peng Jia)

Number of previous encounters

1514131211109876543210

Mea

n he

lp r

eque

st r

ate

.4

.3

.2

.1

0.0

Reading level

Grade 1

Grade 2

Grade 3

Grade 4

Page 11: Educational data mining in a computer tutor that listens

11

Region of focus

• “When students need help”– 1st and 2nd grade ability– 1-6 prior word encounters

• Selected data– 53 students– 175,961 words– 29,278 help requests

• # of cases per student: 1392 - 7783

• Help rate excluding common words: 0.54%–54%– A few novice readers

requested substantial amounts of help

Number of previous encounters

1514131211109876543210

Mea

n he

lp r

eque

st r

ate

.4

.3

.2

.1

0.0

Reading level

Grade 1

Grade 2

Grade 3

Grade 4

Page 12: Educational data mining in a computer tutor that listens

12

How to predict help requests

• Approach: treat as classifier learning problem– Inputs: features about the word and the student– Output: whether the student will ask for help

• Need to decide:– Features describing word and student– What data to use to train model

Page 13: Educational data mining in a computer tutor that listens

Abbreviated example of features (20 features were used)

WordStudent on this word

Student overall Requested

help?Length Frequency

Seen before?

Helped before?

Help rate

Grade

6 1189 Yes No 0.5 1 Yes

11 22255 No No 0.5 1 Yes

3 826 Yes Yes 0.1 3 No

5 1537 No No 0.05 2 No

.

.

.

Page 14: Educational data mining in a computer tutor that listens

14

Grouped student prediction

• Predict whether student will request help by using other students’ data

• Leave one student out cross validation:– Training data: randomly select 25% of all

other students and pool their data together• (Using all data crashed the machine.)

– Testing data: student’s data

Page 15: Educational data mining in a computer tutor that listens

15

Grouped model prediction results

• Used J48 (version of C4.5) and NBC• Evaluation criteria: weighted accuracy

– Weigh cases where student asked for help 5 times more heavily

– Not providing help when needed worse than extra help

• Performance (averaged per-student)– J48: 71%– NBC: 75%

• How to (possibly) do better? – Build individualized models for each student

Page 16: Educational data mining in a computer tutor that listens

16

• Incrementally construct models as data are seen• Same features as grouped student prediction• Performance (averaged per-student)

– J48: 81%– NBC: 75%– Better to use data about individuals than population

• Obvious extension: combine grouped and individual modeling approaches

Building individual models

All data ordered by time

training testing

Training

testing

beginning

In the middle

Page 17: Educational data mining in a computer tutor that listens

17

Using subword properties to help predict help requests

(with June Sison)

• If student is predicted to need help on “chord,” he would probably need help on “chords” as well– Word roots?– But what about “chaos?” “chemical?”

• CH/K/ is common across items• Model lettersound mappings in words

– Called graphemephoneme (gp) mappings

Page 18: Educational data mining in a computer tutor that listens

18

Which gp mappings to use?

• Chemical– CHK– EEH– MM– IAH– CK– AAH– LL

Page 19: Educational data mining in a computer tutor that listens

19

Which gp mappings to use?• Chemical

– CHK– EEH– MM– IAH– CK– AAH– LL

• First and last parts of a word are most important for children’s decoding (Perfetti)– And adults’ decoding (recent email message floating around)>Aoccdrnig to rsereach at an Elingsh uinervtisy, it deosn't mttaer in waht >oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht frist and >lsat ltteres are in the rghit pclae…Tihs is bcuseae we do not raed ervey >lteter by istlef, but the wrod as a wlohe.

Page 20: Educational data mining in a computer tutor that listens

20

Features describing a gp

• P(g): How common is this grapheme?

• P(p|g): How likely is it to generate this sound given the letters?

• Compute above two features for– First gp in a word– Rarest gp in a word– Average of all gp in a word

• Add to classifier’s set of features

Page 21: Educational data mining in a computer tutor that listens

21

Results

• Used individual models with J48• Improved accuracy by 0.7% absolute (P=0.013)

over not using gp features– However, already using many features about student– Suggests students are sensitive to gp properties

• Can we do better?– These gp properties are static– Only describe words, not students– Perhaps modeling a student’s skills would work

better?• Infer what is in student’s head rather than just predict actions

Page 22: Educational data mining in a computer tutor that listens

22

Outline

Predicting student behavior• E.g. will the student click for help on this

word?

Inferring student’s skills• E.g. does the student know “ch” can make a K

sound (e.g. “chaos”)?

Future work

Page 23: Educational data mining in a computer tutor that listens

23

Automated assessment(with Peng Jia and June Sison)

• We gather lots of data; use it to assess students– “Knowing What Students Know” provides metaphor

• Why perform automated assessment? – Drawbacks of paper tests:

• Expensive• Lack of ongoing results• Costly to report to teachers and computer tutors

• Problem: our data are (literally) noisy– But we have a lot of it: students attempt over 300 words

per day

Page 24: Educational data mining in a computer tutor that listens

24

Converting speech input to usable data

I’LL HAVE TO MOP UP MUTTERED DENNIS…“I'll have to mop it all up,” muttered Dennis…

Speech input (Sphinx)

Align text(Multimatch)

Assess subword knowledge

Page 25: Educational data mining in a computer tutor that listens

25

Assessing subword knowledge

• Interested in student proficiency in individual gp mappings– Maintain knowledge estimate, P(knows), for

each mapping

• “Hidden subskill problem” (latent variable)– Cannot assess directly

• Credit/blame first and last gp of every word attempted– But how?

Page 26: Educational data mining in a computer tutor that listens

26

What is knowledge tracing?(Corbett et al.)

UnlearnedState

Two Learning Parameters

p(L0) Probability the rule is in the learned state at time 0 (prior to the first opportunity to apply the rule in problem solving).

p(T) Probability the rule will make the transition from the unlearned state to the learned state at each opportunity to apply the rule

Two Performance Parameters

p(G) Probability the student will guess correctly if the rule is in the unlearned state

p(S) Probability the student will slip (make a mistake) if the rule is in the learned state

LearnedState

p(T)

correct correct

p(G) 1-p(S)

p(L0)

Page 27: Educational data mining in a computer tutor that listens

27

Modifying knowledge tracing

• Problem: noisy speech recognition• Solution: broaden notion of slip and guess

– P(slip) = chance a skilled student makes a mistake + chance ASR fails to hear correct reading

– P(guess) = chance a novice pronounces word correctly + chance ASR incorrectly credits student

• Very different semantics of slip/guess• Knowledge tracing equations unchanged• Estimate slip/guess from students working

with Reading Tutor

Page 28: Educational data mining in a computer tutor that listens

28

Applying knowledge tracing

• E.g. Student reads “Dennis” correctlyupdate DD, SS

• Assume student had P(knows) = 0.1 for both

• Update P(knows DD)– P(guess DD) = 0.81– P(slip DD) = 0.13– New P(knows DD) = 0.107

• Update P(knows SS)– P(guess SS) = 0.80– P(slip SS) = 0.12– New P(knows SS) = 0.109

• Slow updates– A good thing

Page 29: Educational data mining in a computer tutor that listens

29

Evaluation of gp mappings

• Data from 2002-2003 • N=259 (1st through 4th graders)• Goal: predict performance on fluency posttest

– Standardized test is scored by humans– (Not our final goal)

• Construct 2 linear models for all students– Mean P(knows) for all gp

• Fluency posttest = 133.3 * mean – 42.8

– Pretest paper-test score• Fluency posttest = 0.809 * fluency pretest + 20.5

Page 30: Educational data mining in a computer tutor that listens

30

Results• All results are leave-one-out cross validation

– Correlation of 0.862 for P(knows) for all gp– Correlation of 0.808 for pretests

• Look at within-grade correlation– Reduce heterogeneity– E.g. shoe size and spelling ability

Grade Mean Pretest N

1 0.878 0.561 115

2 0.808 0.881 80

3 0.813 0.883 42

4 0.859 0.898 22

Average 0.840 0.806

Page 31: Educational data mining in a computer tutor that listens

31

Using mean of P(knows) to predict GORT posttest

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 100

Predicted GORT posttest score

Ac

tua

l G

OR

T p

os

tes

t s

co

re

Page 32: Educational data mining in a computer tutor that listens

32

Outline

Predicting student behavior• E.g. will the student click for help on this

word?

Inferring student’s skills• E.g. does the student know “ch” can make a K

sound (e.g. “chaos”)?

Future work

Page 33: Educational data mining in a computer tutor that listens

33

Near-term goals

• Construct more general tools – Crosstabs– View a student’s growth in reading

• Automated assessment– Validate gp mappings– Model latent variables– Improve ASR

Page 34: Educational data mining in a computer tutor that listens

34

Model of student knowledge

Speech

DD ZZZDT DDZ

Reading proficiency

……GP knowledge(371 items!)

Page 35: Educational data mining in a computer tutor that listens

35

Model of student knowledge: adding latent variables

Speech

DD ZZZDT DDZ

Reading proficiency

……GP knowledge

“Higher level” knowledge

e.g. short vowels, rare use, etc.

Page 36: Educational data mining in a computer tutor that listens

Improving ASR• Cannot listen for all mistakes

• Bias ASR based on student proficiencies

• E.g. student encounters “thugs”– thth, uah, gg, sz

P(say “Thugs”) = 0.90

P(say “Tugs”) = 0.02

Page 37: Educational data mining in a computer tutor that listens

37

Improving ASR• Cannot listen for all mistakes

• Bias ASR based on student proficiencies

• E.g. student encounters “thugs”– thth, uah, gg, sz

P(say “Thugs”) = 0.90

P(say “Tugs”) = 0.02

Page 38: Educational data mining in a computer tutor that listens

Improving ASR• Cannot listen for all mistakes

• Bias ASR based on student proficiencies

• E.g. student encounters “thugs”– thth, uah, gg, sz

P(say “Thugs”) = 0.90

P(say “Tugs”) = 0.02

Doesn’t know thetaKnows theta

P(say “Thugs”) = 0.95

P(say “Tugs”) = 0.01

P(say “Thugs”) = 0.40

P(say “Tugs”) = 0.40

Page 39: Educational data mining in a computer tutor that listens

39

Longer-term goals

• Improving ASR good goal due to ability to evaluate changes offline

• However, would like to improve educational outcomes– Problem: harder to evaluate learning since human

trials are expensive– Solution: construct a simulation of the student and

tutor and use reinforcement learning (RL)– Approach used in my dissertation work (at UMass)

on ADVISOR in the AnimalWatch system

Page 40: Educational data mining in a computer tutor that listens

40

ADVISOR overview

Predict student behavior in state s

Pedagogical Agent

Data from prior users of tutor

Teaching goal

Teaching action“try again”

Result “correct answer, took 15 sec.”

Teaching policy

Page 41: Educational data mining in a computer tutor that listens

41

Why is applying RL harder in the Reading Tutor than in AnimalWatch?

• Reading Tutor– Built by others– Still building student

model– Domain is reading,

hard to measure outcomes

– Greater variety of points to intervene

• AnimalWatch– I designed– Started with student

model– Domain was math,

easy to measure outcomes

– Built from ground up with ADVISOR in mind

Page 42: Educational data mining in a computer tutor that listens

42

Conclusions

• Can assess students despite noisy data• For predicting student behavior, data are plentiful

• Can examine models and features

• For predicting student test scores, data are scarce• Restricted to simple models and need good features

• Educational data mining offers many opportunities to improve efficacy of teaching

• Big data is a “secret weapon” but…– We still don’t have enough to do everything we want