ground truth for behavior detection week 3 video 1

Ground Truth for Behavior Detection

Week 3 Video 1

Welcome to Week 3

Over the last two weeks, we’ve discussed prediction models

This week, we focus on a type of prediction models called behavior detectors

Behavior Detectors

Automated models that can infer from log files whether a student is behaving in a certain way

We discussed examples of this off-task behavior and gaming detectors

In the San Pedro et al. case study in week 1

Behaviors people have detected

5

Disengaged Behaviors

Gaming the System (Baker et al., 2004, 2008, 2010; Cheng & Vassileva, 2005; Walonoski & Heffernan, 2006; Beal, Qu, & Lee, 2007)

Off-Task Behavior (Baker, 2007; Cetintas et al., 2010)

Carelessness (San Pedro et al., 2011; Hershkovitz et al., 2011)

WTF Behavior (Rowe et al., 2009; Wixon et al., UMAP2012)

6

Meta-Cognitive Behaviors

Help Avoidance (Aleven et al., 2004, 2006)

Unscaffolded Self-Explanation (Shih et al., 2008; Baker, Gowda, & Corbett, 2011)

Exploration Behaviors (Amershi & Conati, 2009)

If you’re not interested in behavior detectors…

Feel free to take the week off and rejoin us in a week

Although there may still be stuff that’s interesting to you this week

Ground Truth

The first big issue with developing behavior detectors is…

Where do you get the prediction labels?

Behavior Labels are Noisy

No perfect way to get indicators of student behavior

It’s not truth It’s ground truth (Truthiness?)

Behavior Labels are Noisy

Another way to think of it

In some areas, there are “gold-standard” measures As good as gold

With behavior detection, we have to work with “bronze-standard” measures Gold’s less expensive cousin

Is this a problem?

Not really

It does limit how good we can realistically expect our models to be

If your training labels have inter-rater agreement of Kappa = 0.62

You probably should not expect (or want) your detector to have Kappa = 0.75

Ultimately

“Anything worth doing is worth doing badly.” – Herb Simon

Picture taken from CMU tribute page

Sources of Ground Truth for Behavior

Self-report Field observations Text replays Video coding

Self-report

Fairly common for constructs like affect and self-efficacy

Not as common for labeling behavior

Are you gaming the system right now?

Yes No

Was recommended to me by an IRB compliance officer in 2003

Are you gaming the system right now?

Yes No

Field Observations

One or more observers watch students and take systematic notes on student behavior

Takes some training to do right (Ocumpaugh et al., 2012) http://www.columbia.edu/~rsb2162/bromp.html

Free Android App developed by my group (Baker et al., 2011) http://www.columbia.edu/~rsb2162/HART8.6.zip

Text replays

Pretty-prints of student interaction behavior from the logs

Examples

Major Advantages

Blazing fast to conduct 8 to 40 seconds per observation

Notes

Decent inter-rater reliability is possible

Agree with other measures of constructs

Can be used to train behavior detectors

Major Limitations

Limited range of constructs you can code

Gaming the System – yes Collaboration in online chat – yes

(Prata et al, 2008) Frustration, Boredom – sometimes Off-Task Behavior outside of software – no Collaborative Behavior outside of

software – no

Major Limitations

Lower precision (because lower bandwidth of observation)

Video Coding

Can be videos of live behavior in classrooms or screen replay videos

Slowest method Replicable and precise Some challenges in camera positioning

If you don’t get this right, it can be difficult to code your data

Inter-Rater Reliability

Good to check for inter-rater reliability when using expert coding

Researchers typically try to shoot for Kappa = 0.6 or higher for the expert coding But ignore magic numbers… Any real signal is something you can try to

re-capture I’d rather have 1,000 data points with

Kappa = 0.5 Than 100 data points with Kappa = 0.7

Once you have ground truth… You can build your detector!

Once you’ve synchronized your ground truth to the log data…

Next Lecture

Data Synchronization and Grain-Sizes

ground truth for behavior detection week 3 video 1

Documents

week slide

labeling behavior slide

observation slide

logs slide

behavior labels

behavior rowe

expensive cousin slide

behavior detection week