boredom across activities, and across the year, within reasoning mind

Click here to load reader

Post on 06-Jan-2016

19 views

Category:

Documents

1 download

Embed Size (px)

DESCRIPTION

Boredom Across Activities, and Across the Year, within Reasoning Mind. William L. Miller, Ryan Baker, Mathew Labrum, Karen Petsche , Angela Z. Wagner. In recent years. Increasing interest in modeling more about students than just what they know. In recent years. - PowerPoint PPT Presentation

TRANSCRIPT

Slide 1

Boredom Across Activities, and Across the Year, within Reasoning MindWilliam L. Miller, Ryan Baker, Mathew Labrum, Karen Petsche, Angela Z. Wagner

1In recent yearsIncreasing interest in modeling more about students than just what they know

In recent yearsIncreasing interest in modeling more about students than just what they know

Can we assess a broad range of constructs

In recent yearsIncreasing interest in modeling more about students than just what they know

Can we assess a broad range of constructsIn a broad range of contextsBoredomA particularly important construct to measure

Boredom isCommon in real-world learning (DMello, 2013)Boredom isCommon in real-world learning (DMello, 2013)Associated with worse learning outcomes in the short-term (Craig et al., 2004; Rodrigo et al., 2007)Boredom isCommon in real-world learning (DMello, 2013)Associated with worse learning outcomes in the short-term (Craig et al., 2004; Rodrigo et al., 2007)Associated with worse course grades and standardized exam performance (Pekrun et al., 2010; Pardos et al., 2013)Boredom isCommon in real-world learning (DMello, 2013)Associated with worse learning outcomes in the short-term (Craig et al., 2004; Rodrigo et al., 2007)Associated with worse course grades and standardized exam performance (Pekrun et al., 2010; Pardos et al., 2013)Associated with lower probability of going to college, years later (San Pedro et al., 2013)Online learning environmentsOffer great opportunities to study boredom in contextVery fine-grained interaction logs that indicate everything the student did in the systemNOTE: Do you mean Infer affective state from interaction logs once classroom observations have been done? I mean inferring affective state when there are no observations

10Automated boredom detectionCan we detect boredom in real time, while a student is learning?Can we detect boredom retrospectively, from log files?

Automated boredom detectionCan we detect boredom in real time, while a student is learning?Can we detect boredom retrospectively, from log files?

Would allow us to study affect at a large scaleFigure out which content is most boring, in order to improve itAffect Detection: Physical Sensors?Lots of work shows that affect can be detected using physical sensorsTone of voice (Litman & Forbes-Riley, 2005)EEG (Conati & McLaren, 2009)Posture sensor and video (DMello et al., 2007)

Its hypothesized but not yet conclusively demonstrated that using physical sensors may lead to better performance than interaction logs aloneSensor-free affect detectionEasier to scale to the millions of students who use online learning environments In settings that do not have cameras, microphones, and other physical sensorsHome settingshave parents bought equipment? can they set it up and maintain it?Classroom settingscan school maintain equipment? do students intentionally destroy equipment?parent concerns and political climateSensor-free boredom detectionHas been developed for multiple learning environmentsProblem solving tutors (Baker et al. 2012; Pardos et al. 2013)Dialogue tutors (DMello et al. 2008)Narrative virtual learning environments (Sabourin et al. 2011; Baker et al. 2014)Science simulations (Paquette et al., 2014)The principles of affect detection are largely the same across environmentsBut the behaviors associated with boredom differ considerably between environmentsThis talkWe discuss our work to develop sensor-free boredom detection for Reasoning Mind Genie 2 (Khachatryan et al, 2014)Self-paced blended learning mathematics curriculum for elementary school students Youngest population for sensor-free affect detection so farUsed by approximately 100,000 students a yearNo action-by-action assessment is deliberate since we want students to be able to exploreNo on-demand scaffolding (at least not in our dataset)16Reasoning Mind Genie 2Combines Guided Study with a pedagogical agent GenieSpeed Games that support development of fluency

Used in schools 3-5 days a week for 45-90 minutes per day

Reasoning Mind Genie 2

(a)(b)(c)Reasoning Mind Genie 2Better affect and more on-task behavior than most pedagogies, online or offline (Ocumpaugh et al., 2013)

Still a substantial amount of boredom

Reducing boredom is a key goalRole for affect detectionIf we can detect boredom in log filesWe can determine which content is more boring, and improve that content

Related WorkEvidence that specific design features associated with boredom in Cognitive Tutors for high school algebra (Doddannara et al., 2013)

Related WorkEvidence that specific design features associated with boredom in Cognitive Tutors for high school algebra (Doddannara et al., 2013)

Evidence that some disengaged behaviors increase during the year (Beck, 2005)Important to verify that differences in affect due to actual content/design, not time of yearApproach to Boredom DetectionCollect ground truth data on student boredom, using field observationsSynchronize log data to field observations Distill meaningful data features of log data, hypothesized to relate to boredomDevelop automated detector using classification algorithmValidate detector for new students/new lessons/new populations

BROMP 2.0 Field Observations(Ocumpaugh et al., 2012)Conducted through Android app HART (Baker et al., 2012)

Protocol designed to reduce disruption to studentSome features of protocol: observe with peripheral vision or side glances, hover over student not being observed, 20-second round-robin observations of several students, bored-looking people are boringInter-rater reliability around 0.8 for behavior, 0.65 for affect64 coders now certified in USA, Philippines, India

Data collection408 elementary school studentsData collectionDiverse sample important for model generalizability (Ocumpaugh et al., 2014)

11 different 8th grade classes6 schools2 urban in Texas, predominantly African-American1 urban in Texas, predominantly Latino1 suburban in Texas, predominantly White1 suburban in Texas, mixed ethnicity/race1 rural in West Virginia, predominantly WhiteAffect coding3 expert coders observed each student using BROMP

Coded 5 categories of affectEngaged ConcentrationBoredomConfusionFrustration?

4891 observations collected in RM classrooms

Building detectorsObservations were synchronized with the logs of the students interactions with RM, using HART app and internet time serverFor each observation, a set of 93 meaningful features describing the students behavior was engineeredComputed on actions occurring during or preceding an observation (up to 20 seconds before)

Features: ExamplesIndividual action features Whether an action was correct or notHow long the action tookFeatures across all past activityFraction of previous attempts on the current skill the student has gotten correctOther known models applied to logsProbability student knows skill (Bayesian Knowledge Tracing)CarelessnessMoment-by-Moment Learning GraphAutomated detector of boredomDetectors were built using RapidMiner 5.3For each algorithm the best features were selected using forward selection/backward eliminationData was re-sampled to have more equal class frequencies; models were evaluated on original class distributionDetectors were validated using 10-fold student-level cross-validationAutomated detector of boredomDetectors were built using 4 machine learning algorithms that have been successful for building affect detectors in the past:J48JRip Step RegressionNave BayesBest OneDetectors were built using 4 machine learning algorithms that have been successful for building affect detectors in the past:J48JRip Step RegressionNave BayesMachine learningPerformance of the detectors was evaluated usingAGiven two observations, probability of correctly identifying which one is an example of a specific affective state and which one is not A of 0.5 is chance level and 1 is perfectIdentical to Wilcoxon statisticVery similar to AUC ROC (Area Under the Receiver-Operating Characteristic Curve)ResultsA = 0.64

Compared to similar detectors in other systems, validated in similar stringent fashionSystemA'Cognitive Tutor Algebra(Baker et al. 2012)0.69ASSISTments (Pardos et al. 2013) 0.63EcoMUVE(Baker et al. 2014)0.65Inq-ITS(Paquette et al. 2014)0.72CoefficientFeature+0.212The standard deviation, across the clip, of student correctness (1 or 0) on each action.-0.013The number of actions in the clip that occurred on Speed Game items.-0.070The fraction of the total clip duration spent on Speed Game items.-0.073The number of actions in the clip on items where the answer input was made by selecting an item from a drop-down list.+0.290The minimum slip parameter (P(S) in Bayesian Knowledge Tracing) on skills in the clip.-0.260The standard deviation, across the clip, of the action duration, normalized across all students, times the presence (1) or absence (0) of a hint request on the previous action.+0.123Y-intercept.Using detectorsModel applied to entire year of data from these classrooms

2,974,944 actions by 462 studentsIncludes 54 additional students not present during observations

Aggregation over pseudo-confidences rather than binary predictionsRetains more informationApparent downward trendApparent downward trendIs it statistically significant?

Apparent downward trendIs it statistically significant?

Yes. Students are less bored later in the yearF-test controlling for studentp