Enhancing Student Models in Game-based Learning
with Facial Expression Recognition
Robert Sawyer Department of Computer Science
North Carolina State University
Andy Smith Department of Computer Science
North Carolina State University
Jonathan Rowe Department of Computer Science
North Carolina State University
Roger Azevedo Department of Psychology
North Carolina State University
James Lester Department of Computer Science
North Carolina State University
ABSTRACT
Recent years have seen a growing recognition of the role that
affect plays in learning. Because game-based learning
environments elicit a wide range of student affective states,
affect-enhanced student modeling for game-based learning holds
considerable promise. This paper introduces an affect-enhanced
student modeling framework that leverages facial expression
tracking for game-based learning. The affect-enhanced student
modeling framework was used to generate predictive models of
student learning and student engagement for students who
interacted with CRYSTAL ISLAND, a game-based learning
environment for microbiology education. Findings from the study
reveal that the affect-enhanced student models significantly
outperform baseline predictive student models that utilize the
same gameplay traces but do not use facial expression tracking.
The study also found that models based on individual facial action
coding units are more effective than composite emotion models.
The findings suggest that introducing facial expression tracking
can improve the accuracy of student models, both for predicting
student learning gains and also for predicting student
engagement.
KEYWORDS Student Modeling, Affect, Game-based Learning
1 INTRODUCTION
Affect plays a central role in learning [3, 8, 9, 48]. Positive
emotions, such as curiosity and joy, can lead students to engage
more deeply in learning, while emotions such as boredom and
frustration can lead students to resort to unproductive behaviors
including hint abuse and gaming the system, or even to
completely disengage [37, 48]. Students exhibit a range
of emotions when interacting with adaptive learning
environments [13]. Building on recent advances in our
understanding of affect, which stem from both manually coded
protocols [15, 35] and automated techniques [7, 10, 36], affect-
enhanced student models that track and dynamically respond to
affect on a moment-by-moment basis offer significant potential
for enhancing adaptive learning environments.
While student modeling has long been a goal of the user
modeling community [19, 33], student modeling work to date has
focused primarily on cognitive aspects of learning [1]. Recent
student modeling work has begun to consider how to augment
student models with affect [12], as well as to tailor support based
on students’ affective states, such as confusion, frustration, and
boredom [14, 17]. Models of affect have also been used to predict
students’ reactions to virtual agents [25, 29] and better understand
student motivation and off-task behavior [40].
Game-based learning has been studied in a broad range of
subject matters and student populations [11, 20, 42, 49]. Because
game-based learning environments are designed to provide
students with engaging learning experiences, they offer a rich
“laboratory” for investigating affect-enhanced student
models. This paper reports on an investigation of affect-enhanced
student modeling for game-based learning. Specifically, using a
game-based learning environment for microbiology education,
CRYSTAL ISLAND [38], we compare two approaches to student
modeling: an affect-enhanced student model that uses facial
expression tracking in addition to gameplay data, and a baseline
student model that uses only gameplay data. We compare the
accuracy of the models for predicting both student learning
(measured with normalized learning gain) and student presence
(a facet of engagement measured with participant perception of
transportation into a virtual environment through a standard self-
report instrument [4]). In addition, we compare two competing
approaches to affect recognition—composite emotion models and
models based on individual facial action coding units—to explore
how different types of facial expression tracking can support
affect-enhanced student modeling.
2
2 RELATED WORK
Learning and affect are inextricably linked. It has been found, for
example, that positively valenced emotions tend to support
student learning [37]. Confusion has been shown to be beneficial
to learning in certain contexts [16], yet students experiencing
anxiety or anger tend to be less efficient learners [37]. Student
affect correlates with problem-solving performance [39] and is
predictive of user interactions with virtual agents [25].
Recognizing the potential for affect to inform student modeling
in learning environments, recent work with automated affect
detectors has found that integrating detectors into Bayesian
knowledge tracing models can improve predictive accuracy of
student performance [12].
Although there are many open questions about which
emotions or sets of emotions offer the greatest potential for
adaptive learning environments, as well as which sets of
multimodal data streams can best support adaptivity in learning
environments, facial expression tracking has emerged as a
promising means for inferring learning-centric emotions [10, 28].
Arroyo et al. [2] used facial expressions in combination with other
sensors to trigger interventions, which appeared to increase on-
task behavior and reduce hint-abuse. Bosch et al. [7] created
models that accurately predict student affect with facial action
units in a classroom setting that generalized across multiple days
and multiple classrooms. Grafsgaard et al. [24] found an additive
relationship between facial expressions, dialogue acts, and task
performance in a tutoring system, and more recent work in this
line of investigation found that male and female students exhibit
significant differences in facial expressions [43]. Collectively,
this work suggests that facial expressions provide a window into
affect and learning that can be utilized by student models to
support adaptive learning environments.
Upon detecting student affective states, adaptive learning
environments can deliver a broad range of possible supports for
student learning and emotion regulation. For example, D’Mello
et al. [14, 17] devised a modified version of AutoTutor that
delivered affect-sensitive dialogue moves, facial displays of
emotion, and synthesized speech patterns in response to student
affect. Results indicated that the affect-sensitive interventions had
a positive impact on student learning among low prior knowledge
students, but a neutral or even harmful impact on student learning
in other conditions. DeFalco et al. [18] found that self-efficacy-
oriented feedback messages, which were delivered in response to
detected learner frustration, helped to enhance student learning in
a simulation-based training environment for military combat
medics. However, the feedback messages were not found to
mitigate occurrences of frustration itself during training.
Although student modeling has been investigated for decades
in the intelligent tutoring systems community [44], only recently
has student modeling research turned to game-based
learning. Baker et al. [6] analyzed game-trace logs in the Virtual
Performance Assessments system to create assessment models of
students’ ability to design controlled experiments and apply
inquiry skills. Using evidence-centered design, Bayesian
network-based stealth assessments of student knowledge have
been explored in both the Physics Playground and a modified
version of the popular game, Plants vs. Zombies [26, 41], and
other work in this vein has explored deep learning-based models
of stealth assessment for game-based learning [31]. Student goal
recognition has also been investigated in game-based learning,
with approaches spanning Bayesian networks [34], Markov logic
networks [5], and long short-term memory networks [32]. The
affect-enhanced student modeling approach presented here
extends earlier work in student modeling for game-based learning
to exploit facial expression data to improve student models’
predictive accuracy.
Figure 1. Screenshots from CRYSTAL ISLAND illustrating a virtual textbook, non-player characters, the virtual
scanner, and game environment.
3
3 CRYSTAL ISLAND
We explore affect-enhanced student modeling in a study
conducted with an intelligent game-based learning environment
for science problem solving in microbiology, CRYSTAL
ISLAND [38]. Intelligent game-based learning environments
integrate the personalized learning supports of intelligent tutoring
systems [27, 45] with the engaging interactions afforded by game
technologies. In CRYSTAL ISLAND, students take on the role of a
medical field agent tasked with investigating an epidemic on a
remote island research station. The student must discover the
source and identity of the disease as well as recommend a
treatment plan to aid the infected members of the island’s research
team.
In CRYSTAL ISLAND, students engage in a range of problem-
solving actions to gather information, test hypotheses, and
diagnose a solution. Examples of student activities in the game
include exploring the island environment, conversing with non-
player characters, running tests on potentially contaminated items
in a virtual laboratory, and completing an in-game diagnosis
worksheet to report findings and solve the mystery. The open-
world game environment features several buildings, including an
infirmary, dining hall, laboratory, and various residences where
the student can gather information by talking to characters.
Relevant microbiology concepts about viruses, bacteria,
immunization, and how diseases spread are found in virtual books
and articles scattered throughout the buildings of the island.
Students use knowledge gained from both the informational texts
and dialogue with non-player characters to diagnose the illness
and solve the mystery. Students successfully complete the game
by submitting a worksheet to the camp nurse with the correct
diagnosis, identified transmission object, and treatment solution.
4 METHODS AND DATA
To investigate the potential contributions of affect to student
modeling, we compare two approaches. First, we created an
affect-enhanced student model that uses facial expression
tracking in combination with gameplay. As students played
CRYSTAL ISLAND, video was captured of their facial expressions.
We created predictive student models that used both the facial
expression data stream and the gameplay data stream to predict
students’ learning and students’ sense of presence, a core
component of engagement. We then created a baseline student
model that also predicts students’ learning and presence but only
uses the gameplay data stream. We compare the performance of
these two models and explore the impact of two approaches to
facial recognition tracking for affect-enhanced student modeling:
(1) composite emotion models and (2) models based on individual
facial action coding units.
4.1 Participants and Experimental Setup
The study involved 37 college age students that interacted with
the CRYSTAL ISLAND game-based learning environment in a lab
setting. Four students were removed due to partial or missing
data, resulting in 33 students (M = 19.9 years old, SD = 1.30), of
which 17 (51.5%) were female. All remaining students played the
1 The iMotions software was previously commercially available as FACET and the
research-focused toolbox CERT [30].
game to completion, and time spent playing the game ranged from
26.4 to 105.1 minutes (M = 63.7, SD = 18.4). Prior to interacting
with the game, students completed a 20-question multiple-choice
test assessing their conceptual and application-based
understanding of microbiology. At conclusion of gameplay,
students completed a Presence Questionnaire [46] and again
completed the same microbiology assessment.
4.2 Survey Measures
In order to assess student learning, we examined students’
performance on the microbiology content test administered
before and after students’ interactions with CRYSTAL ISLAND.
Using the difference between the pre-test score (Pre-Test) and
post-test score (Post-Test), Normalized Learning Gain was
calculated for each student participating in the study. Normalized
Learning Gain is the difference between Post-Test and Pre-Test,
standardized by the total amount of improvement or decline
possible from the Pre-Test. Students achieved positive
Normalized Learning Gain on average (M = 0.226, SD = 0.315),
with 25 of the 33 (75.8%) students achieving positive Normalized
Learning Gain (Post-Test greater than Pre-Test).
Normalized Learning Gain =
{
Post − Pre
1 − Pre Post > Pre
Post − Pre
Pre Post ≤ Pre
We use presence as a proxy for engagement in the game
environment. In order to assess student presence, the Presence
Questionnaire was used to measure players’ self-reported
perceptions of presence [47], which refers to a participant’s
perception of transportation into a virtual environment. The
questionnaire contains a series of 30 questions that students
answer on a 7-point scale to characterize their in-game experience
with a virtual environment. The questionnaire consists of several
subscales for measuring presence, such as involvement, sensory
fidelity, adaptation/immersion, and interface quality. To measure
presence, we summed the numeric responses to all 30 questions
to obtain a presence score for each student, which ranged from
116 to 209 (M = 155, SD = 23.4).
4.3 Facial Expression Recognition
Facial expression features were extracted automatically through a
video-based facial expression tracking system, iMotions.1 The
iMotions system extracts facial features that correspond to the
Facial Action Coding System (FACS) [21, 22]. It uses an
objective three-phase framework of facial detection from image
input, feature detection (i.e., locating position of facial
landmarks), and feature classification to report an evidence score
representing the likelihood of a particular affective measure being
present. A separate classifier is used for each affective measure,
ranging from low-level facial expression features such as Action
Units (AUs) to more complex, composite affective measures,
such as Surprise and Joy. We used the facial expression tracking
system to monitor students during gameplay to provide real-time
4
evidence scores for 20 AUs and 9 composite affective measures.
The 9 composite affective measures include Anger, Surprise,
Frustration, Joy, Confusion, Fear, Disgust, Sadness, and
Contempt. Table 1 displays the relationships between composite
affective measures and AUs as defined by iMotions.
Table 1. Breakdown of Composites by contributing AU.
Composite Measure Contributing AUs
Joy 6 + 12
Sadness 1 + 4 + 15
Surprise 1 + 2 + 5 + 26
Fear 2 + 4 + 5 + 7 + 20 + 26
Anger 4 + 5 + 7 + 23
Disgust 9 + 15 + 16
Contempt 12 + 14
Predictive models require higher-level features than are
provided by the evidence scores. We performed feature
engineering to transform the evidence scores to a representation
that could be effectively used by the student models. We used a
method with relative thresholding by amplitude and summed over
durations to minimize the effect of micro expressions. First, the
evidence scores were standardized for each student by subtracting
the mean evidence and dividing by the standard deviation
evidence over their entire episode. While iMotions calibrates the
evidence scores over the first few observations, it does not
account for the potential variability of evidence for students.
Thus, dividing by the standard deviation of a student’s evidence
score over the episode accounts for potential variability in
expressiveness of individuals. Events were added to the affect log
for the duration that the standardized evidence scores rose above
a threshold of one. These events represent moments when
evidence scores rose one standard deviation above the mean
evidence score for a particular student. The duration of these
events in each affective measure (i.e., the 20 AUs and 9
Composites) is summed over the gameplay episode of a student.
The final feature used in student modeling is this sum (i.e., the
duration of the evidence score was above one standard deviation
above the mean evidence score) divided by the total time the
student spent playing the game. These affective features thus
represent the proportion of gameplay duration that the respective
affective evidence score was one standard deviation above the
mean evidence score for a particular student.
4.3 Gameplay Behavior Logging
Gameplay interactions with CRYSTAL ISLAND were recorded in a
granular timestamped game log for each student. From these
game interaction logs, several measures were calculated to
summarize student behavior in the game-based learning
environment. The actions recorded include how students gather
information (e.g., reading books and articles, speaking with non-
player characters), select and organize information (e.g., editing
the diagnosis worksheet), and test their hypotheses (e.g., scanning
objects, submitting the worksheet as the final solution). The
measures used as features in predictive modeling include both the
number of actions performed and the duration of actions with
varying lengths. The set of actions reported includes
conversations with non-player characters (ConversationCount
and ConversationDuration), items scanned in the virtual
laboratory (ScannerCount), books and articles read
(BooksReadCount and ReadingDuration), edits to the diagnosis
worksheet (WorksheetCount and WorksheetDuration), and the
number of times the worksheet was submitted for final evaluation
(SubmitCount). These measures were also standardized by the
total time the student spent playing the game to represent rate-
per-minute for counts and proportion of time spent performing
each of the actions over a given duration (Table 2). The units for
counts are shown in counts-per-minute, while the units for
durations are in seconds-per-minute, representing a proportion
out of 60.
Table 2. Mean and standard deviations of gameplay
behaviors.
Gameplay Behaviors Mean
(per minute)
Standard
Deviation
ConversationCount 0.797 0.216
BooksReadCount 0.352 0.119
WorksheetCount 0.352 0.159
SubmitCount 0.029 0.017
ScannerCount 0.389 0.223
ConversationDuration 7.900 2.070
ReadingDuration 24.100 5.280
WorksheetDuration 5.150 1.970
5 RESULTS
We used three different subsets of the full feature set in order to
compare predictive student models of Normalized Learning Gain
and Presence. The first feature subset (Gameplay) consisted of
only the 8 gameplay features (actions and durations of select
actions) to represent a baseline model without affect information.
The second feature subset consisted of the gameplay actions and
AU-7 Lid Tightener
AU-12 Lip Corner Puller
AU-14 Dimpler
AU-15 Lip Corner Depressor
Figure 2. Examples of AUs used.
5
the 9 different composite affective features
(Gameplay+Composites). The third feature subset consisted of
the gameplay actions with the 20 different AU affective features
(Gameplay+AUs). The composite features and AUs are treated
separately due to multicollinearity issues stemming from the fact
that composites are combinations of the AUs. The AUs provide a
more granular measure while the composites provide a higher
level, more interpretable measure of affect. We compare the
performance of each feature subset predicting Normalized
Learning Gain and Presence for a total of 6 models (3 feature
subsets x 2 response variables), with 2 being baseline models
using only gameplay features, and the other 4 being affect-
enhanced models of student learning and presence.
Variables for a linear model were chosen using a forward
stepwise linear regression where features were selected from their
respective subset to optimize leave-one-out cross-validation
(LOOCV) R2. The reported models are the result of using the
selected features in an ordinary least squares linear regression
model.
5.1 Baseline Models
Baseline predictive models for Normalized Learning Gain and
Presence were created using the stepwise feature selection
process using the subset of 8 gameplay features. In addition to
standard linear regression measures, (coefficients, standard error
of coefficients, standardized coefficients and t-statistic for
significance of feature), the leave-one-out cross validation R2 is
reported as the measure of the accuracy of the model.
The highest performing Normalized Learning Gain model
using gameplay features for leave-one-out cross validation R2 is
shown in Table 1. This linear model uses two gameplay features
to predict Normalized Learning Gain: ScannerCount and
ReadingDuration. The model achieves a leave-one-out cross
validation R2 of 0.268. The negative coefficient of ScannerCount
indicates that the more often students scan items in the virtual
laboratory, the lower Normalized Learning Gain they achieve.
Conversely, the longer students read, the higher their Normalized
Learning Gain, an intuitive result since much of the microbiology
assessment focuses on content found in the game-based learning
environment’s virtual books and articles.
Table 3. Resulting linear model predicting Normalized
Learning Gain using gameplay features.
Baseline Model for Normalized Learning Gain
B SE B p-value
Scanner
Count
-0.318 0.242 -0.224 0.198
Reading
Duration 0.0279 0.0102 0.466 0.010*
Leave-One Out CV R2 = 0.268
Note: ** - p < 0.01, * - p < 0.05
The optimal Presence model using gameplay features under
leave-one-out cross validation R2 yields ConversationCount and
WorksheetCount as the only two gameplay features used to
predict Presence. This model achieves a leave-one-out cross
validation R2 of 0.147. The negative coefficients for each of the
features selected suggests that more conversations with non-
player characters and more edits of the worksheet lead to lower
Presence outcomes. The LOOCV R2 for the Presence model is
lower than the Gameplay model for Normalized Learning Gain,
revealing that predictions for Presence using only gameplay
features are worse relative to the total sum of squares for Presence
than similar predictions using only gameplay features for
Normalized Learning Gain. This indicates that Presence is more
challenging to predict from Gameplay features than Normalized
Learning Gain.
Table 4. Resulting linear model for predicting Presence
from gameplay features.
Baseline Model for Presence
B SE B p-value
Conversation
Count -31.9 17.5 -0.294 0.078
Worksheet
Count -52.6 23.6 -0.359 0.034*
Leave-One Out CV R2 = 0.147
Note: ** - p < 0.01, * - p < 0.05
5.2 Affect-Enhanced Composite Models
Next, we explored the potential of affect-enhanced predictive
models for Normalized Learning Gain and Presence by
introducing the 9 composite affective measures with the previous
8 gameplay features denoted Affect-Enhanced Composite
models. From this set of 17 features, we performed the same
forward stepwise feature selection technique optimizing for
LOOCV R2 and report the resulting models.
Table 5. Resulting linear model for predicting
Normalized Learning Gain from gameplay and
composite-affect features.
Affect-Enhanced Composite Model for
Normalized Learning Gain
B SE B p-value
Scanner
Count -0.369 0.237 -0.261 0.130
Reading
Duration 0.0212 0.0106 0.364 0.049*
Surprise -0.0183 0.0112 -0.246 0.111
Leave-One Out CV R2 = 0.324
Note: ** - p < 0.01, * - p < 0.05
The optimal Affect-Enhanced Composite model for predicting
Normalized Learning Gain uses three features and achieves a
LOOCV R2 of 0.324. Of the three features selected from a
potential 17, two are gameplay features (ScannerCount and
ReadingDuration), and one is a Composite feature (Surprise). It
is important to note that these two gameplay features are the same
6
for the Baseline model for Normalized Learning Gain, indicating
that adding Surprise to that model improves the predictive
accuracy for Normalized Learning Gain from an LOOCV R2 of
0.268 to 0.324. The coefficients of the two gameplay features are
also similar to those of the Baseline model, indicating that the
incorporation of Surprise improves the former model without
making drastic changes to the former model parameter estimates.
The negative coefficient of Surprise suggests that, holding other
variables constant, more surprise leads to reduced Normalized
Learning Gain.
The optimal Affect-Enhanced Composite model for Presence
achieves a LOOCV R2 of 0.245, improving on the Baseline
LOOCV R2 of 0.147. This model uses five features, with the
same two gameplay features (ConversationCount and
WorksheetCount) as the Baseline model and three features from
the set of Composite measures (Disgust, Confusion, and Sadness).
The negative coefficients of Disgust and Confusion indicate that
students experiencing more duration with elevated standardized
evidence score for these composite emotions exhibited lower
Presence. Conversely, students with a higher proportion of
expressed Sadness reported higher Presence. This Affect-
Enhanced Composite model also indicates an additive effect in
the sense that the same gameplay features were maintained, but
the predictive accuracy is improved by incorporating several
Composite features.
5.3 Affect-Enhanced AU Models
While the Affect-Enhanced Composite models improve upon the
Baseline models, the Action Units (AUs) measured by iMotions
provide low-level feature evidence scores that offer a more
granular level of facial expression features than the Composite
measures. Creating predictive models from this more granular
measure of affect may produce a more accurate affect-enhanced
model at the cost of interpretability since the AUs constitute a
lower-level facial expression feature than the Composite
measures. There are 20 AUs measured by iMotions, yielding a
total feature set of 28 when including the gameplay features that
can potentially be used in modeling students’ normalized learning
gain and presence.
Using the forward stepwise feature selection optimizing
LOOCV R2 for predicting Normalized Learning Gain from the
feature set of 20 AUs and 8 Gameplay features generates a model
with six features and a LOOCV R2 of 0.475. The two gameplay
features selected (ScannerCount and ReadingDuration) are the
same as the Affect-Enhanced Composite model and Baseline
model for Normalized Learning Gain. The other four features
were AUs, including AU1 Inner Brow Raiser, AU14 Dimpler,
AU15 Lip Corner Depressor, and AU43 Eyes Closed. Because the
Sadness measure is partially composed of AU1 Inner Brow Raiser
(as indicated by Table 1), the inclusion of AU1 Inner Brow Raiser
shows the similarity between the affect-enhanced models because
Sadness was included in the Composite-Affect Normalized
Learning Gain model. The coefficients for both Gameplay
features remain similar to both the Affect-Enhanced Composite
Normalized Learning Gain model and the Baseline Normalized
Learning Gain model. The addition of the AUs into this linear
model provides a greater LOOCV R2 than either the baseline
Normalized Learning Gain model or Affect-Enhanced Composite
Normalized Learning Gain model. Thus, the Affect-Enhanced
AU Normalized Learning Gain model exemplifies the additive
value of more granular affective measures in the predictive
models for Normalized Learning Gain, compared to the Affect-
Enhanced Composite Normalized Learning Gain model.
Table 7. Resulting linear model from predicting
Normalized Learning Gain.
Affect-Enhanced AU Model for
Normalized Learning Gain
B SE B p-value
Scanner
Count -0.294 0.203 -0.207 0.159
Reading
Duration 0.0353 0.0087 0.590 < 0.001**
AU1 -0.0290 0.0085 -0.413 0.002**
AU14 -0.0548 0.0157 -0.534 0.002**
AU15 0.0470 0.0186 0.390 0.018*
AU43 -0.0340 0.0163 -0.265 0.048*
Leave-One Out CV R2 = 0.475
Note: ** - p < 0.01, * - p < 0.05
An Affect-Enhanced AU Presence model was created using
the feature set of 20 AUs and 8 gameplay features and the same
feature selection techniques as the previous models. This resulted
in five selected features, the same two gameplay features from the
previous Presence models (ConversationCount and
WorksheetCount) along with three AUs. The included AUs were
AU7 Lid Tightener, AU12 Lip Corner Puller, and AU28 Lip Suck.
While the gameplay features are the same as the Affect-Enhanced
Composite and Baseline Presence models, none of the AUs
included in the Affect-Enhanced AU Presence model contribute
to the composite features included in the Affect-Enhanced
Composite model. The Affect-Enhanced AU Presence model
results in an increased LOOCV R2 over both the Affect-Enhanced
Composite model and Baseline model. Since this model preserves
similar coefficients for the same features used in the Baseline
Presence model, it appears the AUs improve the predictive
accuracy of the Baseline in terms of LOOCV R2.
Table 6. Resulting linear model for predicting Presence
from gameplay and composite-affect features.
Affect-Enhanced Composite Model for Presence
B SE B p-value
Conversation
Count -39.3 16.4 -0.362 0.024*
Worksheet
Count -64.0 21.6 -0.437 0.006**
Disgust -4.56 1.95 -0.369 0.027*
Confusion -2.20 1.29 -0.325 0.099
Sadness 3.44 1.20 0.560 0.008**
Leave-One Out CV R2 = 0.245
Note: ** - p < 0.01, * - p < 0.05
7
Table 8. Resulting linear model from predicting
Presence from gameplay and AU-affect features.
Affect-Enhanced AU Model for Presence
B SE B p-value
Conversation
Count -26.5 15.9 -0.244 0.107
Worksheet
Count -58.3 21.5 -0.398 0.011*
AU7 -4.89 1.58 -0.581 0.005**
AU12 3.88 1.61 0.440 0.023*
AU28 -2.01 0.822 -0.367 0.022*
Leave-One Out CV R2 = 0.296
Note: ** - p < 0.01, * - p < 0.05
The predictive accuracy of the three families of student
models is shown in Figure 3. The results show that including
facial expression features improved the predictive power of the
models for both learning gains and presence. Similarly, models
utilizing action unit features achieved higher predictive accuracy
than models using the composite emotion labels. This superior
performance was achieved for both learning gains and
presence. While the models of learning gains outperform their
counterparts for presence for each model structure, the Affect-
Enhanced Composite Presence model shows a larger increase
from the Baseline than the Affect-Enhanced Composite
Normalized Learning Gain model, while the Affect-Enhanced
AU Normalized Learning Gain model shows a larger increase
from utilizing AU features than the Presence model.
The Affect-Enhanced models use the same gameplay features
as the Baseline models, with the addition of several affective
features. Thus, the Baseline models are nested within the Affect-
Enhanced models, with the Affect-Enhanced models acting as
“full models” and the Baselines as “reduced models.” Since the
Baseline models are nested within their respective Affect-
Enhanced Composite and Affect-Enhanced AU models, an F-test
provides a measure of the contributions of the additional features
included in the full models over the reduced models. Results from
this test show that the Affect-Enhanced Composite Normalized
Learning Gain model’s addition of one additional feature
(Surprise) does not show significant improvement over the
Baseline Normalized Learning Gain model (F(1,29) = 2.70, p =
0.11). The Affect-Enhanced AU Normalized Learning Gain
model’s addition of four features shows significant improvement
over the Baseline Normalized Learning Gain model (F(4, 26) =
5.12, p = 0.004) in terms of reduction in residual sum of squares.
For the Presence models, results indicate that both the Affect-
Enhanced Composite (F(3, 27) = 3.37, p = 0.033) and Affect-
Enhanced AU (F(3, 27) = 3.95, p = 0.019) Presence models’
incorporation of three affective features show significant
improvement over the Baseline Presence model in terms of
reduction in sum of squared errors.
6 DISCUSSION
The results of the study suggest that affect-enhanced student
models can improve upon predictive accuracy compared to
models based exclusively on student behavior in the learning
environment. By augmenting gameplay data with facial
expression data, affect-enhanced student models significantly
outperformed models using only gameplay
features. Additionally, the models using the more granular action
unit data significantly outperformed models using the composite
emotions. These results highlight the importance of decisions
about the granularity of affect data representation.
Across multiple feature sets, predictive models for
Normalized Learning Gain outperformed models for Presence, as
measured by leave-one-out cross validation R2. In all of the
Normalized Learning Gain models, ReadingDuration is included
with high significance. The virtual books and articles contain
information relevant to questions presented in the microbiology
assessment, making it a strong candidate for predicting the
learning gain assessed by the microbiology test. For both learning
Figure 3. Summary of leave-one-out cross validation R2 for Baseline and Affect-Enhanced models.
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
NLG Presence
LO
OC
V R
2
Model Accuracy Summary
Baseline Affect-Enhanced Composite Affect-Enhanced AU
8
gains and presence, the models utilizing the more granular AUs
achieved significantly higher predictive accuracy than those
utilizing composite emotions. This could be due to a variety of
factors. First, the automatic tagging of AUs may be more
accurate, which is plausible as action units represent more
concretely represented phenomena than the composite emotions.
In addition, the facial expression tracking framework underlying
the emotion recognition system was trained on a library of posed
faces, which are potentially more expressive than emotions that
appear in a learning context. It could also be the case that
important information is being captured by AUs that is not
captured by the composite emotion labels. For example, an AU
may be conveying a subtle expression of an emotion not captured
by the composite labeling. It is also possible that the AUs are
representing a contextually-specific response not captured by the
composite labels, such as signaling increased cognitive load.
Analysis of the specific emotions present in the composite
models show that less Surprise is more predictive of learning
gains. The result highlights the need to further investigate where
in the gameplay activity the periods of Surprise tend to occur.
Similarly, for presence measures, the model shows less Disgust
and less Contempt and increased Sadness to be predictive of
presence. While the relationship for Disgust and Contempt align
with what one would expect for increased presence, increased
Sadness does not appear to align, though could potentially be
related to students’ empathy for the sick patients in the mystery
or emotional investment in solving the mystery.
Many of the AUs found to be significant in this work have
previously been found as commonly occurring AUs that have
predictive significance for tutoring outcomes [23]. The presence
of AU14 Dimpler being negatively associated with learning is a
result also found in a study of affect in tutoring [24], which found
AU14 Dimpler to be an indicator of lower engagement and higher
frustration. AU1 Inner Brow Raiser is associated with surprise,
which was the only emotion present in the composite model,
further strengthening the need to further investigate which game
events are triggering the reaction. For the presence models, AU7
Lid Tightener and AU28 Lip Suck were negatively associated with
presence. These AUs have been associated with increased mental
effort, which may negatively impact student perceptions of
“transportation” into the virtual environment.
While AUs may produce better models, using them sacrifices
some of the interpretability found in the composite models. For
example, one can imagine designing an adaptive intervention to
induce confusion, or perhaps reduce frustration, but it is not as
clear what the design implications would be for inducing or
decreasing a specific AU. While some AUs can be related to
composite emotions, others may be more difficult to categorize or
require additional information to contextualize. On the other
hand, composite emotions may not always be more interpretable,
as shown by the features in the presence model. The trade-off
between predictive ability and interpretability is one that requires
further investigation, and may depend on the types of adaptivity
that a game-based learning environment is designed to
provide. One approach to improving interpretability would be to
investigate how these models differ as students move through a
game. While the data presented here was accumulated through
complete playthroughs of a game, examining how facial
expressions evolve as students transition between affective states
throughout different phases of the gameplay session may help
identify specific game events or contexts that produce the most
desirable reactions. For example, expressions of frustration may
be distributed evenly through the session, or they may be
triggered by specific events. A more granular analysis of the data
may help identify such scenarios, allowing for targeted and
effective supports to be incorporated into the game-based learning
environment.
7 CONCLUSION
Affect is integral to learning and problem solving in adaptive
learning environments. Creating affect-enhanced student models
has the potential to provide the foundation for the next generation
of adaptive learning environments that are more effective and
more engaging. Because game-based learning environments are
specifically designed for the dual goals of increasing learning
effectiveness and increasing student engagement, the prospect of
introducing affect-enhanced student models into game-based
learning environments may be a particularly fruitful line of
investigation. Creating affect-enhanced student models that
leverage facial expression analysis could potentially yield student
models that can more accurately predict student learning and
engagement and, ultimately, contribute to improved student
experiences through more informed adaptation.
In this work we have investigated affect-enhanced student
models with a game-based learning environment. Baseline
models using gameplay features were created to predict
normalized learning gain and presence using a forward stepwise
feature selection methodology. Two sets of affect-enhanced
student models, one based on composite emotion models and one
based on action units, were created. Both affect-enhanced student
models augmented gameplay data with these features sets.
Results show that the affect-enhanced student models improve
upon the predictive accuracy of baseline models. The additional
features incorporated in the affect-enhanced models were also
found to significantly contribute to predictive accuracy,
suggesting that the affective features provide additive value. The
action-unit-based affect-enhanced models performed better than
the composite-based affect-enhanced models, achieving
predictive accuracy improvements for both student learning (as
measured by normalized learning gain) and student engagement
(as measured by presence). In future work, it will be important to
investigate the integration of affect-enhanced student models into
runtime game-based learning environments. Explorations of run-
time integration should include early prediction modeling and
game-based adaptation that uses the affect-enhanced student
models to guide both cognitive and affective scaffolding and to
understand their impact on learning processes and outcomes.
Acknowledgements
We thank our collaborators in the Center of Educational
Informatics and the SMART Lab at North Carolina State
University. This study was supported by funding from the Social
Sciences and Humanities Research Council of Canada. Any
opinions, findings, and conclusions or recommendations
expressed in this material are those of the author(s) and do not
necessarily reflect the views of the Social Sciences and
Humanities Research Council of Canada.
9
REFERENCES [1] Aleven, V., Mclaughlin, E., Glenn, R. and Koedinger, K.
2016. Instruction Based on Adaptive Learning
Technologies. Handbook of Research on Learning and
Instruction. R. Mayer and P. Alexander, eds. Routledge,
London, UK.
[2] Arroyo, I., Ferguson, K., Johns, J., Dragon, T.,
Mehranian, H., Fisher, D., Barto, A., Mahadevan, S. and
Woolf, B. 2007. Repairing Disengagement with Non-
Invasive Interventions. In Proceedings of the 13th
International Conference on Artificial Intelligence in
Education. Los Angeles, CA, 195–202.
[3] Azevedo, R. and Aleven, V. eds. 2013. International
Handbook of Metacognition and Learning
Technologies. Springer New York.
[4] Azevedo, R., Taub, M., Mudrick, N.V., Martin, S.A. and
Grafsgaard, J. 2017. Using Multi-Channel Trace Data to
Infer and Foster Self-Regulated Learning Between
Humans and Advanced Learning Technologies.
Handbook of self-regulation of learning and
performance. D.H. Schunk and J.A. Greene, eds.
Routledge, London, UK.
[5] Baikadi, A., Rowe, J., Mott, B. and Lester, J. 2014.
Generalizability of Goal Recognition Models in
Narrative-Centered Learning Environments. In
Proceedings of the 22nd Conference on User Modeling,
Adaptation, and Personalization. Springer, Aalborg,
Denmark, 278–289.
[6] Baker, R. and Clarke-Midura, J. 2013. Predicting
Successful Inquiry Learning in a Virtual Performance
Assessment for Science. In Proceedings of the 21st
International Conference on User Modeling, Adaptation
and Personalization. Springer, Rome, Italy, 203–214.
[7] Bosch, N., D’Mello, S., Ocumpaugh, J., Baker, R. and
Shute, V. 2016. Using Video to Automatically Detect
Learner Affect in Computer-Enabled Classrooms. ACM
Transactions on Interactive Intelligent Systems, 6 (2),
17:1-26.
[8] Calvo, R. and D’Mello, S. 2011. New Perspectives on
Affect and Learning Technologies. Springer. New York,
New York, USA.
[9] Calvo, R., D’Mello, S., Gratch, J. and Kappas, A. 2015.
The Oxford Handbook of Affective Computing. Oxford
University Press. New York, New York, USA.
[10] Calvo, R. and Mello, S. 2010. Affect Detection : An
Interdisciplinary Review of Models, Methods, and Their
Applications. IEEE Transactions on affective
computing, 1 (1), 18–37.
[11] Clark, D., Tanner-Smith, E. and Killingsworth, S. 2016.
Digital Games, Design, and Learning: A Systematic
Review and Meta-Analysis. Review of Educational
Research, 86 (1), 79–122.
[12] Corrigan, S., Barkley, T. and Pardos, Z. 2015. Dynamic
Approaches to Modeling Student Affect and Its
Changing Role in Learning and Performance. In
Proceedings of the 23rd International Conference on
User Modeling, Adaptation and Personalization.
Springer, Dublin, Ireland, 92–103.
[13] D’Mello, S. 2013. A Selective Meta-Analysis on the
Relative Incidence of Discrete Affective States during
Learning with Technology. Journal of Educational
Psychology, 105 (4), 1082–1099.
[14] D’Mello, S. and Graesser, A. 2012. AutoTutor and
Affective AutoTutor: Learning by Talking with
Cognitively and Emotionally Intelligent Computers that
Talk Back? ACM Transactions on Interactive Intelligent
Systems, 15 (212), 434–442.
[15] D’Mello, S. and Graesser, A. 2014. Confusion and Its
Dynamics during Device Comprehension with
Breakdown Scenarios. Acta Psychologica, 151, 106–
116.
[16] D’Mello, S., Lehman, B., Pekrun, R. and Graesser, A.
2014. Confusion can be Beneficial for Learning.
Learning and Instruction, 29, 153–170.
[17] D’Mello, S., Lehman, B., Sullins, J., Daigle, R., Combs,
R., Vogt, K., Perkins, L. and Graesser, A. 2010. A Time
for Emoting: When Affect-Sensitivity Is and Isn’t
Effective at Promoting Deep Learning. In Proceedings
of the International Conference on Intelligent Tutoring
Systems. Springer, Pittsburgh, PA, 245–254.
[18] Defalco, J., Georgoulas-Sherry, V., Paquette, L., Baker,
R., Rowe, J., Mott, B. and Lester, J. 2016. Motivational
Feedback Messages as Interventions to Frustration in
GIFT. In Proceedings of the Fourth GIFT User
Symposium(GIFTSym4). Princeton, NJ, 25-35.
[19] Desmarais, M. and Baker, R. 2011. A Review of Recent
Advances in Learner and Skill Modeling in Intelligent
Learning Environments. User Modeling and User-
Adapted Interaction, 22 (1–2), 9–38.
[20] Easterday, M., Aleven, V., Scheines, R. and Carver, S.
2016. Using Tutors to Improve Educational Games: A
Cognitive Game for Policy Argument. Journal of the
Learning Sciences, 26 (2), 226–276.
[21] Ekman, P. and Friesen, W. 1977. Facial Action Coding
System.
[22] Ekman, P. and Rosenberg, E. 1997. What the Face
Reveals: Basic and Applied Studies of Spontaneous
Expression Using the Facial Action Coding System
(FACS). Oxford University Press. New York, New
York, USA.
[23] Grafsgaard, J., Wiggins, J., Boyer, K., Wiebe, E. and
Lester, J. 2013. Automatically Recognizing Facial
Expression: Predicting Engagement and Frustration. In
Proceedings of the 6th International Conference on
Educational Data Mining. Memphis, TN, 43-50.
[24] Grafsgaard, J., Wiggins, J., Vail, A., Boyer, K., Wiebe,
E. and Lester, J. 2014. The Additive Value of
Multimodal Features for Predicting Engagement,
Frustration, and Learning during Tutoring. In
Proceedings of the 16th International Conference on
Multimodal Interaction (ICMI ’14). Istanbul, Turkey,
42–49.
[25] Harley, J., Carter, C., Papaionnou, N., Bouchet, F.,
Landis, R., Azevedo, R. and Karabachian, L. 2016.
Examining the Predictive Relationship between
Personality and Emotion Traits and Students Agent-
Directed Emotions: Towards Emotionally-Adaptive
10
Agent-Based Learning Environments. User Modelling
and User-Adapted Interaction, 26 (2–3) 177–219.
[26] Kim, Y., Almond, R. and Shute, V. 2016. Applying
Evidence-Centered Design for the Development of
Game-Based Assessments in Physics Playground.
International Journal of Testing, 16 (2), 142–163.
[27] Koedinger, K. and Aleven, V. 2016. An Interview
Reflection on “Intelligent Tutoring Goes to School in the
Big City.” International Journal of Artificial
Intelligence in Education, 26 (1), 13–24.
[28] De la Torre, F. and Cohn, J. 2011. Facial Expression
Analysis. Springer London.
[29] Lallé, S., Mudrick, N., Taub, M., Grafsgaard, J., Conati,
C. and Azevedo, R. 2016. Impact of Individual
Differences on Affective Reactions to Pedagogical
Agents Scaffolding Related Work on Students’
Reactions to Pedagogical Agents. In Proceedings of
International Conference on Intelligent Virtual Agents.
Springer, Los Angeles, CA, 269-282.
[30] Littlewort, G., Whitehill, J., Wu, T., Fasel, I., Frank, M.,
Movellan, J. and Bartlett, M. 2011. The Computer
Expression Recognition Toolbox (CERT). In
Proceedings of the 11th International Conference on
Automatic Face & Gesture Recognition and Workshops.
IEEE, Santa Barbara, CA, 298–305.
[31] Min, W., Frankosky, M., Mott, B., Rowe, J., Wiebe, E.,
Boyer, K. and Lester, J. 2015. DeepStealth: Leveraging
Deep Learning Models for Stealth Assessment in Game-
Based Learning Environments. In Proceedings of the
Seventeenth International Conference on Artificial
Intelligence in Education. Springer, Madrid, Spain, 277–
286.
[32] Min, W., Mott, B., Rowe, J., Liu, B. and Lester, J. 2016.
Player Goal Recognition in Open-World Digital Games
with Long Short-Term Memory Networks. International
Joint Conference on Artificial Intelligence. AAAI Press,
Burlingame, CA, 197-203.
[33] Mitrovic, A. 2012. Fifteen Years of Constraint-Based
Tutors: What We Have Achieved and Where We Are
Going. User Modeling and User-Adapted Interaction,
22 (1–2) 39–72.
[34] Mott, B., Lee, S. and Lester, J. 2006. Probabilistic Goal
Recognition in Interactive Narrative Environments. In
Proceedings of the Twenty-First National Conference on
Artificial Intelligence. AAAI Press, Boston, MA, 187–
192.
[35] Ocumpaugh, J., Baker, R. and Rodrigo, M. 2015. Baker
Rodrigo Ocumpaugh Monitoring Protocol (BROMP)
2.0 Technical and Training Manual.
[36] Paquette, L., Rowe, J., Baker, R., Mott, B., Lester, J.,
DeFalco, J., Brawner, K., Sottilare, R. and Georgoulas,
V. 2015. Sensor-Free or Sensor-Full: A Comparison of
Data Modalities in Multi-Channel Affect Detection.
Proceedings of the 8th International Conference on
Educational Data Mining. Madrid, Spain, 93–100.
[37] Pekrun, R. 2011. Emotions as Drivers of Learning and
Cognitive Development. New Perspectives on Affect and
Learning Technologies. R.A. Calvo and S.K. D’Mello,
eds. Springer. 23–39.
[38] Rowe, J., Shores, L., Mott, B. and Lester, J. 2011.
Integrating Learning, Problem Solving, and Engagement
in Narrative-Centered Learning Environments.
International Journal of Artificial Intelligence in
Education, 21 (1-2), 115–133.
[39] Sabourin, J. and Lester, J. 2014. Affect and Engagement
in Game-Based Learning Environments. IEEE
Transactions on Affective Computing, 5 (1), 45–56.
[40] Sabourin, J., Rowe, J., Mott, B. and Lester, J. 2013.
Considering Alternate Futures to Classify Off-Task
Behavior as Emotion Self-Regulation: A Supervised
Learning Approach. Journal of Educational Data
Mining, 5 (1), 9–38.
[41] Shute, V., Wang, L., Greiff, S., Zhao, W. and Moore, G.
2016. Measuring Problem Solving Skills via Stealth
Assessment in an Engaging Video Game. Computers in
Human Behavior, 63, 106–117.
[42] Taub, M., Mudrick, N. V., Azevedo, R., Millar, G.C.,
Rowe, J. and Lester, J. 2017. Using Multi-Channel Data
with Multi-Level Modeling to Assess In-Game
Performance during Gameplay with Crystal Island.
Computers in Human Behavior,
DOI:https://doi.org/10.1016/j.chb.2017.01.038.
[43] Vail, A.K., Grafsgaard, J.F., Boyer, K.E., Wiebe, E.N.
and Lester, J.C. 2016. Gender Differences in Facial
Expressions of Affect During Learning. In Proceedings
of the 24th Conference on User Modeling, Adaptation
and Personalization (UMAP 2016). ACM, Halifax,
Canada, 65–73.
[44] VanLehn, K. 2006. The Behavior of Tutoring Systems.
International Journal of Artificial Intelligence in
Education, 16 (3), 227–265.
[45] VanLehn, K. 2011. The Relative Effectiveness of
Human Tutoring, Intelligent Tutoring Systems, and
Other Tutoring Systems. Educational Psychologist, 46
(4), 197–221.
[46] Witmer, B., Jerome, C. and Singer, M. 2005. The Factor
Structure of the Presence Questionnaire. Presence:
Teleoperators and Virtual Environments, 14 (3), 298–
312.
[47] Witmer, B. and Singer, M. 1998. Measuring Presence in
Virtual Environments: A Presence Questionnaire.
Presence: Teleoperators Virtual Environments, 7 (3),
225–240.
[48] Woolf, B., Burleson, W., Arroyo, I., Dragon, T., Cooper,
D. and Picard, R. 2009. Affect-Aware Tutors:
Recognising and Responding to Student Affect.
International Journal of Learning Technology, 4 (3-4),
129–164.
[49] Wouters, P., van Nimwegen, C., van Oostendorp, H. and
van der Spek, E. 2013. A Meta-Analysis of the Cognitive
and Motivational Effects of Serious Games. Journal of
Educational Psychology, 105 (2), 249–265.