the effects of exam frames on student effort and
TRANSCRIPT
THE EFFECTS OF EXAM FRAMES ON STUDENT EFFORT AND
PERFORMANCE
Briana Ballisa, Lester Lusherb∗, Paco Martorellc
aUniversity of California, MercedbUniversity of Hawaii at Manoa and IZA
cUniversity of California, Davis, United States
This draft: April 20, 2021
Abstract
We investigate how the framing of the point structure on an exam can influence test-taking effort andperformance. Under traditional choice theory, students may choose to skip more questions if the framinginduces loss averse preferences. We conduct a large-scale online experiment and a classroom experimentto instead find that loss-framed exams lead to less skipping. Further evidence finds that students inducedgreater effort, as measured through several proxies, under loss frames, and this greater effort led them toanswer more questions. Consequently, student performance was improved under loss frames. We findlittle evidence of a gender differential response to frames.
Keywords: Framing, Loss Aversion, Test-Taking, Field ExperimentJEL Classification: C93, D01, D81, D91
∗Corresponding author: Lester Lusher, Department of Economics, University of Hawaii at Manoa, Saunders Hall 542, 2424Maile Way, Honolulu, HI 96822 United States. This material is based upon work supported by a grant from University of HawaiiCollege of Social Sciences and the University of California, Davis Institute for Social Sciences. Email: [email protected]; BrianaBallis, Department of Economics, University of California at Merced, 5200 N. Lake Rd., Merced, CA 95343, United States.Email:[email protected]; Paco Martorell, School of Education, University of California at Davis, 1 Shields Ave., Davis CA, 95616,United States. Email: [email protected]
1
1 Introduction
Standardized tests are widely used in high stakes settings for students, educators, and educational sys-
tems. For example, exam scores are used to compare academic skills across countries, in teacher pay and
promotion decisions, as higher education gatekeepers, and in school accountability programs. Exam scores
are used in these ways because they purportedly measure an exam-taker’s proficiency in important skills.
A growing literature has investigated whether this is true (Wise and DeMars, 2005; Finn, 2015), and has
identified numerous determinants impacting exam performance, such as the roles of non-financial incen-
tives, grading and ranking among peers, intrinsic motivation, and student effort (e.g. Duckworth et al., 2011;
Jalava et al., 2015; Levitt et al., 2016; Gneezy et al., 2019).
This study investigates how the framing of exam scoring affects exam-taker behavior and performance.
We do so in a context where participants take a standardized test and incorrect answers are penalized more
than skipped items. Our experimental design randomly assigns exam-takers into one of two possible frames.
The first frame, which mirrors exam scoring on a number of important tests in the U.S. and abroad,1 we call
a “loss frame.” In this frame, participants start off with an “endowment” of points that they can lose by
answering questions incorrectly, but then gain points for answers questions correctly. The second frame we
generate in our experiment removes the “losing” of points, but holds the mapping between the number of
items answered correctly, incorrectly, and skipped constant. In this “gain frame,” respondents do not lose
points for incorrect answers but instead gain points for skipped items and correct items.
Since differential scoring of incorrect and skipped items implies that a test-taker bears a risk in answering
test items, the framing of this scoring can interact with risk and loss preferences in a way that affects exam
performance. In particular, the theory of loss aversion (Tversky and Kahneman, 1979) suggests that in
lottery decisions, individuals are willing to bear more risk in order to reduce the likelihood of experiencing
a “loss.”2 However, loss aversion theory has ambiguous predictions for how this type of exam score framing
affects exam-taker behavior. On one hand, the more the test-taker feels an incorrect answer triggers a “loss,”
the less willing they will be to risk an attempt on a question. On the other hand, a loss framing might
induce more effort to avoid losing the “endowment” with which they began the exam. This effort effect
arises because the probabilities in the “lottery” of an exam question are partly determined by student effort.
For instance, test-takers may (unconsciously) read questions with greater care or spend more time thinking1In the U.S., examples include the Scholastic Aptitude Test (SAT) prior to 2016, the Graduate Management Admission Test
(GMAT), the middle and upper Secondary School Admission Tests (used for private school admission). Internationally, examplesinclude college admissions exams (Pekkarinen, 2015; Coffman and Klinowski, 2020) and professional licensing exams (Iriberri andRey-Biel, 2019).
2See e.g. Rabin (2000); Tom et al. (2007); Gachter et al. (2007).
2
about correct answers, causing an increase the number of items they can confidently answer. This may
be especially relevant in lower-stakes settings, where exam-taker effort plays a role in exam performance
(Gneezy et al., 2019). In total, these two inputs, student effort and choice, pull in opposing directions: the
fear of incurring a loss with a wrong answer tempts the student to skip the question, but this fear also drives
effort to answer test items correctly, increasing the probability of answering correctly (if they attempt) and
thus incentivizing the student to attempt the question.
To date, the literature on how point structure and frames could influence student choice on exams is
limited. Furthermore, to our knowledge, no study has investigated how exam frames could impact student
effort. In an experimental setting, Baldiga (2014) finds that on exams that penalize wrong answers (relative
to omitting an answer), women skip significantly more than men, leading to worse scores overall. Similar
findings come from Pekkarinen (2015) in the context of Finnish university entrance examinations: women
skip more than men, leading to worse test scores. A working paper from Iriberri and Rey-Biel (2019) uses a
within-exam (across question) approach along with a different frame (+points for skipping and ++points for
correct answer) to uncover a similar finding: women skip more than men, harming their scores.3 Importantly,
these studies do not explicitly test the impact of different score-equivalent frames on student outcomes;4
instead, they identify gender differences within a specific frame. Perhaps most similar to our study, a
working paper from Balart et al. (2020) tests two score-equivalent loss frames against each other, one where
skipping encodes a loss and another where it does not: confirming the classical Tversky and Kahneman
(1979) loss aversion in choice findings, the authors find that students skip more questions when there’s no
“loss” in skipping.5
We carry out our investigation by analyzing findings from two experiments designed to investigate how
choices and effort respond to differential frames of the exam, one using Amazon’s Mechanical Turk (MTurk)
platform and the other from a college classroom. First, on MTurk, each of 1,904 subjects took an exam of
15 multiple choice questions. For each question, participants had two potential answers to select from, all
of which invoked English vocabulary knowledge via sentence completion. Participants were randomly as-3A working paper from Hernandez and Hershaff (2015) finds that middle-schoolers in Michigan skip questions on the statewide
standardized test despite there being no penalty for incorrect responses and no time limit on the exam. The authors link thisincidence to longer run outcomes such as high school dropout.
4More specifically, these studies do not present two separate exams where the relative scoring is held constant while the framingof the scoring is manipulated (e.g. -points for incorrect, 0 points for skipping +points for correct vs. 0 points for incorrect, +pointsfor skipping and ++points for correct).
5Outside of the exam setting, frames have been shown to have meaningful effects on individual performance. Looking at overallcourse performance, studies from Apostolova-Mihaylova et al. (2015) and McEvoy et al. (2016) find that student performance isimproved when the point structure of the entire course utilizes a loss frame. Findings from other settings have also shown that theframing of monetary contracts can induce greater worker and teacher effort (Fryer Jr et al., 2012; Hossain and List, 2012; Imas et al.,2017). A recent working paper from Pierce et al. (2020) carefully investigates how loss framed monetary contracts can actuallylead to reductions in worker output.
3
signed one of two possible scoring structures. In one treatment (“gain frame”), participants were awarded
a point for skipping and two points for each correct answer. In the second treatment (“loss frame”), par-
ticipants were endowed with 15 points, and would gain (lose) a point for each (in)correct response, while
skipping resulted in no change in points.6 Crucially, under this scoring system, the total number of points
awarded is the same with either framing. Monetary payments were strictly a function of the number of
points scored on the test. Thus, participants who perceive the frames to be equivalent should have no effect
of treatment assignment. If, instead, participants in the “loss frame” are more likely to perceive incorrect
answers as losses, then the two countervailing forces emerge: “loss frame” participants exert more effort in
order to reduce the probability of a wrong answer, but they also will be more likely to skip a question to avoid
an incorrect answer. Since our exams were carried out via Qualtrics, we can observe how much time was
spent by each participant answering the questions, and on non-exam components as well (i.e. reading the
instructions and completing the post-experiment survey). This grants us an additional proxy for participant
effort, on top of evaluating their actual performance on the test.
We find that the loss framing increased exam scores. The test score gains arose from a reduction in
skipping questions and an increase in answering questions correctly, with the number of incorrect answers
left unchanged. We interpret this as evidence consistent with increased effort being exerted under the loss
frame. The loss frame appears to have increased participants’ willingness to exert the effort needed to
answer questions correctly that they were particularly uncertain about and would have skipped under the
gain frame. If instead the loss framing simply induced less skipping but did not affect effort, then we would
have expected to find more incorrect answers (as a fraction of all questions answered) since the marginal
test items are presumably ones participants find more difficult and are more likely answer incorrectly for a
fixed level of effort.7
More direct measures of effort also point to the loss frame inducing more effort. We first find that loss
frame participants spent significantly more time completing the exam questions. These participants also
spent more time on non-exam portions of the experiment (reading instructions and completing the post-
exam survey) suggesting the loss frame induced greater effort overall in the experiment. Additionally, loss
frame participants self-reported exerting significantly higher levels of effort on the exam.
Furthermore, much like prior studies, we find that women were significantly more likely to skip across
both treatment arms, which generated a gender gap with women having lower average test scores (Baldiga,6Note that the expected relative points of guessing is equivalent to the certain amount of points from skipping. The first
treatment mimics the framing in the study from Iriberri and Rey-Biel (2019), while the second is more similar to Baldiga (2014)and Pekkarinen (2015).
7These results remain robust to a specification estimated at the student-question level with question fixed effects.
4
2014; Pekkarinen, 2015; Iriberri and Rey-Biel, 2019; Coffman and Klinowski, 2020). This finding holds
conditioning on education level, and is in spite of female participants spending significantly more time on
the exam and self-reportedly exerting greater effort on the exam. Thus, it appears that female participants
were significantly more risk averse than their male counterparts, and this risk aversion induced greater effort
but lowered overall test scores due to “too much” skipping. We also explore whether there’s a gender
differential in effort and behavior across the two frames. We do not find any evidence of loss versus gain
frame effects on the gender gap in test scores.
To extend our results beyond the MTurk setting, we conducted a subsequent experiment in an online
introductory microeconomics course. The course had two midterms and a final, each of which consisted
entirely of multiple choice questions. Approximately one week prior to each exam, students were randomly
assigned into a loss or gain frame for scoring on the exam. Though the sample of students is relatively
small (N=66), consistent with our MTurk results, we first find that loss frame students skipped significantly
fewer questions. Point estimates on test score gains are similar in magnitude to the MTurk experiments,
but imprecisely estimated. Since students could be randomized to differing frames across exams, our data
also allow us to show robustness to models with student fixed effects. Overall, the classroom experiment
reaffirms the initial findings that loss frames can induce greater effort, which subsequently decreases the
likelihood the student skips a question and boosts test scores.
The remainder of this paper proceeds as follows: In section 2, we describe our experiments, in section 3
we present our results, and in section 4 we conclude.
2 Experimental Design and Hypotheses
2.1 Hypotheses
We set out to test two hypotheses about how test framing affects behavior. First, we hypothesize that
a setting where students begin with an “endowment” of points that they lose when answering questions
incorrectly will induce greater effort on the exam. This effect would materialize if test takers are loss averse
and if they think they can increase the likelihood of answering questions correctly by exerting more effort.
In this context, greater effort might take the form of trying harder to remember the definitions of vocabulary
words to help decide which option completes the sentence best. A subsidiary hypothesis is that overall test
performance will be higher for those in the loss frame as a result of the increased effort.
We also hypothesize that students in the loss frame will skip more test items. This is because incorrect
answers are framed as resulting in the student losing points in the loss frame, and loss averse students might
5
skip test items rather than risk incurring a loss if they are uncertain of an answer. We refer to this type
of effect as the item-specific loss aversion mechanism. However, there might also be a second mechanism
affecting skip rates, which is the aforementioned change in overall effort on the exam. If effort increases, it
could allow exam takers to realize they can correctly answer a question they might not have skipped if they
expended less effort. The effort mechanism and the item-specific loss aversion mechanism have opposing
effects on item skipping, and the relative strength of these competing forces determines the net effect of the
loss frame on item skipping.
2.2 Experiment with Amazon MTurk
This study presents results from two separate experiments. Our primary experiment was conducted via
Amazon’s MTurk platform. MTurk is an online labor market that is popular among social scientists as a
means to conduct online experiments. Given the ubiquity of MTurk, numerous studies have assessed the
validity of MTurk for conducting online experiments, and provided guidance to researchers for recruiting a
subject pool (see e.g. Horton et al., 2011; Paolacci et al., 2010; Buhrmester et al., 2016; Hamby and Taylor,
2016; Thomas and Clifford, 2017; Cheung et al., 2017). The vast majority of these studies suggest that
online labor markets such as MTurk serve as adequate, cost-efficient substitutes for in-person experiments.
Due to the cost-effectiveness of recruitment online via MTurk, the primary benefit of utilizing MTurk is the
ability to recruit a large number of subjects.
MTurk primarily serves as an online labor market where workers can complete tasks called Human
Intelligence Tasks (HITs). Generally, HITs consist of low-skill tasks that are better done by humans versus
a computer, such as reviewing images, reading documents, and transcribing audio. Researchers, instead,
post information on their experiment, such as the expected length of the experiment and how much the
subject could earn for participating. The researcher can also set filters on the subject pool. Participants can
only access a HIT if they have a bank account linked to an Amazon account. Moreover, participants can
only create one account, and researchers can limit the number of times a participant completes a HIT (to
avoid repeat subjects).
Our experiment recruited subjects through a HIT where only users from the US could participate.
MTurkers were told that they could earn up to $3 for participating in the study, and that the study would
take no more than 15 minutes. Upon joining the experiment, subjects were instructed that they would be
presented with 15 multiple choice questions, with each question having two potential answers. Each ques-
tion invoked English vocabulary knowledge via sentence completion. Subjects had three minutes and thirty
seconds to complete the exam portion of the experiment. At the end of the exam, subjects were asked
6
to complete a post-exam survey. The full text from the recruitment message, instructions, questions and
answers, and post-exam survey are available in the appendix.
Participants were randomly assigned to one of two possible scoring structures. In one treatment (“gain
frame”), participants were awarded a point for skipping and two points for each correct answer. In the
second treatment (“loss frame”), participants were endowed with 15 points, and would lose a point for each
incorrect response, while skipping resulted in no change in points and answering correctly led to one point.
Thus, the treatment groups can be summarized as:
• Gain frame: Incorrect answer is 0 points, skip question is 1 point, correct answer is 2 points
• Loss frame: Start with 15 points, incorrect answer is -1 point, skip question is 0 points, correct answer
is 1 point
Note that for any combination of correct, incorrect, and skipped questions, the total score is the same
across the two treatments. Also note that since each question has only two answer choices, the expected
point value of guessing is equivalent to the point value of skipping. Thus, a risk neutral test-taker would
be indifferent between answering a question and skipping if they have no intuition as to the correct answer.
Risk averse test-takers may skip questions to avoid the uncertainty of answering a question, so long as they
have some doubt about the correct answer. Subjects were awarded $0.10 per point, for a maximum of $3 in
earnings.
2.2.1 Summary statistics and balance test
A total of 2,147 MTurkers clicked the link to our HIT, but only 1,940 fully participated in the experiment.
An analysis of differential persistence rates suggests that gain frame participants were 2.9 percentage points
more likely to complete the experiment. This offers some suggestive evidence that the loss frame participants
did find their exams more unattractive and that the loss frame does induce loss preferences. Alternatively, it
may be that loss frame participants were simply more confused by their frame, and this confusion led them
to quit.
Table 1 uses responses from the post-exam survey to present summary statistics for our analytic sample,
and to test for whether there are observable differences between those who completed the experiment by
treatment group. Roughly 60% of subjects were female with an average age of 38 years. Approximately
two-thirds of subjects completed some post-secondary education. Two survey questions asked subjects to
rate their own risk preference and patience (on a scale from 1 to 7), while a series of other hypothetical
7
questions were asked in order to roughly proxy for the subject’s loss aversion, risk aversion, present bias,
and patience.8
Despite the differential attrition, baseline covariates of particiants are mostly balanced between the two
experimental conditions. The one exception is that the loss frame group has a higher fraction of women.
Mitigating this concern, when regressing an indicator for treatment on all covariates, a joint-significance
test fails to reject the null hypothesis that the coefficients are jointly equal to zero (F-stat of 1.05, p-value of
0.40). Nonetheless, the gender imbalance raises the possibility that there are unobserved differences across
the groups (that correlate with gender) and ultimately test performance.9
We will take two approaches to address this potential concern. First, we will test the sensitivity of
our results to the inclusion of our observable covariates. Shown later, we find that female subjects skip
more questions and answer more questions incorrectly, leading to lower scores. Since the loss frame had
disproportionately more women, this suggests that in the absence of controlling for gender, the loss frame
group will have relatively more skipping with lower test scores. We find, however, that the loss frame
experienced less skipping with higher test scores, and so if there is any unobserved correlate with gender,
then the gender imbalance is likely pushing in the opposite direction of our results and attenuating the
estimated treatment effects.
Alternatively, the potential participants who decided not to take the test may have been those who would
have exerted low effort, skipping at a relatively high rate and scoring poorly. This type of selection could
account for at least some of our estimated effects. To address this concern, a second approach will be to
bound our estimates using Lee (2009) bounds. Using this approach, we trim the gain frame sample so that
the share of observations is equal across frames, and then estimate treatment effects on the trimmed sample.
Specifically, we trim from above by dropping observations from the gain frame with the highest values of
the relevant outcomes, and then from below by dropping observations from the gain frame with the lowest
values of the relevant outcomes. As will be discussed further in Section 3.1.1, we find that for our skipping
and score outcomes, the relevant treatment effect bounds do not overlap with the value of zero. Thus, we
view it as unlikely that attrition from the loss frame can explain our treatment effects.8Please see the appendix for the exact wording of the questions. A subject is flagged as loss averse if the subject answered “An
additional $10 added to your winnings” to question #2 and “Yes, I’d risk losing the bonus $10 and flip a coin to win an additional$10 (for $20 total)” to question #3. A subject is flagged as risk averse if the subject answered “An additional $10 added to yourwinnings” to question #2. A subject is flagged as present biased if the subject answered “$10 given today” to question #4 and“$11 given in 6 months” to question #5. Lastly, a subject is flagged as patient if the subject answered “$11 given in one month” toquestion #4 and “$11 given in 6 months” to question #5.
9As previously noted, evidence from prior studies on gender differences in test performance when incorrect answers are treateddifferently than skipped questions.
8
2.3 Experiment in a college classroom
As a follow-up to our MTurk experiment, we randomly assigned loss frame and gain frame exams as
part of an online introductory microeconomics course. Treatment was administered across two midterms
and a final. Students were given 60 minutes to complete the midterm, each of which consisted of 50 multiple
choice questions. The final took two hours and consisted of 100 multiple choice questions. Each multiple
choice question had four potential answers.
We randomly assigned each student-exam either a gain frame or a loss frame (so that some students
were exposed to both frames while others may have been exposed to only one frame). We emailed students
approximately one week prior to their exam with which treatment they were assigned. Below we present
the relevant emailed text, which also appeared on their midterm (final) instructions:
Gain frame: The maximum number of points you can score on this midterm (final) is 200 (400). The
midterm (final) will be graded out of 200 (400) points. On this test, you will have the opportunity to skip
questions, if you desire.
• If you answer a question incorrectly, you will receive 0 points
• If you skip a question, you will receive 1 point
• If you answer a question correctly, you will receive 4 points
Loss frame: The maximum number of points you can score on this midterm (final) is 200 (400). The
midterm (final) will be graded out of 200 (400) points. On this test, you will have the opportunity to skip
questions, if you desire.
• First, you will start with 50 (100) points
• If you answer a question incorrectly, you will lose 1 point
• Nothing will happen if you skip a question (0 point change)
• If you answer a question correctly, you will receive 3 points
A notable shortcoming from the classroom experiment relative to the MTurk experiment is the lack of
sample size and corresponding statistical power. On the other hand, because treatment was re-randomized
for each exam, our data allow us to estimate models with student fixed effects. These in turn control for
9
all unobservable factors at the student level that may happen to correlate with treatment-assignment. For
instance, student fixed effects will account for any scenario of differential consent based on treatment. After
restricting our sample to students who consented to release their information and to those who experienced
both treatments across exams (to estimate student fixed effects), we are left with a sample of 66 students
(198 student-exam observations).
3 Results
3.1 Results from the Mturk Experiment
We begin by examining the raw data. Table 2 shows average performance and effort on the exam by
frame. Three results emerge from comparing average outcomes across loss and gain participants. First, loss
frame participants skip fewer questions. Relative to gain participants, loss participants skip about one fewer
question. Second, loss-frame participants have higher average scores (about 0.16 standard deviations higher)
and get about one more question correct relative to gain participants. Finally, loss frame participants exert
more effort on the exam, as measured by the total time that they took on the exam and their self-reported
effort. Relative to gain participants, loss participants spend about 6 seconds more on the multiple choice
questions, 20 seconds more on reading the instructions, and self-report exerting slightly more effort on the
exam.
In Table 3, we estimate the effects of the loss frame on question skipping and overall exam performance
using Ordinary Least Squares (OLS). Our outcome variables include the number of omitted questions, the
final score, the number of questions answered incorrectly, and the number of questions answered correctly,
which are all standardized to have a mean 0 and standard deviation of 1. For each outcome, we report the
coefficient on an indicator for loss frame, without controls (in odd numbered columns) and with controls for
demographics and other test-taker characteristics (in even numbered columns).10 These results show that
loss frame participants skip fewer questions (0.20 standard deviations of the mean), receive a higher overall
score (0.16 standard deviations of the mean), and significantly answering more questions correctly (0.21
standard deviations of the mean). There is little difference in the number of incorrect answers by frame.
This suggests that the improved test scores of loss participants are driven by skipping fewer questions and
answering more questions correctly, which is consistent with the theory that loss framing induces more effort10We report the coefficients on all covariates in the even numbered columns as well. Unsurprisingly, we first see that those with
higher education levels skip fewer questions and score higher on the exam. Moreover, those who are risk averse skip more questions(less willing to bear a risk in answering a question). In Appendix Table A1, we report pairwise correlations between individualdemographics and the outcome variables to uncover a similar pattern as the conditional results of 3.
10
on the exam. Additionally, the estimates are remarkably stable in response to the inclusion of controls; any
bias would need to come from unobserved differences that are not correlated with the included covariates.
To more directly test whether loss framing induced greater effort on the exam, we look at the amount
of time participants spent on the MTurk survey and self-reported effort on the exam. Table 4 shows OLS
estimates, where the outcomes include the total time spent on the multiple choice questions, the total time
spent on the other parts of the MTurk survey (reading the instructions and completing the post-exam survey),
an indicator for whether participants took the maximum time on the multiple choice questions (which was
180 seconds), and self-reported effort. Again, we standardize all outcome variables (except for the indicator
of whether participants took the maximum time on the exam). These results show that loss frame partici-
pants spend significantly more time on answering the multiple choice questions (0.15 standard deviations of
the mean) and on other aspects of the MTurk survey (0.08 standard deviations of the mean), are 9 percent-
age points more likely to take the maximum amount of time on the multiple choice questions, and report
significantly higher levels of effort exerted on the exam (0.08 standard deviations of the mean).
Next we examine how these patterns vary by gender. The evidence in Table 5 shows that women are
more likely to skip questions, receive lower scores, and answer more questions incorrectly. These findings
are consistent with those of prior studies. To investigate whether female participants have different strategies
for leaving a question unanswered by frame, we estimated the effect of the treatment interacted with gender.
Columns 2 and 5 of Table 5 show the estimation results for the OLS specification plus an interaction term
between Female and Loss. Female participants skip more questions on average (0.192 standard deviations
of the mean). This difference decreases (by 0.117 standard deviations of the mean) for loss-frame partic-
ipants, although this effect is not statistically significant at conventional levels. Female participants score
on average worse than male participants (0.202 standard deviations of the mean), but this difference does
not change for loss-frame participants. These results suggest that loss framing may, if anything, reduce the
gender gap in skip-rates, but has little effect on ultimate scores.
Similarly, we investigate how these patterns differ by education levels. Assuming less educated par-
ticipants possess weaker English vocabulary skills, we should observe greater amounts of skipping and/or
more incorrect answers for those with less education. Indeed, our results in Table 4 demonstrate that as ed-
ucation levels increase, skipping rates decrease and the number of questions answered correctly increases,
leading to higher test scores.11 To investigate whether loss framing differentially impacted participants with
different education levels, Columns 3 and 6 of Table 5 show the estimation results for the OLS specifica-
tion plus an interaction term between Loss and each of the following education categories: AA, BA, and11Those with higher education also exerted less effort on the test, suggesting they found the exam easier.
11
Postgrad (where HS or less is the omitted category). The differences in skipping and final scores across
education levels do not change significantly for loss frame participants. This suggests that loss framing did
not differentially impact test-taking strategy by education level.
3.1.1 Robustness
We next test the robustness of these results by first estimating a question-level analysis. Table 6 shows
OLS estimates for a question-level analysis, where the outcomes include whether a question was skipped
or answered correctly. We include question fixed effects and cluster our standard errors by individual. We
draw similar conclusions from the individual level regressions presented in Table 3. Loss frame participants
are significantly less likely to skip a question (5 percentage points less likely) and significantly more likely
to answer a question correctly (5 percentage points more likely). Moreover, the results are stable to the
inclusion of additional controls and question-order fixed effects.
As discussed earlier, we find that gain frame participants were slightly more likely to complete the
experiment. To assess whether this differential attrition is impacting our estimates, we perform a Lee (2009)
bounding exercise. Specifically, since we observe more attrition from the loss frame, we drop observations
from the gain frame with either the highest or lowest scores by a trimming factor of 3%.12 Then, we identify
treatment effects on the trimmed sample. Results from this exercise are shown in Table 7. The first column
shows our baseline estimates. The second column shows the upper and lower bound from this bounding
exercise. Importantly, for both of our primary outcomes, the relevant treatment-effect bounds do not overlap
the value of zero. Standard confidence intervals on these bounds, shown in Column 3, demonstrate that the
lower bound for the skipping outcome continues to differ from zero, whereas the lower bound for the score
outcome no longer differs from zero. However, Imbens and Manski (2004) propose a method for computing
smaller (and preferred) confidence interval for treatment effects which we present in Column 4. This time,
both treatment-effects differ from zero. As a result, we conclude that it is unlikely that our results can be
fully explained away by attrition from the loss frame by either very low or high preforming participants.
3.2 Results from the Classroom Experiment
Largely, the conclusions we draw from our classroom experiment are similar to those from the Mturk
experiment. Table 8 shows OLS estimates where the outcome is the total number of questions skipped
and the final score on the classroom exams. We show the results for all exams combined (Columns 1-21292% do not attrit from the gain frame and 89% do not attrit from the loss frame. The trimming factor of 3% is computed as
follows: (92-89)/92.
12
and 6-7), for the final exam (in Columns 3 and 8), and for the midterm exams (in Columns 4-5 and 9-10).
When we have multiple exams in the sample, we show the results with and without student fixed effects.
These results show that loss frame participants are less likely to skip questions than gain participants across
most samples and specifications. The effect is generally significant, except when we focus on all exams
combined, including student fixed effects (Column 2) and for the results that focus on the final exam only
(Column 3). However, even in the cases where the point estimate is not significant, it is negative, suggesting
that loss framing leads to fewer questions skipped even for these samples. While we do not find that loss
framing led to changes in final scores for any of our samples or specifications, the point estimates are all
positive suggesting that final scores may have increased under the loss framing in the classroom setting.
Although the estimates are less precise due to the large reduction in sample size, these classroom results
help corroborate our findings from the field experiment in a higher-stakes environment.
4 Discussion and Conclusions
Given the prevalence of high-stakes standardized testing throughout the world, it is important that these
tests accurately measure underlying student performance. Prior research has shown that seemingly innocu-
ous decisions about exam scoring can have sizable effects on measured performance, with these effects
contributing to test score gender gaps. This paper explores the role of loss aversion as a mechanism through
which the framing of how tests are scored can lead to differences in test performance. The effects are theo-
retically ambiguous. Under a “loss frame” where incorrect answers are scored as losing points, loss aversion
could induce exam takers to skip more questions and possibly obtain a lower overall score. Alternatively,
however, the loss frame could encourage an effort response whereby exam takers exert more effort which
makes them able to correctly answer questions about which they are uncertain and with lower effort would
just skip.
To empirically test between these alternatives, we use two experiments - one with a sample of workers
participating in the Amazon MTurk employment platform and a second in an undergraduate economics
course. In both, participants in the “gain frame” received points for skipping questions and did not lose
any points for incorrect answers. In the “loss frame” students lost points for incorrect answers and did
not receive points for skipped questions. Crucially, only the framing of the scoring differed across the two
conditions; total scores were always identical under the loss and gain frames.
We find strong evidence that the loss frame improved overall test scores and reduced question skipping.
We find no effect on the number of questions answered incorrectly. This suggests that the loss frame induced
13
correct answers that would have been skipped under the gain frame, which we interpret as evidence the
loss frame increased effort. More direct evidence in favor of effects on effort include positive effects on
time spent on the exam and self-reported effort on the exam. Consistent with prior research that women
perform worse when skipped and incorrect answers are scored differently, we find sizable gender gaps in
test performance. However, we do not find evidence of differential exam framing effects by gender.
Altogether, these results suggest that loss aversion triggered through exam framing can have important
effects on standardized test performance, and that these appear to operate through effects on effort. This has
important implications for how scores from tests that use differential scoring for skipped and unanswered
questions ought to be interpreted. Consistent with Gneezy et al. (2019), these results suggest that effort on
the exam task, and not aptitude alone, plays a significant role in test performance. Our results build on this
finding by highlighting the role of loss aversion plays in affecting exam effort.
Our results also relate to a greater literature on behavioral biases in the lab versus in the field. A common
concern from earlier literature was that loss aversion and reference dependence were figments of the lab, and
that in alternative settings with sufficiently high stakes and/or experienced decision makers, these behavioral
biases would cease to exist (Levitt and List, 2008). A growing literature has, however, uncovered loss averse
preferences in several field settings with high stakes, including in gambling (Lien and Zheng, 2015), game
shows (Post et al., 2008), and professional sports (Pope and Schweitzer, 2011; Lusher et al., 2018). Our
study uncovers effort effects from loss aversion in two settings with differing stakes: one where the stakes
are experimental earnings, and the other which impacts a student’s actual grade in the course. Thus, though
the stakes of our experiments may pale in comparison to those of tests like the SAT or GMAT, the prior
literature suggests there’s still ample reason to believe that even in higher stakes settings, effort can be
impacted by the framing of the test.
14
References
APOSTOLOVA-MIHAYLOVA, M., W. COOPER, G. HOYT, AND E. C. MARSHALL (2015): “Heterogeneousgender effects under loss aversion in the economics classroom: A field experiment,” Southern EconomicJournal, 81, 980–994.
BALART, P., L. EZQUERRA, AND I. HERNANDEZ-ARENAZ (2020): “Framing Effects on Risk-TakingBehavior: Evidence from a Field Experiment,” Available at SSRN 3556710.
BALDIGA, K. (2014): “Gender differences in willingness to guess,” Management Science, 60, 434–448.
BUHRMESTER, M., T. KWANG, AND S. D. GOSLING (2016): “Amazon’s Mechanical Turk: A new sourceof inexpensive, yet high-quality data?” .
CHEUNG, J. H., D. K. BURNS, R. R. SINCLAIR, AND M. SLITER (2017): “Amazon Mechanical Turkin organizational psychology: An evaluation and practical recommendations,” Journal of Business andPsychology, 32, 347–361.
COFFMAN, K. B. AND D. KLINOWSKI (2020): “The impact of penalties for wrong answers on the gendergap in test scores,” Proceedings of the National Academy of Sciences, 117, 8794–8803.
DUCKWORTH, A. L., P. D. QUINN, D. R. LYNAM, R. LOEBER, AND M. STOUTHAMER-LOEBER (2011):“Role of test motivation in intelligence testing,” Proceedings of the National Academy of Sciences, 108,7716–7720.
FINN, B. (2015): “Measuring motivation in low-stakes assessments,” ETS Research Report Series, 2015,1–17.
FRYER JR, R. G., S. D. LEVITT, J. LIST, AND S. SADOFF (2012): “Enhancing the efficacy of teacherincentives through loss aversion: A field experiment,” Tech. rep., National Bureau of Economic Research.
GACHTER, S., E. J. JOHNSON, AND A. HERRMANN (2007): “Individual-level loss aversion in riskless andrisky choices,” .
GNEEZY, U., J. A. LIST, J. A. LIVINGSTON, X. QIN, S. SADOFF, AND Y. XU (2019): “Measuring successin education: the role of effort on the test itself,” American Economic Review: Insights, 1, 291–308.
HAMBY, T. AND W. TAYLOR (2016): “Survey satisficing inflates reliability and validity measures: Anexperimental comparison of college and Amazon Mechanical Turk samples,” Educational and Psycho-logical Measurement, 76, 912–932.
HERNANDEZ, M. AND J. HERSHAFF (2015): “Skipping Questions in School Exams: The Role of Non-Cognitive Skills on Educational Outcomes,” .
HORTON, J. J., D. G. RAND, AND R. J. ZECKHAUSER (2011): “The online laboratory: Conductingexperiments in a real labor market,” Experimental economics, 14, 399–425.
HOSSAIN, T. AND J. A. LIST (2012): “The behavioralist visits the factory: Increasing productivity usingsimple framing manipulations,” Management Science, 58, 2151–2167.
IMAS, A., S. SADOFF, AND A. SAMEK (2017): “Do people anticipate loss aversion?” Management Sci-ence, 63, 1271–1284.
15
IMBENS, G. W. AND C. F. MANSKI (2004): “Confidence intervals for partially identified parameters,”Econometrica, 72, 1845–1857.
IRIBERRI, N. AND P. REY-BIEL (2019): “Brave Boys and Play-it-Safe Girls: Gender Differences in Will-ingness to Guess in a Large Scale Natural Field Experiment,” .
JALAVA, N., J. S. JOENSEN, AND E. PELLAS (2015): “Grades and rank: Impacts of non-financial incen-tives on test performance,” Journal of Economic Behavior & Organization, 115, 161–196.
LEE, D. S. (2009): “Training, wages, and sample selection: Estimating sharp bounds on treatment effects,”The Review of Economic Studies, 76, 1071–1102.
LEVITT, S. D. AND J. A. LIST (2008): “Homo economicus evolves,” Science, 319, 909–910.
LEVITT, S. D., J. A. LIST, S. NECKERMANN, AND S. SADOFF (2016): “The behavioralist goes to school:Leveraging behavioral economics to improve educational performance,” American Economic Journal:Economic Policy, 8, 183–219.
LIEN, J. W. AND J. ZHENG (2015): “Deciding When to Quit: Reference-Dependence over Slot MachineOutcomes,” American Economic Review, 105, 366–70.
LUSHER, L., C. HE, AND S. FICK (2018): “Are professional basketball players reference-dependent?”Applied Economics, 50, 3937–3948.
MCEVOY, D. M. ET AL. (2016): “Loss aversion and student achievement,” Economics Bulletin, 36, 1762–1770.
PAOLACCI, G., J. CHANDLER, AND P. G. IPEIROTIS (2010): “Running experiments on amazon mechanicalturk,” Judgment and Decision making, 5, 411–419.
PEKKARINEN, T. (2015): “Gender differences in behaviour under competitive pressure: Evidence on omis-sion patterns in university entrance examinations,” Journal of Economic Behavior & Organization, 115,94–110.
PIERCE, L., A. REES-JONES, AND C. BLANK (2020): “The Negative Consequences of Loss-FramedPerformance Incentives,” Tech. rep., National Bureau of Economic Research.
POPE, D. G. AND M. E. SCHWEITZER (2011): “Is Tiger Woods loss averse? Persistent bias in the face ofexperience, competition, and high stakes,” The American Economic Review, 101, 129–157.
POST, T., M. J. VAN DEN ASSEM, G. BALTUSSEN, AND R. H. THALER (2008): “Deal or no deal?Decision making under risk in a large-payoff game show,” The American Economic Review, 38–71.
RABIN, M. (2000): “Risk Aversion and Expected-Utility Theory: A Calibration Theorem,” Econometrica,68, 1281–1292.
THOMAS, K. A. AND S. CLIFFORD (2017): “Validity and Mechanical Turk: An assessment of exclusionmethods and interactive experiments,” Computers in Human Behavior, 77, 184–197.
TOM, S. M., C. R. FOX, C. TREPEL, AND R. A. POLDRACK (2007): “The neural basis of loss aversion indecision-making under risk,” Science, 315, 515–518.
TVERSKY, A. AND D. KAHNEMAN (1979): “Prospect theory: An analysis of decision under risk,” Econo-metrica, 47, 263–291.
16
WISE, S. L. AND C. E. DEMARS (2005): “Low examinee effort in low-stakes assessment: Problems andpotential solutions,” Educational assessment, 10, 1–17.
17
Tables and Figures
Table 1: Summary statistics and balance test
(1) (2) (3) (4)All Gain frame Loss frame Difference: (2)-(3)
Female 0.616 0.591 0.642 -0.051**
Age 37.859 38.026 37.689 0.337[0.270] [0.376] [0.389]
Education:-HS or less 0.330 0.339 0.322 0.017
-AA 0.165 0.155 0.176 -0.022
-BA 0.367 0.375 0.358 0.017
-Postgrad 0.138 0.132 0.143 -0.012
Willingness to Take Risks (1-7) 2.799 2.782 2.816 -0.034[0.036] [0.050] [0.052]
Self-Reported Patience (1-7) 3.176 3.171 3.182 -0.010[0.030] [0.042] [0.044]
Loss Averse 0.056 0.055 0.057 -0.002
Risk Averse 0.880 0.881 0.880 0.001
Present Biased 0.255 0.253 0.256 -0.003
Patient 0.149 0.139 0.159 -0.020
N 1904 963 941Joint-significance test (p-value) 0.40
Note: Standard deviations for non-binary covariates presented in brackets. Tests for statistically significant differences between thetwo frames are conducted in column (4). The joint-significance test is conducted by regressing an indicator for gain frame on allcovariates (testing for whether coefficients are jointly equal to zero). *p<0.10, ** p<0.05, *** p<0.01
18
Table 2: Pairwise differences in outcomes across frames
(1) (2) (3) (4)All Loss frame Gain frame t-test: (2)-(3)
Loss 0.494 1.000 0.000
Total Skipped 3.607 3.241 3.965 -0.723***[0.082] [0.108] [0.121]
Score 20.189 20.563 19.823 0.740***[0.108] [0.152] [0.152]
Total Incorrect 3.102 3.098 3.106 -0.008[0.051] [0.073] [0.070]
Total Correct 8.291 8.661 7.929 0.732***[0.081] [0.110] [0.118]
Think Correctly Answered 8.591 8.678 8.505 0.173[0.082] [0.113] [0.118]
Time (MC) - Seconds 185.048 187.830 182.330 5.499***[0.825] [1.095] [1.226]
Time (Other) - Seconds 297.832 308.204 287.698 20.507*[6.010] [9.350] [7.588]
Self-Reported Effort (1-7) 4.191 4.233 4.151 0.082*[0.023] [0.032] [0.032]
N 1904 963 941
Note: Standard deviations for non-binary covariates presented in brackets. Tests for statistically significant differences between thetwo frames are conducted in column (4). *p<0.10, ** p<0.05, *** p<0.01
19
Table 3: Effect of loss frame on skipping and performance
Total Skipped (std) Score (std) Total Incorrect (std) Total Correct (std)(1) (2) (3) (4) (5) (6) (7) (8)
Loss -0.203*** -0.204*** 0.157*** 0.159*** -0.00369 -0.00464 0.206*** 0.208***(0.0455) (0.0450) (0.0457) (0.0428) (0.0459) (0.0439) (0.0456) (0.0435)
Female 0.115** -0.172*** 0.0914** -0.172***(0.0481) (0.0454) (0.0466) (0.0463)
Age -0.00222 0.0112*** -0.0101*** 0.00853***(0.00199) (0.00181) (0.00188) (0.00187)
AA -0.00999 -0.00976 0.0185 -0.00146(0.0698) (0.0639) (0.0712) (0.0640)
BA -0.126** 0.352*** -0.274*** 0.297***(0.0551) (0.0508) (0.0527) (0.0524)
Postgrad -0.260*** 0.541*** -0.367*** 0.490***(0.0715) (0.0720) (0.0700) (0.0725)
Loss Averse 0.00802 0.0539 -0.0640 0.0318(0.127) (0.128) (0.142) (0.121)
Risk Averse 0.148 0.309*** -0.449*** 0.131(0.0969) (0.0913) (0.100) (0.0905)
Present Biased -0.0886* 0.241*** -0.185*** 0.204***(0.0527) (0.0507) (0.0506) (0.0517)
Patient -0.203*** 0.501*** -0.371*** 0.435***(0.0628) (0.0623) (0.0596) (0.0636)
Mean Y (not std) 3.607 3.607 20.19 20.19 3.102 3.102 8.291 8.291N 1904 1904 1904 1904 1904 1904 1904 1904
Note: This table contains OLS estimates, where the outcomes include the number of omitted questions, the final score, the numberof questions answered incorrectly, and the number of questions answered correctly. All outcomes are standardized to have a meanof 0 and standard deviation of 1. Odd numbered columns do not include controls, while even number columns include controlsfor gender, age, education level (with a high school degree or less as the omitted category), loss aversion, risk aversion, presentbiased, and a measure of patience. Observations are at the individual level. Robust standard errors in parenthesis, where *p<0.10,** p<0.05, *** p<0.01
20
Table 4: Effect of loss frame on effort
Time (MC only) Time (Full) Self-Reported EffortIn Seconds (std) In Seconds (std) Spent Max Time on MC (std)(1) (2) (3) (4) (5) (6) (7) (8)
Loss 0.153*** 0.144*** 0.0782* 0.0778* 0.0869*** 0.0822*** 0.0823* 0.0822*(0.0457) (0.0451) (0.0459) (0.0445) (0.0227) (0.0227) (0.0458) (0.0452)
Female 0.0988** 0.153*** 0.0632*** 0.132***(0.0500) (0.0496) (0.0238) (0.0481)
Age 0.00277 0.0113*** 0.000452 0.0104***(0.00201) (0.00178) (0.000982) (0.00196)
AA 0.0593 -0.0311 0.0403 -0.122*(0.0675) (0.0596) (0.0344) (0.0683)
BA -0.0769 -0.0342 -0.0198 -0.114**(0.0549) (0.0575) (0.0273) (0.0537)
Postgrad -0.128* -0.151*** -0.0745** -0.125*(0.0724) (0.0583) (0.0362) (0.0729)
Loss Averse 0.399*** 0.0202 0.130** 0.0678(0.154) (0.192) (0.0638) (0.145)
Risk Averse 0.336*** -0.197 0.113*** 0.189*(0.126) (0.137) (0.0440) (0.107)
Present Biased 0.149*** 0.0288 0.0244 0.00888(0.0514) (0.0569) (0.0271) (0.0533)
Patient 0.137** -0.0984 0.0471 -0.0467(0.0634) (0.0645) (0.0333) (0.0703)
Mean, dept. var 185.0 185.0 297.8 297.8 0.442 0.442 4.191 4.191N 1904 1904 1904 1904 1904 1904 1904 1904
Note: This table contains OLS estimates, where the outcomes include the total time spent on the multiple choice questions, the totaltime spent on the other parts of the MTurk survey (reading the instructions and completing the post-exam survey), an indicator forwhether participants took the maximum time on the multiple choice questions (which was 180 seconds), and self-reported effort.All outcomes, except for the indicator of whether participants took the maximum time on the exam, are standardized to have a meanof 0 and standard deviation of 1. Odd numbered columns do not include controls, while even number columns include controls forgender, age and education level (with a high school degree or less as the omitted category). Observations are at the individual level.Robust standard errors in parenthesis, where *p<0.10, ** p<0.05, *** p<0.01
21
Table 5: Treatment effect interacted with gender and education
Total Skipped (std) Score (std)(1) (2) (3) (4) (5) (6)
Loss -0.210*** -0.137* -0.216*** 0.170*** 0.159** 0.101(0.0451) (0.0724) (0.0821) (0.0437) (0.0705) (0.0716)
Female 0.136*** 0.192*** 0.140*** -0.194*** -0.202*** -0.196***(0.0474) (0.0696) (0.0475) (0.0458) (0.0647) (0.0459)
Age -0.00212 -0.00210 -0.00206 0.0107*** 0.0107*** 0.0107***(0.00199) (0.00198) (0.00199) (0.00185) (0.00185) (0.00185)
AA -0.00813 -0.00638 -0.00112 -0.0153 -0.0156 -0.0557(0.0702) (0.0701) (0.106) (0.0657) (0.0658) (0.0938)
BA -0.140** -0.139** -0.115 0.393*** 0.393*** 0.309***(0.0551) (0.0552) (0.0816) (0.0516) (0.0516) (0.0729)
Postgrad -0.284*** -0.287*** -0.385*** 0.611*** 0.611*** 0.641***(0.0712) (0.0712) (0.102) (0.0725) (0.0726) (0.105)
Female * Loss -0.117 0.0179(0.0929) (0.0902)
AA * Loss -0.0137 0.0834(0.140) (0.131)
BA * Loss -0.0514 0.173*(0.110) (0.103)
Postgrad*Loss 0.197 -0.0539(0.141) (0.144)
Mean Y (not std) 3.607 3.607 3.607 20.19 20.19 20.19N 1904 1904 1904 1904 1904 1904Controls Yes Yes Yes Yes Yes Yes
Note: This table contains OLS estimates, where the outcomes include the number of omitted questions, the final score, the numberof questions answered incorrectly, and the number of questions answered correctly. All outcomes are standardized to have a meanof 0 and standard deviation of 1. All columns include controls for gender, age, education level (with a high school degree or lessas the omitted category). Columns 2 and 5 include an interaction term between Female and Loss. Columns 3 and 6 include aninteraction term between Loss and each of the following education categories: AA, BA, and Postgrad (where HS or less is theomitted category). Observations are at the individual level. Robust standard errors in parenthesis, where *p<0.10, ** p<0.05, ***p<0.01
22
Table 6: Robustness to individual-question level analysis
Skipped Correct(1) (2) (3) (4) (5) (6)
Loss -0.0482*** -0.0498*** -0.0498*** 0.0488*** 0.0516*** 0.0516***(0.0108) (0.0107) (0.0107) (0.0108) (0.0104) (0.0104)
Female 0.0324*** 0.0324*** -0.0466*** -0.0466***(0.0113) (0.0113) (0.0109) (0.0109)
Age -0.000504 -0.000504 0.00194*** 0.00194***(0.000471) (0.000471) (0.000447) (0.000447)
AA -0.00193 -0.00193 -0.00144 -0.00144(0.0166) (0.0167) (0.0154) (0.0154)
BA -0.0332** -0.0332** 0.0783*** 0.0783***(0.0131) (0.0131) (0.0125) (0.0125)
Postgrad -0.0674*** -0.0674*** 0.130*** 0.130***(0.0169) (0.0169) (0.0171) (0.0171)
Mean, dept. var 0.240 0.240 0.240 0.553 0.553 0.553N 28560 28560 28560 28560 28560 28560FE Question ID Question ID Question ID & Question ID Question ID Question ID &
Question Order Question Order
Note: This table contains OLS estimates, where the outcomes include whether a question was skipped or answered correctly. Allcolumns include question fixed effects. Columns 2 and 5 additionally control for gender, age, education level (with a high schooldegree or less as the omitted category). Columns 3 and 6 additionally include question order fixed effects. Observations are at thequestion level. Robust standard errors clustered at the individual level in parenthesis, where *p<0.10, ** p<0.05, *** p<0.01
Table 7: Bounding exercises on treatment effects
95% CIOutcome Baseline Estimates Lee Bounds Standard Imbens and Manski
(1) (2) (3) (4)
Skipping (Std) -0.203*** [-0.558, -0.111] [-0.660, -0.001] [-0.644, -0.018](0.0455)
Score (Std) 0.157*** [0.092, 0.230] [-0.009, 0.336] [0.0072, 0.319](0.0457)
N 1904 2099N - Selected Obs - 1904Trimming Proportion - 0.033
Note: The first column presents our baseline coefficients on the effect of loss framing on the number of omitted questions and thefinal score (standardized to have a mean of 0 and a standard deviation of 1). See Table 3 for the full set of controls used. Robuststandard errors in parenthesis, where *p<0.10, ** p<0.05, *** p<0.01. The second column shows the upper and lower boundsfrom the Lee (2009) bounding exercise. The third column shows 95% confidence intervals for these bounds. The fourth columnsshows 95% confidence intervals for these bounds suggested by Imbens and Manski (2004).
23
Table 8: Results from experiment in introductory microeconomics course
Total skipped Score
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10)Loss framing -1.664∗ -0.820 -0.440 -2.117∗∗ -2.098∗∗ 2.506 1.340 1.011 3.278 1.810
(0.850) (0.819) (1.795) (0.851) (0.834) (1.675) (1.395) (2.098) (2.156) (1.960)Female -0.065 -1.279 0.528 0.414 0.887 0.220
(0.797) (1.638) (0.885) (2.664) (3.186) (2.795)Semester units -0.055 -0.121 -0.030 0.544∗∗ 0.233 0.710∗∗
(0.121) (0.237) (0.153) (0.251) (0.292) (0.324)Cumulative GPA -0.816 0.015 -1.251 8.514∗∗∗ 10.161∗∗∗ 7.734∗∗∗
(0.868) (1.835) (0.865) (2.088) (2.968) (2.132)Observations 198 198 66 132 132 198 198 66 132 132Skip rate 0.059 0.059 0.045 0.066 0.066Mean score 74.381 74.381 77.614 72.765 72.765Other controls X X X X X XStudent FE X X X XSample MT+Final MT+Final Final Midterms Midterms MT+Final MT+Final Final Midterms Midterms
Notes: Unit of observation at the student-exam level. All regressions include dummies for exam. “Other controls” includeindicators for college of major, year, admission term, degree type (BA vs. BS vs. Other), citizenship, Pell recipiency, andgender, and linear controls for term units, cumulative units, and cumulative GPA. Standard errors clustered at the studentlevel. One, two, and three asterisks indicate statistical significance at the 10, 5, and 1 percent levels, respectively.
24
5 Online Appendix: Additional Tables and Figures
Table A1: Correlations between Outcomes and Participant Characteristics
Outcome Variable Total Skipped Score Mean Char.
Female 0.450*** -0.808*** 0.616(0.167) (0.221)
Age -0.006 0.050*** 37.86(0.007) (0.009)
Education:- AA -0.0301 -0.0192
(0.251) (0.310) 0.165
- BA -0.513*** 1.912*** 0.367(0.197) (0.246)
- Postgrad -1.070*** 3.086*** 0.138(0.252) (0.345)
Willingness to 0.001 -0.081 2.799Take Risks (1-7) (0.056) (0.070)
Self-Reported -0.132** 0.038 3.176Patience (1-7) (0.064) (0.079)
Loss Averse -0.495 -1.111** 0.0562(0.326) (0.482)
Risk Averse 0.545** 1.296*** 0.880(0.245) (0.339)
Present Biased -0.173 0.607** 0.255(0.180) (0.242)
Patient -0.838*** 2.559*** 0.149(0.211) (0.291)
N 1904 1904 1904
Note: Each row shows results from a separate regression where we regress each outcome (total skipped or score) on each baselinecharacteristic. For education level, each outcome is regressed on a dummy variable for educational attainment (where the omittedcategory is high school degree or less). The third column shows the mean of the given characteristic across all Mturk participants.Robust SE. *p<0.10, ** p<0.05, *** p<0.01
25
Figure A1: Differences Between Loss and Gain Frame in Skipping by Question
Figure A2: Differences Between Loss and Gain Frame in Skipping - Order Question was Viewed
Note: The 15 Questions were randomized. This figure plots differences in the likelihood of skipping the questions in the order they
were viewed (i.e. The first, second, third,.... question viewed differs across participants).
26
Figure A3: Histogram - Self-Reported Effort by Frame
27
6 Online Appendix: Text from MTurk experiment
6.1 Recruitment text posted on MTurk
As part of this study, you will take a standardized test with sentence completion questions. After completing
the test, you will be asked to do a short survey. Your total monetary reward will be determined by your
performance on the standardized test.
Your total monetary reward will be determined by your performance on the standardized test. The details of
the rewards and scoring of the test will be shown once you begin this experiment. We expect this study to
take 15 minutes. Just for participating you will receive $0.01. The maximum additional payment based on
your performance that you will receive is $3.00.
At the end of the survey you will be provided a validation code. Please enter that validation code here where
it says “Provide the survey code here.”
6.2 MTurk experiment instructions, questions and answers, and post-exam survey
Thank you for your interest in our study! In order to participate, please read through the consent form on
the next page and check the box acknowledging your willingness to participate in the study.
As part of this study, you will take a standardized test with sentence completion questions. After completing
the test, you will be asked to do a short survey.
Your monetary reward will be determined by your performance on the standardized test. The details of the
rewards and scoring of the test will be shown later.
[Page break]
[Informed consent]
28
[Page break]
Scoring the test
The test will have 15 multiple-choice questions. You will have 3 minutes and 30 seconds to complete the
test, or 14 seconds per question on average.
Your performance on each section of the test will be combined into a single point total. Your monetary
reward for participating in the study will depend on this point total. You will be paid $0.10 for every point
you receive.
[Loss frame only:] You will start out with 15 points ($1.50). You will receive 1 point ($.10) for each correct
answer. You will receive 0 points for questions you skip/do not answer. You will lose 1 point (- $0.10) for
each incorrect answer.
[Loss frame only:] For instance, if you answer 8 questions correctly, skip 4 questions, and answer 3 incor-
rectly, your total number of points will be (15 + 8*1 - 3*1) = 20 points, and you will earn $2.00 at the end
of the survey.
[Gain frame only:] You will start out with 0 points ($0.00). You will receive 2 points ($.20) for each correct
answer. You will receive 1 point ($0.10) for questions you skip/do not answer. You will receive 0 points
($0.00) for each incorrect answer.
[Gain frame only:] For instance, if you answer 8 questions correctly, skip 4 questions, and answer 3 incor-
rectly, your total number of points will be (8*2 + 4*1) = 20 points, and you will earn $2.00 at the end of the
survey.
Before starting the exam, we want to make sure you understand how your exam performance determines the
amount of money you will receive. To demonstrate your understanding, please enter the amount of money
you would receive if your exam performance was as described below.
29
Question: How much money would you earn if you answered 9 questions correctly, 2 incorrectly, and you
skipped 4 questions? (do not include the ”$” sign)
[Page break]
Correct! By answering 9 questions correctly, 2 incorrectly, and skipping 4 questions, you would earn $2.20.
Each question will have exactly ONE correct answer. You can skip a question either by leaving it blank OR
by clicking on the ”Skip this question (or leave all choices blank)” option.
[Page break]
When you are ready to start the test, click on the arrow below to start the test. Once you do that, the timer
will begin. Once time has expired, you will not be able to work on the test any more.
[Loss frame only:] Remember you will start out with 15 points ($1.50). You will receive 1 point ($.10) for
each correct answer. You will receive 0 points for questions you skip/do not answer. You will lose 1 point (-
$0.10) for each incorrect answer.
[Gain frame only:] Remember you will start out with 0 points ($0.00). You will receive 2 points ($0.20)
for each correct answer. You will receive 1 point ($0.10) for questions you skip/do not answer. You will
receive 0 points ($0.00) for each incorrect answer.
GOOD LUCK!!
[Page break]
[Note: Order of questions randomized across participants]
Question: The professor became increasingly (blank) in later years, flying into a rage whenever he was
30
opposed. [taciturn / irascible / Skip this question (or leave all choices blank)]
Question: Many so-called social playwrights are distinctly (blank); rather than allowing the members of the
audience to form their own opinions, these writers force a viewpoint on the viewer. [didactic / conciliatory /
Skip this question (or leave all choices blank)]
Question: The flood of innovation that has engendered many of last decade’s technological breakthroughs
has also claimed some victims in its wake: companies once at the (i) (blank) of such innovation have now
become (ii) (blank). [(i) forefront - (ii) mostly obsolete / (i) periphery - (ii) increasingly relevant / Skip this
question (or leave all choices blank)]
Question: In a fit of (blank) she threw out the valuable statue simply because it had belonged to her ex-
husband. [contrition / pique / Skip this question (or leave all choices blank)]
Question: For a writer with a reputation for both prolixity and inscrutability, Thompson, in this slim col-
lection of short stories, may finally be intent on making his ideas more (blank) to a readership looking for
quick edification. [palatable / transcendent / Skip this question (or leave all choices blank)]
Question: Vain and prone to violence, Caravaggio could not handle success: the more his (i)(blank) as an
artist increased, the more (ii)(blank) his life became. [(i) notoriety - (ii) dispassionate / (i) eminence -(ii)
tumultuous / Skip this question (or leave all choices blank)]
Question: During their famous clash, Jung was ambivalent about Freud, so he attacked the father of modern
psychoanalysis even as he (blank) him. [understood / revered / Skip this question (or leave all choices blank)]
Question: Rubens, for all his high-flown rhetoric, churns out book reviews that have come to seem (blank):
from decades of critiquing other’s prose, he now relies on a familiar and tired formula. [perfunctory /
scathing / Skip this question (or leave all choices blank)]
Question: The government has taken exclusively ceremonial steps to alleviate illiteracy, poverty, and disease
31
in the countryside, (blank) the assumptions that the leadership only cares about its wealthy urban population.
[strengthening / disproving / Skip this question (or leave all choices blank)]
Question: Lacking sacred scriptures or (blank), Shinto is more properly regarded as a legacy of traditional
religious practices and basic values than as a formal system of belief. [dogma / relics / Skip this question
(or leave all choices blank)]
Question: Much of the consumer protection movement is predicated on the notion that routine exposure to
seemingly (blank) products can actually have long-term deleterious consequences. [banal / benign / Skip
this question (or leave all choices blank)]
Question: Fred often used (blank) to achieve his professional goals, even though such artful subterfuge
alienated his colleagues. [chicanery / bombast / Skip this question (or leave all choices blank)]
Question: Inured to the intense work engendered by the deadlines they normally faced, the production man-
agers felt somewhat (blank) by the temporary hiatus in orders. [wizened / disoriented / Skip this question
(or leave all choices blank)]
Question: In parts of the Arctic, the land grades into the land fast ice so (blank) that you can walk off the
coast and not know you are over the hidden sea. [imperceptibly / precariously / Skip this question (or leave
all choices blank)]
Question: The formerly (blank) waters of the lake have been polluted so that the fish are no longer visible
from the surface. [tranquil / pellucid / Skip this question (or leave all choices blank)]
[Page break]
Question #1: Out of the 15 questions, how many do you think you answered correctly?
Question #2: Suppose you were offered one of the following additional rewards for participating in the
32
study. Which do you prefer? [An additional $10 added to your winnings / A lottery where a coin is flipped
and if the coin comes up Tails, then an additional $20 are added to your winnings. If the coin comes up
Heads, then you receive no additional winnings]
Question #3: Suppose instead that you got a bonus $10 for participating in the study. Then, you were offered
a chance to flip a coin where if the coin comes up Tails, then you win an additional $10, but if the coin comes
up Heads, then you lose the bonus $10 (for $0 total). Would you take the chance and flip the coin? [Yes, I’d
risk losing the bonus $10 and flip a coin to win an additional $10 (for $20 total) / No, I don’t want to flip a
coin and I’ll take the bonus $10]
Question #4: Suppose instead you were offered one of the following additional rewards for participating in
the experiment. Which do you prefer? [$10 given today / $11 given in one month]
Question #5: Suppose instead you were offered one of the following additional rewards for participating in
the experiment. Which do you prefer? [$10 given in 5 months / $11 given in 6 months]
Question #6: What is your age (in years)?
Question #7: What gender do you identify yourself as?[Male / Female / Non-Binary]
Question #8: Is English your first language? [Yes No]
Question #9: What is the highest degree or level of school that you have COMPLETED? [Did not complete
high school / High school diploma (including GED or alternative credential) / Associate’s degree / Bache-
lor’s degree / Postgraduate degree]
Question #10: How willing are you to take risks in general? (On a scale from 1, unwilling to 7, very willing)
Question #11: How patient are you in general? (On a scale from 1, extremely impatient, to 7, extremely
patient)
33
Question #12: How much effort did you exert on this exam? (On a scale from 1, very little effort, to 7, a lot
of effort)
Question #13: Have you participated in previous research studies on MTurk? [Yes / No]
Please enter your Mechanical Turk Id (this will be needed to compute your payment).
34