the effects of exam frames on student effort and

THE EFFECTS OF EXAM FRAMES ON STUDENT EFFORT AND

PERFORMANCE

Briana Ballisa, Lester Lusherb∗, Paco Martorellc

aUniversity of California, MercedbUniversity of Hawaii at Manoa and IZA

cUniversity of California, Davis, United States

This draft: April 20, 2021

Abstract

We investigate how the framing of the point structure on an exam can influence test-taking effort andperformance. Under traditional choice theory, students may choose to skip more questions if the framinginduces loss averse preferences. We conduct a large-scale online experiment and a classroom experimentto instead find that loss-framed exams lead to less skipping. Further evidence finds that students inducedgreater effort, as measured through several proxies, under loss frames, and this greater effort led them toanswer more questions. Consequently, student performance was improved under loss frames. We findlittle evidence of a gender differential response to frames.

Keywords: Framing, Loss Aversion, Test-Taking, Field ExperimentJEL Classification: C93, D01, D81, D91

∗Corresponding author: Lester Lusher, Department of Economics, University of Hawaii at Manoa, Saunders Hall 542, 2424Maile Way, Honolulu, HI 96822 United States. This material is based upon work supported by a grant from University of HawaiiCollege of Social Sciences and the University of California, Davis Institute for Social Sciences. Email: [email protected]; BrianaBallis, Department of Economics, University of California at Merced, 5200 N. Lake Rd., Merced, CA 95343, United States.Email:[email protected]; Paco Martorell, School of Education, University of California at Davis, 1 Shields Ave., Davis CA, 95616,United States. Email: [email protected]

1

1 Introduction

Standardized tests are widely used in high stakes settings for students, educators, and educational sys-

tems. For example, exam scores are used to compare academic skills across countries, in teacher pay and

promotion decisions, as higher education gatekeepers, and in school accountability programs. Exam scores

are used in these ways because they purportedly measure an exam-taker’s proficiency in important skills.

A growing literature has investigated whether this is true (Wise and DeMars, 2005; Finn, 2015), and has

identified numerous determinants impacting exam performance, such as the roles of non-financial incen-

tives, grading and ranking among peers, intrinsic motivation, and student effort (e.g. Duckworth et al., 2011;

Jalava et al., 2015; Levitt et al., 2016; Gneezy et al., 2019).

This study investigates how the framing of exam scoring affects exam-taker behavior and performance.

We do so in a context where participants take a standardized test and incorrect answers are penalized more

than skipped items. Our experimental design randomly assigns exam-takers into one of two possible frames.

The first frame, which mirrors exam scoring on a number of important tests in the U.S. and abroad,1 we call

a “loss frame.” In this frame, participants start off with an “endowment” of points that they can lose by

answering questions incorrectly, but then gain points for answers questions correctly. The second frame we

generate in our experiment removes the “losing” of points, but holds the mapping between the number of

items answered correctly, incorrectly, and skipped constant. In this “gain frame,” respondents do not lose

points for incorrect answers but instead gain points for skipped items and correct items.

Since differential scoring of incorrect and skipped items implies that a test-taker bears a risk in answering

test items, the framing of this scoring can interact with risk and loss preferences in a way that affects exam

performance. In particular, the theory of loss aversion (Tversky and Kahneman, 1979) suggests that in

lottery decisions, individuals are willing to bear more risk in order to reduce the likelihood of experiencing

a “loss.”2 However, loss aversion theory has ambiguous predictions for how this type of exam score framing

affects exam-taker behavior. On one hand, the more the test-taker feels an incorrect answer triggers a “loss,”

the less willing they will be to risk an attempt on a question. On the other hand, a loss framing might

induce more effort to avoid losing the “endowment” with which they began the exam. This effort effect

arises because the probabilities in the “lottery” of an exam question are partly determined by student effort.

For instance, test-takers may (unconsciously) read questions with greater care or spend more time thinking1In the U.S., examples include the Scholastic Aptitude Test (SAT) prior to 2016, the Graduate Management Admission Test

(GMAT), the middle and upper Secondary School Admission Tests (used for private school admission). Internationally, examplesinclude college admissions exams (Pekkarinen, 2015; Coffman and Klinowski, 2020) and professional licensing exams (Iriberri andRey-Biel, 2019).

2See e.g. Rabin (2000); Tom et al. (2007); Gachter et al. (2007).

2

about correct answers, causing an increase the number of items they can confidently answer. This may

be especially relevant in lower-stakes settings, where exam-taker effort plays a role in exam performance

(Gneezy et al., 2019). In total, these two inputs, student effort and choice, pull in opposing directions: the

fear of incurring a loss with a wrong answer tempts the student to skip the question, but this fear also drives

effort to answer test items correctly, increasing the probability of answering correctly (if they attempt) and

thus incentivizing the student to attempt the question.

To date, the literature on how point structure and frames could influence student choice on exams is

limited. Furthermore, to our knowledge, no study has investigated how exam frames could impact student

effort. In an experimental setting, Baldiga (2014) finds that on exams that penalize wrong answers (relative

to omitting an answer), women skip significantly more than men, leading to worse scores overall. Similar

findings come from Pekkarinen (2015) in the context of Finnish university entrance examinations: women

skip more than men, leading to worse test scores. A working paper from Iriberri and Rey-Biel (2019) uses a

within-exam (across question) approach along with a different frame (+points for skipping and ++points for

correct answer) to uncover a similar finding: women skip more than men, harming their scores.3 Importantly,

these studies do not explicitly test the impact of different score-equivalent frames on student outcomes;4

instead, they identify gender differences within a specific frame. Perhaps most similar to our study, a

working paper from Balart et al. (2020) tests two score-equivalent loss frames against each other, one where

skipping encodes a loss and another where it does not: confirming the classical Tversky and Kahneman

(1979) loss aversion in choice findings, the authors find that students skip more questions when there’s no

“loss” in skipping.5

We carry out our investigation by analyzing findings from two experiments designed to investigate how

choices and effort respond to differential frames of the exam, one using Amazon’s Mechanical Turk (MTurk)

platform and the other from a college classroom. First, on MTurk, each of 1,904 subjects took an exam of

15 multiple choice questions. For each question, participants had two potential answers to select from, all

of which invoked English vocabulary knowledge via sentence completion. Participants were randomly as-3A working paper from Hernandez and Hershaff (2015) finds that middle-schoolers in Michigan skip questions on the statewide

standardized test despite there being no penalty for incorrect responses and no time limit on the exam. The authors link thisincidence to longer run outcomes such as high school dropout.

4More specifically, these studies do not present two separate exams where the relative scoring is held constant while the framingof the scoring is manipulated (e.g. -points for incorrect, 0 points for skipping +points for correct vs. 0 points for incorrect, +pointsfor skipping and ++points for correct).

5Outside of the exam setting, frames have been shown to have meaningful effects on individual performance. Looking at overallcourse performance, studies from Apostolova-Mihaylova et al. (2015) and McEvoy et al. (2016) find that student performance isimproved when the point structure of the entire course utilizes a loss frame. Findings from other settings have also shown that theframing of monetary contracts can induce greater worker and teacher effort (Fryer Jr et al., 2012; Hossain and List, 2012; Imas et al.,2017). A recent working paper from Pierce et al. (2020) carefully investigates how loss framed monetary contracts can actuallylead to reductions in worker output.

3

signed one of two possible scoring structures. In one treatment (“gain frame”), participants were awarded

a point for skipping and two points for each correct answer. In the second treatment (“loss frame”), par-

ticipants were endowed with 15 points, and would gain (lose) a point for each (in)correct response, while

skipping resulted in no change in points.6 Crucially, under this scoring system, the total number of points

awarded is the same with either framing. Monetary payments were strictly a function of the number of

points scored on the test. Thus, participants who perceive the frames to be equivalent should have no effect

of treatment assignment. If, instead, participants in the “loss frame” are more likely to perceive incorrect

answers as losses, then the two countervailing forces emerge: “loss frame” participants exert more effort in

order to reduce the probability of a wrong answer, but they also will be more likely to skip a question to avoid

an incorrect answer. Since our exams were carried out via Qualtrics, we can observe how much time was

spent by each participant answering the questions, and on non-exam components as well (i.e. reading the

instructions and completing the post-experiment survey). This grants us an additional proxy for participant

effort, on top of evaluating their actual performance on the test.

We find that the loss framing increased exam scores. The test score gains arose from a reduction in

skipping questions and an increase in answering questions correctly, with the number of incorrect answers

left unchanged. We interpret this as evidence consistent with increased effort being exerted under the loss

frame. The loss frame appears to have increased participants’ willingness to exert the effort needed to

answer questions correctly that they were particularly uncertain about and would have skipped under the

gain frame. If instead the loss framing simply induced less skipping but did not affect effort, then we would

have expected to find more incorrect answers (as a fraction of all questions answered) since the marginal

test items are presumably ones participants find more difficult and are more likely answer incorrectly for a

fixed level of effort.7

More direct measures of effort also point to the loss frame inducing more effort. We first find that loss

frame participants spent significantly more time completing the exam questions. These participants also

spent more time on non-exam portions of the experiment (reading instructions and completing the post-

exam survey) suggesting the loss frame induced greater effort overall in the experiment. Additionally, loss

frame participants self-reported exerting significantly higher levels of effort on the exam.

Furthermore, much like prior studies, we find that women were significantly more likely to skip across

both treatment arms, which generated a gender gap with women having lower average test scores (Baldiga,6Note that the expected relative points of guessing is equivalent to the certain amount of points from skipping. The first

treatment mimics the framing in the study from Iriberri and Rey-Biel (2019), while the second is more similar to Baldiga (2014)and Pekkarinen (2015).

7These results remain robust to a specification estimated at the student-question level with question fixed effects.

4

2014; Pekkarinen, 2015; Iriberri and Rey-Biel, 2019; Coffman and Klinowski, 2020). This finding holds

conditioning on education level, and is in spite of female participants spending significantly more time on

the exam and self-reportedly exerting greater effort on the exam. Thus, it appears that female participants

were significantly more risk averse than their male counterparts, and this risk aversion induced greater effort

but lowered overall test scores due to “too much” skipping. We also explore whether there’s a gender

differential in effort and behavior across the two frames. We do not find any evidence of loss versus gain

frame effects on the gender gap in test scores.

To extend our results beyond the MTurk setting, we conducted a subsequent experiment in an online

introductory microeconomics course. The course had two midterms and a final, each of which consisted

entirely of multiple choice questions. Approximately one week prior to each exam, students were randomly

assigned into a loss or gain frame for scoring on the exam. Though the sample of students is relatively

small (N=66), consistent with our MTurk results, we first find that loss frame students skipped significantly

fewer questions. Point estimates on test score gains are similar in magnitude to the MTurk experiments,

but imprecisely estimated. Since students could be randomized to differing frames across exams, our data

also allow us to show robustness to models with student fixed effects. Overall, the classroom experiment

reaffirms the initial findings that loss frames can induce greater effort, which subsequently decreases the

likelihood the student skips a question and boosts test scores.

The remainder of this paper proceeds as follows: In section 2, we describe our experiments, in section 3

we present our results, and in section 4 we conclude.

2 Experimental Design and Hypotheses

2.1 Hypotheses

We set out to test two hypotheses about how test framing affects behavior. First, we hypothesize that

a setting where students begin with an “endowment” of points that they lose when answering questions

incorrectly will induce greater effort on the exam. This effect would materialize if test takers are loss averse

and if they think they can increase the likelihood of answering questions correctly by exerting more effort.

In this context, greater effort might take the form of trying harder to remember the definitions of vocabulary

words to help decide which option completes the sentence best. A subsidiary hypothesis is that overall test

performance will be higher for those in the loss frame as a result of the increased effort.

We also hypothesize that students in the loss frame will skip more test items. This is because incorrect

answers are framed as resulting in the student losing points in the loss frame, and loss averse students might

5

skip test items rather than risk incurring a loss if they are uncertain of an answer. We refer to this type

of effect as the item-specific loss aversion mechanism. However, there might also be a second mechanism

affecting skip rates, which is the aforementioned change in overall effort on the exam. If effort increases, it

could allow exam takers to realize they can correctly answer a question they might not have skipped if they

expended less effort. The effort mechanism and the item-specific loss aversion mechanism have opposing

effects on item skipping, and the relative strength of these competing forces determines the net effect of the

loss frame on item skipping.

2.2 Experiment with Amazon MTurk

This study presents results from two separate experiments. Our primary experiment was conducted via

Amazon’s MTurk platform. MTurk is an online labor market that is popular among social scientists as a

means to conduct online experiments. Given the ubiquity of MTurk, numerous studies have assessed the

validity of MTurk for conducting online experiments, and provided guidance to researchers for recruiting a

subject pool (see e.g. Horton et al., 2011; Paolacci et al., 2010; Buhrmester et al., 2016; Hamby and Taylor,

2016; Thomas and Clifford, 2017; Cheung et al., 2017). The vast majority of these studies suggest that

online labor markets such as MTurk serve as adequate, cost-efficient substitutes for in-person experiments.

Due to the cost-effectiveness of recruitment online via MTurk, the primary benefit of utilizing MTurk is the

ability to recruit a large number of subjects.

MTurk primarily serves as an online labor market where workers can complete tasks called Human

Intelligence Tasks (HITs). Generally, HITs consist of low-skill tasks that are better done by humans versus

a computer, such as reviewing images, reading documents, and transcribing audio. Researchers, instead,

post information on their experiment, such as the expected length of the experiment and how much the

subject could earn for participating. The researcher can also set filters on the subject pool. Participants can

only access a HIT if they have a bank account linked to an Amazon account. Moreover, participants can

only create one account, and researchers can limit the number of times a participant completes a HIT (to

avoid repeat subjects).

Our experiment recruited subjects through a HIT where only users from the US could participate.

MTurkers were told that they could earn up to $3 for participating in the study, and that the study would

take no more than 15 minutes. Upon joining the experiment, subjects were instructed that they would be

presented with 15 multiple choice questions, with each question having two potential answers. Each ques-

tion invoked English vocabulary knowledge via sentence completion. Subjects had three minutes and thirty

seconds to complete the exam portion of the experiment. At the end of the exam, subjects were asked

6

to complete a post-exam survey. The full text from the recruitment message, instructions, questions and

answers, and post-exam survey are available in the appendix.

Participants were randomly assigned to one of two possible scoring structures. In one treatment (“gain

frame”), participants were awarded a point for skipping and two points for each correct answer. In the

second treatment (“loss frame”), participants were endowed with 15 points, and would lose a point for each

incorrect response, while skipping resulted in no change in points and answering correctly led to one point.

Thus, the treatment groups can be summarized as:

• Gain frame: Incorrect answer is 0 points, skip question is 1 point, correct answer is 2 points

• Loss frame: Start with 15 points, incorrect answer is -1 point, skip question is 0 points, correct answer

is 1 point

Note that for any combination of correct, incorrect, and skipped questions, the total score is the same

across the two treatments. Also note that since each question has only two answer choices, the expected

point value of guessing is equivalent to the point value of skipping. Thus, a risk neutral test-taker would

be indifferent between answering a question and skipping if they have no intuition as to the correct answer.

Risk averse test-takers may skip questions to avoid the uncertainty of answering a question, so long as they

have some doubt about the correct answer. Subjects were awarded $0.10 per point, for a maximum of $3 in

earnings.

2.2.1 Summary statistics and balance test

A total of 2,147 MTurkers clicked the link to our HIT, but only 1,940 fully participated in the experiment.

An analysis of differential persistence rates suggests that gain frame participants were 2.9 percentage points

more likely to complete the experiment. This offers some suggestive evidence that the loss frame participants

did find their exams more unattractive and that the loss frame does induce loss preferences. Alternatively, it

may be that loss frame participants were simply more confused by their frame, and this confusion led them

to quit.

Table 1 uses responses from the post-exam survey to present summary statistics for our analytic sample,

and to test for whether there are observable differences between those who completed the experiment by

treatment group. Roughly 60% of subjects were female with an average age of 38 years. Approximately

two-thirds of subjects completed some post-secondary education. Two survey questions asked subjects to

rate their own risk preference and patience (on a scale from 1 to 7), while a series of other hypothetical

7

questions were asked in order to roughly proxy for the subject’s loss aversion, risk aversion, present bias,

and patience.8

Despite the differential attrition, baseline covariates of particiants are mostly balanced between the two

experimental conditions. The one exception is that the loss frame group has a higher fraction of women.

Mitigating this concern, when regressing an indicator for treatment on all covariates, a joint-significance

test fails to reject the null hypothesis that the coefficients are jointly equal to zero (F-stat of 1.05, p-value of

0.40). Nonetheless, the gender imbalance raises the possibility that there are unobserved differences across

the groups (that correlate with gender) and ultimately test performance.9

We will take two approaches to address this potential concern. First, we will test the sensitivity of

our results to the inclusion of our observable covariates. Shown later, we find that female subjects skip

more questions and answer more questions incorrectly, leading to lower scores. Since the loss frame had

disproportionately more women, this suggests that in the absence of controlling for gender, the loss frame

group will have relatively more skipping with lower test scores. We find, however, that the loss frame

experienced less skipping with higher test scores, and so if there is any unobserved correlate with gender,

then the gender imbalance is likely pushing in the opposite direction of our results and attenuating the

estimated treatment effects.

Alternatively, the potential participants who decided not to take the test may have been those who would

have exerted low effort, skipping at a relatively high rate and scoring poorly. This type of selection could

account for at least some of our estimated effects. To address this concern, a second approach will be to

bound our estimates using Lee (2009) bounds. Using this approach, we trim the gain frame sample so that

the share of observations is equal across frames, and then estimate treatment effects on the trimmed sample.

Specifically, we trim from above by dropping observations from the gain frame with the highest values of

the relevant outcomes, and then from below by dropping observations from the gain frame with the lowest

values of the relevant outcomes. As will be discussed further in Section 3.1.1, we find that for our skipping

and score outcomes, the relevant treatment effect bounds do not overlap with the value of zero. Thus, we

view it as unlikely that attrition from the loss frame can explain our treatment effects.8Please see the appendix for the exact wording of the questions. A subject is flagged as loss averse if the subject answered “An

additional $10 added to your winnings” to question #2 and “Yes, I’d risk losing the bonus $10 and flip a coin to win an additional$10 (for $20 total)” to question #3. A subject is flagged as risk averse if the subject answered “An additional $10 added to yourwinnings” to question #2. A subject is flagged as present biased if the subject answered “$10 given today” to question #4 and“$11 given in 6 months” to question #5. Lastly, a subject is flagged as patient if the subject answered “$11 given in one month” toquestion #4 and “$11 given in 6 months” to question #5.

9As previously noted, evidence from prior studies on gender differences in test performance when incorrect answers are treateddifferently than skipped questions.

8

2.3 Experiment in a college classroom

As a follow-up to our MTurk experiment, we randomly assigned loss frame and gain frame exams as

part of an online introductory microeconomics course. Treatment was administered across two midterms

and a final. Students were given 60 minutes to complete the midterm, each of which consisted of 50 multiple

choice questions. The final took two hours and consisted of 100 multiple choice questions. Each multiple

choice question had four potential answers.

We randomly assigned each student-exam either a gain frame or a loss frame (so that some students

were exposed to both frames while others may have been exposed to only one frame). We emailed students

approximately one week prior to their exam with which treatment they were assigned. Below we present

the relevant emailed text, which also appeared on their midterm (final) instructions:

Gain frame: The maximum number of points you can score on this midterm (final) is 200 (400). The

midterm (final) will be graded out of 200 (400) points. On this test, you will have the opportunity to skip

questions, if you desire.

• If you answer a question incorrectly, you will receive 0 points

• If you skip a question, you will receive 1 point

• If you answer a question correctly, you will receive 4 points

Loss frame: The maximum number of points you can score on this midterm (final) is 200 (400). The

midterm (final) will be graded out of 200 (400) points. On this test, you will have the opportunity to skip

questions, if you desire.

• First, you will start with 50 (100) points

• If you answer a question incorrectly, you will lose 1 point

• Nothing will happen if you skip a question (0 point change)

• If you answer a question correctly, you will receive 3 points

A notable shortcoming from the classroom experiment relative to the MTurk experiment is the lack of

sample size and corresponding statistical power. On the other hand, because treatment was re-randomized

for each exam, our data allow us to estimate models with student fixed effects. These in turn control for

9

all unobservable factors at the student level that may happen to correlate with treatment-assignment. For

instance, student fixed effects will account for any scenario of differential consent based on treatment. After

restricting our sample to students who consented to release their information and to those who experienced

both treatments across exams (to estimate student fixed effects), we are left with a sample of 66 students

(198 student-exam observations).

3 Results

3.1 Results from the Mturk Experiment

We begin by examining the raw data. Table 2 shows average performance and effort on the exam by

frame. Three results emerge from comparing average outcomes across loss and gain participants. First, loss

frame participants skip fewer questions. Relative to gain participants, loss participants skip about one fewer

question. Second, loss-frame participants have higher average scores (about 0.16 standard deviations higher)

and get about one more question correct relative to gain participants. Finally, loss frame participants exert

more effort on the exam, as measured by the total time that they took on the exam and their self-reported

effort. Relative to gain participants, loss participants spend about 6 seconds more on the multiple choice

questions, 20 seconds more on reading the instructions, and self-report exerting slightly more effort on the

exam.

In Table 3, we estimate the effects of the loss frame on question skipping and overall exam performance

using Ordinary Least Squares (OLS). Our outcome variables include the number of omitted questions, the

final score, the number of questions answered incorrectly, and the number of questions answered correctly,

which are all standardized to have a mean 0 and standard deviation of 1. For each outcome, we report the

coefficient on an indicator for loss frame, without controls (in odd numbered columns) and with controls for

demographics and other test-taker characteristics (in even numbered columns).10 These results show that

loss frame participants skip fewer questions (0.20 standard deviations of the mean), receive a higher overall

score (0.16 standard deviations of the mean), and significantly answering more questions correctly (0.21

standard deviations of the mean). There is little difference in the number of incorrect answers by frame.

This suggests that the improved test scores of loss participants are driven by skipping fewer questions and

answering more questions correctly, which is consistent with the theory that loss framing induces more effort10We report the coefficients on all covariates in the even numbered columns as well. Unsurprisingly, we first see that those with

higher education levels skip fewer questions and score higher on the exam. Moreover, those who are risk averse skip more questions(less willing to bear a risk in answering a question). In Appendix Table A1, we report pairwise correlations between individualdemographics and the outcome variables to uncover a similar pattern as the conditional results of 3.

10

on the exam. Additionally, the estimates are remarkably stable in response to the inclusion of controls; any

bias would need to come from unobserved differences that are not correlated with the included covariates.

To more directly test whether loss framing induced greater effort on the exam, we look at the amount

of time participants spent on the MTurk survey and self-reported effort on the exam. Table 4 shows OLS

estimates, where the outcomes include the total time spent on the multiple choice questions, the total time

spent on the other parts of the MTurk survey (reading the instructions and completing the post-exam survey),

an indicator for whether participants took the maximum time on the multiple choice questions (which was

180 seconds), and self-reported effort. Again, we standardize all outcome variables (except for the indicator

of whether participants took the maximum time on the exam). These results show that loss frame partici-

pants spend significantly more time on answering the multiple choice questions (0.15 standard deviations of

the mean) and on other aspects of the MTurk survey (0.08 standard deviations of the mean), are 9 percent-

age points more likely to take the maximum amount of time on the multiple choice questions, and report

significantly higher levels of effort exerted on the exam (0.08 standard deviations of the mean).

Next we examine how these patterns vary by gender. The evidence in Table 5 shows that women are

more likely to skip questions, receive lower scores, and answer more questions incorrectly. These findings

are consistent with those of prior studies. To investigate whether female participants have different strategies

for leaving a question unanswered by frame, we estimated the effect of the treatment interacted with gender.

Columns 2 and 5 of Table 5 show the estimation results for the OLS specification plus an interaction term

between Female and Loss. Female participants skip more questions on average (0.192 standard deviations

of the mean). This difference decreases (by 0.117 standard deviations of the mean) for loss-frame partic-

ipants, although this effect is not statistically significant at conventional levels. Female participants score

on average worse than male participants (0.202 standard deviations of the mean), but this difference does

not change for loss-frame participants. These results suggest that loss framing may, if anything, reduce the

gender gap in skip-rates, but has little effect on ultimate scores.

Similarly, we investigate how these patterns differ by education levels. Assuming less educated par-

ticipants possess weaker English vocabulary skills, we should observe greater amounts of skipping and/or

more incorrect answers for those with less education. Indeed, our results in Table 4 demonstrate that as ed-

ucation levels increase, skipping rates decrease and the number of questions answered correctly increases,

leading to higher test scores.11 To investigate whether loss framing differentially impacted participants with

different education levels, Columns 3 and 6 of Table 5 show the estimation results for the OLS specifica-

tion plus an interaction term between Loss and each of the following education categories: AA, BA, and11Those with higher education also exerted less effort on the test, suggesting they found the exam easier.

11

Postgrad (where HS or less is the omitted category). The differences in skipping and final scores across

education levels do not change significantly for loss frame participants. This suggests that loss framing did

not differentially impact test-taking strategy by education level.

3.1.1 Robustness

We next test the robustness of these results by first estimating a question-level analysis. Table 6 shows

OLS estimates for a question-level analysis, where the outcomes include whether a question was skipped

or answered correctly. We include question fixed effects and cluster our standard errors by individual. We

draw similar conclusions from the individual level regressions presented in Table 3. Loss frame participants

are significantly less likely to skip a question (5 percentage points less likely) and significantly more likely

to answer a question correctly (5 percentage points more likely). Moreover, the results are stable to the

inclusion of additional controls and question-order fixed effects.

As discussed earlier, we find that gain frame participants were slightly more likely to complete the

experiment. To assess whether this differential attrition is impacting our estimates, we perform a Lee (2009)

bounding exercise. Specifically, since we observe more attrition from the loss frame, we drop observations

from the gain frame with either the highest or lowest scores by a trimming factor of 3%.12 Then, we identify

treatment effects on the trimmed sample. Results from this exercise are shown in Table 7. The first column

shows our baseline estimates. The second column shows the upper and lower bound from this bounding

exercise. Importantly, for both of our primary outcomes, the relevant treatment-effect bounds do not overlap

the value of zero. Standard confidence intervals on these bounds, shown in Column 3, demonstrate that the

lower bound for the skipping outcome continues to differ from zero, whereas the lower bound for the score

outcome no longer differs from zero. However, Imbens and Manski (2004) propose a method for computing

smaller (and preferred) confidence interval for treatment effects which we present in Column 4. This time,

both treatment-effects differ from zero. As a result, we conclude that it is unlikely that our results can be

fully explained away by attrition from the loss frame by either very low or high preforming participants.

3.2 Results from the Classroom Experiment

Largely, the conclusions we draw from our classroom experiment are similar to those from the Mturk

experiment. Table 8 shows OLS estimates where the outcome is the total number of questions skipped

and the final score on the classroom exams. We show the results for all exams combined (Columns 1-21292% do not attrit from the gain frame and 89% do not attrit from the loss frame. The trimming factor of 3% is computed as

follows: (92-89)/92.

12

and 6-7), for the final exam (in Columns 3 and 8), and for the midterm exams (in Columns 4-5 and 9-10).

When we have multiple exams in the sample, we show the results with and without student fixed effects.

These results show that loss frame participants are less likely to skip questions than gain participants across

most samples and specifications. The effect is generally significant, except when we focus on all exams

combined, including student fixed effects (Column 2) and for the results that focus on the final exam only

(Column 3). However, even in the cases where the point estimate is not significant, it is negative, suggesting

that loss framing leads to fewer questions skipped even for these samples. While we do not find that loss

framing led to changes in final scores for any of our samples or specifications, the point estimates are all

positive suggesting that final scores may have increased under the loss framing in the classroom setting.

Although the estimates are less precise due to the large reduction in sample size, these classroom results

help corroborate our findings from the field experiment in a higher-stakes environment.

4 Discussion and Conclusions

Given the prevalence of high-stakes standardized testing throughout the world, it is important that these

tests accurately measure underlying student performance. Prior research has shown that seemingly innocu-

ous decisions about exam scoring can have sizable effects on measured performance, with these effects

contributing to test score gender gaps. This paper explores the role of loss aversion as a mechanism through

which the framing of how tests are scored can lead to differences in test performance. The effects are theo-

retically ambiguous. Under a “loss frame” where incorrect answers are scored as losing points, loss aversion

could induce exam takers to skip more questions and possibly obtain a lower overall score. Alternatively,

however, the loss frame could encourage an effort response whereby exam takers exert more effort which

makes them able to correctly answer questions about which they are uncertain and with lower effort would

just skip.

To empirically test between these alternatives, we use two experiments - one with a sample of workers

participating in the Amazon MTurk employment platform and a second in an undergraduate economics

course. In both, participants in the “gain frame” received points for skipping questions and did not lose

any points for incorrect answers. In the “loss frame” students lost points for incorrect answers and did

not receive points for skipped questions. Crucially, only the framing of the scoring differed across the two

conditions; total scores were always identical under the loss and gain frames.

We find strong evidence that the loss frame improved overall test scores and reduced question skipping.

We find no effect on the number of questions answered incorrectly. This suggests that the loss frame induced

13

correct answers that would have been skipped under the gain frame, which we interpret as evidence the

loss frame increased effort. More direct evidence in favor of effects on effort include positive effects on

time spent on the exam and self-reported effort on the exam. Consistent with prior research that women

perform worse when skipped and incorrect answers are scored differently, we find sizable gender gaps in

test performance. However, we do not find evidence of differential exam framing effects by gender.

Altogether, these results suggest that loss aversion triggered through exam framing can have important

effects on standardized test performance, and that these appear to operate through effects on effort. This has

important implications for how scores from tests that use differential scoring for skipped and unanswered

questions ought to be interpreted. Consistent with Gneezy et al. (2019), these results suggest that effort on

the exam task, and not aptitude alone, plays a significant role in test performance. Our results build on this

finding by highlighting the role of loss aversion plays in affecting exam effort.

Our results also relate to a greater literature on behavioral biases in the lab versus in the field. A common

concern from earlier literature was that loss aversion and reference dependence were figments of the lab, and

that in alternative settings with sufficiently high stakes and/or experienced decision makers, these behavioral

biases would cease to exist (Levitt and List, 2008). A growing literature has, however, uncovered loss averse

preferences in several field settings with high stakes, including in gambling (Lien and Zheng, 2015), game

shows (Post et al., 2008), and professional sports (Pope and Schweitzer, 2011; Lusher et al., 2018). Our

study uncovers effort effects from loss aversion in two settings with differing stakes: one where the stakes

are experimental earnings, and the other which impacts a student’s actual grade in the course. Thus, though

the stakes of our experiments may pale in comparison to those of tests like the SAT or GMAT, the prior

literature suggests there’s still ample reason to believe that even in higher stakes settings, effort can be

impacted by the framing of the test.

14

References

APOSTOLOVA-MIHAYLOVA, M., W. COOPER, G. HOYT, AND E. C. MARSHALL (2015): “Heterogeneousgender effects under loss aversion in the economics classroom: A field experiment,” Southern EconomicJournal, 81, 980–994.

BALART, P., L. EZQUERRA, AND I. HERNANDEZ-ARENAZ (2020): “Framing Effects on Risk-TakingBehavior: Evidence from a Field Experiment,” Available at SSRN 3556710.

BALDIGA, K. (2014): “Gender differences in willingness to guess,” Management Science, 60, 434–448.

BUHRMESTER, M., T. KWANG, AND S. D. GOSLING (2016): “Amazon’s Mechanical Turk: A new sourceof inexpensive, yet high-quality data?” .

CHEUNG, J. H., D. K. BURNS, R. R. SINCLAIR, AND M. SLITER (2017): “Amazon Mechanical Turkin organizational psychology: An evaluation and practical recommendations,” Journal of Business andPsychology, 32, 347–361.

COFFMAN, K. B. AND D. KLINOWSKI (2020): “The impact of penalties for wrong answers on the gendergap in test scores,” Proceedings of the National Academy of Sciences, 117, 8794–8803.

DUCKWORTH, A. L., P. D. QUINN, D. R. LYNAM, R. LOEBER, AND M. STOUTHAMER-LOEBER (2011):“Role of test motivation in intelligence testing,” Proceedings of the National Academy of Sciences, 108,7716–7720.

FINN, B. (2015): “Measuring motivation in low-stakes assessments,” ETS Research Report Series, 2015,1–17.

FRYER JR, R. G., S. D. LEVITT, J. LIST, AND S. SADOFF (2012): “Enhancing the efficacy of teacherincentives through loss aversion: A field experiment,” Tech. rep., National Bureau of Economic Research.

GACHTER, S., E. J. JOHNSON, AND A. HERRMANN (2007): “Individual-level loss aversion in riskless andrisky choices,” .

GNEEZY, U., J. A. LIST, J. A. LIVINGSTON, X. QIN, S. SADOFF, AND Y. XU (2019): “Measuring successin education: the role of effort on the test itself,” American Economic Review: Insights, 1, 291–308.

HAMBY, T. AND W. TAYLOR (2016): “Survey satisficing inflates reliability and validity measures: Anexperimental comparison of college and Amazon Mechanical Turk samples,” Educational and Psycho-logical Measurement, 76, 912–932.

HERNANDEZ, M. AND J. HERSHAFF (2015): “Skipping Questions in School Exams: The Role of Non-Cognitive Skills on Educational Outcomes,” .

HORTON, J. J., D. G. RAND, AND R. J. ZECKHAUSER (2011): “The online laboratory: Conductingexperiments in a real labor market,” Experimental economics, 14, 399–425.

HOSSAIN, T. AND J. A. LIST (2012): “The behavioralist visits the factory: Increasing productivity usingsimple framing manipulations,” Management Science, 58, 2151–2167.

IMAS, A., S. SADOFF, AND A. SAMEK (2017): “Do people anticipate loss aversion?” Management Sci-ence, 63, 1271–1284.

15

IMBENS, G. W. AND C. F. MANSKI (2004): “Confidence intervals for partially identified parameters,”Econometrica, 72, 1845–1857.

IRIBERRI, N. AND P. REY-BIEL (2019): “Brave Boys and Play-it-Safe Girls: Gender Differences in Will-ingness to Guess in a Large Scale Natural Field Experiment,” .

JALAVA, N., J. S. JOENSEN, AND E. PELLAS (2015): “Grades and rank: Impacts of non-financial incen-tives on test performance,” Journal of Economic Behavior & Organization, 115, 161–196.

LEE, D. S. (2009): “Training, wages, and sample selection: Estimating sharp bounds on treatment effects,”The Review of Economic Studies, 76, 1071–1102.

LEVITT, S. D. AND J. A. LIST (2008): “Homo economicus evolves,” Science, 319, 909–910.

LEVITT, S. D., J. A. LIST, S. NECKERMANN, AND S. SADOFF (2016): “The behavioralist goes to school:Leveraging behavioral economics to improve educational performance,” American Economic Journal:Economic Policy, 8, 183–219.

LIEN, J. W. AND J. ZHENG (2015): “Deciding When to Quit: Reference-Dependence over Slot MachineOutcomes,” American Economic Review, 105, 366–70.

LUSHER, L., C. HE, AND S. FICK (2018): “Are professional basketball players reference-dependent?”Applied Economics, 50, 3937–3948.

MCEVOY, D. M. ET AL. (2016): “Loss aversion and student achievement,” Economics Bulletin, 36, 1762–1770.

PAOLACCI, G., J. CHANDLER, AND P. G. IPEIROTIS (2010): “Running experiments on amazon mechanicalturk,” Judgment and Decision making, 5, 411–419.

PEKKARINEN, T. (2015): “Gender differences in behaviour under competitive pressure: Evidence on omis-sion patterns in university entrance examinations,” Journal of Economic Behavior & Organization, 115,94–110.

PIERCE, L., A. REES-JONES, AND C. BLANK (2020): “The Negative Consequences of Loss-FramedPerformance Incentives,” Tech. rep., National Bureau of Economic Research.

POPE, D. G. AND M. E. SCHWEITZER (2011): “Is Tiger Woods loss averse? Persistent bias in the face ofexperience, competition, and high stakes,” The American Economic Review, 101, 129–157.

POST, T., M. J. VAN DEN ASSEM, G. BALTUSSEN, AND R. H. THALER (2008): “Deal or no deal?Decision making under risk in a large-payoff game show,” The American Economic Review, 38–71.

RABIN, M. (2000): “Risk Aversion and Expected-Utility Theory: A Calibration Theorem,” Econometrica,68, 1281–1292.

THOMAS, K. A. AND S. CLIFFORD (2017): “Validity and Mechanical Turk: An assessment of exclusionmethods and interactive experiments,” Computers in Human Behavior, 77, 184–197.

TOM, S. M., C. R. FOX, C. TREPEL, AND R. A. POLDRACK (2007): “The neural basis of loss aversion indecision-making under risk,” Science, 315, 515–518.

TVERSKY, A. AND D. KAHNEMAN (1979): “Prospect theory: An analysis of decision under risk,” Econo-metrica, 47, 263–291.

16

WISE, S. L. AND C. E. DEMARS (2005): “Low examinee effort in low-stakes assessment: Problems andpotential solutions,” Educational assessment, 10, 1–17.

17

Tables and Figures

Table 1: Summary statistics and balance test

(1) (2) (3) (4)All Gain frame Loss frame Difference: (2)-(3)

Female 0.616 0.591 0.642 -0.051**

Age 37.859 38.026 37.689 0.337[0.270] [0.376] [0.389]

Education:-HS or less 0.330 0.339 0.322 0.017

-AA 0.165 0.155 0.176 -0.022

-BA 0.367 0.375 0.358 0.017

-Postgrad 0.138 0.132 0.143 -0.012

Willingness to Take Risks (1-7) 2.799 2.782 2.816 -0.034[0.036] [0.050] [0.052]

Self-Reported Patience (1-7) 3.176 3.171 3.182 -0.010[0.030] [0.042] [0.044]

Loss Averse 0.056 0.055 0.057 -0.002

Risk Averse 0.880 0.881 0.880 0.001

Present Biased 0.255 0.253 0.256 -0.003

Patient 0.149 0.139 0.159 -0.020

N 1904 963 941Joint-significance test (p-value) 0.40

Note: Standard deviations for non-binary covariates presented in brackets. Tests for statistically significant differences between thetwo frames are conducted in column (4). The joint-significance test is conducted by regressing an indicator for gain frame on allcovariates (testing for whether coefficients are jointly equal to zero). *p<0.10, ** p<0.05, *** p<0.01

18

Table 2: Pairwise differences in outcomes across frames

(1) (2) (3) (4)All Loss frame Gain frame t-test: (2)-(3)

Loss 0.494 1.000 0.000

Total Skipped 3.607 3.241 3.965 -0.723***[0.082] [0.108] [0.121]

Score 20.189 20.563 19.823 0.740***[0.108] [0.152] [0.152]

Total Incorrect 3.102 3.098 3.106 -0.008[0.051] [0.073] [0.070]

Total Correct 8.291 8.661 7.929 0.732***[0.081] [0.110] [0.118]

Think Correctly Answered 8.591 8.678 8.505 0.173[0.082] [0.113] [0.118]

Time (MC) - Seconds 185.048 187.830 182.330 5.499***[0.825] [1.095] [1.226]

Time (Other) - Seconds 297.832 308.204 287.698 20.507*[6.010] [9.350] [7.588]

Self-Reported Effort (1-7) 4.191 4.233 4.151 0.082*[0.023] [0.032] [0.032]

N 1904 963 941

Note: Standard deviations for non-binary covariates presented in brackets. Tests for statistically significant differences between thetwo frames are conducted in column (4). *p<0.10, ** p<0.05, *** p<0.01

19

Table 3: Effect of loss frame on skipping and performance

Total Skipped (std) Score (std) Total Incorrect (std) Total Correct (std)(1) (2) (3) (4) (5) (6) (7) (8)

Loss -0.203*** -0.204*** 0.157*** 0.159*** -0.00369 -0.00464 0.206*** 0.208***(0.0455) (0.0450) (0.0457) (0.0428) (0.0459) (0.0439) (0.0456) (0.0435)

Female 0.115** -0.172*** 0.0914** -0.172***(0.0481) (0.0454) (0.0466) (0.0463)

Age -0.00222 0.0112*** -0.0101*** 0.00853***(0.00199) (0.00181) (0.00188) (0.00187)

AA -0.00999 -0.00976 0.0185 -0.00146(0.0698) (0.0639) (0.0712) (0.0640)

BA -0.126** 0.352*** -0.274*** 0.297***(0.0551) (0.0508) (0.0527) (0.0524)

Postgrad -0.260*** 0.541*** -0.367*** 0.490***(0.0715) (0.0720) (0.0700) (0.0725)

Loss Averse 0.00802 0.0539 -0.0640 0.0318(0.127) (0.128) (0.142) (0.121)

Risk Averse 0.148 0.309*** -0.449*** 0.131(0.0969) (0.0913) (0.100) (0.0905)

Present Biased -0.0886* 0.241*** -0.185*** 0.204***(0.0527) (0.0507) (0.0506) (0.0517)

Patient -0.203*** 0.501*** -0.371*** 0.435***(0.0628) (0.0623) (0.0596) (0.0636)

Mean Y (not std) 3.607 3.607 20.19 20.19 3.102 3.102 8.291 8.291N 1904 1904 1904 1904 1904 1904 1904 1904

Note: This table contains OLS estimates, where the outcomes include the number of omitted questions, the final score, the numberof questions answered incorrectly, and the number of questions answered correctly. All outcomes are standardized to have a meanof 0 and standard deviation of 1. Odd numbered columns do not include controls, while even number columns include controlsfor gender, age, education level (with a high school degree or less as the omitted category), loss aversion, risk aversion, presentbiased, and a measure of patience. Observations are at the individual level. Robust standard errors in parenthesis, where *p<0.10,** p<0.05, *** p<0.01

20

Table 4: Effect of loss frame on effort

Time (MC only) Time (Full) Self-Reported EffortIn Seconds (std) In Seconds (std) Spent Max Time on MC (std)(1) (2) (3) (4) (5) (6) (7) (8)

Loss 0.153*** 0.144*** 0.0782* 0.0778* 0.0869*** 0.0822*** 0.0823* 0.0822*(0.0457) (0.0451) (0.0459) (0.0445) (0.0227) (0.0227) (0.0458) (0.0452)

Female 0.0988** 0.153*** 0.0632*** 0.132***(0.0500) (0.0496) (0.0238) (0.0481)

Age 0.00277 0.0113*** 0.000452 0.0104***(0.00201) (0.00178) (0.000982) (0.00196)

AA 0.0593 -0.0311 0.0403 -0.122*(0.0675) (0.0596) (0.0344) (0.0683)

BA -0.0769 -0.0342 -0.0198 -0.114**(0.0549) (0.0575) (0.0273) (0.0537)

Postgrad -0.128* -0.151*** -0.0745** -0.125*(0.0724) (0.0583) (0.0362) (0.0729)

Loss Averse 0.399*** 0.0202 0.130** 0.0678(0.154) (0.192) (0.0638) (0.145)

Risk Averse 0.336*** -0.197 0.113*** 0.189*(0.126) (0.137) (0.0440) (0.107)

Present Biased 0.149*** 0.0288 0.0244 0.00888(0.0514) (0.0569) (0.0271) (0.0533)

Patient 0.137** -0.0984 0.0471 -0.0467(0.0634) (0.0645) (0.0333) (0.0703)

Mean, dept. var 185.0 185.0 297.8 297.8 0.442 0.442 4.191 4.191N 1904 1904 1904 1904 1904 1904 1904 1904

Note: This table contains OLS estimates, where the outcomes include the total time spent on the multiple choice questions, the totaltime spent on the other parts of the MTurk survey (reading the instructions and completing the post-exam survey), an indicator forwhether participants took the maximum time on the multiple choice questions (which was 180 seconds), and self-reported effort.All outcomes, except for the indicator of whether participants took the maximum time on the exam, are standardized to have a meanof 0 and standard deviation of 1. Odd numbered columns do not include controls, while even number columns include controls forgender, age and education level (with a high school degree or less as the omitted category). Observations are at the individual level.Robust standard errors in parenthesis, where *p<0.10, ** p<0.05, *** p<0.01

21

Table 5: Treatment effect interacted with gender and education

Total Skipped (std) Score (std)(1) (2) (3) (4) (5) (6)

Loss -0.210*** -0.137* -0.216*** 0.170*** 0.159** 0.101(0.0451) (0.0724) (0.0821) (0.0437) (0.0705) (0.0716)

Female 0.136*** 0.192*** 0.140*** -0.194*** -0.202*** -0.196***(0.0474) (0.0696) (0.0475) (0.0458) (0.0647) (0.0459)

Age -0.00212 -0.00210 -0.00206 0.0107*** 0.0107*** 0.0107***(0.00199) (0.00198) (0.00199) (0.00185) (0.00185) (0.00185)

AA -0.00813 -0.00638 -0.00112 -0.0153 -0.0156 -0.0557(0.0702) (0.0701) (0.106) (0.0657) (0.0658) (0.0938)

BA -0.140** -0.139** -0.115 0.393*** 0.393*** 0.309***(0.0551) (0.0552) (0.0816) (0.0516) (0.0516) (0.0729)

Postgrad -0.284*** -0.287*** -0.385*** 0.611*** 0.611*** 0.641***(0.0712) (0.0712) (0.102) (0.0725) (0.0726) (0.105)

Female * Loss -0.117 0.0179(0.0929) (0.0902)

AA * Loss -0.0137 0.0834(0.140) (0.131)

BA * Loss -0.0514 0.173*(0.110) (0.103)

Postgrad*Loss 0.197 -0.0539(0.141) (0.144)

Mean Y (not std) 3.607 3.607 3.607 20.19 20.19 20.19N 1904 1904 1904 1904 1904 1904Controls Yes Yes Yes Yes Yes Yes

Note: This table contains OLS estimates, where the outcomes include the number of omitted questions, the final score, the numberof questions answered incorrectly, and the number of questions answered correctly. All outcomes are standardized to have a meanof 0 and standard deviation of 1. All columns include controls for gender, age, education level (with a high school degree or lessas the omitted category). Columns 2 and 5 include an interaction term between Female and Loss. Columns 3 and 6 include aninteraction term between Loss and each of the following education categories: AA, BA, and Postgrad (where HS or less is theomitted category). Observations are at the individual level. Robust standard errors in parenthesis, where *p<0.10, ** p<0.05, ***p<0.01

22

Table 6: Robustness to individual-question level analysis

Skipped Correct(1) (2) (3) (4) (5) (6)

Loss -0.0482*** -0.0498*** -0.0498*** 0.0488*** 0.0516*** 0.0516***(0.0108) (0.0107) (0.0107) (0.0108) (0.0104) (0.0104)

Female 0.0324*** 0.0324*** -0.0466*** -0.0466***(0.0113) (0.0113) (0.0109) (0.0109)

Age -0.000504 -0.000504 0.00194*** 0.00194***(0.000471) (0.000471) (0.000447) (0.000447)

AA -0.00193 -0.00193 -0.00144 -0.00144(0.0166) (0.0167) (0.0154) (0.0154)

BA -0.0332** -0.0332** 0.0783*** 0.0783***(0.0131) (0.0131) (0.0125) (0.0125)

Postgrad -0.0674*** -0.0674*** 0.130*** 0.130***(0.0169) (0.0169) (0.0171) (0.0171)

Mean, dept. var 0.240 0.240 0.240 0.553 0.553 0.553N 28560 28560 28560 28560 28560 28560FE Question ID Question ID Question ID & Question ID Question ID Question ID &

Question Order Question Order

Note: This table contains OLS estimates, where the outcomes include whether a question was skipped or answered correctly. Allcolumns include question fixed effects. Columns 2 and 5 additionally control for gender, age, education level (with a high schooldegree or less as the omitted category). Columns 3 and 6 additionally include question order fixed effects. Observations are at thequestion level. Robust standard errors clustered at the individual level in parenthesis, where *p<0.10, ** p<0.05, *** p<0.01

Table 7: Bounding exercises on treatment effects

95% CIOutcome Baseline Estimates Lee Bounds Standard Imbens and Manski

(1) (2) (3) (4)

Skipping (Std) -0.203*** [-0.558, -0.111] [-0.660, -0.001] [-0.644, -0.018](0.0455)

Score (Std) 0.157*** [0.092, 0.230] [-0.009, 0.336] [0.0072, 0.319](0.0457)

N 1904 2099N - Selected Obs - 1904Trimming Proportion - 0.033

Note: The first column presents our baseline coefficients on the effect of loss framing on the number of omitted questions and thefinal score (standardized to have a mean of 0 and a standard deviation of 1). See Table 3 for the full set of controls used. Robuststandard errors in parenthesis, where *p<0.10, ** p<0.05, *** p<0.01. The second column shows the upper and lower boundsfrom the Lee (2009) bounding exercise. The third column shows 95% confidence intervals for these bounds. The fourth columnsshows 95% confidence intervals for these bounds suggested by Imbens and Manski (2004).

23

Table 8: Results from experiment in introductory microeconomics course

Total skipped Score

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10)Loss framing -1.664∗ -0.820 -0.440 -2.117∗∗ -2.098∗∗ 2.506 1.340 1.011 3.278 1.810

(0.850) (0.819) (1.795) (0.851) (0.834) (1.675) (1.395) (2.098) (2.156) (1.960)Female -0.065 -1.279 0.528 0.414 0.887 0.220

(0.797) (1.638) (0.885) (2.664) (3.186) (2.795)Semester units -0.055 -0.121 -0.030 0.544∗∗ 0.233 0.710∗∗

(0.121) (0.237) (0.153) (0.251) (0.292) (0.324)Cumulative GPA -0.816 0.015 -1.251 8.514∗∗∗ 10.161∗∗∗ 7.734∗∗∗

(0.868) (1.835) (0.865) (2.088) (2.968) (2.132)Observations 198 198 66 132 132 198 198 66 132 132Skip rate 0.059 0.059 0.045 0.066 0.066Mean score 74.381 74.381 77.614 72.765 72.765Other controls X X X X X XStudent FE X X X XSample MT+Final MT+Final Final Midterms Midterms MT+Final MT+Final Final Midterms Midterms

Notes: Unit of observation at the student-exam level. All regressions include dummies for exam. “Other controls” includeindicators for college of major, year, admission term, degree type (BA vs. BS vs. Other), citizenship, Pell recipiency, andgender, and linear controls for term units, cumulative units, and cumulative GPA. Standard errors clustered at the studentlevel. One, two, and three asterisks indicate statistical significance at the 10, 5, and 1 percent levels, respectively.

24

5 Online Appendix: Additional Tables and Figures

Table A1: Correlations between Outcomes and Participant Characteristics

Outcome Variable Total Skipped Score Mean Char.

Female 0.450*** -0.808*** 0.616(0.167) (0.221)

Age -0.006 0.050*** 37.86(0.007) (0.009)

Education:- AA -0.0301 -0.0192

(0.251) (0.310) 0.165

- BA -0.513*** 1.912*** 0.367(0.197) (0.246)

- Postgrad -1.070*** 3.086*** 0.138(0.252) (0.345)

Willingness to 0.001 -0.081 2.799Take Risks (1-7) (0.056) (0.070)

Self-Reported -0.132** 0.038 3.176Patience (1-7) (0.064) (0.079)

Loss Averse -0.495 -1.111** 0.0562(0.326) (0.482)

Risk Averse 0.545** 1.296*** 0.880(0.245) (0.339)

Present Biased -0.173 0.607** 0.255(0.180) (0.242)

Patient -0.838*** 2.559*** 0.149(0.211) (0.291)

N 1904 1904 1904

Note: Each row shows results from a separate regression where we regress each outcome (total skipped or score) on each baselinecharacteristic. For education level, each outcome is regressed on a dummy variable for educational attainment (where the omittedcategory is high school degree or less). The third column shows the mean of the given characteristic across all Mturk participants.Robust SE. *p<0.10, ** p<0.05, *** p<0.01

25

Figure A1: Differences Between Loss and Gain Frame in Skipping by Question

Figure A2: Differences Between Loss and Gain Frame in Skipping - Order Question was Viewed

Note: The 15 Questions were randomized. This figure plots differences in the likelihood of skipping the questions in the order they

were viewed (i.e. The first, second, third,.... question viewed differs across participants).

26

Figure A3: Histogram - Self-Reported Effort by Frame

27

6 Online Appendix: Text from MTurk experiment

6.1 Recruitment text posted on MTurk

As part of this study, you will take a standardized test with sentence completion questions. After completing

the test, you will be asked to do a short survey. Your total monetary reward will be determined by your

performance on the standardized test.

Your total monetary reward will be determined by your performance on the standardized test. The details of

the rewards and scoring of the test will be shown once you begin this experiment. We expect this study to

take 15 minutes. Just for participating you will receive $0.01. The maximum additional payment based on

your performance that you will receive is $3.00.

At the end of the survey you will be provided a validation code. Please enter that validation code here where

it says “Provide the survey code here.”

6.2 MTurk experiment instructions, questions and answers, and post-exam survey

Thank you for your interest in our study! In order to participate, please read through the consent form on

the next page and check the box acknowledging your willingness to participate in the study.

As part of this study, you will take a standardized test with sentence completion questions. After completing

the test, you will be asked to do a short survey.

Your monetary reward will be determined by your performance on the standardized test. The details of the

rewards and scoring of the test will be shown later.

[Page break]

[Informed consent]

28

[Page break]

Scoring the test

The test will have 15 multiple-choice questions. You will have 3 minutes and 30 seconds to complete the

test, or 14 seconds per question on average.

Your performance on each section of the test will be combined into a single point total. Your monetary

reward for participating in the study will depend on this point total. You will be paid $0.10 for every point

you receive.

[Loss frame only:] You will start out with 15 points ($1.50). You will receive 1 point ($.10) for each correct

answer. You will receive 0 points for questions you skip/do not answer. You will lose 1 point (- $0.10) for

each incorrect answer.

[Loss frame only:] For instance, if you answer 8 questions correctly, skip 4 questions, and answer 3 incor-

rectly, your total number of points will be (15 + 8*1 - 3*1) = 20 points, and you will earn $2.00 at the end

of the survey.

[Gain frame only:] You will start out with 0 points ($0.00). You will receive 2 points ($.20) for each correct

answer. You will receive 1 point ($0.10) for questions you skip/do not answer. You will receive 0 points

($0.00) for each incorrect answer.

[Gain frame only:] For instance, if you answer 8 questions correctly, skip 4 questions, and answer 3 incor-

rectly, your total number of points will be (8*2 + 4*1) = 20 points, and you will earn $2.00 at the end of the

survey.

Before starting the exam, we want to make sure you understand how your exam performance determines the

amount of money you will receive. To demonstrate your understanding, please enter the amount of money

you would receive if your exam performance was as described below.

29

Question: How much money would you earn if you answered 9 questions correctly, 2 incorrectly, and you

skipped 4 questions? (do not include the ”$” sign)

[Page break]

Correct! By answering 9 questions correctly, 2 incorrectly, and skipping 4 questions, you would earn $2.20.

Each question will have exactly ONE correct answer. You can skip a question either by leaving it blank OR

by clicking on the ”Skip this question (or leave all choices blank)” option.

[Page break]

When you are ready to start the test, click on the arrow below to start the test. Once you do that, the timer

will begin. Once time has expired, you will not be able to work on the test any more.

[Loss frame only:] Remember you will start out with 15 points ($1.50). You will receive 1 point ($.10) for

each correct answer. You will receive 0 points for questions you skip/do not answer. You will lose 1 point (-

$0.10) for each incorrect answer.

[Gain frame only:] Remember you will start out with 0 points ($0.00). You will receive 2 points ($0.20)

for each correct answer. You will receive 1 point ($0.10) for questions you skip/do not answer. You will

receive 0 points ($0.00) for each incorrect answer.

GOOD LUCK!!

[Page break]

[Note: Order of questions randomized across participants]

Question: The professor became increasingly (blank) in later years, flying into a rage whenever he was

30

opposed. [taciturn / irascible / Skip this question (or leave all choices blank)]

Question: Many so-called social playwrights are distinctly (blank); rather than allowing the members of the

audience to form their own opinions, these writers force a viewpoint on the viewer. [didactic / conciliatory /

Skip this question (or leave all choices blank)]

Question: The flood of innovation that has engendered many of last decade’s technological breakthroughs

has also claimed some victims in its wake: companies once at the (i) (blank) of such innovation have now

become (ii) (blank). [(i) forefront - (ii) mostly obsolete / (i) periphery - (ii) increasingly relevant / Skip this

question (or leave all choices blank)]

Question: In a fit of (blank) she threw out the valuable statue simply because it had belonged to her ex-

husband. [contrition / pique / Skip this question (or leave all choices blank)]

Question: For a writer with a reputation for both prolixity and inscrutability, Thompson, in this slim col-

lection of short stories, may finally be intent on making his ideas more (blank) to a readership looking for

quick edification. [palatable / transcendent / Skip this question (or leave all choices blank)]

Question: Vain and prone to violence, Caravaggio could not handle success: the more his (i)(blank) as an

artist increased, the more (ii)(blank) his life became. [(i) notoriety - (ii) dispassionate / (i) eminence -(ii)

tumultuous / Skip this question (or leave all choices blank)]

Question: During their famous clash, Jung was ambivalent about Freud, so he attacked the father of modern

psychoanalysis even as he (blank) him. [understood / revered / Skip this question (or leave all choices blank)]

Question: Rubens, for all his high-flown rhetoric, churns out book reviews that have come to seem (blank):

from decades of critiquing other’s prose, he now relies on a familiar and tired formula. [perfunctory /

scathing / Skip this question (or leave all choices blank)]

Question: The government has taken exclusively ceremonial steps to alleviate illiteracy, poverty, and disease

31

in the countryside, (blank) the assumptions that the leadership only cares about its wealthy urban population.

[strengthening / disproving / Skip this question (or leave all choices blank)]

Question: Lacking sacred scriptures or (blank), Shinto is more properly regarded as a legacy of traditional

religious practices and basic values than as a formal system of belief. [dogma / relics / Skip this question

(or leave all choices blank)]

Question: Much of the consumer protection movement is predicated on the notion that routine exposure to

seemingly (blank) products can actually have long-term deleterious consequences. [banal / benign / Skip

this question (or leave all choices blank)]

Question: Fred often used (blank) to achieve his professional goals, even though such artful subterfuge

alienated his colleagues. [chicanery / bombast / Skip this question (or leave all choices blank)]

Question: Inured to the intense work engendered by the deadlines they normally faced, the production man-

agers felt somewhat (blank) by the temporary hiatus in orders. [wizened / disoriented / Skip this question

(or leave all choices blank)]

Question: In parts of the Arctic, the land grades into the land fast ice so (blank) that you can walk off the

coast and not know you are over the hidden sea. [imperceptibly / precariously / Skip this question (or leave

all choices blank)]

Question: The formerly (blank) waters of the lake have been polluted so that the fish are no longer visible

from the surface. [tranquil / pellucid / Skip this question (or leave all choices blank)]

[Page break]

Question #1: Out of the 15 questions, how many do you think you answered correctly?

Question #2: Suppose you were offered one of the following additional rewards for participating in the

32

study. Which do you prefer? [An additional $10 added to your winnings / A lottery where a coin is flipped

and if the coin comes up Tails, then an additional $20 are added to your winnings. If the coin comes up

Heads, then you receive no additional winnings]

Question #3: Suppose instead that you got a bonus $10 for participating in the study. Then, you were offered

a chance to flip a coin where if the coin comes up Tails, then you win an additional $10, but if the coin comes

up Heads, then you lose the bonus $10 (for $0 total). Would you take the chance and flip the coin? [Yes, I’d

risk losing the bonus $10 and flip a coin to win an additional $10 (for $20 total) / No, I don’t want to flip a

coin and I’ll take the bonus $10]

Question #4: Suppose instead you were offered one of the following additional rewards for participating in

the experiment. Which do you prefer? [$10 given today / $11 given in one month]

Question #5: Suppose instead you were offered one of the following additional rewards for participating in

the experiment. Which do you prefer? [$10 given in 5 months / $11 given in 6 months]

Question #6: What is your age (in years)?

Question #7: What gender do you identify yourself as?[Male / Female / Non-Binary]

Question #8: Is English your first language? [Yes No]

Question #9: What is the highest degree or level of school that you have COMPLETED? [Did not complete

high school / High school diploma (including GED or alternative credential) / Associate’s degree / Bache-

lor’s degree / Postgraduate degree]

Question #10: How willing are you to take risks in general? (On a scale from 1, unwilling to 7, very willing)

Question #11: How patient are you in general? (On a scale from 1, extremely impatient, to 7, extremely

patient)

33

Question #12: How much effort did you exert on this exam? (On a scale from 1, very little effort, to 7, a lot

of effort)

Question #13: Have you participated in previous research studies on MTurk? [Yes / No]

Please enter your Mechanical Turk Id (this will be needed to compute your payment).

34

the effects of exam frames on student effort and

Documents