ambiguity in performance pay: a real effort...

Ambiguity in Performance Pay: A Real Effort Experiment

April 20, 2016

David J. Cooper Florida State University

David B. Johnson

University of Central Missouri

Abstract: Many incentive contracts are inherently ambiguous, lacking an explicit mapping between

performance and pay. Using an online labor market, Amazon Mechanical Turk, we study the effect of

ambiguous incentives on willingness to accept contracts to do a real-effort task, the probability of

completing the task, and performance at the task. Even modest levels of ambiguity about the relationship

between performance and pay are sufficient to eliminate the positive selection effect associated with piece

rates, as high ability individuals are no more likely than low ability individuals to accept a contract. Piece

rate contracts significantly improve performance relative to fixed wages, primarily due to selection, but this

positive effect is not present with ambiguous incentive contracts. Modest levels of ambiguity reduce the

probability that subjects accept an incentive contract and all types of ambiguous incentive contracts increase

the probability of quitting after having accepted an incentive contract. Information about individual ability

at the task reduces the probability that subjects choose and complete the task.

Acknowledgements: We would like to thank Javier Cano-Urbina, Yan Chen, Emel Filiz Ozbay, Anastasia

Semykina, and seminar participants at Chapman University, EWEBE, Florida State University, and the

University of Maryland for helpful comments. We would like to thank Joe Stinn for his help as a research

assistant and Sue Isaac for attempting (unsuccessfully) to improve our shaky grasp on the English language.

Keywords: ambiguity, incentives, performance pay, quitting, experiment

JEL Classification: C9, J33, D81

1

1. Introduction: Incentive contracts tying pay to performance are ubiquitous in both the public and

private sectors. In some cases, these contracts are extremely explicit, making the relationship between

performance and pay unambiguous. Sales commissions and piece rate systems are well-known examples

of this sort of incentive contract. At the opposite end of the spectrum are incentive schemes where

employees are completely uninformed about the relationship between their performance and pay. The

“black box” compensation used by some large law firms (i.e. Jones Day) provide a striking example of

this type of incentive contract. Herrmann (2012) describes black box compensation as follows:

“Under this system, the managing partner (or a small committee) sets compensation for each partner in the firm. There is no specific formula for allocating the spoils, and partners are forbidden from discussing their compensation with each other. Each partner is told what he’ll make in the coming year … and the process is over.”

Most incentive schemes fall somewhere between these extremes. While employees have an idea about

what constitutes good performance and what the relationship is between performance and pay, the precise

definitions of good performance and the relationship between performance and pay may be ambiguous. An

example that academics are painfully familiar with is the tenure system. While senior faculty can usually

give general guidance about what is necessary for tenure, could they state a precise rule? More publications

are better than less, but exactly how many are needed to get tenure? Better quality publications count for

more, but exactly how is quality measured?

Continuing the tenure example, novice hires often face uncertainty not just about the incentive system,

but also about their likely performance. Suppose a newly hired economics professor is told that they need

six publications in top field journals or better. It seems unlikely that a newly minted PhD knows precisely

how difficult this task is or exactly how good they are at this task.

The preceding examples suggest that workers face ambiguity as well as risk. If they know neither their

ability at their job nor precisely how their performance will translate into pay, it is hard to imagine that

workers know the probability distribution generating their performance pay. Our primary goal is to study

how ambiguity about incentive contracts affects workers. Specifically, how does ambiguity affect the

willingness of workers to take a job, the likelihood that they quit a job, and their performance at this job?

To address the effects of ambiguous incentive plans, we conduct an online experiment, recruiting

subjects from Amazon Mechanical Turk (AMT). Subjects choose between two real effort tasks: coding

messages drawn from Charness and Dufwenberg (2006) or transcribing printed words into a text box.

Coding messages is a challenging task where it is difficult for a subject to tell how well they are doing,

making it a good task for inducing ambiguity about ability and performance. Subjects who choose the

coding task earn a bonus which varies based on their performance. The transcription task is a sure thing.

2

It takes a fixed amount of time to complete, identical to the time for the coding task, and yields a fixed and

known bonus.

We vary subjects’ information about the coding task along two dimensions: (1) the relationship between

their performance and their bonus and (2) their likely performance. There are five treatments along the first

dimension, two involving no ambiguity and three varying the level of ambiguity. The true bonus system

for all treatments (except the fixed bonus treatment) is a 9¢ piece rate for each message coded correctly.

1. Fixed (FIXED): Subjects receive a fixed and known amount for the coding task.

2. Piece Rate (PR): Subjects receive a known piece rate (9¢) for each message coded correctly. They

know the rule for determining whether a coding is correct.

3. Low Ambiguity (LA): Subjects know the rule for determining whether a message is coded

correctly. They know the minimum ($0.00) and maximum ($1.35) possible bonus and that their

bonus depends on how well they do at coding, but do not know the relationship between the number

of messages coded correctly and the bonus (low ambiguity).

4. High Ambiguity, $1.35 Maximum (HA135): Subjects do not know the rule for determining whether

a message is coded correctly. They know the minimum ($0.00) and maximum ($1.35) possible

bonus but have no information about how this bonus is determined.

5. High Ambiguity, $1.00 Maximum (HA100): This is the same as the preceding treatment, except

subjects are told that the maximum possible bonus is $1.00.

All subjects code five practice messages prior to selecting a task, providing a good measure of their

ability at the coding task. In the fixed, piece rate, and low ambiguity treatments we vary whether subjects

receive feedback about their performance on the practice messages (i.e. how many practice messages they

code correctly). We never give feedback in the high ambiguity treatments as this could undo some of the

ambiguity about the incentive contract in these treatments.

Our experiment is conducted within a real labor market where quitting is a natural feature. The subjects

aren’t in a lab where they could not easily leave and would face psychological discomfort if they did so.1

If our subjects don’t like the experiment or think they could make more money doing something else, they

could switch to another webpage or another task on AMT. This behavior was not uncommon. About 11%

of the subjects who chose the coding task quit before completing it even though the instructions and

materials heavily stress that this means forfeiting their pay. A number of our results involve the effects of

our treatments on the likelihood of quitting.

1 Subjects in lab experiments are routinely told that they can leave at any time with no consequence other than foregone earnings, but few subjects do so. The procedures in lab experiments that come closest to quitting in AMT are real effort experiments where subjects can leave the lab as a natural part of the experiment (i.e. Abeler, Falk, Götte, and Huffman, 2011; Rosaz, Slonim, and Villeval, 2012) or can earn money by not doing the experimental task (i.e. Cooper, D’Adda, and Weber, 2013).

3

Drawing on the smooth ambiguity model of Klibanoff, Marinacci, and Mukerji (2005), adapted to our

experiment, we make several predictions about the experimental results.

1) The probability of choosing and completing the coding task will decrease as ambiguity of the bonus

system increases (holding the maximum possible bonus fixed).

2) The probability of choosing and completing the coding task will decrease in the high ambiguity

treatments when the maximum possible bonus is reduced between HA135 and HA100.

3) The predicted effect of feedback will depend on whether its primary effect is to reduce uncertainty

or to reduce subjects’ overconfidence about their ability at the coding task. In the former case the

effect of feedback on the probability of choosing and completing the coding task is predicted to be

positive while in the latter case a negative effect is predicted.

4) Selection of more able individuals into the coding task will occur for all treatments except FIXED,

but the selection effect will be weakened by ambiguity relative to the piece rate treatment.

5) The number of messages coded correctly, subject to choosing and completing Task A, will decrease

with higher ambiguity.

The data provides mixed support for our predictions about choosing and completing the coding task.

On the positive side, subjects are significantly less likely to select and complete the coding task in the low

ambiguity (LA) treatment than the piece rate (PR) treatment. Also as predicted, willingness to select and

complete the coding task is significantly lower under high ambiguity when the maximum possible bonus is

reduced between HA135 and HA100. Contrary to our prediction, subjects are significantly more likely to

select and complete the coding task in HA135 than LA.

Providing subjects with feedback about their performance on the practice messages significantly lowers

willingness to select and complete the coding task, although only in the cases where performance is relevant

for the bonus (i.e. not FIXED). This suggests the primary effect of giving feedback is to reduce subjects’

overconfidence about their abilities.

High ability individuals (as measured by performance on the practice questions) are significantly more

likely than low ability individuals to choose and complete the coding task under piece rate incentives (PR).

This contrasts with the fixed bonus treatment (FIXED) where there is not selection based on ability. The

positive selection occurring in PR vanishes with ambiguity about the incentive contract. At best we find no

selection with ambiguity (LA and HA135), and in one case (HA100) the effect reverses with high ability

subjects significantly less likely to select and complete the coding task than low ability subjects. If we do

not control for selection, performance on the coding task by subjects who choose and complete it is

significantly higher in PR than in FIXED. This is not true for any of the treatments with ambiguous

incentive contracts. All treatment differences in performance disappear once we control for selection.

4

Our paper contributes to several existing literatures. There is a sizable literature using experiments

with real effort tasks to study the effects of incentive contracts on selection and/or performance. Notable

recent examples include Cadsby, Song, and Tapon (2007), Niederle and Vesterlund (2007), Eriksson,

Teyssier, and Villeval (2009), Dohmen and Falk (2011), Gill, Prowse, and Vlassopoulos (2012), Buser and

Dreber (2015), and Johnson and Weinhardt (2015). Our innovation is to focus on how changing subjects’

information about the relationship between performance and pay affects behavior. Ambiguity about

performance pay affects subjects’ choice on a number of dimensions and on one critical dimension our

results differ considerably from the standard findings. A common result in the literature is that piece rate

incentives improve performance relative to fixed wages. This is largely due to selection as high ability

workers who benefit more from a piece rate system are more likely to take such contracts (e.g. Lazear,

2000; Barro and Beaulieu, 2003; Cadsby, Song, and Tapon, 2007; Dohmen and Falk, 2011). These results

change when there is ambiguity about the relationship between performance and pay. The selection of high

ability types into the coding task vanishes with ambiguity, and the impact of performance pay on

productivity (relative to fixed pay) is no longer significant.

Another innovation of our experiments is to examine an aspect of performance that has received little

previous attention in the experimental literature: quitting the task.2 Quits matter a great deal for real firms.3

An advantage of the piece rate contract relative to a fixed bonus is a significant reduction of the quit rate.

This advantage over fixed pay is weaker with low ambiguity about the relationship between performance

and pay and largely vanishes with high ambiguity.

While not the primary purpose of our work, we contribute to the literature studying choice and

ambiguity. Numerous theories have been developed to explain ambiguity aversion,4 and a number of

experiments have tried to sort out between the various theories. The results indicate a great deal of

heterogeneity, with no one theory explaining the behavior of all subjects.5 The reactions of our subjects to

ambiguity depend on the nature of the ambiguity. The negative effect of low ambiguity on the probability

2 There exist a number of papers in the labor literature on quitting (e.g. Griffeth, Hom, and Gaertner, 2000; Abowd, Kramarz, and Roux, 2006; Abraham, Spletzer, and Harper, 2010; Farber, 2010; Buchinsky, Fougere, Kramarz, and Tchernis, 2010). There also exists a related experimental literature on quitting in tournament settings (Eriksson, Poulsen and Villeval, 2009; Fershtman and Gneezy, 2011; Rosaz, Slonim, and Villeval, 2012). 3 In the United States, somewhat more than two million individuals quit their jobs each month (Bureau of Labor Statistics, 2013). One recent survey reported that “… the number of employees making a New Year's resolution to get a new job jumped to 38 percent [for 2013].” (Perman, 2013). Further, a recent news article reports employee turnover costs the employer approximately 16 to 20 percent of the employee’s annual salary, with percentage being larger for high salaried workers (Lucas, 2012). 4 For examples, see Hurwicz, 1951; Segal, 1987; Gilboa and Schmeidler, 1989; Schmeidler, 1989; Ergin and Gul, 2004; Halevy and Feltkamp, 2005; Ahn, 2008; Klibanoff, Marinacci, and Mukerji, 2005 and 2009; Nau, 2006; Seo, 2009; and Cerreia-Vioglioa, Maccheroni, Marinacci, 2011. 5 See Halevy (2007), Ahn, Choi, Gale, and Kariv (2009), Chen, Katušcák, and Ozdenoren (2007), Binmore, Stewart, and Voorhoeve (2012), Charness, Karni, and Levin (2013), Charness and Gneezy (2010), and Abdellaoui, Baillon, Placido, and Wakker (2011).

5

of choosing and completing the coding task relative to the piece rate treatment is consistent with our

adaptation of the smooth ambiguity model, but the positive effect of moving from low to high ambiguity

(LA vs. HA135) does not fit well with any standard model. We conjecture that the type of extreme

ambiguity subjects encounter in the high ambiguity treatments leads to the adoption of simple rules of

thumb such as valuing the coding task at the midpoint of the high and low possible bonuses (as implied by

the principle of insufficient reason). The high ambiguity treatment with a low maximum possible bonus

(HA100) is included in the experimental design to examine this hypothesis. The significant difference

between HA135 and HA100 is consistent with our conjecture.

There are a number of potential advantages of ambiguous incentive contracts, including those generally

advanced for incomplete contracts (i.e. reducing the cost of writing complex contracts, flexibility to respond

to unforeseen circumstances) as well as reasons related to other-regarding behavior (i.e. black box

compensation is intended to limit infighting over compensation). The results of our paper suggest that there

are costs to ambiguous incentive contracts as well. The low ambiguity treatment is more realistic than the

high ambiguity treatments since workers typically have an idea about the basis of evaluation, but often

don’t know the precise mapping between performance and pay. Our results imply that low levels of

ambiguity impose costs on firms. The unsurprising cost is that contracts with low ambiguity are

unattractive, implying that it will be more difficult to hire and retain workers. Less obvious, ambiguity

about performance pay removes one of the greatest advantages of piece rate systems as there is no longer

selection in favor of high ability workers. This reduces the quality of workers a firm hires and weakens the

positive effect of performance pay on performance. It remains an open question whether the benefits of

ambiguous incentive contracts outweigh the costs.

2. Procedures, Experimental Design, and Treatments

A. Amazon Mechanical Turk: Our experiment is conducted using the online labor market Amazon

Mechanical Turk (AMT). Workers (our subjects) on AMT complete short jobs, referred to as “Human

Intelligence Tasks” (HITs), and are paid through PayPal. We are not the first researchers to use AMT or

other online labor markets (i.e. Odesk, Elance, Crowdflower, and Guru) as a platform for conducting

experiments, and although this remains relatively rare within economics the platform is being used more

often.6 Using AMT has several advantages for us, listed in ascending order of importance. Our description

of AMT reflects the population, rules, and norms at the time the experiments were run in 2012. As current

users may notice, some details have changed over time.

First, AMT allows us to access a more diverse set of subjects than are typically present in laboratory

experiments. Table 1 summarizes the characteristics of our subjects. Our subjects’ ages range from 18 to

6 Economics studies using online labor markets include Chen, Ho, Kim (2010), Liu, Yang, Adamic, and Chen (2011), Paolacci, Chandler, and Ipeirotis (2010), Horton, Rand, and Zeckhauser (2011), and Dean and McNeill (2014).

6

69. A total of 31 countries are represented with the majority coming from India and a large minority from

the USA. Most are college educated and 40% report incomes in excess of $25,000 per year, a figure which

jumps to 71% for American subjects.7

[Insert Table 1]

Second, we take advantage of AMT’s low costs to run an unusually large number of subjects. This is

especially important for us because we build an instrument into our experimental design and then use a

Heckman correction to correct for the effects of selection on performance in contracts with ambiguity.

Without correcting for selection we have little hope of accurately measuring the effect of contract types on

performance, but using the Heckman correction makes high demands on the power of our study. Using

AMT makes it possible to do this without taking out a second mortgage.

The final and most important advantage of using AMT is that our experiments take place within an

existing labor market. This lets us take advantage of a naturally occurring feature of this market that is

present in many real labor markets, the ability of subjects to quit after accepting a contract. Like many

experiments, our work does not fit neatly into a classification of lab versus field experiments. Our subjects

know that they are participating in a study, but we intentionally use the above feature of the underlying

labor market that was naturally occurring rather than part of the experimental design.

There are several concerns with experiments run on AMT. The scale of payoffs is much lower than

experimenters typically use in laboratory studies. The average subject who completed the experiment only

earned $0.62. This was reasonable compensation by AMT standards at the time, but low compared to what

a college student in the US or Europe typically gets paid for participating in an experiment. Even given

that this was a fast experiment (subjects averaged 23 minutes to complete the entire study), the average

earnings per hour were only $1.62/hour. This raises the possibility that the incentives were not salient for

the subjects. However, virtually all of our subjects (97%) reported that earning money was one of their

motivations for taking HITs on AMT8 and our subjects responded strongly to changes in incentives (see

Result 3). These two observations indicate that the incentives were salient.

Another concern is loss of control due to conducting the study online. Unlike a laboratory study, we

cannot be sure that subjects were not consulting outside sources. We intentionally choose a task where

7 Indian subjects had significantly lower incomes than the other subjects (t = 12.19; p < .01). Only 24% of Indian subjects report an income in excess of $25,000 as opposed to 62% of other subjects. 8 The pre-experimental survey asked subjects why they complete tasks on AMT. Subjects chose from six options (Ipeirotis, 2010). Subjects could check as many or few items as desired. We classify a subject as motivated by earning money if they choose “fruitful way to spend free time and get some cash,” “for ‘primary’ income purposes (e.g. gas, bills, groceries, credit cards)”, “for ‘secondary’ income purposes, pocket change (for hobbies, gadgets, going out),” or “I am currently unemployed, or have only a part time job.” For the final category, a subject is only classified as motivated by earning money if they did not also check a category which was inconsistent with a desire to earn money (i.e. “to kill time” or “I find the tasks to be fun”).

7

consulting the internet would not be helpful. Subjects are not told that Charness and Dufwenberg (2006)

is the source of the game and subjects and are not given any information that would make it easy to find

this information via a Google search.

AMT has a number of specific features that differ from those faced in a typical lab setting. Many

elements of our design are intended to prevent a loss of control due to these features. These AMT specific

modifications are described in Section 2.4.

A number of studies exist comparing AMT and laboratory experiments. In one well known example,

Horton, Rand, and Zeckhauser (2011) replicate a number of standard experiments on AMT and report

results that look little different from the typical results of laboratory experiments. A number of subsequent

papers report similar findings when comparing lab and online studies (Suri and Watts, 2011; Amir and

Rand, 2012; De Winter, Kyriakidis, Dodou, and Happee, 2015; Wang, Huang, Yao, and Chan, 2015). We

doubt that the use of AMT has a major effect, one way or the other, on the conclusions we reach from our

experiments.

B. Tasks: Subjects have the option of participating in two tasks: coding messages (Task A) or a simple

transcription task (Task B).

Figure 1: Trust Game with Hidden Actions

Task A uses data from Charness and Dufwenberg (2006). In the Charness and Dufwenberg

experiments, subjects played the trust game with hidden actions shown in Figure 1. Player A starts the

game by choosing whether to trust Player B (by choosing In) or not (by choosing Out). If Player A chooses

In, Player B can either be trustworthy (by choosing Roll) or not (by choosing Don’t).

The critical feature of the Charness and Dufwenberg data for our experiment is the pre-game messages

sent by Player Bs. These can be used to persuade Player A to choose “In” by promising to choose “Roll”.

8

Charness and Dufwenberg report all of the messages verbatim in the published version of their paper. Task

A asks subjects to code fifteen of these messages. For each message, the subject is shown a JPG image of

a hand-written copy of the text (see Figure 2). We use hand-written copies of the messages as a form of

human verification.9 This makes it more difficult to copy and paste the text into a search engine or a

translation program, as well as making it impossible for an internet bot to complete the task. Subjects are

asked to classify messages into six possible categories, shown below the text of the message (see Figure 2).

These categories are drawn from a pre-experimental coding of the fifteen messages done by the co-authors.

Subjects are free to check multiple categories for a message, a possibility that is stressed in the instructions.

They also have the option to indicate that none of the six categories apply. All subjects who choose Task

A see the same fifteen messages in the same order. The fifteen messages are chosen to vary the content,

length, and difficulty of coding.

Figure 2: Task A Example

We score a message’s coding as correct if it matches the modal coding for subjects who code that

particular message.10 This scoring rule is based on the coding methodology developed by Houser and Xiao

(2011). Coding is an inherently subjective exercise, but for all fifteen questions the modal coding is similar

to what we would choose as informed experts. We think a neutral expert would agree that subjects credited

with more correctly coded messages had performed better.

We choose this particular task, coding messages from experimental data, for several reasons. We want

a real effort task that can easily be done in an online environment, but not one where the subjects can use

access to the internet or their computer to get help. Part of the experimental design involves giving subjects

information (based on practice questions) about their ability to perform the task prior to choosing a task.

9 The messages were hand copied by multiple RAs, so the handwriting varied across messages. Human verification such as CAPTCHAs is designed to verify that information is being entered by human users rather than internet bots. 10 For providing feedback on the practice questions, we used data from an initial set of the sessions without feedback to determine whether a coding was correct. The modal coding of the five practice questions for this subset of the data matches the modal coding for all experimental participants, so the feedback was accurate.

9

We therefore need a task where performance can be evaluated in a clear fashion, but a subject will not be

able to determine their performance without the experimenters giving them information. Given the

subjective nature of coding messages, there is no way a subject could know their performance without being

told. This inherent ambiguity matches the type of situation we have in mind in the real world, where it isn’t

necessarily obvious how well somebody is doing at their job.

The Charness and Dufwenberg game is specifically chosen for several reasons. The messages are

publicly available and de-identified, easing IRB approval of the study. The game is relatively simple and

easy to explain. The messages are short enough to be easily read and coded, but categorization is

sufficiently difficult that subjects were not faced with a trivial task.

In Task B, the transcription task, subjects are shown a series of fifteen words. Each word has to be

typed into a text box. The words are shown as JPG images to prevent copying and pasting or use of bots.

We use typed text for the JPG images rather than hand-written versions to facilitate transcription. The

transcription task is intended to be a sure thing, and we have no cases of incorrect transcription. Task B

takes the same amount of time to complete as Task A, a point which is emphasized to the subjects, and

requires some effort (albeit less than Task A) as the subject must pay attention to new words appearing and

type them correctly into the text box.

C. Procedures and Experimental Design: Experimental “sessions” are posted on AMT in small batches

of 15 to 25 HITs. We do not hide the fact that the HITs are part of a study. Given the large number of

studies posted on AMT and the nature of the tasks subjects are asked to complete, it seems unlikely that

subjects would fail to guess that they are participating in a study.

Subjects are told up front that the experiment consists of two phases lasting 5 minutes and 8 – 10

minutes respectively, with the first phase of the experiment paying a 20 cent flat wage for completing a

survey and the second phase of the experiment paying an unspecified bonus for doing a task.11 It is stressed

to the subjects that they will only be paid if they complete both phases of the experiment.

Figure 3: Experiment Design

Figure 3 illustrates the timing of the experiment. After indicating consent, subjects begin the first phase

of the experiment by filling out a survey. This collects demographic information (age, gender, nationality,

11 The time estimates were optimistic as subjects spent more time than we expected for the second phase.

Survey Practice

Task Feedback/

No-Feedback Task

Selection

10

income, and educational background) and their reasons for working on AMT. The survey also contains a

questionnaire measuring their risk attitudes, and a measure of ambiguity aversion. Seven of the questions

measuring risk attitudes are taken from the German Socio-Economic Panel (GSOEP). These questions ask

about subjects’ willingness to take risks in general as well as in a number of narrower domains (e.g. driving

a car). Dohmen, Falk, Hoffman, Sunde, Schupp, and Wagner (2011) validate these questions as a measure

of risk attitudes. We also measure risk by asking subjects about their willingness to make a risky investment

if they had a financial windfall. This is a hypothetical version of the investment decision used by Gneezy

and Potters (1997). To measure ambiguity aversion, we present subjects with the standard pair of

hypothetical questions used to generate the Ellsberg paradox (Ellsberg, 1961):

“Suppose there is a bag containing 90 balls. You know that 30 are red and the other 60 are a mix

of black and yellow in unknown proportion. One ball is to be drawn from the bag at random. You

are offered a choice to (a) win $100 if the ball is red and nothing if otherwise, or (b) win $100 if

it's black and nothing if otherwise. Which do you prefer?”

“The bag is refilled as before, and a second ball is to drawn from the bag at random. You are

offered a choice to (c) win $100 if the ball is red or yellow, or (d) win $100 if the ball is black or

yellow. Which do you prefer?”

Choices of (a) and (d) are inconsistent with subjective expected utility and can be interpreted as

evidence of ambiguity aversion (e.g. Hogarth and Villeval, 2010; Dominiak, Dürsch, and Lefort, 2012).

Copies of all experimental materials, including the survey, can be seen in Appendix A.

After completing the survey, subjects are given descriptions of Task A (coding) and Task B

(transcription). The description of Task A includes a description of Charness and Dufwenberg’s trust game

with hidden actions, how to perform the coding, and the coding categories. Subjects view several examples

of messages and coding. We intentionally do not give subjects a citation to Charness and Dufwenberg or

the name of the game to make it difficult to search for additional information on the web. The description

of Task B (transcription) is shorter given the trivial nature of the task, and focuses on making it clear that

Task B is a sure thing. Before selecting a task, subjects code five messages for practice. Subjects are then

asked how they thought they did at coding the practice messages in comparison to other subjects. The

wording of this question is deliberately vague to avoid reducing the ambiguity in the high ambiguity

treatments (“How well do you think you performed the practice task in comparison to other Turkers?”).

Responses are on a Likert scale ranging from 0 (“Well below average”) to 4 (“Well above average”), with

2 representing “average” performance. The mean response is 2.42, indicating subjects on average think

they are better than average. Regardless of treatment, subjects do not have information about their

performance on the practice questions prior to answering this question.

11

At this point, subjects begin the second phase of the experiment by choosing between Task A or Task

B. Task B serves as an outside option. It has a fixed and known payoff. We randomly vary the payoff for

Task B between 30, 40, and 50 cents. This is resolved in advance and subjects know the value of the outside

option.12 Based on calibration exercises we expect the average bonus for Task A to be about 40 cents,13 so

the average payoff for Task B is set to roughly match the average for Task A. Varying the payoffs for Task

B gives us an exogenous source of variation that can be used to identify a selection equation. The

instructions stress that Task B is a sure thing and takes the same amount of time as Task A. In other words,

subjects are given no reason to base their task choice on one task taking less time to complete than the other.

The treatments, described in Section 2.E, vary what information subjects have about Task A prior to

choosing a task.

Task A consists of coding 15 messages. Subjects are forced to spend at least 30 seconds on each

message before they can continue. This parallels Task B, where subjects have to spend at least 30 seconds

on each screen before the next word appears. There are 15 words in Task B, so both tasks require 7½

minutes to complete if subjects do not go beyond the required 30 seconds. Although subjects consistently

ran over the required 7½ minutes, the average time to complete the experiment was the same for Task A

and Task B. Consistent with what subjects are told when choosing a task, the two tasks take the same

amount of time.

This is a between subjects design, so each subject participates in a single treatment, and subjects cannot

participate in the experiment more than once. We drop 19% of the observations (167/863) as unusable.

Most of these observations (151/167) are dropped due to subjects failing the screen for English

comprehension as described in Section 2.D.14

Table 2 summarizes the number of subjects recruited for each cell of each treatment and how many

usable observations we have for each cell. With a few exceptions described below, 35 subjects were

recruited for each cell.

[Insert Table 2]

After selecting a task, subjects finish the experiment by performing their selected task. To get paid,

subjects have to complete the task. Unlike a lab experiment, subjects can quit the experiment by leaving

the website or prematurely submitting the HIT. We have 106 observations (out of 696 usable observations)

where subjects quit after starting one of the tasks. The instructions and materials feature prominent

12 Subjects are not told that other values were possible for the outside option to avoid any mistaken impression that the outside option was a gamble. 13 By pure luck, the average bonus for Task A in our data is exactly as calibrated. 14 Eleven subjects are dropped due to filling out less than half of the survey, making their data unusable. Responses to the survey are bimodal. Subjects either fill out the entire survey or, in these eleven cases, almost nothing. Five observations were dropped from subjects who took advantage of a recruiting glitch to participate a second time.

12

reminders that subjects must complete the entire task to get paid, so we think it is unlikely that many quits

are due to errors. Quitting a HIT is like quitting any job, and presumably reflects the subject’s judgment

that the opportunity cost of completing the HIT outweighs the benefits. We were aware of quitting when

we designed the experiments and view it as a natural feature of this and many employment environments.

Rather than treating quits as a bug, our approach has been to try to understand how the experimental

treatments affect the likelihood of quitting.

Bonuses are paid within a few days of completing the experiment. Subjects know that the payment of

the bonus would be delayed. This is not unusual for AMT and would not have surprised the subjects.

D) AMT Specific Design Issues: Our experiments contain a number of features addressing AMT specific

issues. Our subjects come from all over the world. The experimental materials inform subjects that English

fluency is required to complete the HIT, but this requirement is unenforceable. The survey therefore

includes two questions testing subjects’ English comprehension. Both questions have only one correct

answer and the correct answer is given as part of the question. For example, one of these questions reads,

“Over the weekend, Bob watched two football games. On the scale below, mark the number of football

games Bob watched over the weekend.” Any subject who reads this question carefully and understands

English should be able to answer it correctly.

There has been a persistent problem on AMT with workers who try to complete tasks as fast as possible

to maximize their revenues, clicking buttons rapidly without spending much time to check whether their

responses make any sense. AMT actively tries to screen out workers who have a history of providing poor

quality work. Our HIT is not particularly attractive for this type of worker as it is relatively intricate and

has built in timers that prevent subjects from clicking through as fast as possible. The English

comprehension questions also screen for this sort of behavior. The answers for the two English

comprehension questions are located in different columns. This prevents subjects who are mechanically

checking the first or last columns from passing the comprehension test without actually reading and

understanding the questions.

We drop observations from subjects who do not correctly answer the two English comprehension

questions. We suspect that most dropped subjects comprehend English but weren’t reading the questions

carefully. Given the nature of our experiment, we do not want to use data from subjects who are not actually

reading the material.

Another concern with using AMT is that subjects either use multiple accounts simultaneously to be

paid for a task more than once or collude with other AMT workers by sharing information (e.g. how to code

the messages).15 We took a number of steps to address these concerns. To limit collusion, the posting of

15 Because subjects are paid via their Amazon account (each AMT account has to be linked to a unique email address and PayPal account), the same person can only participate more than by multiple accounts. We explored

13

HITs was scattered over time with only a limited number of slots available at any point in time. This makes

it difficult for a group to collude since only a small number of slots were posted at any given time and the

pool of potential workers is large. We also monitored online forums for evidence that our HIT was being

discussed. We found no evidence that information about the HIT (specifically, information about how to

code the messages or how the bonus was determined) was being shared. Ex post, we checked our data for

autocorrelation. If subjects worked together or simultaneously completed multiple versions of the

experiment, we should see correlation across time of completion. There is no significant autocorrelation

either in task selection or performance.

Workers in AMT can view the HTML file being used to generate the HIT. This creates the possibility

that subjects could eliminate the ambiguity or otherwise game the system by viewing the code. Several

features of the experiment limit this possibility. Each treatment of the experiment uses a different HTML

file. Subjects therefore cannot look at the HTML and gain more information than they were supposed to

have by viewing the instructions for other treatments. Bonus payments are determined manually by the

research team, so the HTML cannot be used to determine the incentive contract or the correct coding for

any particular message. In sessions with feedback, the portion of the HTML used for determining whether

or not the subject correctly coded the practice messages is intentionally convoluted.16 This makes it

extremely difficult to use the HTML to cheat on the practice messages. While there are no incentives to

cheat on the practice messages, we thought it best to eliminate this possibility.

E. Treatments

E.1 Information about Determination of Bonus

Fixed (FIXED): Task A pays 40 cents regardless of performance. This contract is a sure thing and gives

no incentives for performance. This is included to confirm that subjects are responsive to incentives. Even

though both tasks take the same amount of time, we do not expect subjects with fixed pay for Task A to

always choose the task with the higher payoff since subjects may be willing to sacrifice money to do an

easier task or one that is more interesting. Nevertheless, the willingness to do Task A should be a decreasing

function of the pay for the outside option (Task B).

Piece Rate (PR): Subjects are paid a piece rate of 9 cents for each message coded correctly. The

instructions for choosing a task tell subjects the number of messages to be coded, the 9 cent piece rate, and

the minimum ($0.00) and maximum ($1.35) possible bonuses from Task A. They are given detailed

instructions about the rule for determining whether a message is coded correctly. The goal is to eliminate

collecting the IP addresses of subjects as a way of identifying subjects using multiple accounts. This ran into two problems: (1) Our IRB would not allow us to collect this information without telling subjects, which would have made many subjects unwilling to participate, and (2) IP addresses do not uniquely identify individuals. 16 A large amount of extraneous code was added to the HTML. Finding the code that determined whether a specific practice question was correct was equivalent to finding the proverbial needle in a haystack.

14

any ambiguity about the incentive contract. The PR treatment gives us a baseline to determine the effects

of ambiguity about the incentive contract.

Low Ambiguity (LA): Subjects are paid a piece rate of 9 cents for each message coded correctly. The

instructions include a detailed description of the rule for determining whether a message is coded correctly.

Subjects are not told the piece rate, the number of questions, or even that a piece rate system is being used.

Instead, they are informed the bonus will “depend on how well you do at coding messages correctly.” In

other words, subjects know that there is a positive relationship between how many messages are coded

correctly and their bonus, but have no information about what that relationship might be. They are told the

minimum ($0.00) and maximum ($1.35) possible bonuses from Task A, but since they do not know the

number of messages it is not possible to infer the piece rate or even that a piece rate system is being used.

The incentive contract in the LA treatment is more ambiguous than the incentive contract in the PR

treatment. In the PR treatment, a subject who knows the probability of coding a message correctly also

knows the distribution of bonuses. In the LA treatment, knowing the probability of coding a message

correctly is not sufficient to know the distribution of bonuses.

High Ambiguity (HA): Subjects are paid a piece rate of 9 cents for each message coded correctly. The

instructions include the minimum ($0.00) and maximum ($1.00 or $1.35) possible bonuses from Task A.

Subjects are given no additional information about how the bonus is determined. Specifically, subjects are

not told the piece rate (or even that a piece rate system is being used), the number of questions, or the rule

used to determine if a message has been coded correctly. The HA treatment represents a higher level of

ambiguity than the LA treatment, since subjects have no information about how the bonus is determined

beyond the minimum and maximum possible bonus. In the LA treatment subjects know how performance

is measured and that there is a positive relationship between performance and the bonus.

Initially the HA treatment only used the true maximum bonus of $1.35 (HA135). As data accumulated,

we noticed that the take-up rate for Task A was surprisingly high in HA135 given the high level of

ambiguity. We suspected subjects were using simple rules of thumb that would be sensitive to the

maximum possible bonus. To test this, we added the HA100 treatment.

In HA100, subjects are told that the maximum possible bonus is $1.00. This statement is essentially

true; while subjects could earn as much as $1.35 in theory, in reality subjects almost never earn a bonus

more than $0.99.17 Since the purpose of HA100 is to determine the effect of changing the maximum

possible bonus in the HA treatment, it always uses the outside option of 50 cents and we over-sample the

two HA cells with an outside option of 50 cents to generate more power for making this comparison.18

17 One subject in the entire experiment earned $1.05. This happened after we started running HA100 and we had gathered hundreds of observations before beginning HA100. 18 Looking at Table 2, observant readers will notice that one cell of the PR treatment is also over-sampled. This was not intentional – a HIT was accidentally posted twice on AMT. Three other cells are off by a couple of observations,

15

E.2 Treatments: Information about Likely Performance

Feedback: Prior to choosing a task, subjects are told the number of practice messages they coded correctly.

The correlation between the number of practice messages coded correctly and the number of messages

coded correctly in Task A (for subjects who chose and completed Task A) is .51, so the number of practice

messages coded correctly is a good predictor of performance at Task A.19 We do not include cells with

feedback in HA treatments since giving subjects feedback requires giving them the rule for determining

whether a message is coded correctly.

Receiving feedback reduces the uncertainty faced by subjects by reducing their uncertainty about the

likelihood of coding a message correctly. For example, consider a subject in the PR treatment who

considers it equally likely that their probability of coding any given message correctly is either 0, ½, or 1.

If they code one message correctly, the best and worst case scenarios can be eliminated. The subject still

faces risk, but the uncertainty has been eliminated. Feedback also improves subjects’ calibration about

their ability to code messages correctly, an important issue since subjects are overconfident on average.

No Feedback: Subjects do not receive feedback about the number of practice messages coded correctly.

3. Theory and Hypotheses

A. Theory: This paper is not intended to test the various theories of ambiguity aversion, but it is nonetheless

useful to develop a formal model in order to generate hypotheses. We treat subjects who don’t know the

details of how their bonus is determined or their skill at the coding task as facing ambiguity. Multiple priors

correspond in a natural way to the multiple incentive contracts a subject believes could possibly be in effect.

We therefore adapt the smooth ambiguity model of Klibanoff, Marinacci, and Mukerji (2005) to our

experiment. A subject faces a choice between an option with an unknown value (Task A) and a sure thing

(Task B). Let b ∊ [bmin, bmax] be the bonus for Task A. If a subject chooses Task A, they also must choose

an effort e ∊ [emin, emax] to put into coding messages. Let the cost of effort be given by c(e), where c’(e) >

0 and c’’(e) > 0. We assume that costs of effort are known rather than a source of uncertainty. A subject’s

payoff from Task A is given by b – c(e). Let K be the sure payoff from Task B.

A state of the world gives the probability of each possible bonus as a function of effort. We model the

state of the world as having two components, s and α, where α ∊ 0,1 captures ability to perform the task

and s ∊ Δ contains all other elements of the state of the world such as the incentive contract. The objective

probability of earning b in state s α subject to choosing e is given by f(s,α,b,e) and the expected bonus in

state s α is calculated in the usual fashion. Critically, we assume that if α' > α, then f(s,α',b,e) first order

stochastic dominates f(s,α,b,e). Holding s and e fixed, higher ability leads to a higher expected bonus. Let

which aren’t mistakes per se but instead are due to the oddities of how HITs are posted on AMT and our need to pay bonuses in a timely fashion. 19 The correlation between these variables is easily significant at the 1% level.

16

be a differentiable function with ' > 0 and '' < 0 and let ν and μ be the subjective probability distributions

over s and α. The subject’s value for Task A is given by (1). Task A is chosen if this is greater than (K).

max

min maxmin

b1

e e ,e0 s b

max b f s, ,b,e c(e) db d d

(1)

The assumption that is concave implies ambiguity aversion in the following sense: holding effort

fixed, a mean preserving spread (contraction) over the states of the world makes the coding task less (more)

attractive versus a sure outside option. This leads to the following proposition implying that changes in

information that represent a pure reduction in uncertainty (i.e. mean preserving contractions) will lead to

more choices of Task A. Note that this proposition is stated and proved in terms of mean preserving

contractions of μ, but applies equally well to mean preserving spreads of ν.

Proposition 1: Let μ and μ’ be subjective probability distributions over Δ. Suppose a subject chooses the

coding task and effort e* under μ. If μ’ is a mean preserving contraction of μ, the subject will also choose

the coding task under μ’.

Proof: See Appendix B.

The HA135 and HA100 treatments (with an outside option of 50 cents) only differ in what subjects are

told was the maximum possible bonus. A simple way to model this change in information is by holding

beliefs over the state of the world fixed but changing beliefs, subject to the state of the world, by applying

Bayes Rule. Specifically, beliefs are updated so bonuses greater than 100 receive zero weight (see

Appendix B for details). Given this approach, we can prove Proposition 2.

Proposition 2: Assume that for all states the probability distribution of bonuses as a function of effort has

full support over the range [bmin, bmax]. Suppose a subject chooses the sure thing (Task B) with a maximum

bonus of bmax. If the maximum bonus is reduced to b and the subjective probability distribution over states

is unchanged, the subject must still choose the sure thing.


If an individual knows their value of α, it is trivial to show that if they choose Task A with ability α,

they also choose Task A with ability α' > α. More realistically for our experiment, suppose an individual

does not know their ability. Just as having greater ability in the case with known ability implies a greater

willingness to choose Task A, believing you have greater ability (in the sense of first order stochastic

dominance) also implies a greater willingness to choose Task A.

17

Proposition 3: Let ν and ν' be subjective probability distributions over ability. Suppose a subject chooses

the coding task and effort e* under ν. If ν' first order stochastic dominates ν, the subject will also choose

the coding task under ν'.


B. Hypotheses: Moving from HA135 to LA to PR reduces the uncertainty facing subjects. Comparing

the HA135 and LA treatments, subjects are given two additional pieces of information in the LA

treatment: (1) the rule for how performance is evaluated and (2) the bonus is an increasing function of

performance. This eliminates extreme rules such as automatically giving all subjects the worst possible

bonus or automatically giving all subjects the highest possible bonus. If the primary effect of moving

from HA135 to LA is to reduce the subjective probability of extreme states, as seems likely, Proposition 1

implies that the coding task becomes more attractive. Likewise, under the LA treatment there are an

infinite number of monotonically increasing rules for determining the bonus. Some of these rules are

quite bad for subjects (i.e a step function giving the maximum possible if all messages are correctly and

zero otherwise) while others are more desirable (i.e. a step function giving the maximum possible bonus if

any message is coded correctly). If moving from LA to PR primarily affects behavior by moving weight

away from extreme states, Proposition 1 again implies that the coding task becomes more attractive.

Our discussion of the theory focused on the likelihood of initially choosing Task A, but also applies

to quitting after having chosen Task A. As each new message is presented for coding, subjects face a

decision to either continue with the coding task or take their outside option by leaving the experiment.

They have not received any additional information about the payoff rule beyond what they knew when

choosing a task, although presumably they have learned more about the difficulty of the coding task.

When making quit decisions, the payoff rules affect the ambiguity of the continuation payoff in the same

manner the payoffs from Task A were affected when choosing between Tasks A and B. Our predictions

about completing Task A, subject to choosing Task A, mirror our predictions about choosing Task A.

Hypothesis 1: Controlling for subject characteristics and the value of the outside option, the probability

of choosing and completing Task A is increasing as we shift from the HA135 treatment to the LA treatment

to the PR treatment.

Proposition 2 implies Hypothesis 2. Foreshadowing the results, Hypothesis 1 fails when we compare

LA and HA135. Even though Hypothesis 2 is generated from the model, evidence in favor of Hypothesis

2 does not validate the model. Rather, it helps us understand why Hypothesis 1 did not hold.

18

Hypothesis 2: Controlling for subject characteristics and holding the value of the outside option fixed, the

probability of choosing and completing Task A is decreasing as we shift from the HA135 treatment to the

HA100 treatment.

In FIXED, the bonus does not depend on performance at Task A and subjects face no ambiguity. If

incentives are salient, we expect choice of Task A to respond to the size of the outside option.

Hypothesis 3: As the payment for Task B increases in FIXED, choice of Task A is reduced.

Proposition 3 implies that subjects who believe they are more able are more likely to choose and

complete Task A. If there is positive correlation between a subject’s actual ability and their beliefs,

Hypothesis 4 follows.

Hypothesis 4: For PR, LA, HA135, and HA100, the probability of choosing and completing Task A is an

increasing function of coding task ability as measured by performance on the practice questions.

Hypothesis 4 predicts positive selection in the PR, LA, HA135, and HA100 treatments, but does not

predict how strong this relationship will be. We can develop some intuition for possible differences

between treatments by noting that positive selection relies on subjects perceiving links between ability and

bonuses – higher ability leads to higher performance leads to higher bonuses. The link between

performance and bonuses is transparent in the PR treatment. In the treatments with ambiguity (LA, HA135,

and HA100), we expect that subjects’ beliefs put weight on incentive contracts where the relationship

between performance and bonuses is weak or non-existent. This leads to the following conjecture:

Conjecture 1: The relationship between ability and choosing and completing Task A will weaken as

ambiguity becomes greater shifting from the PR to the LA to the HA135 treatment.

Giving subjects feedback has two effects. It reduces subjects’ uncertainty about their ability. Coding

is an unfamiliar task for our subjects, chosen to create ambiguity. They are unlikely to know how good

they are at this task, but giving them feedback should reduce this uncertainty. If this functions as a mean

reducing spread of ν, reducing uncertainty without affecting expected ability, Task A becomes more

attractive as was the case when ambiguity about the incentive contract was reduced. However, feedback

may also improve the calibration of subjects’ beliefs about their ability. Recall that, on average, subjects

thought they were above average at the coding task. Feedback should reduce their overconfidence. Because

reducing the uncertainty and reducing overconfidence work in opposite directions, we make no firm

prediction about the effect of feedback on the likelihood of choosing Task A. If we assume the effect of

reducing uncertainty is smaller than the effect of reducing overconfidence, Conjecture 2 follows. This is a

placeholder, since we have no way of knowing ex ante whether the effect of reducing uncertainty or

overconfidence is larger.

19

Conjecture 2: Controlling for subject characteristics, the value of the outside option, and information

about the incentive contract, the probability of choosing and completing Task A should decrease when

feedback is provided in treatments where the bonus depends on performance (PR and LA).

Using the model to predict how the treatments affect the number of messages coded correctly subject

to choosing and completing Task A is equivalent to predicting how the optimal choice of effort is affected

by the treatments. Unfortunately, this requires knowing the sign of the third derivative of and there is no

basis for assuming the sign is either positive or negative. We therefore make no formal prediction.

That said, we conjecture that reducing ambiguity increases the number of messages coded correctly

subject to choosing Task A. Intuitively, the marginal cost of increased effort is a sure thing while the

marginal return is uncertain. A decrease in uncertainty should make the marginal return more attractive.

Conjecture 3: Controlling for selection and subject characteristics, the number of messages coded

correctly subject to choosing Task A should be increasing as we shift from the HA135 treatment to the LA

treatment to the PR treatment.

4. Results: Table 3 summarizes the data by treatment. The percentage of quits is subject to choosing

Task A and the number of messages coded correctly is subject to choosing and completing Task A.

[Insert Table 3]

A. Task Selection and Completion: Ambiguity about the incentive contract affects task selection and

completion, but not exactly as expected. Subjects are less likely to choose Task A, more likely to quit after

choosing Task A, and less likely to choose and complete Task A in the LA treatment than the PR treatment.

All of the preceding observations are consistent with Hypothesis 1, but the high rate of subjects choosing

Task A in the HA135 treatment relative to the LA treatment runs contrary to Hypothesis 1. Quitting is

more likely in HA135 than LA, consistent with Hypothesis 1, but this is sufficiently weak that subjects are

also more likely to choose and complete Task A in HA135 than in LA, contrary to Hypothesis 1.

Task selection under high ambiguity is sensitive to the maximum possible payoff, as predicted by

Hypothesis 2, with choice of Task A less common in HA100 than HA135. This difference remains large

if we limit the data to observations with an outside option of 50 cents (56.9% select Task A in HA135 vs.

36.7% in HA100). Quitting is more common in HA100, so it is also less common to choose and complete

Task A in HA100 than HA135.

Comparing outside options of 30, 40, and 50 cents, the percentages choosing Task A are 47.4%, 35.0%,

and 20.0% respectively in FIXED, the only treatment where the incentives are completely unambiguous.

This supports Hypothesis 3, providing strong evidence that the incentives are salient for subjects. It is

mildly surprising that many subjects choose Task A even when it unambiguously lowers earnings, but

presumably these subjects derive enjoyment from doing the more interesting task.

20

Feedback lowers the percentage of subjects who choose Task A. Quits increase with feedback, so the

probability of choosing and completing Task A is also lowered by feedback. This effect is larger in the

pooled data from the PR and LA treatments, where performance is relevant for a subject’s earnings, than in

FIXED. Conjecture 2 is supported by the data, suggesting that the primary effect of feedback is a reduction

of overconfidence.

We are surprised by the relatively high rate of quitting in FIXED. Two features of the data provide a

possible explanation for this oddity: (1) The ability (as measured by performance on the practice questions)

of subjects choosing Task A is lower in FIXED than other treatments and (2) The quits in FIXED come

disproportionately from subjects who have high outside options (40 or 50 cents).20 The latter observation

suggests that subjects who quit in FIXED often choose Task A because they think it will be more interesting

than Task B. In reality it is rather tedious, presumably more so for subjects who aren’t very good at the

task. Once they realize that Task A isn’t very interesting, subjects who have no monetary reason for

choosing it in the first place are prime candidates to quit.

Definitive conclusions about task selection and quitting should not be drawn from the raw data

summarized in Table 3. By design the value of the outside option varies between subjects. This was

intended to be balanced between treatments, but the random pattern of who fails the English screen as well

as oversampling in the HA treatment means that the sample is not truly balanced. Likewise, even with

random assignment there is variation in subjects’ demographic characteristics across treatments. To make

statistically valid statements about the effects of the treatments on task selection and completion, Table 4

reports the results of regressions examining whether subjects choose Task A, quit subject to having chosen

Task A, and, combining task selection and quits, choose and complete Task A.

[Insert Table 4]

All regressions on Table 4 include a dummy for subjects who do not indicate that earning money is

among their reasons for completing tasks on AMT. These subjects are significantly less likely to select

Task A and significantly more likely to quit if they chose Task A. These estimates are not reported to save

space, but full copies of the regression output are available from the authors upon request. Table 4 reports

parameter estimates, not marginal effects.

Models 1 and 2 are probits with a dummy for whether the subject chooses Task A as the dependent

variable. Robust standard errors are reported in parentheses. Model 1 estimates treatment effects while

controlling for the value of the outside option. The base is the PR treatment without feedback. Note that

there are two separate dummies for feedback, with the effect for the FIXED treatment estimated separately

20 Quit rates in FIXED are 18.5%, 28.6% and 36.4% for outside options of 30, 40, and 50 cents respectively. This positive correlation does not occur for the other treatments.

21

from the effect for the PR and LA treatments. Since performance does not affect the bonus in FIXED, we

predicted a weaker effect from feedback in this treatment.

The parameter estimate for LA is negative and significant at the 5% level in Model 1. Consistent with

Hypothesis 1, low ambiguity lowers subjects’ willingness to choose Task A. HA135 has a positive effect

relative to the PR treatment, although not significant, but the difference between the parameter estimates

for the LA and HA135 treatments is significant at the 1% level (z = 3.14; p ≤ .01). Holding the maximum

possible bonus fixed, high ambiguity raises willingness to choose Task A relative to low ambiguity. This

is not consistent with Hypothesis 1. Task selection under high ambiguity is sensitive to the maximum

possible bonus, consistent with Hypothesis 2, as the difference between the estimates for HA135 and HA

100 is significant at the 5% level (z = 2.15; p ≤ .05). Both parameter estimates for the effect of feedback

are negative, but neither is significant and the magnitudes are similar. This is inconsistent with Conjecture

2. Both dummies for the outside option are significant and the difference between the dummies for outside

options of 40 and 50 cents is also significant (z = 2.43; p < .05). If the dataset is limited to observations

from FIXED, the dummy for having an outside option of 50 cents is negative and significant at the 1% level

(z = 2.97; p < .01) as is the difference between outside options of 40 and 50 cents (z = 1.99; p < .05).

Subjects respond strongly to incentives consistent with Hypothesis 3.

Model 2 adds controls for subject demographics: age, gender, nationality, income, and education. For

nationality we include dummies for India and the USA, with all other nationalities serving as the base.

Given that Indians and Americans make up 88% of the sample, this captures most of the variation by

nationality. The survey questions for income and education only allow for categorical answers, so the

independent variables are dummies for the various categories.

Model 2 also adds controls for the subject’s risk and ambiguity attitudes as well as a control for their

confidence in their ability to perform Task A. The survey on risk attitude includes seven questions on risk

attitudes drawn from the GSOEP as well as a hypothetical investment question based on the investment

decision used by Gneezy and Potters (1997). The answers for the seven GSOEP questions are highly

correlated, so we combine them into a single measure using factor analysis.21 We classify subjects as

ambiguity averse, ambiguity neutral, or ambiguity loving based on their answers to the two Ellsberg

questions. Subjects who are ambiguity averse are coded as a 1 while those who are ambiguity loving are

coded as a -1. After completing the practice questions subjects are asked about their performance relative

to other Turkers.22 We use this as our measure of confidence. The number of practice questions answered

21 The use of a single factor is based off of the Kaiser criterion which suggests only keeping factors with eigenvalues greater than or equal to one (Yeomans and Golder, 1982). 22 The question asked, “How well do you think you performed the practice task in comparison to other Turkers?” Subject marked their answer using a 5 point Likert scale ranging from well below average to well above average.

22

correctly is also included as an independent variable. This allows us to separate the effect of confidence

from the effects of skill at Task A. To save space, Table 2 does not report all of the added variables in

Models 2, 4, and 6, focusing on the variables of greatest economic interest. Full regression output is

available from the authors upon request.

The main results are little changed between Models 1 and 2. The parameter estimates for FIXED and

LA are still negative and significant, and the difference between HA135 and LA remains significant at the

1% level (z = 2.79; p ≤ .01). The difference between HA135 and HA100 is slightly weaker, narrowly

missing significance at the 5% level (z = 1.95; p ≤ .10). Looking at the control variables, gender has a

weakly significant effect as men are more likely to choose Task A than women. Neither the risk nor

ambiguity measures are statistically significant.

Models 3 and 4 look at the decision to quit subject to having selected Task A. These are probits with

a Heckman correction since there is selection into the population of subjects that choose Task A. For the

Heckman model to be identified, we need an exogenous source of variation that is correlated with decisions

to select Task A but not with actions subject to choosing Task A. By design, the value of the outside option

serves this purpose. The base is the PR treatment. The independent variables in Models 3 and 4 are the

same as those in Models 1 and 2. Model 3 is a basic regression that only controls for the treatment while

Model 4 adds controls for demographics and the behavioral measures.

Looking at Model 3, the HA100, and HA135 treatments both significantly increase the frequency of

quitting (the difference between the estimates for HA100 and HA135 is not significant). The LA treatment

also increases quits, but this effect is not significant. Feedback increases the frequency of quitting, albeit

weakly, in the treatments where task performance is payoff relevant (PR and LA). The treatment with the

strongest effect on quits is FIXED, with the estimate easily significant at the 1% level.

The results are little changed by the inclusion of additional control variables in Model 4. The HA100

and HA135 treatments remain statistically significant at the 5% and 10% levels respectively. Feedback in

the PR and LA treatments is now significant at the 5% level. Looking at the control variables, subjects who

are less risk averse (as measured by the hypothetical investment question) or less ambiguity averse are

significantly more likely to quit.23 Subjects who are more confident about their ability to code messages

quit significantly less.

The dependent variable for Models 5 and 6 is a dummy for whether the subject select and complete

Task A, combining the effects reported in Models 1 – 4. The specifications are identical to Models 1 and

2 respectively, and the results are similar to those reported in Models 1 and 2. This similarity is unsurprising

since relatively few subjects quit. One notable exception is the negative effect of feedback in the PR and

23 Given how the measures are constructed, a positive change on the hypothetical investment question corresponds to less risk aversion while a positive change on the ambiguity measure corresponds to more ambiguity aversion.

23

LA treatments, which is now significant at the 5% level in both models. Feedback in these two treatments

weakly decreases the proportion of subjects choosing Task A and increases the percentage who quit. The

combination of these two weak effects yields a strong effect on the probability of choosing and completing

Task A. The effects of ambiguity attitudes and overconfidence on quitting are sufficiently strong that both

variables have a weakly significant, positive effect on the probability of choosing and completing Task A.

Our main results on task selection and completion are summarized as follows:

Result 1: Relative to the PR treatment, the LA treatment lowers the probability of choosing and completing

Task A. This is consistent with Hypothesis 1. Relative to the LA treatment, the HA135 treatment

significantly raises the probability of choosing and completing Task A. This effect is due to a higher

probability of choosing Task A and is not consistent with Hypothesis 1.

Result 2: The HA100 treatment lowers the probability of choosing and completing Task A relative to

HA135. This is consistent with Hypothesis 2.

Result 3: Consistent with Hypothesis 3, the probability of choosing Task A decreases as the payoff from the

outside option (Task B) increases.

Result 4: For the PR and LA treatments (pooled), feedback about performance on the practice questions

lowers the probability of choosing and completing Task A. This is consistent with Conjecture 2.

B. Selection: Results 1 – 3 establish that the treatments affect whether people choose and complete Task

A. A related question is whether they affect what type of people choose and complete Task A. In particular,

a manager should be interested in whether there is selection by skill level.

A natural measure of skill in our experiments is performance on the practice coding questions. Recall

that before choosing a task, all subjects are given five practice questions. There is a strong relationship

between performance on the practice questions and performance in Task A (see regressions results in

Section 4.C). Although the practice questions are not incentivized, subjects have good reason to take them

seriously since they will be choosing whether or not to do similar tasks for pay.

Figure 4 illustrates the relationship between performance on the practice questions and task selection.

Performance on the practice questions is broken into three categories (0 – 1 correct, 2 correct, 3 – 4 correct)

that roughly divide our data into thirds.24 For each treatment we display, by category of performance on

the practice questions, the percentage of subjects choosing and completing Task A.

In the piece rate treatment, low ability types (0 – 1 practice questions correct) are much less likely to

choose and complete Task A. This is true whether or not feedback is given. For FIXED, LA, and HA 135,

there is no obvious relationship between ability and the likelihood of choosing and completing Task A, and

the relationship is reversed in HA100 with high ability types less likely to choose and complete Task A.

24 No subjects got all five practice questions correct.

24

The regression results shown in Table 5 back the preceding observations. Both models are probits and

only differ in the dependent variables: a dummy for whether the subject chooses Task A in Model 1 versus

a dummy for choosing and completing Task A in Model 2. The independent variables are the same as in

Models 1 and 5 from Table 4 (treatment dummies, controls for the Task B payoff, and a dummy for subjects

who do not indicate an interest in earning money) plus interaction terms between the number of practice

questions answered correctly and dummies for each of the five treatments. These are the variables of

primary interest. A positive (negative) interaction term indicates positive selection, with high ability

subjects more (less) likely to choose and complete Task A. Table 5 only reports the five interaction terms

to save space, but full regression output is available from the authors upon request. We are as interested in

the differences between interaction terms as the levels, so the middle four lines show differences between

the interaction terms for FIXED versus the other four treatments (PR, LA, HA135, and HA100) and the

bottom three lines show differences between the interaction term for the PR treatment and the interaction

terms for the three treatments with ambiguity (LA, HA135, and HA100). Robust standard errors are

reported in parentheses.

0%

10%

20%

30%

40%

50%

60%

70%

PR FIXED LA HA135 HA100

% C

hoos

ing

and

Com

plet

ing

Task

AFigure 4: Skill Level and Selection

0 -1 Correct 2 Correct 3-4 Correct

25

(Insert Table 5)

In both models, the interaction term for the PR treatment is positive and significant. This matches our

impression from Figure 4. High ability subjects are more likely to choose (or choose and complete) Task

A in the PR treatment. In no other treatment is there a positive and significant interaction effect.

Comparing differences in interaction terms between FIXED and the other treatments measures whether

selection into Task A is improved relative to a fixed payment. As expected, PR significantly improves

selection. The only other treatment that generates a significant improvement is HA135, and this is weak

from a statistical point of view and vanishes once we consider choosing and completing Task A. Looking

at the differences in interaction effects between PR and the treatments with ambiguity, these are always

negative and almost always significant. Positive selection is significantly stronger with an explicit piece

rate than ambiguous incentives. It is particularly striking that this holds for the LA treatment, where

subjects know that higher performance yields a higher bonus.

To summarize, use of an explicit piece rate generates positive selection, with high ability types more

likely to choose (or choose and complete) Task A. This vanishes with ambiguity. At best no selection on

skill level takes place and in some cases selection is reversed to favor low ability types choosing Task A.25

Result 5: In the PR treatment, high ability types are more likely to choose and complete Task A. This

selection effect is eliminated and possibly reversed with ambiguity. Other than the PR treatment, the data

is not consistent with Hypothesis 4 but does support Conjecture 1.

C. Performance: Our measure of performance is how many messages are coded correctly in Phase 2. A

message is counted as correct if the subject’s coding matches the modal coding for the population.26 This

is the criterion that is used in the experiment to determine whether or not a subject is paid the piece rate for

coding a message correctly.

Returning to Table 3, the number of messages coded correctly by subjects who choose and complete

Task A is higher in the PR treatment than in FIXED either with or without feedback. Performance in the

LA treatment falls between FIXED and PR either with or without feedback. Performance in the HA135

and HA100 treatments is only slightly higher than in FIXED without feedback (recall that feedback was

never provided in the high ambiguity treatments).

The preceding suggests that piece rate contracts and, to a lesser extent, incentive contracts with low

ambiguity improve performance. However, this may reflect the effect of selection rather than any direct

effect on subjects’ effort. As suggested by the analysis in Section 4.B, there are large differences between

treatments in the ability level of subjects who choose and complete Task A. This is documented in Table

6. The first column displays average performance by treatment on the five practice questions, limiting the

25 We have checked whether feedback affects selection into Task A, but find no significant results. 26 The population is subjects who are included in the dataset and coded the message in question.

26

sample to subjects who chose and completed Task A. As a point of comparison, the first row reports

average performance on the practice questions by all 696 subjects (this includes all subjects whether or not

they choose Task A). The next two columns give the results of t-tests comparing performance on the

practice questions with average performance across all subjects and performance by subjects who choose

and complete Task A in FIXED. Among subjects who choose and complete Task A, ability (as measured

by performance on the practice questions) is highest in the PR treatment, significantly higher than either

the population average or FIXED. None of the other treatments significantly improve ability over these

two benchmarks, and ability is significantly worse in HA100. This is admittedly crude statistical analysis,

but it should nonetheless reinforce the point that the selection processes differ between treatments.

[Insert Table 6 here]

The regressions reported in Table 7 examine how the treatments affect performance, measured by the

number of questions coded correctly in Task A by subjects who choose and complete Task A, and whether

these treatment effects change when we correct for selection. Model 1 is an OLS regression.27 The base is

the FIXED treatment without feedback, and the only independent variables are treatment dummies as well

as the usual control for subjects who did not indicate an interest in earning money. This regression measures

the treatment effects without controlling for selection effects. Compared to the base, the PR treatment

yields a significant increase in performance. There are no other significant treatment effects.

[Insert Table 7 here]

To control for selection, Models 2 and 3 include the number of practice questions answered correctly

as an independent variable. Model 3 also adds controls for demographics (age, income, education, and

gender), risk and ambiguity attitudes, and confidence. As an additional means of accounting for

selection, both regressions are Heckman selection models using the value of the outside option (the fixed

payment for Task B) as an instrument.

The parameter estimate for performance on the practice messages is large, positive, and easily

significant at the 1% level in Models 2 and 3. Controlling for selection, none of the treatments have an

effect on performance relative to FIXED. The effect of the PR treatment on coding performance is due to

selection effects, similar to the results of Dohmen and Falk (2011).

Result 6: Relative to the FIXED treatment, subjects who choose and complete Task A code more messages

correctly in the PR treatment. This treatment effect is due to selection and vanishes after controlling for

performance on the practice questions. The results do not support Conjecture 3.

27 There is little censoring in the data. Among subjects who complete Task A, only 10 of 281 (3.6%) code zero messages correctly and none coded all fifteen messages correctly. Running a tobit model has little effect on the results reported in Model 1.

27

5. Conclusions: Ambiguity is a common feature of real world incentive contracts. We show that it has

non-trivial effects on who accepts these contracts and how they perform. The most important effect is the

impact on selection. We replicate the common finding that piece rate contracts increase performance, with

the effect due to selection of high ability individuals into the piece rate contract. This positive selection

vanishes with all of the varieties of ambiguity we study, along with the associated bump in performance.

Low levels of ambiguity are quite natural in an incentive contract. Subjects in our LA treatment know that

better performance will yield higher pay, know how performance is measured, and even know the range of

possible bonuses. This is likely more than workers know in many settings. Yet even a low level of

ambiguity is sufficient to vastly reduce the efficacy of performance pay.

Low levels of ambiguity reduce the likelihood of accepting an incentive contract. All types of

ambiguity are associated with higher rates of quitting, with the effect being significant in the two high

ambiguity treatments. Given that conducting a hiring search is expensive, and having to replace workers

is even more expensive, this is further evidence that ambiguity harms the efficacy of performance pay.

Feedback about performance on the practice questions reduces the likelihood of choosing and

completing an incentive contract, suggesting that the effect of feedback on overconfidence is more

important than any associated reduction in uncertainty. We might expect feedback to help the firm in the

long run by reducing quits (due to overconfident workers never taking up the incentive contract rather than

accepting it, learning they aren’t as good at the job as they thought, and subsequently quitting). Our data

is not consistent with this conjecture as quitting is more frequent with feedback. Feedback is apparently a

complement for learning about one’s ability from experience rather than a substitute.

The positive effect of high ambiguity on willingness to choose Task A is not predicted by our model of

ambiguity. Holding the maximum payoff fixed, removing virtually all information about how bonuses

were determined increased the willingness to accept incentive contracts. The sensitivity of this result to the

maximum payoff suggests the use of a simple rule of thumb (consistent with the principle of insufficient

reason) under high ambiguity. Testing theories of ambiguity is not a primary purpose of this paper, but this

result does suggest that how individuals react to ambiguity varies with the nature of the uncertainty.

The usual caveats about generalizing results from experiments conducted in one specific setting apply

here, and we don’t expect the effects of ambiguous incentive contracts to be the same across all labor

markets. We anticipate that stronger effects will occur in entry level labor markets. This conjecture stems

from the nature of the ambiguity. Like our subjects, we expect that many workers face uncertainty about

the relationship between performance and pay as well as uncertainty about their likely performance. As

workers gain experience, they presumably learn along both of these dimensions, reducing any effects due

to ambiguity.

28

Our work raises a number of questions for future research. We find clear differences between the piece

rate and low ambiguity contracts. It is straight forward to design contracts that lie between these two cases,

such as specifying that the piece rate lies within a range rather than indicating a precise value. Knowing

how much ambiguity is needed to disrupt the strong positive selection observed with piece rate contracts

would help us understand how fragile the positive selection result is. On a different dimension, there are

benefits to the firm from having an ambiguous incentive contract. For instance, preserving flexibility is an

obvious advantage of an ambiguous incentive contract. Work is needed to determine the relative costs and

benefits of ambiguity in incentive contracts.

29

Bibliography

Abdellaoui, M., Baillon, A., Placido, L., & Wakker, P. P. (2011). The rich domain of uncertainty: Source functions and their experimental implementation. The American Economic Review, 101(2), 695-723.

Abeler, J., Falk, A., Goette, L., and Huffman, D. (2011). Reference Points and Effort Provision. American Economic Review, 101(2), 470-92.

Abowd, J. M., Kramarz, F., & Roux, S. (2006). Wages, Mobility and Firm Performance: Advantages and Insights from Using Matched Worker–Firm Data*. The Economic Journal, 116(512), F245-F285

Abraham, K. G & Spletzer, J. R., & Harper, M., (2010). Labor in the New Economy, NBER Books, National Bureau of Economic Research, Inc.

Ahn, D. S. (2008). Ambiguity without a state space. The Review of Economic Studies, 75(1), 3-28.

Ahn, D., Choi, S., Gale, D., & Kariv, S. (2009). Estimating ambiguity aversion in a portfolio choice experiment.

Amir, O., & Rand, D. G. (2012). Economic games on the internet: The effect of $1 stakes. PloS one, 7(2), e31461.

Barro, J., & Beaulieu, N. (2003). Selection and improvement: Physician responses to financial incentives (No. w10017). National Bureau of Economic Research.

Binmore, K., Stewart, L., & Voorhoeve, A. (2012). How much ambiguity aversion?. Journal of risk and uncertainty, 45(3), 215-238.

Buchinsky, M., Fougere, D., Kramarz, F., & Tchernis, R. (2010). Interfirm mobility, wages and the returns to seniority and experience in the United States. The Review of Economic Studies, 77(3), 972-1001

Buser, T., & Dreber, A. (2015). The flipside of comparative payment schemes. Management Science.

Cadsby, C. B., Song, F., & Tapon, F. (2007). Sorting and incentive effects of pay for performance: An experimental investigation. Academy of Management Journal, 50(2), 387-405.

Cerreia-Vioglio, S., Maccheroni, F., Marinacci, M., & Montrucchio, L. (2011). Uncertainty averse preferences. Journal of Economic Theory, 146(4), 1275-1330.

Charness, G., & Dufwenberg, M. (2006). Promises and partnership. Econometrica, 74(6), 1579-1601.

Charness, G., & Gneezy, U. (2010). Portfolio choice and risk attitudes: An experiment. Economic Inquiry, 48(1), 133-146.

Charness, G., Karni, E., & Levin, D. (2013). Ambiguity attitudes and social interactions: An experimental investigation. Journal of Risk and Uncertainty, 46(1), 1-25.

Chen, Y., Ho, T. H., & Kim, Y. M. (2010). Knowledge market design: A field experiment at google answers. Journal of Public Economic Theory, 12(4), 641-664.

Chen, Y., Katuščák, P., & Ozdenoren, E. (2007). Sealed bid auctions with ambiguity: Theory and experiments. Journal of Economic Theory, 136(1), 513-535.

Cooper, D., D’Adda, G., and Weber, R. (2013). Leadership in an Incentivized Real Effort Experiment. Florida State University working paper, May 2013.

De Winter, J. C. F., Kyriakidis, M., Dodou, D., & Happee, R. (2015). Using CrowdFlower to study the relationship between self-reported violations and traffic accidents. Procedia Manufacturing, 3, 2518-2525.

Dean, M., & McNeill, J. (2014). Preference for Flexibility and Random Choice: an Experimental Analysis (No. 2014-10).

30

Dohmen, T., & Falk, A. (2011). Performance pay and multidimensional sorting: Productivity, preferences, and gender. The American Economic Review, 101(2), 556-590.

Dohmen, T., Falk, A., Huffman, D., Sunde, U., Schupp, J., & Wagner, G. G. (2011). Individual risk attitudes: Measurement, determinants, and behavioral consequences. Journal of the European Economic Association, 9(3), 522-550.

Dominiak, A., Dürsch, P., & Lefort, J. P. (2012). A dynamic Ellsberg urn experiment. Games and Economic Behavior, 75(2), 625-638.

Ellsberg, D. (1961). Risk, ambiguity, and the Savage axioms. The Quarterly Journal of Economics, 643-669.

Ergin, H., & Gul, F. (2004). A subjective theory of compound lotteries. Journal of Economic Theory, 144, 899-929.

Eriksson, T., Poulsen, A., & Villeval, M. C. (2009). Feedback and incentives: Experimental evidence. Labour Economics, 16(6), 679-688

Eriksson, T., Teyssier, S., & Villeval, M. C. (2009). Self‐Selection and the Efficiency of Tournaments. Economic Inquiry, 47(3), 530-548.

Farber, H. S. (2010). Job loss and the decline in job security in the United States. In Labor in the New Economy (pp. 223-262). University of Chicago Press.

Fershtman, C., & Gneezy, U. (2011). The Tradeoff between Performance and Quitting in High Power Tournaments. Journal of the European Economic Association, 9(2), 318-336.

Gilboa, I., & Schmeidler, D. (1989). Maxmin expected utility with non-unique prior. Journal of mathematical economics, 18(2), 141-153.

Gill, D., Prowse, V., & Vlassopoulos, M. (2012). Cheating in the workplace: An experimental study of the impact of bonuses and productivity. Available at SSRN 2109698.

Gneezy, U., & Potters, J. (1997). An experiment on risk taking and evaluation periods. The Quarterly Journal of Economics, 631-645.

Griffeth, R. W., Hom, P. W., & Gaertner, S. (2000). A meta-analysis of antecedents and correlates of employee turnover: Update, moderator tests, and research implications for the next millennium. Journal of management, 26(3), 463-488

Halevy, Y. (2007). Ellsberg revisited: An experimental study. Econometrica, 75(2), 503-536.

Halevy, Y., & Feltkamp, V. (2005). A Bayesian approach to uncertainty aversion. The Review of Economic Studies, 72(2), 449-466.

Herrmann, M. (2012). Inside Straight: Analyzing “Black-Box” compensation. Above the Law. Retrieved 9/2012. From http://abovethelaw.com/2012/08/inside-straight-analyzing-black-box-compensation/

Hogarth, R., & Villeval, M. C. (2010). Intermittent reinforcement and the persistence of behavior: Experimental evidence.

Horton, J. J., Rand, D. G., & Zeckhauser, R. J. (2011). The online laboratory: Conducting experiments in a real labor market. Experimental Economics, 14(3), 399-425.

Houser, D., & Xiao, E. (2011). Classification of natural language messages using a coordination game. Experimental Economics, 14(1), 1-14

Hurwicz, L. (1951). Optimality criteria for decision making under ignorance. Cowles commission papers, 370.

31

Ipeirotis, P. (2010). Demographics of mechanical turk.

Job Openings and Labor Turnover Summary. (2013). Economic News Release, Bureau of Labor Statistics, Government of United States. Retrieved May 8, 2013, from http://www.bls.gov/news.release/jolts.nr0.htm.

Johnson, D., & Weinhardt, J. (2015). The effect of financial goals and incentives on labor. An experimental test in an online market (No. 2014-85). Department of Economics, University of Calgary.

Klibanoff, P., Marinacci, M., & Mukerji, S. (2005). A smooth model of decision making under ambiguity. Econometrica, 73(6), 1849-1892.

Klibanoff, P., Marinacci, M., & Mukerji, S. (2009). Recursive smooth ambiguity preferences. Journal of Economic Theory, 144(3), 930-976.

Lazear, E. (2000). Performance Pay and Productivity. American Economic Review, 90(5), 1346-1361.

Liu, T. X., Yang, J., Adamic, L. A., & Chen, Y. (2011). Crowdsourcing with all‐pay auctions: A field experiment on Taskcn. Proceedings of the American Society for Information Science and Technology, 48(1), 1-4.

Lucas, S. (2012). How much does it cost companies to lose employees?. cbsnews. Retrieved 5/2013. From http://www.cbsnews.com/8301-505125_162-57552899/how-much-does-it-cost-companies-to-lose-employees/

Nau, R. F. (2006). Uncertainty aversion with second-order utilities and probabilities. Management Science, 52(1), 136-145

Niederle, M., & Vesterlund, L. (2007). Do women shy away from competition? Do men compete too much?. The Quarterly Journal of Economics, 122(3), 1067-1101.

Paolacci, G., Chandler, J., & Ipeirotis, P. (2010). Running experiments on amazon mechanical turk. Judgment and Decision Making, 5(5), 411-419.

Perman, C. (January, 2013). 2013 May be the Year Employees say ‘I Quit!’. CNBC. Retrieved 4/2013. From http://www.cnbc.com/id/100359891

Rosaz, J., Slonim, R., & Villeval, M. C. (2012). Quitting and peer effects at work.

Schmeidler, D. (1989). Subjective probability and expected utility without additivity. Econometrica: Journal of the Econometric Society, 571-587.

Segal, U. (1987). The Ellsberg paradox and risk aversion: An anticipated utility approach. International Economic Review, 175-202.

Seo, K. (2009). Ambiguity and Second‐Order Belief. Econometrica, 77(5), 1575-1605.

SOEP Group. (2001). The German Socio-Economic Panel (GSOEP) after more than 15 years-overview. Vierteljahrshefte zur Wirtschaftsforschung, 70(1), 7-14.

Suri, S., & Watts, D. J. (2011). Cooperation and contagion in web-based, networked public goods experiments. PLoS One, 6(3), e16836.

Wang, S., Huang, C. R., Yao, Y., & Chan, A. (2015). Mechanical Turk-based Experiment vs Laboratory-based Experiment: A Case Study on the Comparison of Semantic Transparency Rating Data.

Yeomans, K. A., & Golder, P. A.. (1982). The Guttman-Kaiser Criterion as a Predictor of the Number of Common Factors. Journal of the Royal Statistical Society. Series D (the Statistician), 31(3), 221–229.

32

Table 1: Subject Characteristics and Task Performance

Variable Obs Mean StDev

Male 696 0.57 0.50

Age 696 31.58 9.94

Indian 696 0.58 0.49

American 696 0.29 0.46

College Graduate 696 0.75 0.43

Income < $12.5K 696 0.37 0.48

Income ≥ $25K 696 0.40 0.49

Primary Income 696 0.24 0.43

Motivated by Earning Money 696 0.97 0.16

Table 2: Count of Subjects by Treatment

Recruited Usable Observations

Outside Option $0.30 $0.40 $0.50 $0.30 $0.40 $0.50

Piece Rate, No Feedback 35 35 35 29 26 30

Piece Rate, Feedback 35 35 50 23 31 36

Fixed Pay, No Feedback 35 35 35 28 27 24

Fixed Pay, Feedback 35 37 35 29 33 31

Low Ambiguity, No Feedback 35 34 35 30 24 32

Low Ambiguity, Feedback 35 35 35 28 28 30

High Ambiguity (Max = $1.35) 35 34 73 31 28 58

High Ambiguity (Max = $1.00) 0 0 75 0 0 60

33

Table 3: Summary of Results by Treatment

Choose Task A Quit (if choosing

Task A) Choose and complete

Task A # Correct (if choosing

and completing Task A)

Piece Rate, No Feedback 55.3% 0.0% 55.3% 4.74

Piece Rate, Feedback 47.7% 9.3% 43.3% 4.97

Fixed Pay, No Feedback 39.2% 22.6% 30.4% 3.63

Fixed Pay, Feedback 30.1% 28.6% 21.5% 4.30

Low Ambiguity, No Feedback 46.5% 7.5% 43.0% 4.41

Low Ambiguity, Feedback 36.0% 9.7% 32.6% 4.68

High Ambiguity (Max = $1.35) 63.2% 9.5% 57.3% 4.06

High Ambiguity (Max = $1.00) 36.7% 13.6% 31.7% 3.63

34

Table 4: Regression Analysis of Treatment Effects on Choosing and Completing of Task A

Model 1 Model 2 Model 3 Model 4 Model 5 Model 6

Dependent Variable Select Task A Dropout from Task A Select and Complete Task A

Fixed Pay -0.469** (0.188)

-0.580*** (0.198)

1.264*** (0.377)

1.205*** (0.387)

-0.689*** (0.191)

-0.784*** (0.201)

Low Ambiguity -0.287** (0.138)

-0.308** (0.143)

0.392 (0.325)

0.214 (0.320)

-0.324** (0.139)

-0.331** (0.144)

High Ambiguity (Max = $1.35) 0.252

(0.171) 0.184

(0.175) 0.748** (0.323)

0.650* (0.376)

0.110 (0.169)

0.070 (0.173)

High Ambiguity (Max = $1.00) -0.202 (0.216)

-0.231 (0.219)

0.921** (0.464)

0.976** (0.468)

-0.344 (0.218)

-0.358 (0.223)

Feedback * Fixed -0.238 (0.199)

-0.166 (0.209)

0.149 (0.379)

0.316 (0.391)

-0.264 (0.209)

-0.222 (0.215)

Feedback * (Piece Rate + Low Ambiguity)

-0.215 (0.138)

-0.234 (0.143)

0.536* (0.315)

0.614** (0.313)

-0.276** (0.138)

-0.311** (0.145)

Outside 40 -0.338***

(0.130) -0.378***

(0.134)

-0.369*** (0.131)

-0.410*** (0.134)

Outside 50 -0.643***

(0.125) -0.710***

(0.131)

-0.600*** (0.126)

-0.659*** (0.133)

Male 0.177

(0.105)*

0.054 (0.234)

0.177* (0.106)

SOEP Risk Questions (Factor)

0.030

(0.060)

.103 (.125)

.003

(.061) Hypothetical Investment

Question

0.002 (0.002)

0.007* (0.004)

0.001

(0.002)

Ambiguity Aversion 0.043

(0.081)

-0.385** (0.178)

0.148* (0.082)

Confidence 0.056

(0.062)

-0.274** (0.128)

0.114* (0.065)

Log Pseudo Likelihood -449.12 -434.56 -549.58 -521.57 -435.83 -419.86

Notes: All regressions are based on 696 observations. Numbers in parentheses are robust standard errors. Three (***), two (**), and one (*) stars indicate statistical significance at the 1%, 5%, and 10% levels.

35

Table 5: Regression Analysis of Selection and Skill

Model 1 Model 2 Dependent Variable Select Task A Select and Complete Task A

Practice Questions Correct * Piece Rate

0.233** (0.095)

0.285*** (0.096)

Practice Questions Correct * Fixed

-0.258** (0.107)

-0.097 (0.110)

Practice Questions Correct * Low Ambiguity

-0.039 (0.095)

0.016 (0.096)

Practice Questions Correct * High Ambiguity (Max = $1.35)

0.046 (0.110)

-0.016 (0.107)

Practice Questions Correct * High Ambiguity (Max = $1.00)

-0.359** (0.181)

-0.468** (0.202)

Difference in Interaction Effects (vs. Fixed)

Practice Questions Correct * (PR – Fixed)

0.491*** (0.144)

0.382*** (0.146)

Practice Questions Correct * (LA – Fixed)

0.218 (0.143)

0.113 (0.146)

Practice Questions Correct * (HA135 – Fixed)

0.304* (0.154)

0.081 (0.154)

Practice Questions Correct * (HA100 – Fixed)

-0.101 (0.210)

-0.370 (0.230)

Difference in Interaction Effects (vs. Piece Rate)

Practice Questions Correct * (LA – PR)

-0.273** (0.134)

-0.269** (0.135)

Practice Questions Correct * (HA135 – PR)

-0.187 (0.145)

-0.301** (0.143)

Practice Questions Correct * (HA100 – PR)

-0.592*** (0.204)

-0.752*** (0.224)

Notes: All regressions are based on 696 observations. Numbers in parentheses are robust standard errors. Three (***), two (**), and one (*) stars indicate statistical significance at the 1%, 5%, and 10% levels.

36

Table 6: Ability by Treatment Subject to Choosing and Completing Task A

Average Practice Correct T-test vs. Population Average T-Test vs. Fixed

All Subjects 1.93 --- --- PR 2.26 3.10*** 2.68*** FIXED 1.75 1.09 --- LA 2.05 0.80 1.33 HA135 1.81 0.93 0.26 HA100 1.21 2.90*** 1.79*

Notes: Average reported for “All Subjects” based on all subjects whether or not they choose Task A. Three (***), two (**), and one (*) stars indicate statistical significance at the 1%, 5%, and 10% levels.

37

Table 7: Analysis of Performance

Model 1 Model 2 Model 3

Model Type OLS Heckman Correction

Piece Rate 1.128** (0.547)

0.520 (0.605)

0.065 (0.675)

Low Ambiguity 0.887

(0.566) 0.516

(0.552) 0.062

(0.573)


(0.491) 0.308

(0.577) -0.095 (0.647)


(0.662) 0.624

(0.555) 0.335

(0.560)

Feedback * Fixed 0.661

(0.684) 0.576

(0.641) 0.065

(0.633)

Feedback * (Piece Rate + Low Ambiguity)

0.193 (0.439)

0.134 (0.414)

0.261 (0.441)

Correct Practice 1.113*** (0.110)

0.959*** (0.125)

Notes: Model 1 is based on 281 observations. Models 2 and 3 are based on 696 observations. Numbers in parentheses are robust standard errors. Three (***), two (**), and one (*) stars indicate statistical significance at the 1%, 5%, and 10% levels.

38

For Online Publication

Appendix A: Full Test of Hit

Consent

Informed Consent: I chose to voluntarily to participate in this research project. The purpose of this work is to study the use of crowd sourcing to perform basic data analysis tasks. I have been recruited for this study through Mechanical Turk. Only persons 18 years of age or older may participate, and I affirm that I am 18 years of age or older. This study involves reading hand written messages in English. Only individuals who read and write English may participate. I affirm that I can read and write in English.

The initial phase of this study will take about 5 minutes to complete and will involve me answering a number of survey questions. If I complete the first phase of the study, I will be eligible for a second phase where I can earn a bonus by either completing a basic data analysis task, coding verbal responses, or performing a transcription task. The second phase of the study will last 8 - 10 minutes.

I will earn 20 cents (USA) for successfully completing the first phase of the study. I will then be given details on the second phase of the study, including an explanation of what bonuses I will be eligible to earn.

I am free to withdraw from the study at any time and without incurring the ill will of the researchers. I will only be paid if I complete both phases of the task.

There are no known risks or benefits from this study beyond those from any typical activity you might do in an online environment. This study will benefit society by helping researchers to better understand the use of crowd sourcing to analyze data.

The confidentiality of any personal information will be protected to the extent allowed by law. To the extent allowed by law, only the researcher and any research assistants conducting this experiment will have access to the data from this study. My name will not be reported with any results related to this research.

I can obtain further information from David Johnson ([email protected]). If I have questions concerning my rights as a research subject, I can call the FSU Human Subjects Committee office at 1-850-644-8836 or email them at [email protected].

I may ask questions at any time via email ([email protected]). Please feel free to contact us at this email address if you have any questions. Should new information become available during the course of this study about risks or benefits that might affect my willingness to continue in this research project, it will be given to me as soon as possible.

By clicking on the checkbox below, you indicate consent to participate in this study.

Phase 1

1. What is your gender?

2. What is your age?

3. What country do you currently live in?

39

4. What is your Nationality?

5. Which of the following best describes your highest achieved education level? Dropdown box. Categories are: (1) High School; (2) High School Graduate; (3) Some college, no degree; (4) Associates degree; (5) Bachelors degree; (6) Graduate degree (Masters, Doctorate, etc.).

6. Over the weekend, Bob watched 2 football games. On the scale below, mark the number of football games Bob watched over the weekend. Scale goes from 0 to 4.

7. What is the total income of your household? Dropdown box. Categories are: (1) 12,500; (2) 12,500 - 24,999; (3) 25,000 - 37,499; (4) 37,500-49,999; (5) 50,000 - 62,499; (6) 62,500 - 74,999; (7) 75,000 - 87,499; (8) 87,500-99,999; (9) 100,000 or more.

8. Why do you complete tasks in Mechanical Turk? Please check any of the following that applies: Categories are: (1) Fruitful way to spend free time and get some cash; (2) for primary income purposes (e.g. gas, bills, groceries, credit cards; (3) for secondary income purposes, pocket change (for hobbies, gadgets, going out); (4) to kill time; (5) I find the task to be fun; and (6) I am currently unemployed, or have only a part time job.

9. Next year, Jack and Jill are planning on visiting Disneyland. Jill has been to Disneyland many times while Jack has never been (0 times). On the scale below, mark how often Jack has been to Disneyland. Scale goes from 0 to 4.

10. How do you see yourself: Are you generally a person who is fully prepared to take risks or do you try to avoid taking risks? Please tick a box on the scale, where the value 0 means: "risk averse" and the value 10 means: "fully prepared to take risks". You can use the values in between to make your estimate.

11. People can behave differently in different situations. How would you rate your willingness to take risks in the following areas? Please tick a box on the scale, where the value 0 means: "risk averse" and the value 10 means: "fully prepared to take risks". You can use the values in between to make your estimate. The six categories are: (1) while driving; (2) in financial matters; (3) during leisure and sport; (4) in your occupation; (5) with your health; (6) your faith in other people.

12. Please consider what you would do in the following situation: Imagine that you had won 100,000 dollars in the lottery. Almost immediately after you collect the winnings, you receive the following financial offer from a reputable bank, the conditions of which are as follows: There is the chance to double the money within two years. It is equally possible that you could lose half of the amount invested. You have the opportunity to invest the full amount, part of the amount or reject the offer. What share of your lottery winnings would you be prepared to invest in this financially risky, yet lucrative investment? Please tick a box in each line of the scale

13. Suppose there is a bag containing 90 balls. You know that 30 are red and the other 60 are a mix of black and yellow in unknown proportion. One ball is to be drawn from the bag at random. You are offered a choice to (a) win $ 100 if the ball is red and nothing if otherwise, or (b) win $100 if it's black and nothing if otherwise. Which do you prefer?

40

14. The bag is refilled as before, and a second ball is to drawn from the bag at random. You are offered a choice to (c) win $ 100 if the ball is red or yellow, or (d) win $ 100 if the ball is black or yellow. Which do you prefer?

Phase 2

Thank you for completing the survey. In the next part of the HIT you will have the option of completing one of two possible tasks. Completing either of the tasks will earn you a bonus beyond the 20 cents you earned for completing the survey. You will only be paid if you complete both phases of the HIT.

In the first Task (Task A) you will be asked to classify a series of short messages sent by players participating in a social science experiment involving monetary incentives, trust, and reciprocity. You will get more instructions concerning this on the next page.

In the second Task (Task B), you will be asked to type a word that appears in an image. We have calibrated the tasks so that each task takes about the same amount of time. The words are fairly simple. If you are paying attention, you should have no problem typing all of the words correctly.

Before you make your selection, we have some messages that we would like you to classify for practice. Please use these practice messages to become familiar with Task A. After you have completed the practice messages you will be asked to select between Task A and B. Your bonus will be paid in addition to the amount that you made completing the survey. When you are ready, please click the "Begin Practice" button.

As previously discussed in this HIT, you will be asked to classify short messages sent by players participating in a social science experiment. In this experiment, subjects were randomly grouped into pairs to play a single game. The game had two roles, an A and B player.

The game is shown in the tree below. At the start of the game the A player had the option to select IN or OUT. If the A player selected OUT, both A and B earned 5 dollars. If A selected IN, the B player was then given the option to choose ROLL or DON'T ROLL. If the B player selected DON"T ROLL, the B player would earn 14 dollars and the A player would earn 0.

41

Alternatively if the B player selected ROLL, the B player would earn 10 dollars while a 6 sided dice would be rolled to determine the A player’s earning. If the dice came up 1, the A player would earn 0; if the dice came up 2 through 6, the A player would earn 12 dollars. The game ended when payoffs were assigned. Subjects were paid their earnings, in cash, and had no further interaction with the other player (or any other experimental subject).

The messages you are coding were sent by B players to A players. These were sent at the beginning of the game, before the A player decided whether or not they would select IN or OUT. The players did not know each other’s identities (i.e. play was anonymous) and had no opportunity to communicate other than the pre-game message by the B player.

Before you begin coding, please look carefully at the description of the game to make certain you understand this material. If you would like clarification, you can email [email protected] with questions.

MAIN INSTRUCTIONS:

Please classify each of these handwritten messages using the key located below the handwritten text. You can classify a message by clicking the checkbox adjacent to the classification. Note, each message may have more than one possible classification. You should check all that you think apply.

A description of the available classifications for the messages follows. Please read through this material carefully. Before you begin it is important for you to know how to code the messages. If you would like clarification, you may email [email protected] with questions.

Promise to roll (explicit or implied): Check this box if the message being sent either explicitly states that the sender will choose Roll or implies that the sender will choose Roll.

42

Appeal for Trust: Check this box if the message sent is asking the recipient, either explicitly or implicitly, to trust the sender of the message.

Appeal for Individual Monetary Incentive: Check this box if the message being sent asks the subject to make a selection that would be in the recipient's own monetary interest.

Appeal to joint payoffs: Check this box if the message being sent by the sender suggests that the recipient make a decision that will result in both individuals making more money.

Appeal to fairness: Check this box if the message being sent by the sender suggests that the recipient make a fair decision such as a decision that results in both players earning a more equal amount.

Attempt to build rapport (e.g. jokes, friendly banter): Check this box if the message being sent is an attempt to establish a personal connection with the recipient through the use of jokes or friendly banter.

None of the Above: Check this box if none of the above classifications are applicable to the message.

Before you begin, please consider the examples below:

Example 1

This example would be coded for two categories, “Promise to roll” and “Appeal to joint payoffs.”

Example 2

This example would be coded as one category, “Promise to roll.”

Please begin when you are ready. When you are finished, please hit the "NEXT" button. If you have trouble reading the text, you can zoom in by holding "Ctrl" and pressing "+".

Practice Questions

1) I’m going to roll 2) Take a risk 3) I have laundry to do tonight and I really don’t want to do it1 but I don’t have clean underwear left and I

don’t want to go commando tomorrow. We’ll see what I decide tonight. This man acts funny doesn’t he? But he seems cool, he’s quite the character. All this mystery is kinda cool.

43

4) The fairest thing to do is if you opt “In”. Then I will proceed to choose “roll”. That way you and I have 5/6 chance to make money for the both of us. That is much better than just making $ 5.00 each increase both our chances. Thanks.

5) Choose in, I will roll dice, your are 5/6 likely to get 2,3,4,5, or 6 → $ 12. This way both of us will win something. How well do you think you performed the practice task in comparison to other Turkers? Subject marked their answer using a Likert scale ranging from well below average to well above average. If in feedback they see the following pop-up box:

-------------------------------------------------------------------

FIXED

You will now select between the two tasks. Task A consists of coding messages like what you did during in the practice task. However, Task A will have more messages and will not include any messages you saw during the practice task. In Task B you will type a word in the textbox below the word. Remember, each task takes about the same amount of time to complete.

If you select Task A, you will earn a bonus of 40 cents for certain for completing the task (i.e. coding all of the messages).

If you select Task B, you will be paid a bonus of 50 cents if you successfully type all of the words. Task B is designed to be sufficiently easy that you will earn this bonus for certain as long as you pay attention. Your bonus will be paid in addition to the amount that you made

44

completing the survey. You will only be paid if you complete either Task A or Task B in the second phase of the HIT.

Please select the task that you wish to complete.

---------------------------------------------------------------------

HA


If you select Task A, you will earn a bonus somewhere between 0 and 135 cents for completing the task (i.e. coding all of the messages).

If you select Task B, you will be paid a bonus of 30 cents if you successfully type all of the words. Task B is designed to be sufficiently easy that you will earn this bonus for certain as long as you pay attention. Your bonus will be paid in addition to the amount that you made completing the survey. You will only be paid if you complete either Task A or Task B in the second phase of the HIT.


-----------------------------------------------------

LA


If you select Task A, you will earn a bonus somewhere between 0 and 135 cents for completing the task (i.e. coding all of the messages). The precise bonus you earn will depend on how well you do at coding messages correctly.

A message is counted as "correct" if your coding matches the coding chosen by the most people participating in this study. Consider the example shown below:

45

Example 1

Suppose that 50% of participants code "Promise to Roll", and "Appeal to Joint Payoffs", 20% code only "Promise to Roll", 15% code only "Appeal to Joint Payoffs", and 15 % code "None of the Above". To get credit for a correct coding you must code both "Promise to Roll" and "Appeal to Joint Payoffs".

Example 2

Suppose that 35% of participants code "Promise to Roll", 25% "Appeal to Joint Payoffs" and

"Promise to Roll", 30% code only "Appeal to Joint Payoffs", and 10 % code "None of the Above". To get credit for a correct coding you must code only "Promise to Roll."

If you select Task B, you will be paid a bonus of 30 cents if you successfully type all of the words. Task B is designed to be sufficiently easy that you will earn this bonus for certain as long as you pay attention. Your bonus will be paid in addition to the amount that you made completing the survey. You will only be paid if you complete either Task A or Task B in the second phase of the HIT.


----------------------------------------------------------------------------------------------

PR


If you select Task A, you will earn a bonus of 9 cents for each message you code correctly. There are 15 messages, so if you select Task A, you will earn a bonus somewhere between 0 and 135 cents for completing the task (i.e. coding all of the messages).

46

A message is counted as "correct" if your coding matches the coding chosen by the most people participating in this study. Consider the example shown below:

Example 1

Suppose that 50% of participants code "Promise to Roll", and "Appeal to Joint Payoffs", 20% code only "Promise to Roll", 15% code only "Appeal to Joint Payoffs", and 15 % code "None of the Above". To get credit for a correct coding you must code both "Promise to Roll" and "Appeal to Joint Payoffs".

Example 2

Suppose that 35% of participants code "Promise to Roll", 25% "Appeal to Joint Payoffs" and

"Promise to Roll", 30% code only "Appeal to Joint Payoffs", and 10 % code "None of the Above". To get credit for a correct coding you must code only "Promise to Roll."

If you select Task B, you will be paid a bonus of 30 cents if you successfully type all of the words. Task B is designed to be sufficiently easy that you will earn this bonus for certain as long as you pay attention. Your bonus will be paid in addition to the amount that you made completing the survey. You will only be paid if you complete either Task A or Task B in the second phase of the HIT

Messages if Worker Selected Task A

1) You can have the 2 extra dollars. I’ll be nice and choose to roll. 2) Stay IN, I really need the money. 3) Tee hee, this is kinda Twilight Zone – ism; Why not “go for it”, eh? I hope you have a lovely

evening as well. 4) Please choose In so we can get paid more. 5) If you choose in I’ll roll. P In R Why? If you choose out, we walk out with $10 each. If you

choose IN & I choose IN then both of us coin. So it’s a compromise. By agreeing to this I guarantee myself more $ than risking you choose out. So if you choose out I get $10 ($5 diff.) if you choose in I get $15 vs. $19 ($4 diff.). that’s why

6) Hello fair stranger, anonymous partner _ _ _ Choose whatever you want. Far be it from me to influence your decision, but I think you should choose “in” and I should choose “roll” and we should take the chance at both earning as much as we can. 5 chances out of 6 say it’ll work, and

47

I’m totally broke, looking to rake in stray cash however I can. I feel the luck in the air. E In R I don’t really have much else to say. Hope you’re doing well, whoever you are. Yes. That’s all. Random note from random human

7) Have a Happy day 8) If you stay in, the chances of the die coming up other than 1 are 5 in 6 – pretty good. Otherwise,

we’d both be stuck at $5. (If you opt out) 9) If you will choose “In”, I will choose to roll. This way, we both have an opportunity to make

more than $5! 10) You’ll still be gaining more than if I had chosen Don’t roll. 11) Good luck I do not know what I’m going to do, so I have no hints on how to advise you on

choosing “in” or “out.” Though it would be beneficial for me to pick don’t roll and hope you pick “in”, I also like to give you a chance to gain some cash. Who knows?

12) Choose “In” so we can both make some $$ What are the chances me rolling a 1? I’ll try my best. 13) If I roll a 2–6 (you’ll know when you receive the $, you will give $5.00 to a stranger. P In R

[[[then there is a line, under which is written “Sign here if you are so kind]]] Thanks. 14) Hi, well I’m going to Roll so you have at least a shot for more money. I hope it works out. 15) CHOOSE IN, SO WE CAN ROLL AND GET $12 AND $10.

Words if worker selected Task B

1) Automobile 2) Calamari 3) Dinosaur 4) Hootenanny 5) Juxtapose 6) Luminary 7) Mallard 8) Molasses 9) Plateau 10) Rupture 11) Subliminal 12) Turtle 13) Unleaded 14) Zombie 15) Attention

48

Appendix B: Proofs of Propositions

Proof of Proposition 1: Let e** be the optimal effort if Task A is chosen under μ’. Since is a concave

function, (A.1) must be true.

max max

min min

b b

s b s b

b f s,b,e* c(e*) db d ' b f s,b,e* c(e*) db d > K

(A.1)

By revealed preference, (A.2) must be true and the result follows. Q.E.D.

max max

min min

b b

s b s b

b f s,b,e** c(e**) db d ' b f s,b,e* c(e*) db d '

(A.2)

In the setup for Proposition 2, when the bonus rate is restricted to be less than b, the probability distribution

for bonuses (subject to s and e) is derived from the original probability distribution via Bayes Rule. For

each s Δ let f(s,b,e,b) objective probability of earning b in state s subject to choosing e and restricting the

bonus to be weakly less than b where b < bmax. Equation (5) derives f(s,b,e,b) from f(s,b,e) via Bayes

Rule:

min

b

b

f s,b,ef s,b,e, b if b b

f s, ,e d

0 if b > b

Proof of Proposition 2: Let EB(s,e,b) be the objective expected bonus in state s subject to choosing e

and restricting the bonus to be weakly less than b. For every state of the world j, EB(s,e,b) < EB(s,e).

Suppose the subject chooses Task A with the restricted bonus. Since is an increasing function, the

value of Task A must be strictly higher without the restriction. A contradiction follows Q.E.D.

The preceding assumes that all states of the world remain possible with restriction of the maximum

bonus, but having some states become impossible is easily incorporated into the theory. Assign zero

subjective probability to all states of the world that never yield a payoff less than or equal to b. Update the

subjective probability of the remaining states of the world using Bayes Rule. For all remaining states,

update the probability of each possible bonus using Bayes Rule as above. Proposition 2 extends to this

richer version of the model.

49

Proof of Propostion 3: Holding e fixed, the term below is an increasing function of ability (α).

max

min

b

s b

b f s, ,b,e c(e) db d

If ν' first order stochastic dominates ν, the preceding implies that the following relationship is true hold e

fixed.

max max

min min

b b1 1

0 s b 0 s b

b f s, ,b,e c(e) db d d ' b f s, ,b,e c(e) db d d

Suppose that e solves an individual’s optimization problem under ν. The preceding implies that the

individual has a strictly higher value for Task A using e under ν'. By revealed preference, the value of

Task A must again be (weakly) higher if the individual is optimizing under ν'. Since (K) is unaffected

by changes in the subjective distribution over ability, the result follows. Q.E.D.

ambiguity in performance pay: a real effort...

Documents