abstract - druid · 2019-09-03 · these topcoder contests allow observing contestants’ ability...

30
Paper to be presented at the DRUID Academy Conference 2018 at University of Southern Denmark, Odense, Denmark January 17-19, 2018 The Effect of Peer Pressure on Performance in Crowdsourcing Contests Jonas Heite Max Planck Institute for Innovation and Competition Innovation and Entrepreneurship Research [email protected] Karin Hoisl University of Mannheim Chair of Organization and Innovation [email protected] Abstract In the present study, we investigate whether and why performance differences exist between contestants with the same abilities but who compete against more skilled or less skilled contestants. Performance in contests is a function of the ability and the effort of the contestants. Whereas the ability of the individual contestant is exogenous, the effort can be influenced by the design of a contest. The design configurations that have been studied, so far, include the structure and level of prizes (Ehrenberg and Bognanno 1990, Orszag 1994), the number of contestants (Boudreau et al. 2016, Garcia and Tor 2009), or the composition of the group of contestants (Brown 2011, Konrad 2009, Tanaka and Ishino 2012). Recently, ability-configurations of groups of contestants have attracted notable attention of economists and management scholars. Competing against contestants with the same ability provides incentives to maximize effort in order to win a prize. Competing against contestants with a higher ability, on the contrary, was shown to decrease the performance of the lower-ability contestants. The negative relationship is even more pronounced in tournaments that contain ?star performers?, since contestants reduce their effort, because they assume that it will not suffice to overcome the ability gap between them and the other contestants (Brown 2011). Competing against contestants with a lower ability should also lead to a reduction in effort caused by a feeling of superiority to the other contestants (Tanaka and Ishino 2012). To shed light on the mechanisms causing performance differentials, we use data on crowdsourcing contests hosted on the topcoder platform. The data allow us to implement a Regression Discontinuity Design analysis. We compare two groups of contestants characterized by the same abilities. One group competes against contestants who are equally or more skilled, the other group competes against contestants who are equally or less skilled. Based on the literature above and in case the two groups act rationally, we expect similar performance for both groups. Interestingly, our results show that bottom-performers of a high-ability group are

Upload: others

Post on 11-Mar-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Abstract - DRUID · 2019-09-03 · These topcoder contests allow observing contestants’ ability and performance based on objective measures. As mentioned above, we study software

Paper to be presented at the DRUID Academy Conference 2018at University of Southern Denmark, Odense, Denmark

January 17-19, 2018

The Effect of Peer Pressure on Performance in Crowdsourcing Contests

Jonas HeiteMax Planck Institute for Innovation and Competition

Innovation and Entrepreneurship [email protected]

Karin HoislUniversity of Mannheim

Chair of Organization and [email protected]

AbstractIn the present study, we investigate whether andwhy performance differences exist betweencontestants with the same abilities but whocompete against more skilled or less skilledcontestants. Performance in contests is a functionof the ability and the effort of the contestants.Whereas the ability of the individual contestant isexogenous, the effort can be influenced by thedesign of a contest. The design configurationsthat have been studied, so far, include thestructure and level of prizes (Ehrenberg andBognanno 1990, Orszag 1994), the number ofcontestants (Boudreau et al. 2016, Garcia and Tor2009), or the composition of the group ofcontestants (Brown 2011, Konrad 2009, Tanakaand Ishino 2012).

Recently, ability-configurations of groups ofcontestants have attracted notable attention ofeconomists and management scholars. Competingagainst contestants with the same ability providesincentives to maximize effort in order to win aprize. Competing against contestants with ahigher ability, on the contrary, was shown todecrease the performance of the lower-ability

contestants. The negative relationship is evenmore pronounced in tournaments that contain?star performers?, since contestants reduce theireffort, because they assume that it will not sufficeto overcome the ability gap between them andthe other contestants (Brown 2011). Competingagainst contestants with a lower ability shouldalso lead to a reduction in effort caused by afeeling of superiority to the other contestants(Tanaka and Ishino 2012).

To shed light on the mechanisms causingperformance differentials, we use data oncrowdsourcing contests hosted on the topcoderplatform. The data allow us to implement aRegression Discontinuity Design analysis. Wecompare two groups of contestants characterizedby the same abilities. One group competes againstcontestants who are equally or more skilled, theother group competes against contestants whoare equally or less skilled. Based on the literatureabove and in case the two groups act rationally,we expect similar performance for both groups.Interestingly, our results show thatbottom-performers of a high-ability group are

Page 2: Abstract - DRUID · 2019-09-03 · These topcoder contests allow observing contestants’ ability and performance based on objective measures. As mentioned above, we study software

characterized by a performance that is 36 % lowerthan that of contestants who have the same skilllevel but compete as top-performers of alow-ability group.

To investigate the mechanisms causing theseresults, we investigate the behavior of thecontestants. First evidence shows that thebottom-performers of a high-ability group putmore effort into their task by trying to solve moredifficult problems than individuals who competeagainst equally or less skilled contestants. Sincerational behavior cannot explain thisoverinvestment in effort of the former group, wetest the explanatory power of behavioral factorslike the willingness to take risks (Konrad andLommerud 1993), or mistakes (Camerer et al.2011, Sheremeta 2014).

References

Boudreau, K.J., Lakhani, K.R., Menietti, M. (2016).Performance responses to competition across skilllevels in rank?order tournaments: field evidenceand implications for tournament design. TheRAND Journal of Economics, 47(1), 140-165.

Brown, J. (2011). Quitters Never Win: The(Adverse) Incentive Effect of Competing withSuperstars. Journal of Political Economy, 119,982-1013.

Camerer, C.F., Loewenstein, G., Rabin, M. (2011).Advances in behavioral economics, PrincetonUniversity Press.

Ehrenberg, R.G., Bognanno, M.L. (1990). DoTournaments Have Incentive Effects? Journal ofPolitical Economy, 98, 1307-1324.

Garcia, S.M., Tor, A. (2009). The N-Effect: MoreCompetitors, Less Competition. PsychologicalScience, 20, 871-877.

Konrad, K.A. (2009). Strategy and Dynamics inContests. Oxford, UK: Oxford University Press.

Konrad, K.A., Lommerud, K.E. (1993). Relativestanding comparisons, risk taking, and safetyregulations. Journal of Public Economics, 51(3),345-358.

Sheremeta, R.M. (2014). Behavior in Contests.MPRA Paper No. 57451, July 21, 2014,

http://mpra.ub.uni-muenchen.de/57451.

Tanaka, R., Ishino, K. (2012). Testing the IncentiveEffects in Tournaments with a Superstar. Journalof the Japanese and International Economies, 26,393-404.

Page 3: Abstract - DRUID · 2019-09-03 · These topcoder contests allow observing contestants’ ability and performance based on objective measures. As mentioned above, we study software

1

Peer Pressure in Crowdsourcing Contests

Jonas Heite Max Planck Institute for Innovation and Competition, Munich, DE, Marstallplatz 1, 80539 Munich, Germany,

[email protected]

Karin Hoisl University of Mannheim, Mannheim, DE / Copenhagen Business School, Copenhagen, DK / Max Planck Institute for

Innovation and Competition, Munich, DE, L5, 4, 68161 Mannheim, Germany, [email protected]

Abstract

We investigate whether and why performance differences exist between contestants with the same abilities

but who compete against more skilled or less skilled contestants. We analyze 1,677 unique coders

competing in 38 software algorithm competitions with random assignment. Part of these coders compete

amongst the top-performers of a low-ability group, the others compete amongst the bottom-performers of

a high-ability group. We compare the performance of the coders competing in the two groups using a

Regression Discontinuity Design (RDD) and investigate to what extent the effort exerted by the coders

can explain performance differentials. We find that bottom-performers of a high-ability group are

characterized by a performance that is 17% lower than that of coders who have the same ability-level but

compete as top-performers of a low-ability group. However, a decrease in effort cannot explain the

performance differentials we observe. Instead, we find that psychological factors like choking under

pressure and a rational decision to take higher risks hamper the problem-solving behavior of the

contestants under pressure. Our paper contributes to the literature on performance in contests by providing

new and causal evidence of the mechanisms causing performance differentials.

Keywords:

Contests; tournaments; pressure; performance differentials; mechanisms; risk-taking; coking under

pressure; rationality; behavioral factors

Page 4: Abstract - DRUID · 2019-09-03 · These topcoder contests allow observing contestants’ ability and performance based on objective measures. As mentioned above, we study software

2

1 Introduction

In the present study, we investigate whether and why performance differences exist between contestants

with the same abilities but who compete against more skilled or less skilled contestants1. Performance in

contests is a function of the ability and the effort of the contestants. Whereas the ability of the individual

contestant is exogenous, the effort can be influenced by the design of a contest. Design configurations that

have been studied, so far, include the structure and level of prizes (Ehrenberg and Bognanno 1990, Orszag

1994), the number of contestants (Boudreau et al. 2016, Garcia and Tor 2009), or the composition of the

group of contestants (Brown 2011, Konrad 2009, Tanaka and Ishino 2012).

Recently, ability-configurations of groups of contestants have attracted notable attention of economists

and management scholars. Competing against contestants with a similar ability provides incentives to

maximize effort in order to win a prize. Competing against contestants with a higher ability, on the

contrary, was shown to decrease the performance of the lower-ability contestant (Casas-Arce and

Martinez-Jerez 2009). The negative relationship is even more pronounced in tournaments that contain

“star performers”, since contestants reduce their effort, because they assume that it will not suffice to

overcome the ability gap between them and the other contestants (Lalleman et al. 2008, Brown 2011).

Competing against contestants with a lower ability should also lead to a reduction in effort caused by a

feeling of superiority to the other contestants (Brown 2011, Tanaka and Ishino 2012).

We compare contestants characterized by the same abilities. Part of them competes against contestants

who are more skilled, which represents our treatment of higher competitive pressure; the rest competes

against contestants who are less skilled. In case all contestants acted rationally, we expect all contestants

to reduce their efforts put into the task. This should result in a similar performance of all of them. Our

results, however, reveal that individuals who compete against less skilled contestants outperform

individuals who compete against more skilled contestants. One possible explanation could be that only the

latter contestants reduced their effort even more than the contestants in the other group. A closer look at

the problem-solving behavior of the two groups of contestants, however, shows that a decrease in effort

cannot explain the performance differentials we observe.

Since standard economic theory cannot explain our results, we investigate two alternative explanations.

First, these results may be explained by various psychological factors. Competing against contestants with

a higher ability may, for instance, lead to intimidation or choking under pressure (Baumeister 1984,

Baumeister and Showers 1986, DeCaro et al. 2011). Competing against contestants with a lower ability

may result in a reduction in carefulness (Barrick and Mount, 1991, Hurtz and Donovan 2000) or fear to

make a fool of oneself (Baumeister 1984, Schlenker 1980, Zajonc 1965). Additionally, efficacy

expectations may play a role (Bandura 1977). Second, our findings may be the outcome of rational

1 Within this paper we understand contests as rank-ordered tournaments, where the ordinal rank of output

determines a contest’s compensation (Lazear and Rosen 1981).

Page 5: Abstract - DRUID · 2019-09-03 · These topcoder contests allow observing contestants’ ability and performance based on objective measures. As mentioned above, we study software

3

behavior – even though not related to effort. In particular, contestants may be willing to take higher risks

in case they think that the risk can considerably increase their probability of winning (Konrad and

Lommerud 1993, Buser 2016). Interviews conducted with nine experienced contestants, asked about their

perception of competition as well as their problem-solving behavior and strategies when competing in

crowdsourcing contests confirmed that psychological as well as rational factors affect their performance.

We base our predictions about whether psychological or rational factors can explain our findings on prior

literature from contest theory, behavioral theory, psychology, and the literature on innovation and

crowdsourcing contests. We use data on crowdsourcing contests hosted on the topcoder platform, which

was created in 2001. Today, it has more than 1 million expert members who compete in design,

development and data science challenges. We analyze 38 software algorithm competitions, so-called

Single Round Matches (SRMs), organized between August 2001 and February 2002. SRMs are timed

contests where contestants compete online. In our sample, all contestants solve the same set of three

problems with increasing complexity (problem 1 = simple, problem 2 = medium difficult, problem 3 =

difficult) under the same time constraints. These topcoder contests allow observing contestants’ ability

and performance based on objective measures.

As mentioned above, we study software coders with equal abilities. Part of these coders compete amongst

the top-performers of a low-ability group (control group), the others compete amongst the bottom-

performers of a high-ability group (treatment group). Thus, competitive pressure is higher for our

treatment group. In a first step, we compare the performance of the coders competing in the two groups

using a Regression Discontinuity Design (RDD) and investigate to what extent the effort exerted by the

coders can explain performance differentials. In a second step, we further increase competitive pressure

and investigate the effect of a larger number of competitors, additional to our baseline treatment of

competing in a high-ability group. Our results show that bottom-performers of a high-ability group are

characterized by a performance that is 61 points lower than that of coders who have the same ability-level

but compete as top-performers of a low-ability group. The performance differential between the two

groups equals 17%. Hence, the effect is not only statistically but also economically significant. Once,

pressure is further increased by increasing the number of contestants, the difference between the treatment

and control group increases to 93 points, which equals a performance differential amounting to 25%.

To investigate psychological factors like intimidation or carefulness and rational factors like the

willingness to take risks, we analyze the choice of tasks (tasks vary in the level of difficulty), the speed of

problem solving, the problem-solving experience, and the mistakes the contestants make. We find that the

bottom-performers of a high-ability group tend to try to solve more difficult problems than the top-

performers of a low-ability group, either because they are aware of the fact that they can only win in case

they take more risks or to signal their abilities. Moreover, these contestants also make more mistakes at

the easy and medium problems which indicate lower carefulness or choking under pressure. Robustness

Page 6: Abstract - DRUID · 2019-09-03 · These topcoder contests allow observing contestants’ ability and performance based on objective measures. As mentioned above, we study software

4

checks show that “rating diving”, i.e. a deliberately drop back into the low-ability group to increase the

chance of winning in a subsequent contest is not a relevant concern. In sum, we find first evidence that

both, psychological and rational factors explain performance differentials of equally skilled contestants

that compete in groups of contestants characterized by different ability-configurations.

Our paper contributes to the literature on performance in contests by providing new and causal evidence

of the mechanisms causing performance differentials. We add to the literature on behavioral factors to

explain performance differentials in contests. The project also contributes to the literature on the optimal

design of crowdsourcing contests, since our study uncovers factors that potentially lead to failures of

contest-based knowledge sourcing in the innovation process.

2 Theoretical Framework

The focus of contest theory is relative performance evaluation. Individuals are rewarded based on their

performance relative to other contestants (Knoeber and Tsoulouhas 2013). The theory was originally

developed by Lazear and Rosen (1981) to design optimal labor contracts based on differences in

individual productivity. Since the early 1980s, contest theory has been applied to various fields such as

sports (Bothner et al. 2007), law (Anabtawi 2005), and research and development (Dechenaux et al. 2015).

The interest derives from the fact that contests incentivize individuals to exert higher effort than in non-

competitive environments.

Contest models assume that individuals exert efforts while competing for a prize (Boudreau et al. 2016).

The probability of winning the prize of equally skilled individuals depends on the efforts of all

contestants. In particular, it equals the ratio of the effort of an individual contestant to the sum of all

contestants’ efforts. Consequently, it increases with the individual’s own effort and decreases with the

effort of other contestants (Tullock 1980). The expected payoff of an individual equals the expected

benefit, i.e. the probability of winning the prize, times the prize value minus the costs of effort. Whereas

the payoff (and consequently the effort) increases with the value of the prize, it decreases with the number

of contestants (Skaperdas 1996, Sheremeta 2014, Boudreau et al. 2016). In case we assumed contestants

with equal abilities and kept the number of contestants and the prize level constant, rational individuals

should exhibit the same amount of effort, thus having a similar performance and win with the same

probability.

Whereas contests are most effective in case all contestants have similar abilities, they are less effective if

contestants have heterogeneous abilities. Competing against contestants with a higher ability decreases the

effort of the lower-ability contestants, since the latter assume that they will not overcome the ability gap

(Lalleman et al. 2009, Brown 2011). Competing against contestants with a lower ability should also lead

to a reduction in effort caused by a feeling of superiority to the other contestants (Brown 2011, Tanaka

Page 7: Abstract - DRUID · 2019-09-03 · These topcoder contests allow observing contestants’ ability and performance based on objective measures. As mentioned above, we study software

5

and Ishino 2012). An extreme situation is competing against star performers, i.e. individuals who

consistently show a superior performance relative to other contestants (Rosen 1981). Economic theory

suggests that star performers adversely affect the incentives to exert effort in contests. Lallemand et al.

(2009) and Brown (2011) show that lower-ranked professional tennis and golf players underperform in

matches characterized by heterogeneity of the contestants’ abilities. The effect is even larger in case a star

performer is among the contestants.

However, what if performance differentials of heterogeneous groups cannot be explained by differentials

in the effort put into the task? The literature provides two possible alternative explanations: psychological

factors and a rational decision to take higher risks. Psychological factors can manifest in various forms.

First, intimidation or choking under pressure might explain lower performance (Riley 2012). Situational

pressure may cause individuals to perform below their abilities despite incentives to put effort into a task

(Baumeister 1984, Baumeister and Showers 1986, DeCaro et al. 2011). Distraction theories provide an

explanation for this phenomenon. According to these theories choking occurs because of information

overload or because individuals focus on task-irrelevant cues like worry. Whereas the former leads to a

failure to adequately concentrate on the task, the latter results in neglecting critical characteristics of the

task (Morris and Liebert 1969, Kahnemann 1973, Baumeister and Schowers 1986). Both types of

distractions can result in errors. This is confirmed by Boudreau et al. (2012) who find that errors in logic

are the negative response to an increase in competition in contests. A possible reaction of contestants,

once under pressure, may be to increase carefulness (conscientiousness) to avoid mistakes. Whereas

carefulness, in general, positively affects performance (Barrick and Mount, 1991, Hurtz and Donovan

2000), it might turn into negative in case tasks have to be performed under time constraints. The latter is

typically the case in contests.

A contestant’s performance may also be influenced by her expectancy of success or failure based on

earlier experience. Contestants who believe that they can win are more likely to win than contestants who

are in doubt about their own abilities. Bandura (1977) refers to these believes as “efficacy expectations”.

A possible explanation of the higher likelihood of winning is that positive expectancies balance the

negative effects of pressure (Carver et al. 1979).

Finally, the fear to make a fool of oneself may also affect the performance of contestants. It has been

shown that audience causes performers to be concerned (Baumeister 1984, Schlenker 1980). Whereas

audience can have a positive effect on performance in case individuals perform a well-known task, the

effect is negative in case of a poorly known task (Zajonc 1965). In a contest where each contestant is

aware of her competitors, their ability, and their live-performance during the contest, competitors can be

considered the audience. The expected effect of audience is negative, since, in contests, tasks are poorly

known and have to be performed under time constraints. The negative effect of audience should be even

more pronounced if the same group of contestants repeatedly competes against each other. In particular, it

Page 8: Abstract - DRUID · 2019-09-03 · These topcoder contests allow observing contestants’ ability and performance based on objective measures. As mentioned above, we study software

6

can be assumed that the performance of known contestants attracts more attention than the performance of

unknown contestants.

A rational explanation for performance differentials of contestants with the same ability competing against

better or worse contestants may be their risk-taking propensity. Individuals in contests may be willing to

take higher risks in case they think that the risk can considerably increase their probability of winning the

prize or in case they feel that they have nothing to lose (Konrad and Lommerud 1993, Buser 2016,

Mueller-Langer and Andreoli-Versbach 2017). Genakos and Pagliero (2012) study weightlifting

competitions and show that contestants that are ranked behind the leader take higher risks. Chevalier and

Ellison (1997) find that mutual funds adapt the riskiness of their portfolio depending on the mid-year

performance, increasing the fund volatility in case the mid-year performance is lagging behind. Literature

from psychology shows that individuals in a negative affective state, for instance, caused by the fact that

contestants expect to lose, tend to seek higher risks than those in a positive affective state (Isen and Geva

1987, Mittal and Ross 1998). Furthermore, individuals who are at risk of failing tend to try everything to

avoid failure – taking a higher risk in a contest may be one way of trying to avoid the shame experienced

by failing (Elliot and Thrash 2004, Elliot and Church 1997).

3 Empirical Context – Topcoder and Algorithm Contests

To answer our research question, we use crowdsourcing data. topcoder is an algorithm, development and

design platform, specialized in online programming contests. The platform was founded in 2001 by Jack

Hughes and Mike Lydon2. Today, topcoder has more than 1 million expert members who compete in

challenges. In total, topcoder hosts more than 7,000 contests every year. It is thus, the “world’s largest

community of competitive designers, developers, and data scientists”3. topcoder hosts different types of

competitions, amongst them algorithmic contests, software design contests, or graphic design contests.

We focus on weekly so-called algorithm contests, i.e. timed contests where all contestants compete online

and are given a set of three problems to solve under time constraints. Individual contests are called “Single

Round Matches” (SRMs).4 Coders must register for SRMs and are assigned randomly to so-called virtual

rooms, i.e. groups of up to 8 coders, who compete against each other. SRMs are split into two divisions.

Division 1 (D1) contains coders with medium to high skill levels; Division 2 (D2) comprises coders with

low to medium skill levels. To assign the coders to D1 and D2, the individual skill ratings are used.

topcoder has developed its own rating system based on an Elo rating, which compares the predicted future

rank based on past performance with the performance of all contestants. The cutoff value, i.e. the

2 See https://www.crunchbase.com/organization/topcoder, accessed on November 26, 2017.

3 See https://herox.com/topcoder, accessed on November 26, 2017.

4 In the following we refer to the setting and information environment of our specific sample in 2001 and 2002.

topcoder has changed the rules of SRMs several times over the last years.

Page 9: Abstract - DRUID · 2019-09-03 · These topcoder contests allow observing contestants’ ability and performance based on objective measures. As mentioned above, we study software

7

threshold that determines whether a coder competes in D1 or D2 is unknown until shortly before the start

of the contest but is around 1,489 in our setting. Figure 1 summarizes the composition of SRMs.5

[Insert Figure 1 about here]

Each SRM consists of three phases, the coding phase, the challenge phase, and the system testing phase.

During the coding phase, the contestants must solve three problems characterized by an increasing level of

difficulty. Problem complexities are represented by the maximum reachable points, which are 250 points

for the easy problem (problem 1), 500 points for the medium difficult problem (problem 2) and 1000

points for the difficult problem (problem 3). The algorithmic problems require logical and structural

thinking in order to convert a certain task into a working computer solution. As soon as a contestant

selects and opens a problem, the achievable score (submission points) for that problem begins to decrease.

Thus, the number of points depends on the time elapsed since opening the problem statement and

submitting a solution. For each solution that successfully compiles (= source code that can be transferred

into an executable program), the contestants get the submission points for that problem. If the solution

does not pass the challenge or system testing phase, the contestant loses all submission points for that

specific problem. Hence, the attained score per problem is a function of the correctness and speed of the

respective solution. In total, the coding phase, i.e. the time to solve all three problems, lasts between 60

and 90 minutes (75 minutes in most of the cases).

The challenge phase takes 15 minutes. During this time, the contestants have the possibility to challenge

the functionality of the solutions of their contestants in the same room. In case any of the coders finds an

error in one of the contestants’ code, the contestant who submitted the erroneous code loses all submission

points earned for that specific problem at the end of the coding phase. The successful challenger gets a 50-

point reward. Contrary, if a challenge is unsuccessful, the contestant having made the challenge will lose

50 points from her score and the score of the challenged solution remains unchanged.

Submitted code, not yet successfully challenged, must pass through the system testing phase. In case the

topcoder system test finds an error in the code, the respective coder will, again, lose all her points for that

problem originally earned at the end of the coding phase. Successful challenges from the challenge phase

of both divisions are added to the system test so that all contestants across divisions are treated equally at

the end of the system testing phase.6

Generally, we distinguish between three possible outcomes: (1) opened problem (the contestant opened

the problem description), (2) submitted problem (the contestant submitted a solution for the problem), (3)

passed challenge and system test phase (the submitted solution passed both, the challenge and system test

phase). Only if the solution passed the third step, the contestant receives points and has the chance to win

a price. The final points awarded to a contestant equal the sum of the points received for successful

5 See https://apps.topcoder.com/forums/%3bjsessionid=69D6B19B12B97C75CB96F247E3679D15?module=

Thread&threadID=685368&start=0&mc=5#1306267, https://www.topcoder.com/community/how-it-works/,

accessed on November 26, 2017. 6 See https://help.topcoder.com/hc/en-us/articles/115006162527-SRM-Overview, accessed on November 26, 2017.

Page 10: Abstract - DRUID · 2019-09-03 · These topcoder contests allow observing contestants’ ability and performance based on objective measures. As mentioned above, we study software

8

submissions (outcome 3) and the additional points earned for successful challenges of solutions of other

coders or points lost for unsuccessful challenges.

Coders are motivated to take part in an SRM by extrinsic rewards (prize money) and intrinsic rewards

(e.g., reputation). Generally, the top three performers in a room earn money. In D1, the winner is awarded

300 USD, the second place 150 USD, the third place 75 USD. In D2, the winner gets 150 USD and the

second and third place get 75 USD and 25 USD7, respectively. All other coders do not get any price

money. The intrinsic motives of the coders are in line with motives of open source programmers that have

repeatedly been discussed in the literature (e.g., Lakhani and Wolf 2005) and were confirmed by our

interviewees: signaling in the job market, community recognition, improvement of programming skills,

and fun.

After assignment to the different rooms, the contestants have access to information about their contestants.

They can view the profiles of the other coders in the room containing their programming skills, the time

since joining topcoder, and their skill rating. The coders’ name on topcoder is color-coded depending on

her skill rating. The top performers are colored in red (skill rating: 2200+), coders with a skill rating

between 1500 and 2199 are colored in yellow, coders with a skill rating between 1200 and 1400 are

colored in blue, and bottom performers (skill rating between 900 and 1199 or below 900) are colored in

green and gray. The color coding allows the coders to rapidly detect the composition of the skill

distribution of contestants and thus, their specific competitive environment.8 Additional live information

about the current state of a contest can be obtained from the “Leader Board”. Coders can click on the

"Leader Board" button on their screen, which then shows the current leader in the specific room.

Additionally, the window contains information about the actual points of the leader. Our interviews

indicate that coders check the group composition of the room that they are randomly assigned to before

the start of the contest.

4 Data and Descriptive Statistics

We focus on SRMs 26-66, organized between August 2001 and February 2002. We restricted the sample

to these SRMs, since, in these contests, coders in D1 and D2 solved the same problems and faced the same

time constraints. The latter ensures comparability of the scores achieved in D1 and D2. We dropped three

SRMs because of a lack of challenging points (SRMs 28 and 33) and total time not specified (SRM 35).

Our final sample consists of 38 SRMs and 1,677 unique coders. Some of the coders repeatedly competed

in SRMs, hence, the regression analyses are based on 10,038 observations, i.e. the achievements of coders

in different SRMs.

7 Since the effort is a function of the expected payoffs, it can be assumed that contestants in D1 put more effort into

solving the tasks. This may result in a higher performance of D1 coders than D2 coders. Hence, our assumption

that both parties exert equal efforts is a conservative assumption. 8 See https://apps.topcoder.com/wiki/display/tc/Algorithm+Competition+Rating+System, accessed on November

26, 2017.

Page 11: Abstract - DRUID · 2019-09-03 · These topcoder contests allow observing contestants’ ability and performance based on objective measures. As mentioned above, we study software

9

Table 1 describes the problem solving behavior of the contestants. Whereas 98% of the contestants opened

problem 1, only 95% of the contestants opened problem 2, and 77% opened problem 3. Once opened, a

possible solution of problem 1 was submitted with a probability of 88%, of problem 2 with a probability

of 59% (if opened), and of problem 3 only with a probability of 25%. In case the contestants submitted a

solution of problem 1, 65% of these solutions were correct. Problem 2 solutions were correct with a

probability of 50% and problem 3 solutions only with a probability of 34%. Contestants who submitted a

solution for problem 1 got on average 129.8 points, contestants who solved and submitted a solution for

problem 2 got on average 151.6 points and, for a solution of problem 3, the contestants got, on average

202.2 points. Again conditional on a submission, the contestants dedicated more time to problem 2 (mean

= 33.4 minutes) and 3 (mean = 33.5 minutes) than to problem 1 (mean = 19.03 minutes).

Table 2 contains descriptive statistics of the variables used for our RDD analysis. Column 1 reports

descriptive statistics of the full sample. Columns 2 and 3 report descriptive statistics by division.

Test and ranking variables

The total number of contestants by SRM varies between 117 and 576 and amounts to 160 on average. The

average number of contestants competing in D1 is significantly smaller than the average number of

contestants competing in D2 (meanD1=31; meanD2=131). As mentioned earlier, in D1, the winner is

awarded 300 USD, the second place 150 USD, the third place 75 USD. In D2, the winner gets 150 USD

and the second and third place get 75 USD and 25 USD. In D1, the contestants got an average amount of

70 USD and D2 contestants an average amount of 32 USD. The maximum skill rating of the contestants in

D1 amounted to 3,111 (mean=1,835). The maximum skill rating in D2 amounted to 1,522 (mean=866).

Dependent variables

On average, the contestants opened 2.7 problems (2.9 in D1, 2.6 in D2). Contestants in D1 on average

submitted more problems than contestants in D2 (2.3 in D1 vs. 1.4 in D2). Contestants in D1 on average

received twice as many submission points (769.0 vs. 339.9). They, however, also lost more points after

other contestants challenged their solutions or the solutions did not pass the system tests.9 D1 contestants

lost on average 267.5 points whereas D2 contestants lost on average 173.6 points. The overall

performance of the contestants, which is defined as the final points excluding all additional points earned

during the challenge phase, amounts to 501 on average for contestants in D1 and to 166 on average for

contestants in D2.10

As expected, all differences turned out to be highly significant.

9 Lost points are defined as the difference between final points (performance of contestant) and submission points

and are the results of erroneous submissions detected at the challenge or system test phase. A high number of lost

points are associated with fast but risky submissions. 10

In our analysis, we exclude all additional points that have been earned during the challenge phase in order to be

able to compare the performance of contestants in D1 and D2. It is obvious (and can been shown) that bottom

performer of the high-ability group will earn less additional challenge points than top performer of the low-ability

group. This difference is excluded in our RDD analysis to compare the quality of the solutions between treatment

and control group.

Page 12: Abstract - DRUID · 2019-09-03 · These topcoder contests allow observing contestants’ ability and performance based on objective measures. As mentioned above, we study software

10

Control variables

To avoid biased results, we control for the experience of the contestants, i.e. the number of earlier SRMs

in which the contestants took part.11

Whereas D1 contestants on average took part in 22 earlier contests,

D2 contestants on average only took part in 9 earlier contests. D1 contestants on average had switched 2.7

times between D1 and D2 (in case the contestants earned no points in a contest where they competed in

D1, they might have reduced their skill rating resulting in a move back to D2; from there, they again had

to work their way up to D1).12

D2 contestants on average are characterized by 0.5 switches between D1

and D2. We further control for the number of contestants per room. Rooms in D1 contain 7.5 contestants

on average, rooms in D2 7.8 contestants on average. This variable will also be used in the later regression

analysis to test whether additional competitive pressure is increasing the performance differentials

between D1 and D2 contestants. Following Boudreau et al. (2016), we use the number of contestants per

room as a proxy for the intensity of competitive pressure. Finally, we accounted for the number of contests

of each contestant in D1 and D2 during which the contestants had not submitted any solutions for any of

the problems.13

This behavior is known as “rating diving” thus deliberately performing worse in D1 to

drop back to D2 and having a higher likelihood to win in the next SRM. Again, all differences between D1

and D2 contestants are highly significant.

[Insert Tables 1 and 2 about here]

5 Methodological Approach

For SRMs, contestants are assigned to two groups according to their skill rating, which is a function of the

individual’s performance in previous topcoder contests. As mentioned earlier, those contestants with a

score above a certain threshold determined by topcoder compete in D1 and those below the threshold

compete in D2. Individuals who have a score varying around the threshold (medium-skilled contestants)

repeatedly move between D1 and D2. It is reasonable to assume that those who just barely pass the

threshold to D1 are comparable to those who just miss out on being assigned to this group, i.e. who

compete in D2. Consequently, the assignment of individuals to the two groups, whose ability score is at

the threshold, can be assumed to be like random.

As indicated above, individuals at the threshold, but assigned to D2, are the top-performers of the low-

ability group and contestants just assigned to the high-ability group (D1) are the bottom-performers of the

11

The variable “number contests participated” is a running variable that includes all SRMs that the contestant had

participated in prior the focal contest. 12

The variable “number switches between Divisions” is a running variable that includes all number of switches

between D1 and D2 prior the focal contest. A switch counts in both directions (from D1 to D2 and from D2 to

D1). 13

The variable “No submission Div1” (“No submission Div2”) is a running variable that counts all SRMs, where

the contestant did not submit any solution for the three problems in D1 (D2). For D1 this is referred as “rating

diving”. For D2 this behavior has no benefit and might reflect inexperience or curiosity of beginners.

Page 13: Abstract - DRUID · 2019-09-03 · These topcoder contests allow observing contestants’ ability and performance based on objective measures. As mentioned above, we study software

11

high-ability group. Hence, peer pressure should be higher for individuals at the threshold but competing in

D1. This higher pressure is defined as the treatment in our setting.

The organization of the contests, i.e. an assignment of the contestants to the two divisions according to a

clear cutoff of the skill rating, allows us to implement a Regression Discontinuity Design (RDD) analysis

to assess the causal effect of competitive pressure on the performance of contestants (Thistlethwaite and

Campbell 1960, Lee 2008). In our case, the RDD is sharp, since the assignment to the two groups is a

deterministic function of the skill rating of the contestants (Jacob et al. 2012). In other words, all

contestants above the skills threshold set by topcoder are assigned to the treatment group (D1) and all

contestants with a skill rating below the threshold are assigned to the control group (D2) (Imbens and

Lemieux 2008).14

Figure 2 shows visual evidence of the RDD comparing the performance of the contestants according to

their skill rating and assigned division, where the rating is centered at the threshold. The functional form

of the RDD is based on a polynomial fit of the order 3 (cubic model) and represents the relationship

between the rating variable and the outcome. We further include an interaction between the rating variable

and the treatment which accounts for the fact that the treatment impacts not only the intercept but also the

slope of the regression line (Jacob et al. 2012). Restricting the analysis to contestants around the threshold,

who are all characterized by the same medium-ability level, shows that contestants in D1 (bottom-

performers of the high-ability group) indeed perform worse than those in D2 (top-performers of the low-

ability group). Whereas Figure 2 shows performance differentials without considering challenge points,

Figure 3 takes the additional challenge points into account. As expected, the latter increases the observable

performance differential.

[Insert Figures 2 and 3 about here]

In a second step, we estimate the following equation using an RDD analysis:

𝑌𝑖 = 𝛼 + 𝛽0𝑇𝑖 + 𝛽1𝑟𝑖 + 𝛽2𝑟𝑖2 + 𝛽3𝑟𝑖

3 + 𝛽4𝑟𝑖𝑇𝑖 + 𝛽5𝑟𝑖2𝑇𝑖 + 𝛽6𝑟𝑖

3𝑇𝑖 +

𝛽7(𝑋𝑖 − �̅�) + 𝛽8(𝑋𝑖 − �̅�)𝑇𝑖 + 𝑉 + 𝜖𝑖

where i refers to the contestant per contest (SRM), 𝛼 is the average value of the outcome for those in the

control group after controlling for the rating variable, 𝑌𝑖 refers to the outcome measure, i.e. the

performance of the contestants for observation i, 𝑇𝑖 is the treatment dummy (i.e. being in D1 with higher

peer pressure equals 1), 𝑟𝑖 is the rating variable for observation i, centered at the cutoff value, 𝑋𝑖 represents

the covariate to test heterogeneous treatment effects (centered at the mean to simplify interpretation), V is

14

Of all 10,038 observations, 10 observations (from 8 different SRMs) should have been assigned to D2, because

they had a skill rating that was lower than the specific threshold, but were assigned to D1. We assume that this

was done to fill up competition rooms in D1. As a robustness check, we excluded these observations with no

effect on the results.

Page 14: Abstract - DRUID · 2019-09-03 · These topcoder contests allow observing contestants’ ability and performance based on objective measures. As mentioned above, we study software

12

a vector including round fixed effects (SRM) and control variables for some specifications, and 𝜖𝑖 is the

error term for observation i.

6 Multivariate Analysis

Tables 3 and 4 show the results of an RDD analysis. In each model, we cluster the standard errors at the

contestant level and control for contest (round) fixed effects. Whereas Table 3 shows the results of RDD

models, in which we estimate the effect of a cubic function of the skill rating with different slopes above

and below the threshold, Table 4 (as a robustness check) shows the results based on a quartic function.

Since the results of both models are very similar in terms of coefficient size and sign as well as level of

significance, in the following, we focus on the outcomes displayed in Table 3. Models 1 and 3 only

contain the treatment. Models 2 and 4 add specific control variables. Models 3 and 4 contain an interaction

between the treatment and the number of contestants competing in a room (i.e. intensity of peer pressure).

Results show that competing against contestants with higher ability (i.e. competing in D1) decreases

performance of the contestants with a skill level around the threshold by 61 points (Model 1). Once we

add the controls, the decrease in performance still amounts to 57 points (Model 2). Increasing competitive

pressure (i.e. an increase in the number of contestants in room) further increases the performance

differential by 24 points (Model 4).

With respect to the control variables, we find that experience, measured by number of earlier contests

participated, has a non-linear effect on performance. The number of contests in which the contestants did

not submit any solutions at all decreases performance by 68 points (no submissions in D1) and 20 points

(no submissions in D2).

[Insert Tables 3 and 4 about here]

Since we are interested in the mechanisms leading to performance differentials of individuals who are

characterized by the same ability level but compete in different competitive environments, we have a

closer look at the problem-solving behavior of the contestants, which is displayed in Figures 4 to 8. It has

to be noted that the following results refer to contestants around the cutoff value, i.e. at the threshold of

being assigned to D1 or D2. Figure 4 shows RDD estimates of the problem solving behavior for problem

1. Figures 5 and 6 show RDD analyses describing the problem solving behavior of the contestants for

problems 2 and 3. Figures 7 and 8 display RDD analyses of the overall problem solving behavior (all three

problems combined) of the contestants.

Figure 4 shows that contestants facing higher peer pressure (having a skill rating around the cutoff but

competing in D1) are characterized by a smaller likelihood of submitting the easy problem (problem 1)

and by lower submission points than contestants facing lower peer pressure (having a skill rating around

the cutoff but competing in D2). They, however, spend more time on easy problems than contestants

facing lower peer pressure and have less easy problems correct and less final points. In other words, they

Page 15: Abstract - DRUID · 2019-09-03 · These topcoder contests allow observing contestants’ ability and performance based on objective measures. As mentioned above, we study software

13

make more mistakes than D2 contestants around the cutoff. These results may provide a first indication for

choking under pressure. A higher willingness to take risks might also be an explanation for what we

observe in the data. More mistakes at easy problems may occur due to less emphasis on the easy problem

and more emphasis on the medium or difficult problem.

Figure 5 focuses on the medium difficult problem (problem 2). Results indicate that contestants facing

higher peer pressure (D1) are characterized by a lower likelihood to open and submit medium difficult

problems than contestants competing in D2. Additionally, D1 contestants earn lower submission points for

medium difficult problems than D2 contestants. D1 contestants are further characterized by a lower

likelihood to have medium difficult problems correct, i.e. they earn less final points. They also spend less

time on submitted solutions. Hence, they get higher submission points (they need less time to submit a

solution). However, they submit less correct solutions than D2 coders. As a result, they lose more points

during the challenge or systems test phase. In sum, the problem solving behavior of contestants under

pressure indicates that they spend less emphasis on the medium problem (lower likelihood to open

problem 2) and solve the medium problems less carefully by submitting fast but erroneous solutions. Our

results may also point at a higher risk taking propensity of D1 contestants compared to D2 contestants. In

particular, D1 coders might submit solutions before they are sure that their solution is correct to save time,

which, in turn, leads to more submission points but lower final points. This indicates high risk

submissions.

The problem solving behavior of contestants solving problem 3 (Figure 6) reveals that contestants who

face higher peer pressure (competing in D1) are characterized by a higher likelihood to submit difficult

problems (conditional on opening the problem). The difference diminishes once competitive pressure

increases even further when considering the heterogeneous treatment effects of an increased number of

competitors in a specific contest room. D1 contestants also spent more time on the most difficult problem

(conditional on opening the problem). Figure 6 also indicates that D1 contestants are more likely to submit

erroneous solutions. Hence, even though D1 contestants dedicate more time to difficult problems (once

opened) compared to D2 contestants, they are more likely to make mistakes. A possible interpretation of

this finding is that D1 contestants take a higher risk. In other words, they try to solve problem 3 even

though they might not have the skills to succeed. This may well be their only option to win a price, i.e. to

beat the better peers.

Figure 7 summarizes the overall problem solving behavior of the contestants. It indicates that contestants

facing high competition (competing in D1) open less problems, submit less problems, spent more time on

solving problems, submit less correct solutions (i.e. make more mistakes), and, consequently, receive less

final points than D2 contestants. These results may point at choking under pressure. However, again, a

higher willingness to take risks might also be an explanation.

Finally, Figure 8 shows that contestants facing high competition (D1) have a higher ratio of time elapsed

by problems opened than D2 contestants. This could mean that they take more time to solve problems.

However, it may also be that they open more than one problem at the same time. They also have a higher

Page 16: Abstract - DRUID · 2019-09-03 · These topcoder contests allow observing contestants’ ability and performance based on objective measures. As mentioned above, we study software

14

ratio of submission points by problems submitted than D2 contestants. This indicates a higher

performance per submission. However, they have a lower ratio of problems correct by problems

submitted. Hence they make more mistakes, either because of choking under pressure or because of taking

higher risks.

[Insert Figures 4 and 8 about here]

7 Implications, Conclusions and Limitations

The aim of our analysis was to investigate whether and why performance differences exist between

contestants with the same abilities but who compete against more skilled or less skilled contestants. We

observe lower performance of D1 contestants at all problems which can be explained by more mistakes,

even at the easy and medium problem. This, in turn, might explain choking under pressure because

contestants around the cutoff should be able to solve these problems correctly, but make more mistakes.

We also find evidence of a higher risk taking propensity of contestants facing higher competitive pressure.

This behavior might be rational in order to overcome the ability gap to higher-skilled contestants.

In sum, the detailed analysis of the problem solving behavior of the contestants sheds some light into what

drives the lower performance of equally skilled contestants that compete in D1 compared D2 contestants

around the cutoff. However, these results still do not suffice to disentangle the different possible

explanations of rational (e.g. higher risk taking) or behavioral (e.g. choking under pressure) mechanisms.

Our study contributes to the literature on performance in contests by providing new and causal evidence

performance differentials that cannot be explained by differences in ability or effort. The results also add

to the literature on the design of crowdsourcing contests by relating different levels of pressure to the risk

taking and problem-solving behavior of the contestants. Furthermore, our results help platforms to

optimally design their (crowdsourcing) contests. In order to disentangle the mechanisms of rational or

behavioral factors that influence the performance differential, a lab or field experiment might be useful.

Page 17: Abstract - DRUID · 2019-09-03 · These topcoder contests allow observing contestants’ ability and performance based on objective measures. As mentioned above, we study software

15

References

Anabtawi, I. (2005). Explaining pay without performance: The tournament alternative. Emory Law

Journal, 54, 1557-1602.

Bandura, A. (1977). Self-efficacy: toward a unifying theory of behavioral change. Psychological review,

84(2), 191.

Baumeister, R. F. (1984). Choking under pressure: self-consciousness and paradoxical effects of

incentives on skillful performance. Journal of Personality and Social Psychology, 46(3), 610.

Baumeister, R.F., Showers, C. J. (1986). A review of paradoxical performance effects: Choking under

pressure in sports and mental tests. European Journal of Social Psychology, 16(4), 361-383.

Barrick, M. R., Mount, M. K. (1991). The Big Five personality dimensions and job performance: A meta

analysis. Personnel Psychology, 44(1), 1-26.

Bothner, M.S., Kang, J., Stuart, T.E. (2007). Competitive crowding and risk taking in a tournament:

Evidence from NASCAR racing. Administrative Science Quarterly, 52, 208-247.

Boudreau, K., Helfat, C.E., Lakhani, K.R., Menietti, M.E. (2012). Field evidence on individual behavior

& performance in rank-order tournaments. Harvard Business School Working Paper # 13-016, August

9, 2012; https://dash.harvard.edu/bitstream/handle/1/9502862/13-016.pdf?sequence=1.

Boudreau, K.J., Lakhani, K.R., Menietti, M. (2016). Performance responses to competition across skill

levels in rank‐order tournaments: field evidence and implications for tournament design. The RAND

Journal of Economics, 47(1), 140-165.

Brown, J. (2011). Quitters Never Win: The (Adverse) Incentive Effect of Competing with Superstars.

Journal of Political Economy, 119, 982-1013.

Buser, T. (2016). The Impact of Losing in a Competition on the Willingness to Seek Further Challenges.

Management Science 62(12), 3439-3449.

Carver, C. S., Blaney, P. H., Scheier, M. F. (1979). Reassertion and giving up: The interactive role of self-

directed attention and outcome expectancy. Journal of Personality and Social Psychology. 37(10),

1859.

Casas-Arce, P., Martínez-Jerez, F. A. (2009). Relative performance compensation, contests, and dynamic

incentives. Management Science, 55(8), 1306-1320.

Chevalier, J., G. Ellison (1997). Risk Taking by Mutual Funds as a Response to Incentives. Journal of

Political Economy 105(6), 1167–1200.

DeCaro, M. S., Thomas, R. D., Albert, N. B., Beilock, S. L. (2011). Choking under pressure: multiple

routes to skill failure. Journal of Experimental Psychology: General, 140(3), 390.

Dechenaux, E., Kovenock, D., Sheremeta, R. M. (2015). A survey of experimental research on contests,

all-pay auctions and tournaments. Experimental Economics, 18(4), 609-669.

Ehrenberg, R.G., Bognanno, M.L. (1990). Do Tournaments Have Incentive Effects? Journal of Political

Economy, 98, 1307-1324.

Elliot, A.J., Church, M.A. (1997). A hierarchical model of approach and avoidance achievement

motivation.” Journal of Personality and Social Psychology, 72(1), 218.

Elliot, A.J, Thrash, T.M. (2004). The intergenerational transmission of fear of failure. Personality and

Social Psychology Bulletin, 30(8), 957–971.

Garcia, S.M., Tor, A. (2009). The N-Effect: More Competitors, Less Competition. Psychological Science,

20, 871-877.

Page 18: Abstract - DRUID · 2019-09-03 · These topcoder contests allow observing contestants’ ability and performance based on objective measures. As mentioned above, we study software

16

Genakos, C., M. Pagliero (2012), Interim Rank, Risk Taking and Performance in Dynamic Tournaments.

Journal of Political Economy 120(4), 782–813.

Hurtz, G. M., Donovan, J. J. (2000). Personality and job performance: The Big Five revisited. Journal of

Applied Psychology, 85(6), 869-879.

Imbens, G. W., Lemieux, T. (2008). Regression discontinuity designs: A guide to practice. Journal of

econometrics, 142(2), 615-635.

Isen, A. M., Geva, N. (1987). The influence of positive affect on acceptable level of risk: The person with

a large canoe has a large worry. Organizational Behavior and Human Decision Processes, 39(2), 145-

154.

Jacob, R., Zhu, P., Somers, M. A., Bloom, H. (2012). A Practical Guide to Regression Discontinuity.

MDRC.

Kahnemann, D. (1973). Attention and Effort, Prentice Hall, Englewood Cliffs, NJ.

Knoeber, C.R., Tsoulouhas, T. (2013). Introduction to the Special Issue on Tournaments and Contests.

International Journal of Industrial Organization, 31(3), 195-197.

Konrad, K.A. (2009). Strategy and Dynamics in Contests. Oxford, UK: Oxford University Press.

Konrad, K.A., Lommerud, K.E. (1993). Relative standing comparisons, risk taking, and safety regulations.

Journal of Public Economics, 51(3), 345-358.

Lallemand, T., Plasman, R., Rycx, F. (2008). Women and competition in elimination tournaments:

evidence from professional tennis data. Journal of Sports Economics, 9(1), 3-19.

Lazear, E. P., Rosen, S. (1981). Rank-Order Tournaments as Optimum Labor Contracts. The Journal of

Political Economy, 89(5), 841-864

Lee (2008). Randomized Experiments from Non-random Selection in U.S. House Elections, Journal of

Econometrics, 142(2): 675-697.

Mittal, V., Ross, W. T. (1998). The impact of positive and negative affect and issue framing on issue

interpretation and risk taking. Organizational Behavior and Human Decision Processes, 76(3), 298-

324.

Morris, L.W., Liebert, R.M. (1969). Effects of anxiety on timed and untimed intelligence tests. Journal of

Consulting and Clinical Psychology, 33:240-244.

Mueller-Langer, F, Andreoli-Versbach, P (2017). Leading-effect, risk-taking and sabotage in two-stage

tournaments: Evidence from a natural experiment. Journal of Economics and Statistics, 237(1): 1-28.

Orszag, J.M. (1994). A New Look at Incentive Effects and Golf Tournaments. Economics Letters, 46, 77-

88.

Riley, D., 2012. New tiger, old stripes. Gentlemen's Quarterly.

Rosen, S. (1981). The Economics of Superstars. American Economic Review, 71(5): 845-858.

Schlenker, B. R. (1980). Impression management: The self-concept, social identity, and interpersonal

relations. Monterey: Brooks/Cole.

Sheremeta, R.M. (2014). Behavior in Contests. MPRA Paper No. 57451, July 21, 2014,

http://mpra.ub.uni-muenchen.de/57451.

Skaperdas, S. (1996). Contest success functions. Economic Theory, 7 (2), 283-290.

Tanaka, R., Ishino, K. (2012). Testing the Incentive Effects in Tournaments with a Superstar. Journal of

the Japanese and International Economies, 26, 393-404.

Page 19: Abstract - DRUID · 2019-09-03 · These topcoder contests allow observing contestants’ ability and performance based on objective measures. As mentioned above, we study software

17

Thistlethwaite, D., Campbell, D.T. (1960). Regression-Discontinuity Analysis: An alternative to the ex

post facto experiment. Journal of Educational Psychology, 51(6): 309-317.

Tullock, G. (1980). Efficient rent seeking, in J.M. Buchanan, R.D. Tollison, G. Tullock (Eds): Towards a

Theory of the Rent-Seeking Society, College Station, TX: Texas A&M University Press: 97-112.

Zajonc, R. B. (1965). Social facilitation. Science, 149(3681), 269-274.

Page 20: Abstract - DRUID · 2019-09-03 · These topcoder contests allow observing contestants’ ability and performance based on objective measures. As mentioned above, we study software

18

TABLES AND FIGURES

Notation Obs. Mean St.D. Min. Max.

Problem level one (easy problem)

P1 opened 10038 0.98 0 1

P1 submitted 10038 0.87 0 1

P1 correct 10038 0.57 0 1

P1 submission points 10038 163.56 75.92 0 250

P1 final points 10038 112.90 103.00 0 250

P1 lost points 10038 -50.66 81.21 -250 0

P1 time elapsed 10038 25.93 26.38 0 135

P1 submitted if opened 9864 0.88 0 1

P1 correct if submitted 8729 0.65 0 1

P1 submission points if opened 9864 166.45 73.38 0 250

P1 final points if submitted 8729 129.83 100.01 0 250

P1 lost points if submitted 8729 -58.26 84.51 -250 0

P1 time elapsed if submitted 8729 19.03 15.27 0 85

Problem level two (medium problem)

P2 opened 10038 0.95 0 1

P2 submitted 10038 0.56 0 1

P2 correct 10038 0.28 0 1

P2 submission points 10038 158.61 151.72 0 499

P2 final points 10038 84.95 142.37 0 490

P2 lost points 10038 -73.65 123.54 -499 0

P2 time elapsed 10038 44.56 25.72 0 135

P2 submitted if opened 9531 0.59 0 1

P2 correct if submitted 5622 0.50 0 1

P2 submission points if opened 9531 167.04 151.10 0 499

P2 final points if submitted 5622 151.68 161.47 0 490

P2 lost points if submitted 5622 -131.51 140.16 -499 0

P2 time elapsed if submitted 5622 33.43 15.97 1 86

Problem level three (difficult problem)

P3 opened 10038 0.77 0 1

P3 submitted 10038 0.19 0 1

P3 correct 10038 0.07 0 1

P3 submission points 10038 108.17 231.56 0 999

P3 final points 10038 39.06 152.85 0 972

P3 lost points 10038 -69.11 188.83 -999 0

P3 time elapsed 10038 34.87 27.73 0 135

P3 submitted if opened 7771 0.25 0 1

P3 correct if submitted 1939 0.34 0 1

P3 submission points if opened 7771 139.72 254.67 0 999

P3 final points if submitted 1939 202.22 296.61 0 972

P3 lost points if submitted 1939 -357.75 285.24 -999 0

P3 time elapsed if submitted 1939 33.53 14.94 1 85

Table 1: Descriptive Statistics (part 1)

Page 21: Abstract - DRUID · 2019-09-03 · These topcoder contests allow observing contestants’ ability and performance based on objective measures. As mentioned above, we study software

19

*** p<0.01, ** p<0.05, * p<0.1

Table 2: Descriptive Statistics (part 2)

t-TestN

Mean Median Min Max Mean Median Min Max Mean Median Min Max

Contest variables

Number contestants in contest 159.7 133 117 576 30.7 28 31 99 130.8 108 81 477 ***

Paid prize 40.2 0 0 300 70.3 0 0 300 32.2 0 0 150 ***

Ranking variables

Skill rating 1,070.3 1,097 0 3,111 1,835.3 1,733 1,463 3,111 865.9 972 0 1,522 ***

Skill rating (standardized) -417.7 -391 -1,522 1,621 347.9 245 -23 1,621 -622.2 -515 -1,522 0 ***

Dependent variables

Problems opened 2.7 3 1 3 2.9 3 1 3 2.6 3 1 3 ***

Problems submitted 1.6 2 0 3 2.3 2 0 3 1.4 1 0 3 ***

Time elapsed 105.4 95 0 400 82.4 79 0 322 111.5 95 0 400 ***

Submission points 430.3 391 0 1,636 769.0 635 0 1,636 339.9 239 0 1,625 ***

Problems correct 0.9 1 0 3 1.6 2 0 3 0.7 1 0 3 ***

Performance of contestant 236.9 185 0 1,635 501.5 465 0 1,635 166.3 141 0 1,508 ***

Lost points -193.4 -124 -1,625 0 -267.5 -206 -1,552 0 -173.6 -113 -1,625 0 ***

Control variables

Number contests participated 11.6 7 1 61 22.3 21 2 61 8.8 5 1 57 ***

Number switches between Divisions 0.9 0 0 22 2.7 1 0 21 0.5 0 0 22 ***

Number competitors in room 7.7 8 4 8 7.5 8 6 8 7.8 8 4 8 ***

No submission Div1 0.1 0 0 8 0.1 0 0 8 0.0 0 0 8 ***

No submission Div2 0.5 0 0 15 0.1 0 0 2 0.6 0 0 15 ***

Note N

: Column shows the t-Test on differences in means

Overall sample (Ntotal = 10.038) Division 1 (NDiv1 = 2,116) Division 2 (NDiv2 = 7,922)

Page 22: Abstract - DRUID · 2019-09-03 · These topcoder contests allow observing contestants’ ability and performance based on objective measures. As mentioned above, we study software

20

Performance of contestants (in points)

(Model 1) (Model 2) (Model 3) (Model 4)

VARIABLES

RDD RDD

with controls

RDD

with interaction

RDD with

interaction and

controls

Treatment -60.98*** -57.01*** -66.89*** -62.06***

(18.20) (17.59) (18.23) (17.68)

Treatment X number of competitors -26.41** -23.79*

in contest room (centered) (12.30) (12.20)

Number of competitors -7.65* 0.05 -1.07

in contest room (centered) (4.43) (4.40) (4.38)

Number of contests participated 12.42*** 12.33***

(1.66) (1.66)

Sq(number of contests participated) -0.11*** -0.11***

(0.03) (0.03)

ln(number of switches between Divisions) -0.49 -0.51

(12.87) (12.85)

Number of contests with no submissions -68.44*** -68.41***

in Division 1 (high-ability group) (22.24) (22.18)

Number of contests with no submissions -19.83*** -19.84***

in Division 2 (low-ability group) (4.42) (4.39)

Constant 289.87*** 275.60*** 287.51*** 276.99***

(39.34) (38.68) (39.47) (38.55)

Round FE Yes Yes Yes Yes

Observations 10,038 10,038 10,038 10,038

R-squared 0.23 0.24 0.23 0.24

Number of contests 38 38 38 38

Note: The table reports sharp RD estimates using a cubic functional form with interaction effects to account for

different slopes around the cutoff; Model 3 and Model 4 further include heterogeneous treatment effects of the

number of competitors in a contest room, centered around the mean; all models include round fixed effects.

Treatment refers to the main variable of interest which is constructed as a dummy variable equal 1 if skill rating is

higher than the cutoff thus indicating if a contestant is competing in a high ability group with intense competitive

pressure. Robust standard errors are clustered at the contestant level; *** p<0.01, ** p<0.05, * p<0.1.

Table 3: Regression Discontinuity Design (cubic function)

Page 23: Abstract - DRUID · 2019-09-03 · These topcoder contests allow observing contestants’ ability and performance based on objective measures. As mentioned above, we study software

21

Performance of contestants (in points)

(Model 1) (Model 2)

(Model 3) (Model 4)

VARIABLES

RDD RDD

with controls

RDD

with interaction

RDD with

interaction and

controls

Treatment -74.01*** -69.31*** -77.97*** -73.22***

(20.90) (20.09) (20.93) (20.11)

Treatment X number of competitors -25.46** -23.03*

in contest room (centered) (12.42) (12.31)

Number of competitors -7.35* 0.08 -1.01

in contest room (centered) (4.46) (4.39) (4.37)

Number of contests participated 12.47*** 12.39***

(1.67) (1.67)

Sq(number of contests participated) -0.11*** -0.11***

(0.03) (0.03)

ln(number of switches between Divisions) -0.97 -1.00

(13.00) (12.97)

Number of contests with no submissions -67.74*** -67.79***

in Division 1 (high-ability group) (22.30) (22.23)

Number of contests with no submissions -19.80*** -19.81***

in Division 2 (low-ability group) (4.40) (4.37)

Constant 291.77*** 279.10*** 289.40*** 280.54***

(40.28) (39.68) (40.41) (39.55)

Round FE Yes Yes Yes Yes

Observations 10,038 10,038 10,038 10,038

R-squared 0.23 0.24 0.23 0.24

Number of contests 38 38 38 38

Note: The table reports sharp RD estimates using a quartic functional form with interaction effects to account for

different slopes around the cutoff; Model 3 and Model 4 further include heterogeneous treatment effects of the

number of competitors in a contest room, centered around the mean; all models include round fixed effects. Treatment

refers to the main variable of interest which is constructed as a dummy variable equal 1 if skill rating is higher than

the cutoff thus indicating if a contestant is competing in a high ability group with intense competitive pressure. Robust

standard errors are clustered at the contestant level; *** p<0.01, ** p<0.05, * p<0.1.

Table 4: Regression Discontinuity Design (quartic function)

Page 24: Abstract - DRUID · 2019-09-03 · These topcoder contests allow observing contestants’ ability and performance based on objective measures. As mentioned above, we study software

22

Figure 1: Composition of SRMs

Figure 2: Performance of contestants (RDD descriptive results)

0

500

1000

1500

Per

form

ance

of

con

test

ants

(in

po

ints

)

-1500 -1000 -500 0 500 1000 1500Skill rating (standardized)

Sample average within bin

Polynomial fit of order 3

Room 2 Room 4

Room 1 Room 3

Skill rating

Division 2 (low-ability group)

Division 1 (high-ability group)

Assignment to two groups (= Divisions) based on skill rating

Assignment to different rooms at random

Cutoff Value

Page 25: Abstract - DRUID · 2019-09-03 · These topcoder contests allow observing contestants’ ability and performance based on objective measures. As mentioned above, we study software

23

Figure 3: Performance of contestants with additional challenge points (RDD descriptive results)

0

500

1000

1500

Per

form

ance

of

con

test

ants

w/

chal

len

ge p

oin

ts

-1500 -1000 -500 0 500 1000 1500Skill rating (standardized)

Sample average within bin

Polynomial fit of order 3

Page 26: Abstract - DRUID · 2019-09-03 · These topcoder contests allow observing contestants’ ability and performance based on objective measures. As mentioned above, we study software

24

Note: This figure reports the coefficients of a sharp RDD using a cubic model with interactions and heterogeneous

treatment effects with the number of competitors in a specific contest room. The upper part of the figure up to the

dashed line represents RDD analysis of the overall population of 10,038. Below the dashed line, the RDD estimates

are conditional on the previous stage of the problem (1st stage: problem opened, 2

nd stage: solution for problem

submitted, 3rd

stage: submitted solution is correct).

Standard errors (in parentheses) are clustered at the contestant level and all models include covariates and round

fixed effects; *** p<0.01, ** p<0.05, * p<0.1.

Figure 4: RDD Analysis of problem solving behavior for problem 1

Page 27: Abstract - DRUID · 2019-09-03 · These topcoder contests allow observing contestants’ ability and performance based on objective measures. As mentioned above, we study software

25

Note: This figure reports the coefficients of a sharp RDD using a cubic model with interactions and heterogeneous

treatment effects with the number of competitors in a specific contest room. The upper part of the figure up to the

dashed line represents RDD analysis of the overall population of 10,038. Below the dashed line, the RDD estimates

are conditional on the previous stage of the problem (1st stage: problem opened, 2

nd stage: solution for problem

submitted, 3rd

stage: submitted solution is correct).

Standard errors (in parentheses) are clustered at the contestant level and all models include covariates and round

fixed effects; *** p<0.01, ** p<0.05, * p<0.1.

Figure 5: RDD Analysis of problem solving behavior for problem 2

Page 28: Abstract - DRUID · 2019-09-03 · These topcoder contests allow observing contestants’ ability and performance based on objective measures. As mentioned above, we study software

26

Note: This figure reports the coefficients of a sharp RDD using a cubic model with interactions and heterogeneous

treatment effects with the number of competitors in a specific contest room. The upper part of the figure up to the

dashed line represents RDD analysis of the overall population of 10,038. Below the dashed line, the RDD estimates

are conditional on the previous stage of the problem (1st stage: problem opened, 2

nd stage: solution for problem

submitted, 3rd

stage: submitted solution is correct).

Standard errors (in parentheses) are clustered at the contestant level and all models include covariates and round

fixed effects; *** p<0.01, ** p<0.05, * p<0.1.

Figure 6: RDD Analysis of problem solving behavior for problem 3

Page 29: Abstract - DRUID · 2019-09-03 · These topcoder contests allow observing contestants’ ability and performance based on objective measures. As mentioned above, we study software

27

Note: This figure reports the coefficients of a sharp RDD using a cubic model with interactions and heterogeneous

treatment effects with the number of competitors in a specific contest room. The upper part of the figure up to the

dashed line represents RDD analysis of the overall population of 10,038. Below the dashed line, the RDD estimates

are conditional on the previous stage of the problem (1st stage: problem opened, 2

nd stage: solution for problem

submitted, 3rd

stage: submitted solution is correct).

Standard errors (in parentheses) are clustered at the contestant level and all models include covariates and round

fixed effects; *** p<0.01, ** p<0.05, * p<0.1.

Figure 7: RDD Analysis of overall problem solving behavior (part 1)

Page 30: Abstract - DRUID · 2019-09-03 · These topcoder contests allow observing contestants’ ability and performance based on objective measures. As mentioned above, we study software

28

Note: This figure reports the coefficients of a sharp RDD using a cubic model with interactions and heterogeneous

treatment effects with the number of competitors in a specific contest room. The upper part of the figure up to the

dashed line represents RDD analysis of the overall population of 10,038. Below the dashed line, the RDD estimates

are conditional on the previous stage of the problem (1st stage: problem opened, 2

nd stage: solution for problem

submitted, 3rd

stage: submitted solution is correct).

Standard errors (in parentheses) are clustered at the contestant level and all models include covariates and round

fixed effects; *** p<0.01, ** p<0.05, * p<0.1.

Figure 8: RDD Analysis of overall problem solving behavior (part 2)