ertan - who to punish - eer 2009

Post on 06-Apr-2018

223 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

  • 8/3/2019 Ertan - Who to Punish - EER 2009

    1/17

    Who to punish? Individual decisions and majority rule in mitigatingthe free rider problem$

    Arhan Ertan a, Talbot Page b, Louis Putterman c,

    a Tufts University, MA, USAb Brown University (Emeritus), RI, USAc Department of Economics, Box B, Brown University, Providence, RI 02912, USA

    a r t i c l e i n f o

    Article history:

    Received 2 August 2006

    Accepted 21 September 2008Available online 17 October 2008

    JEL classification:

    C91

    C92

    D71

    H41

    Keywords:

    Public goods

    Collective actionPunishment

    Voting

    Institutions

    a b s t r a c t

    We study a voluntary contributions mechanism in which punishment may be allowed,

    depending on subjects voted rules. We found that out of 160 group votes, even when

    groups had no prior experience with unrestricted punishment, no group ever voted to

    allow unrestricted punishment and no group ever allowed punishment of high

    contributors. Over a series of votes and periods of learning we found a distinct

    reluctance to allow any punishment at the beginning, with a gradual but clear evolution

    toward allowing punishment of low contributors. And groups allowing punishment of

    only low contributors achieved levels of cooperation and efficiency that are among the

    highest in the literature on social dilemmas.

    & 2008 Elsevier B.V. All rights reserved.

    0. Introduction

    Organizations such as teams, firms, and military units depend on cooperative effort to succeed, and organizational

    leadership often attempts to increase cooperative contributions and/or reduce free riding by instituting rewards and

    sanctions and by building a culture or norms of cooperation. Problems of cooperation, for example, in efforts to limit

    greenhouse gases or depletion of fisheries, and free riding in efforts to provide public goods share a common characteristic:

    incentives for the individual that lead to inefficiency in the group. Such problems are often called social dilemmas and havebeen the focus of numerous studies using the method of the laboratory decision-making experiment.

    In one key social dilemma experiment, Ostrom et al. (1992) found for a model of overuse of a commons, that allowing

    face-to-face communication and allowing the subjects to sanction (punish) each other led to a significant increase in

    cooperative behavior. In another influential experiment, Fehr and Gachter (2000a) found that, in a voluntary contributions

    mechanism1 (VCM), the opportunity for punishment had a dramatic positive effect on contributions, but this finding did

    Contents lists available at ScienceDirect

    journal homepage: www.elsevier.com/locate/eer

    European Economic Review

    ARTICLE IN PRESS

    0014-2921/$ - see front matter & 2008 Elsevier B.V. All rights reserved.doi:10.1016/j.euroecorev.2008.09.007

    $ The research reported here was supported by N.S.F. Grant SES-0001769. We are indebted to two anonymous referees for helpful comments and

    suggestions. Corresponding author. Tel.: +1 401863 3837; fax: +1 4018631970.

    E-mail address: Louis_Putterman@Brown.Edu (L. Putterman).1 The basic voluntary contributions mechanism without punishment is a particularly sharp social dilemma, in which each individual maximizes his

    payoff when others contribute their full endowments but he himself contributes nothing; yet when everyone contributes nothing, efficiency is minimized.

    European Economic Review 53 (2009) 495511

    http://www.sciencedirect.com/science/journal/eerhttp://www.elsevier.com/locate/eerhttp://dx.doi.org/10.1016/j.euroecorev.2008.09.007mailto:Louis_Putterman@Brown.Edumailto:Louis_Putterman@Brown.Eduhttp://dx.doi.org/10.1016/j.euroecorev.2008.09.007http://www.elsevier.com/locate/eerhttp://www.sciencedirect.com/science/journal/eer
  • 8/3/2019 Ertan - Who to Punish - EER 2009

    2/17

    not extend to average efficiency. In both experiments, punishment was made possible by allowing a subject to pay out of

    his/her earnings to reduce by a larger amount the earnings of another. Since punishment is costly to both the punisher and

    the punished, it was not surprising to observe that punishment had a less positive effect on efficiency than on contributions

    in VCMs or overuse in commons problems.

    But at the same time practically everyone who studied the role of punishment noticed a curious phenomenon. While

    most punishment was targeted at low contributors in VCMs (and overusers in commons problems), a considerable amount

    of punishment was targeted at cooperators (high contributors in VCMs, low extractors in commons problems). The

    frequency of punishing high contributors in VCMs was too high to be explained as mistakes. Cinyabugama et al. (2006)estimated that about 15% of punishment in several experiments2 of this type was targeted at the highest contributor in a

    group, and about 25% at those who contributed more than their groups average. Researchers suggested several possible

    explanations: for example, revenge and harming others more than oneself to win relatively (tournament style), and moral

    resentment.3 These possible explanations suggested multiple preference typesincluding other-directed preferences

    (revenge, etc.) in addition to the self-interested preference for maximizing earnings found in most economic models. It

    seemed to us that the phenomenon of punishing high contributors in VCMs was more frequent than commonly recognized

    and likely to have adverse effects on contributions and efficiency. In practical life, if decentralized punishment of high

    contributors by resentful free riders has comparably high frequencies it would be a serous problem.4 We called the

    punishment of high contributors perverse punishment because of its seeming inconsistency with self-interested earnings

    maximization.

    Because the directing of a significant fraction of punishment at high contributors appears to limit the usefulness of

    decentralized punishment as a mechanism or institution, we asked whether the problem might be corrected if groups

    of individuals were provided with the opportunity to choose their own rules governing the application of punishment.We conducted an experiment in which rules determining who can be punished are chosen by a series of votes, in order to

    see how the choices of rules evolved over time and how these choices affected cooperation and efficiency. In our

    experiment, subjects voted on three ballot items determining independently whether group members could reduce the

    earnings of low (below average), of average, and of high (above average) contributors to their group account (public good).

    We found that out of 160 group votes, no group ever voted to allow punishment of high contributors. Over a series of

    votes and periods of learning we found a distinct reluctance to allow any punishment at the beginning, with a gradual but

    clear evolution toward allowing punishment of low contributors. And groups adopting this rule of controlling perverse

    punishment achieved levels of contributions and efficiency that are among the highest in the literature on social dilemmas.

    Our main contributions are: to show how rules of punishment can evolve endogenously to address free rider problems,

    within the opportunities of institutional choice presented to the experimental subjects; and to show that perverse

    punishment can have strong negative effects on contributions and efficiency but is amenable to group control.

    These contributions, listed more specifically in Results 14, are based on the observed behaviors in the experiment and

    rely on direct counts or non-parametric tests using fully independent observations at the group or session level. Toward theend of the result section, we also discuss regressions estimated using individual-level observations, here using group and

    period fixed effects to partially address the possible interdependence among observations.

    The paper is organized as follows. Section 1 reviews the theoretical outlook that informs our own and related research,

    then discusses the related literature. Section 2 presents the experimental design, Section 3 presents the analysis, and

    Section 4 discusses interpretative issues.

    1. Theoretical intuitions and literature

    1.1. Theory

    Several social dilemmas have an iterated dominant strategy equilibrium, which implies a unique Nash equilibrium

    without any cooperation. The finitely repeated prisoners dilemma (Kreps et al., 1982), the centipede game (McKelvey and

    Palfrey, 1992), and VCMs are examples having a unique Nash equilibrium with no cooperation. (One of the assumptions

    that leads to this result is that of a single preference type of payoff maximizers, all of whom believe that all the players are

    payoff maximizers.) Kreps et al. found this equilibrium result disturbing because many experiments on the prisoners

    dilemma showed a pattern of substantial cooperation. A little later, McKelvey and Palfrey (1992) developed an exponential

    version of the centipede game for which there are large benefits of cooperation, a unique Nash equilibrium with no

    cooperation, and substantial cooperation in experimental observations. McKelvey and Palfrey thought the centipede game

    ARTICLE IN PRESS

    2 In particular, Fehr and Gachter (2000a), Page et al. (2005), and Bochet et al. (2006).3 A low contributing individual may be made uncomfortable by a high contributors action, feel moral resentment and want to get even by

    punishing the high contributor. An experimental subject gave us this explanation in a debriefing statement.4 Cinyabuguma et al. find support for the idea that most punishment of high contributors by low ones may reflect retaliatory motives. For an

    experiment on retaliatory punishment, see Nikiforakis (2008). Recently, the on-line auction site eBay announced a clamp-down on tit-for-tat feedback

    to prevent sellers from leaving negative feedback on buyers. Today, the biggest issue with the system is that buyers are more afraid than ever to leave

    honest, accurate feedback because of the threat of retaliation, explained eBay North America president Bill Cobb in his January 29, 2008 announcement(Bangeman, 2008).

    A. Ertan et al. / European Economic Review 53 (2009) 495511496

  • 8/3/2019 Ertan - Who to Punish - EER 2009

    3/17

    was an even simpler andymore compelling example of the Nash equilibriums predictive failure than is the prisoners

    dilemma.

    In response to the Nash equilibriums predictive failure under assumption of payoff maximizing as the only preference

    type, Kreps et al. and McKelvey and Palfrey modeled the two social dilemmas as (different) games of incomplete

    information with multiple preference types. Kreps et al. used two types: payoff maximizers and tit-for-tat players.

    McKelvey and Palfrey used two types: payoff maximizers and altruists. With multiple types and incomplete information,

    iterated dominance no longer is implied. Instead, the researchers solved for BayesNash equilibria that more accurately

    predicted substantial cooperation until near the end of the game, as observed experimentally.It is easy to check that for the VCM with a punishment opportunity and voting in our experiment, under the assumption

    of payoff maximizers as the single type, iterated dominance implies a Nash equilibrium predicting no cooperation and no

    punishment (and any voting pattern, including 100% abstentions). But this implication no longer holds when there are

    multiple preference types. This non-implication is suggestive because in numerous experiments researchers found

    substantial contributions in finitely repeated VCMs without punishment (see Davis and Holt, 1993; Ledyard, 1995, for

    surveys). And in VCM experiments with punishment but without voting, Fehr and Gachter (2000a, b, 2002), Carpenter and

    Matthews (2002), Masclet et al. (2003), Page et al. (2005), and Sefton et al. (2002) found substantial contributions and

    substantial punishment. These studies and the non-implication suggest the presence of multiple preference types in our

    experiment and other VCMs.

    Comparison with the prisoners dilemma, the centipede game, and other Bayesian games points toward several

    predictions. Payoff maximizers are likely to mimic cooperators to encourage their cooperation, because this is a reasonable

    strategy for increasing their payoffs. Cooperators are likely to punish low contributors because they dislike free riding

    (see Gintis et al., 2005), and this signals and warns free riders to contribute more. Perverse punishers appear, however, tobe the opposite of cooperators. Fehr and Gachter (2000a, b) interpreted their results primarily in terms of the interaction of

    two preference types: purely selfish players (what we have called payoff maximizers) and a conditional cooperator type

    (see also Hoffman et al., 1998). Fischbacher et al. (2001) and Fischbacher and Gachter (2006) used a strategy method

    protocol to estimate that about 50% of those in their subject pools were of this second type. 5 Further, punishment of high

    contributors, observed by Gachter and Herrmann (2005), Gachter et al. (2005), and Cinyabugama et al. (2006), suggests

    that when punishment is an available option, the presence of a third type, whom we call perverse punishers, should also

    be taken into account. Based on the work mentioned above, we expected perverse punishers to account for not more than

    25% of our subjects.6

    A word of cautionwe believe that these preference types are somewhat stylized interpretations rather than sharply

    fixed, non-overlapping characteristics. With this in mind, intuitively the interaction of the three types in our experiment

    leads to predictions regarding voting. It seems likely that conditional cooperators would vote to allow punishment of low

    contributors and prohibit punishment of high contributors, and payoff maximizers might also vote similarly.7 It also seems

    likely that perverse punishers would vote to allow punishment of high contributors. But being in a minority, they wouldlikely be outvoted, although by chance they might form a majority in a few out of a large number of randomly formed

    groups.

    Considering multiple preference types has been useful in explaining results in a large number of basic VCMs and VCMs

    with punishment. But VCMs are more complicated than the prisoners dilemma or the centipede game, and to our

    knowledge, solving even the basic VCM for BayesNash equilibria has so far been intractable. We attempt here only to use

    the intuitions developed above to guide interpretation of observed behaviors, hopefully contributing both to a practical

    understanding of social dilemmas and to future refinements of theory.

    1.2. Related literature

    While our paper is the first to directly address effects of perverse punishment by allowing or prohibiting intermediate

    restrictions on punishment, there are related papers on the endogenous choice of institutional rules that allow or prohibit

    punishment altogether, or exogenously affect the role of punishment. Gurerk et al. (2005, 2006) designed two experimentsthat allowed subjects to vote with their feet in choosing between two groups, one allowing unrestricted punishment and

    the other no punishment. Subjects initially avoided the group with punishment, but with repeated opportunities to choose,

    almost all eventually chose the group with punishment, in result achieving high contributions and efficiency. Their

    ARTICLE IN PRESS

    5 In a different experimental setup, a VCM with endogenous group formation, Page et al. (2005) estimated a 59% proportion of conditional

    cooperators.6 When subjects from a population with this rough demography of types are randomly assigned to play a VCM in small groups, the groups may

    differ from one another in cooperation levels due to random differences in which types are represented and with what frequencies. Ones and Putterman

    (2007) grouped together on the one hand subjects displaying more cooperative behaviors and on the other hand subjects displaying less cooperation and

    more perverse punishing. They found, predictably, that the former achieved higher contributions and earnings than the latter.7 Incentives in voting of course differ from those in a private action. For example, a payoff maximizer may prefer free riding to contributing, but at the

    same time find it in his interest to vote to allow punishment of low contributors. In his calculation he may believe that by such a rule he would lose the

    benefit from his own free riding, but be more than compensated by many erstwhile free riders who will contribute more in response to the threatened

    punishment of free riding. And in a population of mixed preference types, a payoff maximizers calculations of the net advantage from the rule depends onhis beliefs on whether there will be a sufficient number willing to punish free riders and make the threat of punishment effective.

    A. Ertan et al. / European Economic Review 53 (2009) 495511 497

  • 8/3/2019 Ertan - Who to Punish - EER 2009

    4/17

    experiments differ from ours in that their subjects choose groups with either no punishment or unrestricted punishment,

    while our subjects have fixed groups and vote over alternative restrictions on punishment.

    Botelho et al. (2005) designed an experiment that allowed subjects to choose between an institution with unrestricted

    punishment and another without any punishment. They found that the subjects voted overwhelmingly for the institution

    without punishment. In a related experiment, Sutter et al. (2005) found that subjects most often voted to allow rewards

    rather than punishment even though the latter raised contributions more. These experiments differed from ours by

    allowing only one vote for each group, and not allowing choices of partially restricted punishment.

    Botelho et al. (2005) also analyzed Fehr and Gachters (2000a, 2002) data, finding lower earnings when punishment wasallowed than when it was not allowed.8 In contrast, Gurerk et al. (2005, 2006) found earnings (efficiency) as high or higher

    in VCMs with unrestricted punishment than in VCMs without punishment opportunities. Masclet et al. (2003) also found

    higher earnings with unrestricted punishment compared with no punishment allowed. By varying the ratio of

    punishments cost to the punisher versus the target of punishment, Nikiforakis and Normann (2008) and Egas and Riedl

    (2005) shed light on the conditions under which the unrestricted opportunity to punish does and does not increases

    efficiency.

    Noting the detrimental effects of the punishment of high contributors, Cinyabugama et al. (2006) designed a procedure

    they believed might reduce its incidence. The first two stages of the experiment were an ordinary VCM followed by a

    punishment opportunity. But in a third stage, each subject learned the frequency of each other subjects punishment of

    high, average, and low contributors, and each was given an opportunity to punish on the basis of this information. The

    authors found that this incentive system led to less perverse punishment in the second stage, but fairly frequent perverse

    punishment in the third stage, for example subjects who punished free riders in the second stage were then severely

    punished in the third stage, undermining the incentives in the first stages.Gachter and Herrmann (2005) used population groupings (young rural Russians, older rural Russians, young urban

    Russians, older urban Russians) to study the effects of unrestricted punishment. They found large variations among the

    groups in frequency of punishing high contributors and the harmful effects of this perverse punishment which, they wrote,

    can undermine the positive impact of punishment for cooperation and thereby limit the success of self-governance. Like

    Cinyabuguma et al. and our paper, Gachter and Herrmann emphasized the detrimental effect of perverse punishment on

    efficiency.

    Casari and Luini (2005) compared effects of exogenously imposed punishment rules, including a rule requiring a subject

    to be targeted for punishment by at least two group members (in a group of five) before the punishment takes effect. They

    found that the restriction decreased punishment of high contributors and raised efficiency, but in this treatment the

    average contribution was quite low, not exceeding half of the endowment.

    2. Design and predictions

    2.1. Basic design

    Our design extends the basic VCM in which subjects are randomly assigned to groups that remain fixed (a partners

    design) for a finite and known number of periods. Each subject in a group is provided with an initial endowment that he or

    she is asked to divide between a private account and a group account. Any funds placed in the group account are scaled up

    by the experimenter and divided equally among the subjects in the group without regard to individual contribution. To this

    basic VCM we added punishment and voting opportunities in two designs to study how rules restricting or allowing

    punishment might emerge initially and evolve over a series of votes. In the experiment, individuals act anonymously and

    without communication.

    We initially conducted a pilot experiment in which there were four partner groups with four subjects in each group. At

    the beginning of the 1st period, the subjects received instructions for playing a basic VCM without punishment, and each

    group played 10 periods of this repeated game (details of the basic VCM and its payoff function (1) are shown below). At the

    beginning of the 10th period the subjects received instructions for playing a VCM with unrestricted punishment, and each

    group played 10 periods of this repeated game (details and payoff function (2) shown below). So far, this design is similar to

    Fehr and Gachter (2000a). But following these first 20 periods, each group voted on who if anyone could be punished in a

    final 10 periods (details of the ballot process is shown below). Of the four group votes, all four voted to prohibit punishment

    of higher-than-average contributors; one group prohibited all punishment and the other three groups voted to allow

    punishment of low contributors.9

    ARTICLE IN PRESS

    8 Cinyabuguma et al. (2004) found similar results for Fehr and Gachter (2000a) and in public goods and sanctions experiments by Carpenter and

    Matthews (2002), Sefton et al. (2002), Page et al. (2005), and Bochet et al. (2006). In their working paper, Cinyabuguma et al. (2004) used regression to

    study the impact of punishment upon changes in the punished subjects contribution, and found that each dollar of punishment of a groups highest

    contributor substantially decreased his or her next period contribution. The authors concluded that a major reason why punishment reduces efficiency in

    the experiments mentioned is the punishment of high contributors. Their calculations showed that in the related public goods and sanctions experiments

    by Bochet et al. and Page et al., earnings would have been higher with punishment than without it but for the presence of perverse punishment.

    9 Due to a computer problem, the voted rules were not properly implemented; nonetheless, decisions up to and including the vote remainuncompromised, allowing us to make inferences from this pilot experiment occasionally in what follows.

    A. Ertan et al. / European Economic Review 53 (2009) 495511498

  • 8/3/2019 Ertan - Who to Punish - EER 2009

    5/17

    Following this pilot, we wanted to see not only what rules are chosen initially but also what voting patterns would

    emerge with further experience. In the first of two designs, we increased the number of votes to three votes for each group,

    and correspondingly shortened the number of periods under which a voted rule governed before the next vote from

    10 periods to 8. To keep the total number of periods to 30, we shortened the introductory experiences of VCMs with and

    without punishment from 10 periods each to 3 periods each. This became our 3-Vote design (see Fig. 1A).

    As in the pilot treatment, subjects in the 3-Vote design were given instructions describing the basic VCM, and then

    participated in the basic VCM (this time for 3 periods), then received their second instructions about the opportunity of

    voluntary punishment, unrestricted except for some budgetary constraints (see below), then played for 3 periods under

    this condition, all before learning of the voting opportunities and items to be voted on. At the beginning of the 7th period,

    the subjects received their third instructions, which explained the voting process, and took their first vote on the rulesgoverning who, if anyone, could be punished for the next 8 periods. At the beginning of the 15th period a second vote was

    taken and new rules regulating punishment were chosen. Then the subjects participated in 8 periods of the VCM with

    punishment (if any) governed by the second chosen rules. At the beginning of the 23rd period the third and final vote was

    taken, and the remaining 8 periods were conducted with possible punishment governed by this last vote. As in the pilot, we

    included practice exercises in each of the three sets of instructions.

    Surprised to find that out of 60 group votes none allowed punishment of high contributors and that the majority of

    groups seemed to be converging towards allowing punishment of low contributors, we added a 5-Vote design (Fig. 1B)

    which differed from the 3-Vote design in that (a) there was no play, whether with or without punishment, before the

    determination of rules by vote, and (b) the sequences of play between votes were reduced from 8 to 6 periods, to allow for

    five votes and play phases in a session of similar duration. As Fig.1B shows, the first and only instructions were given at the

    beginning of the experiment. They explained the basic VCM mechanism without punishment, possible rules governing

    punishment, and the opportunity to vote on them. Subjects then voted to allow or restrict punishment (without any hands-

    on experience of punishment or its restrictions). Then they participated for 6 periods in the VCM, governed by the chosenrules of punishment. At the beginning of the 7th period, the subjects voted again, and then participated in 6 periods of the

    VCM, governed by the chosen rules of punishment. The same process repeated for three more times, as shown in the figure.

    The 5-Vote design had the same number of periods (30) as the 3-Vote design.

    The 5-Vote design functioned as a stress test for the results of the 3-Vote design in several ways. First, the task of

    learning and familiarization was harder, since the first choice of rules occurred before subjects had any experience

    interacting in a VCM with or without punishment. Second, the possibility that experiences such as annoyance with free

    riders or with receiving punishment could influence the first vote was eliminated. These differences permitted a test of

    whether the 3-Vote designs results were driven by the 3-Vote designs more gradual, hand-on learning. Third, in the 5-Vote

    design there were 100 group votes, thus with 160 group votes in total, unanimity in prohibiting perverse punishment

    would be very unlikely unless there were strong factors leading in this direction. Finally, with each group voting 5 times

    instead of 3, the monotone increase in votes for the rule allowing punish low-but-not-high would be less likely unless there

    were strong factors leading to this pattern.

    In both the 3- and 5-Vote designs, sessions had 16 subjects assigned randomly to four groups of four subjects whoremained together throughout the session. Each subject knew there were 16 subjects in the experiment room but could not

    ARTICLE IN PRESS

    1instru

    ctions

    2instru

    ctions

    3period

    s with

    3period

    s with

    nopunishmen

    t

    punishment

    3instru

    ctions

    and

    1Vote

    rd rdnd nd

    st

    st

    8period

    s with

    the

    chosen

    rule

    2Vo

    te

    8period

    s with

    the

    chosen

    rule

    3Vote

    8period

    s with

    the

    chosen

    rule

    3 6 9 12 15 18 21 24 27 30

    instr

    uctio

    ns

    and

    1Vote

    thrd

    st

    6period

    s with

    6period

    s with

    the

    chosen

    rule

    the

    chosen

    rule

    3Vo

    te

    4Vo

    te

    3 6 9 12 15 18 21 24 27 30

    Periods

    nd

    2Vo

    te

    6period

    s with

    the

    chosen

    rule

    6period

    s with

    the

    chosen

    rule

    th5

    Vote

    6period

    s with

    the

    chosen

    rule

    Periods

    3-VOTE DESIGN

    5-VOTE DESIGN

    Fig. 1. (A) The 3-Vote design and (B) the 5-Vote design.

    A. Ertan et al. / European Economic Review 53 (2009) 495511 499

  • 8/3/2019 Ertan - Who to Punish - EER 2009

    6/17

    tell which among the others in the session belonged to her group. Contribution and punishment choices (if any) were

    announced to other group members under randomly changing labels B, C, and D, for ones fellow members, so that the

    behaviors of individuals could not be tracked from period to period, except by conjecture. A subject learned the total

    amount of punishment she had received, but not which group members punished her or by how much.

    Just before the second and later votes of both designs, each subject was informed of the punishment rule chosen in the

    preceding votes of each of the four groups in their session along with each groups average contributions and earnings

    during the periods the rule governed (the information was new for the most recently taken vote, and was repeated for the

    earlier votes). This information was included to speed the adjustment process, if there is one, and of course learning fromthe examples of others occurs in many real-world settings. The downside of providing this information, in terms of the

    number of fully independent observations, is substantial, but our main results are statistically significant, in spite of this.

    Also, the first vote of each group remains a strictly independent observation, since no information about other groups was

    shared until immediately before the second vote.

    2.2. Payoffs

    All periods shared the same underlying structure. In each period, each subject had to decide on a division of 10

    experimental dollars, in integer amounts, between a private account and a group account, before observing the choices of

    fellow group members. In a period, subject i earned

    yi 10 Ci 0:4X4

    j1

    Cj (1)

    where Ci is is contribution to the public account and the summation is taken overall members ofis group, including i. After

    all four made their decisions, each was informed of the contribution choices of the others. When punishment was

    permitted, it cost a subject 0.25 experimental dollars to reduce the earnings of another person by 1.00 experimental dollar.

    Subject is earnings after punishment were thus

    yi 10 Ci 0:4X4j1

    Cj 0:25Xjai

    Rij Xjai

    Rji (2)

    where Rij is the number of dollars by which i reduced js earnings, and conversely for Rji. General constraints on punishment

    in all treatments were: (i) a subject could not spend more than her/his pre-punishment earnings for the period on reducing

    the earnings of other subjects, (ii) a subjects post-punishment earnings for a period would be set to zero if earnings yi in

    Eq. (2) were negative, and (iii) a subject i could not spend more on reducing the earnings of a subject j in any period than

    would single-handedly reduce js earnings according in (2) to less than zero.10 The Appendix shows the screen design forentering an individuals contribution and punishment decisions.

    2.3. Voting

    In a voting stage, each subject checked off one of three boxes beside each of three ballot items, on a screen set up as

    follows:

    I vote to allow a persons earnings to be reduced if

    (a) that person assigns less than the average amount11 to the group account Yes No No preference

    & & &

    (b) that person assigns the average amount to the group account Yes No No preference

    & & &

    (c) that person assigns more than the average amount to the group account Yes No No preference

    & & &

    In each group of four subjects, of those expressing a preference in ballot item (a), if there was a majority or tie of No

    votes against punishment of low contributors, then punishment of low contributors would be prohibited for the next 8

    periods in the 3-Vote design and 6 periods in the 5-Vote design; and if a majority voted Yes, punishment of low

    ARTICLE IN PRESS

    10 The purpose of (i) and (ii) was to keep all decisions financially independent of each other while maintaining a guaranteed minimum payment for

    recruiting reasons. The purpose of (iii) was to help subjects to avoid pointless spending on punishment in view of constraint (ii). Note, however, that it

    remained possible for subjects to overspend on punishing in the sense that both subject i and, say, subject k might each spend enough to reduce js

    earnings for the period to zero, although only one subjects punishment would actually be effective in that case, given (ii). This could happen because

    subjects did not learn of punishment not carried out by or aimed at them, and the design (as in Fehr and Gachter, 2000a) keeps such information private

    so as not to encourage free riding on punishment.11 As explained in the instructions, average amount meant the average over the four members of the group in the contribution stage of the period in

    question. It could vary among groups and within a given group from one period to the next. Note that a vote to allow punishment of those contributingless than the group average of 4 players is the same event as a vote to allow punishment of those contributing less than the average of the 3 others.

    A. Ertan et al. / European Economic Review 53 (2009) 495511500

  • 8/3/2019 Ertan - Who to Punish - EER 2009

    7/17

    contributors would be allowed; and correspondingly for ballot items (b) and (c).12 After the vote, each groups members

    received a message reporting the voting outcome, which was one of 23 8 possible punishment rules (i.e., combinations of

    the three ballot item choices).13

    When a group voted to restrict punishment, a fixed zero appeared in the punishment box14 for all individuals to which

    the restriction applied during the punishment stages that followed each contribution stage. For example, members of a

    group that had voted to prohibit all punishment saw the standard punishment stage screen with fixed 0s in all the

    punishment boxes, indicating that no punishment was allowed in this case.

    We conducted five sessions of each design using a total of 160 subjects (see Table 1).15 All of the sessions

    were conducted by computer in a computer lab at Brown University. At the end of each session, cumulative earnings

    for the 30 periods were totaled and converted to real money at the rate of 25 experimental dollars to one real dollar, and$5 was added as a participation fee. Sessions typically lasted a little less than 2 hours including instructions, and subjects

    overall earnings averaged approximately $25. Instructions for both designs are similar and available in our Working

    Paper.16

    3. Results

    3.1. The voting pattern

    In the 3-Vote design there were 720 individual votes (80 subjects each voting 3 times on 3 ballot items), and in the

    5-Vote design 1200 individual votes. Table 2 shows the number of individual votes on each ballot item. The table shows a

    substantial number of individuals voted to allow punishment of higher-than-average contributors, but many more voting

    to allow punishment of less-than-average contributors.

    In the 3-Vote design there were 60 group votes (see Table 1), and in the 5-Vote design there were 100 group votes. In the160 group votes altogether, only 4 of the 8 possible combinations of rules were ever chosen by majority rule. These were to

    allow: (i) no punishment, 56 group votes; (ii) punishment of lower-than-average contributors and no other punishment,

    98 votes; (iii) punishment of low-or-equal-to-average contributors and no other punishment, 4 votes; and (iv) punishment

    of equal-to-average contributors and no other punishment, 2 votes. Conspicuously absent from this list is that no group

    ever voted to allow punishment of higher-than-average contributors.

    Result 1. No group ever voted to allow punishment of higher-than-average contributors, so perverse punishment was ruled out

    from the first opportunity to vote.

    In ruling out perverse punishment, every group also ruled out unrestricted punishment from the beginning. The two rules

    punishment of lower-than-average contributors and no other punishment and punishment of low-or-equal-to-average

    ARTICLE IN PRESS

    Table 1

    Numbers of groups, subjects, and votes.

    Session design Number of

    sessions

    Number of groups

    in each session

    Number of subjects

    in each group

    Total number of

    subjects

    Total number of

    group votes on

    rules

    3-Vote 5 4 4 80 60

    5-Vote 5 4 4 80 100

    12

    We expected few cases where someone was exactly an average contributor, but for symmetry we treated the average contributor on a separateballot item.

    13 Only Yes and No votes were counted in determining majorities; for example, if 2 voted Yes and 2 voted No, the proposal did not pass, but if

    2 voted Yes, 1 No and 1 No preference, the proposal passed. Subjects were informed of whether a ballot item passed or not, but not by how many

    votes or who voted which way.14 See the boxes labeled b, c and d on the lower left portion of the diagram in the appendix showing the screen design.15 Subjects were Brown undergraduates, recruited by (a) distribution of flyers in the mailboxes of all undergraduates, (b) distribution of flyers in a

    large introductory economics course, (c) distribution of table slips at a student dining hall, and (d) advertising under the heading of employment in an on-

    line campus magazine, the Brown Daily Jolt. Analysis of information provided in the post-experiment debriefing shows that the subjects majored in a large

    range of concentrations, with the economics concentration being that of only 15%, about 5% more than the proportion in the overall student body. A little

    less than half the subjects had taken no economics courses at the college level. A total of 67% of the subjects were female, somewhat higher than the 53%

    share in the general student body. Browns undergraduate population numbers about 5500, so students participating in a given session tended not to

    know one another.16 See Ertan et al. (2005). In the instructions and experiment we used neutral language and did not use words like free riding, punishment, and

    perverse punishment. See also Cinyabuguma et al. 2006, where we point out that punishment is most clearly perverse when aimed at a groups highest

    contributor. Here as in that experiment we distinguish between punishment of above average, average, and below average contributors, rather than

    between punishment of highest and of other contributors, because this seems more symmetrical and less likely to convey a biased framing of the problemto subjects.

    A. Ertan et al. / European Economic Review 53 (2009) 495511 501

  • 8/3/2019 Ertan - Who to Punish - EER 2009

    8/17

    contributors and no other punishment are similar and we grouped them together under the heading of allowing

    punishment of low-but-not-high contributors. Fig. 2 shows how the group voting evolved, over the sequence of votes for

    the 3- and 5-Vote designs. Result 2 summarizes the voting pattern over time.

    Result 2. In both designs, a plurality of groups voted in their first vote to prohibit all punishment, with a substantial minority of

    groups voting to allow punishment of low-but-not-high contributors. Over the sequence of votes, this ordering reversed, so that in

    the final vote, nearly all groups voted to allow punishment of low-but-not-high contributors, with only a few remaining groupsvoting to prohibit all punishment.

    ARTICLE IN PRESS

    Table 2

    Numbers of individual votes to allow punishment of high, average, and low contributors, both designs.

    Yes No No preference

    Allow punishment of less than average contributors 410 211 19

    Allow punishment of average contributors 46 577 17

    Allow punishment of above average contributors 111 493 36

    0

    5

    10

    15

    20

    0

    5

    10

    15

    20Prohibit all punishment

    Punish low-but-not-high

    1st vote 2nd vote 3rd vote 1st vote 2nd vote 3rd vote 4th vote 5th voteNum

    berofgroupsvotingfortherule

    Num

    berofgroupsvotingfortherule

    Punish equal-to-average

    3-VOTE DESIGN 5-VOTE DESIGN

    Fig. 2. Evolution of the voting rules: (A) 3-Vote design and (B) the 5-Vote design.

    0

    2

    4

    6

    8

    10

    AverageContributions

    Periods

    3 6 9 12 15 18 21 24 27 30

    Periods

    3 6 9 18 21 24 27 3013 6 9 12 15 18 21 24 27 303 6 9 18 21 24 27 3010

    2

    4

    6

    8

    10

    AverageContributions

    unrestricted (exogenous)

    no punishment (exogenous)

    no punishmentno punishment

    low-but-not-high low-but-not-high

    3-VOTE DESIGN 5-VOTE DESIGN

    Fig. 3. Average contributions for the two designs, by period and punishment rule: (A) the 3-Vote design and (B) the 5-Vote design.

    A. Ertan et al. / European Economic Review 53 (2009) 495511502

  • 8/3/2019 Ertan - Who to Punish - EER 2009

    9/17

    3.2. Contributions and efficiency

    Fig. 3 shows period-by-period contributions of groups for the two composite rules most frequently chosen. In both the

    3- and 5-Vote designs, groups that allowed punishment of low-but-not-high contributors achieved substantially higher

    levels of contributions than did groups that prohibited punishment altogether. We tested the hypothesis that contributions

    are higher for groups choosing the punish low-but-not-high rule than for those choosing the rule of no punishment, against

    the null hypothesis of no difference, in two ways. First, to avoid the possible statistical dependence from one period to

    another, and from group to group in a session because of the information provided from the second vote onward, we set

    aside observations from the second vote onward, and then averaged contributions in the periods between the first andsecond vote (with a probable loss of power). Comparing contributions under no punishment with those under punish low-

    but-not-high at the group level, we found, in a one-tailed MannWhitney test, significance at the 0.1% level for 3-Vote

    design (11 group observations without and 9 with punishment) and at the 5% level (p 0.034) for 5-Vote design

    (13 observations without and 7 with punishment). Second, we tested differences in behaviors from the second vote onward

    in Wilcoxon matched-pair tests at the session level, with fewer observations but similar results.17 In both the 3- and 5-Vote

    designs, contributions in groups that permitted punishment of low-but-not-high contributors tended to increase over time

    until the end-game fall off. In contrast, 3-Vote design groups that prohibited all punishment had falling levels of

    contributions over time, replicating earlier results on basic VCMs without punishment, and in the 5-Vote design

    contributions had a slightly increasing trend in the middle periods.18

    Fig. 4 shows period-by-period efficiency of groups that voted to prohibit all punishment and groups that voted to

    prohibit perverse punishment while allowing punishment of low contributors. In Fig. 4A, average period efficiency was

    ARTICLE IN PRESS

    Periods

    3 6 9 12 15 18 21 24 27 30

    Periods

    13 6 9 12 15 18 21 24 27 3010

    Efficiency

    Efficiency

    0

    0.2

    0.4

    0.6

    0.8

    1.0

    0.2

    0.4

    0.6

    0.8

    1.0

    unrestricted (exogenous)

    no punishment (exogenous)

    no punishmentno punishment

    low-but-not-high low-but-not-high

    3-VOTE DESIGN 5-VOTE DESIGN

    Fig. 4. Efficiency for the two designs, by period and punishment rule: (A) the 3-Vote design and (B) the 5-Vote design.

    17 For these tests, we have at most one paired observation from each session, namely the average contribution per subject in all groups in the session

    that chose one rule, and the corresponding average in all groups that chose the other. This yields up to five paired averaged observations in groups

    allowing no punishment and in groups allowing punishment of low contributors in each design and set of periods, although there are fewer observations

    for sets of periods when only one rule is observed in one or more sessions. For example, if in a certain session and set of periods two groups allowed no

    punishment and two groups allowed punishment of low contributors, we averaged the contributions over the relevant periods in the first two groups and

    likewise for the second two groups, giving us one pair of observations for that session; if all four groups follow the same rule, the session offered no

    observation for this test. We performed Wilcoxon matched pair (ranked sum) tests on these data with the following results beginning with 3-Vote design:

    for periods 714, only 3 sessions have observations of both rules, and although in all cases contributions are higher in the groups allowing punishment,

    the p-value of the one-tailed test is 0.055; for periods 1522, with 4 valid sets of observations, the one-tailed test p-value is 0.034; for periods 2330, only

    two sessions still have groups not using punishment, so although the ordering remains consistent, the one-tailed test p-value is 0.09. Turning to the

    5-Vote design, we have for periods 16, 3 pairs of observations with one-tailed test p-value of 0.055; for periods 712, 4 pairs with one-tailed test p-value

    0.034; for periods 1318, 4 pairs, one with contrary ordering, hence one-tailed test p-value 0.072; periods 1924, 4 pairs including one tie, and one-tailed

    test p-value of 0.055; periods 2530, only 2 pairs, both with the usual order, but two-tailed test p-value 0.09.18 In Fig. 3A and especially 3B contributions under the endogenously chosen rule of no punishment are more sustained and decline more slowly than

    is typical in a VCM without punishment. But endogenous choice includes its process, including repeated voting and the ability to change rules, possibly

    leading to commitment effects (see Sutter et al.), restart effects (see the dashed vertical lines in Fig. 3), and selection effects as groups change rules inresponse to free-riding behavior (i.e., groups with the lowest levels of free riding are less likely to adopt a rule allowing punishment).

    A. Ertan et al. / European Economic Review 53 (2009) 495511 503

  • 8/3/2019 Ertan - Who to Punish - EER 2009

    10/17

    always higher under the rules allowing punishment of low contributors, and similarly in Fig. 4B, except in 6 periods.19 As

    with contributions, we performed both MannWhitney and Wilcoxon tests of the hypothesis that earnings are higher

    under restricted punishment than under no punishment at different levels of aggregation, with the resulting significance

    levels varying from 0.1% to 10% in 3-Vote design and from the 10% level to insignificant in 5-Vote design due to the

    similarity of earnings under the two rules in some groups of periods.20

    Table 3 compares contributions and efficiency under the two most voted rules, and the exogenously imposed conditions

    of unrestricted punishment (periods 46 of the 3-Vote design) and no punishment (periods 13).21 The results of the five

    tests of Table 3 are summarized in Result 3:

    Result 3. For each of the Wilcoxon matched pair tests on contributions, contributions are higher under the rule of punish low-

    but-not-high than under the rule of unrestricted punishment, and contributions are higher under the rule of unrestricted

    punishment than under the rule of no punishment, and this ordering is transitive. Correspondingly, efficiency is higher under

    punish low-but-not high than under no punishment, and efficiency is higher under no punishment that under unrestricted

    punishment, and this ordering is transitive.

    ARTICLE IN PRESS

    Table 3

    Effects of the punishment rule on contribution and efficiency.

    Test Ranks of contributions by the punishment rule Test Ranks of efficiency by the punishment rule

    1 Punish low4unrestricted 4 Punish low4no punishment

    2 unrestricted4no punishment 5 Punish low4no punishment

    3 unrestricted4no punishment 2 No punishment4unrestricted

    4 Punish low 4no punishment 3 No punishment$unrestricted

    5 Punish low 4no punishment 1 Punish low 4unrestricted

    Notes: One-tailed Wilcoxon matched pair tests. Tests 14 are for groups in the 3-Vote design. Test 1 compares the average contributions of periods 79 in

    groups that chose punish low in their first vote matched with the average contributions of the same group in periods 46 of unrestricted punishment

    (the number of distinct groups matched and compared is n 9); and correspondingly for efficiency. Test 2 compares contributions (efficiency) for groups

    in periods 13 with contributions for the same groups in periods 46, n 20. Test 3 compares periods 46 with 79, for the groups that chose no

    punishment in periods 79, n 10. Test 4 compares members of the first groups in each session that switched from a voted rule of no punishment to a

    voted rule of punish low, comparing the 8-period averages before and after the switch; if two or more groups in a session switched at the same time, the

    behaviors of all of their members are averaged; n 5. Test 5 is the same as Test 4, except it is for the 5-Vote design and 6-period averages are compared

    before and after the switch, n 5. A less stringent version of Test 1 considers the first three periods of play in any group that adopted punish low, even if

    after the 2nd or 3rd vote. This test has n 17 and the test has a p-valueo0.1% for contributions and o1% for earnings. We also considered less stringent

    versions of Tests 4 and 5 that compare each group that switched from a voted rule of no punishment to one of punish low, regardless of whether this

    was the first time such a switch had occurred among groups in their session. For Test 4, there are 9 paired observations and the p-value of the test statistic

    iso1% for both contributions and earnings. For Test 5, there are 17 paired observations and the p-value iso0.1% for contributions ando5% for earnings.

    Punish low indicates punish low-but-not-high. Significance at the 1% level. Significance at the 5% level. Significance at the 10% level, and $insignificant, in one-tailed tests.

    19 The difference in earnings between groups with no punishment and those with the punish-low-but-not-high rule (Fig. 4) is smaller than the difference

    in contributions (Fig. 3) because (a) an extra E$1 (one experimental dollar) of contribution raises efficiency by only E$0.60, and (b) punish-low-but-not-high

    groups achieve higher contributions but incur some punishment costs (E$1.25 per E$1 of punishment assigned). Experimenters with the voluntary

    contribution mechanism occasionally seen in the lab a group that achieves high contributions without punishment or other aids, and the two groups that

    resisted voting for punishment in the 5-Vote design were of this type, their members perhaps priding themselves on being able to earn as much as those in

    other groups even without having recourse to the punishment threat.20

    As with contributions, we begin with MannWhitney tests using group level observations from the periods between the 1st and 2nd votes, only.For the 3-Vote design, the one-tailed test p-value in this case is 0.001; for 5-Vote design, the test finds no difference based on punishment rule, consistent

    with what Fig. 4B shows in periods 16. Next, we performed Wilcoxon matched pair tests for each set of periods with a maximum of one pair of

    observations per session. For 3-Vote design, there are 3 valid pairs for periods 714, all showing higher earnings with punishment, with one-tailed test

    p-value of 0.055; for periods 15-22, 4 pairs, p-value 0.034; and for periods 2330, 2 pairs, ordered as expected, p-value of 0.090. For the 5-Vote design,

    periods 16 have 3 paired observations but the difference, as with the corresponding MannWhitney test, is not significant; for periods 712, 4 pairs,

    again no difference; for periods 1318, 4 pairs, with those with punishment earning more in 3 of 4 cases, thus p-value 0.072; periods 1924, 4 pairs, again

    3 favoring those allowing punishment but this time one tie, thus p-value 0.055; periods 2530 only 2 valid pairs are left, with one session displaying one

    order, the other the other order, hence no significant difference. Although violating the requirement of full independence of observations, it may

    nevertheless help to put these results into perspective and convey a sense of the statistical power lost due to the dissemination of information if we report

    also the results for tests using all group level observations for all periods: for 3-Vote design, the p-value of a one-tailed test would be less than 0.001; for

    5-Vote design, the p-value of the corresponding test is 0.01.21 For example, in comparing contributions under the rule of punish low-but-not-high with contributions under the (exogenous) rule of unrestricted

    punishment in Test 1, we considered the 17 groups of the 3-Vote design that eventually chose the rule allowing punishment of low-but-not-high

    contributors (see Fig. 2A). For each of these groups we calculated the average group contribution over the first 3 periods that the group was governed by

    this rule. We matched this average with the same groups average contribution over the 3 periods of unrestricted punishment (periods 46 of the 3-Vote

    design). In the 17 matched pairs, 14 groups had higher contributions under the rule of punish low-but-not-high, 2 groups had higher contributions underunrestricted punishment, and 1 group was tied. The difference is significant (p 0.001) in a two-tailed Wilcoxon matched pair test.

    A. Ertan et al. / European Economic Review 53 (2009) 495511504

  • 8/3/2019 Ertan - Who to Punish - EER 2009

    11/17

    Because of the difference in the orderings for contributions and efficiency, the sequence or tests in Table 3 for efficiency

    are rearranged to show the transitivity. The difference in the orderings of contributions and efficiency is likely due to the

    cost of punishment.

    3.3. Mitigating the free rider problem

    In the literature on public goods games, it is common practice to use the term free rider loosely to denote any

    individual who contributes less than the socially optimal amount. It is worth noting, however, that in the absence of

    punishment anyone who contributes less than others earns more than these others and thus obtains a free ride on others

    contributions; but when punishment is possible a low contributor may fail to earn more, and therefore fail to free ride in

    actuality. To compare how successfully different sets of rules address free riding, we adopt in this section a definition that

    considers the full outcome, not simply the contribution decision.

    Specifically, the symmetric design of this and other VCM experiments suggests a simple definition of free riding: a

    subject A experiences free riding when someone else in his group, B, contributes less to the public good but earns more

    than A does.22 For a specified punishment rule, sequence of periods, and collection of groups, we define the frequency of

    free riding as the number of cases of free riding divided by the number of observations, and an observation as a pairing in a

    group, where one subject in a group has a higher contribution than the other subject of the pair. By the design of a basic

    VCM without punishment and its payoff Eq. (1), every time someone contributes more than someone else, there is a caseof free riding because the higher contributor always has lower earnings than the lower contributors. Thus, in this definition

    of free riding, the frequency of free riding for the basic VCM is 100% (as shown in the first bar ofFig. 5). But the frequency of

    free riding may decrease when sufficient punishment is directed at low contributors.

    For the rule of unrestricted punishment, overall 20 groups in periods 46 of the 3-Vote Design, there were 205

    observations of pairs of unequal contributions by subjects in a group, and 148 cases of free riding, for a frequency of 72%

    (see the middle bar). In comparison, the frequency of free riding in the first 3 periods after a group voted for the rule of

    punishing low-but-not-high contributors was 35% of the 103 observed unequal pairs. This is a striking reduction,

    considering that the rule of punish low-but-not-high does not prevent a higher-than-average contributor from free riding

    on a still higher contributor. The difference in free riding between unrestricted punishment and punish low-but-not-high

    contributors is significant (po0.0001) in a Fisher exact test.23

    Result 4. In comparing VCMs with rules governing punishment, we found the highest frequency of free-riding in groups

    operating with no punishment, less free-riding in groups with unrestricted punishment, and least free riding in groups allowingpunishment of low-but-not-high contributors.

    A regression analysis of incentives to free ride finds the same ordering as in Result 4. In the regressions below, we follow

    Fehr and Gachter in defining subject is absolute negative and positive deviations from the average of others contributions

    as

    Absolute

    Negative

    Deviation

    jCi Cij ifCio Ci

    0 otherwise

    (and

    Positive

    Deviation

    jCi Cij ifCio Ci

    0 otherwise

    (

    where Ci P

    jaiCj=3 is the average of others contributions.

    ARTICLE IN PRESS

    0

    0.2

    0.4

    0.6

    1.0

    no

    punishment

    unrestricted

    punishment

    punish low-

    but-not-high

    Frequency

    ofcasesoffree

    riding

    0.8

    total number ofobservations for arule in parentheses

    (205)

    (103)

    (229)

    Fig. 5. Frequency of cases of free riding, by punishment rule.

    22 Under this definition, if everyone in a group contributed the same low amount, there would be no free riding (it is only defined for unequal

    contributors).23 We also did a Wilcoxon matched pair test, which is also significant; see the Working Paper for details.

    A. Ertan et al. / European Economic Review 53 (2009) 495511 505

  • 8/3/2019 Ertan - Who to Punish - EER 2009

    12/17

    Using Fehr and Gachters specification (see their Table 5, p. 991), we first consider behavior in the 3 periods of the

    exogenously imposed rule of unrestricted punishment (periods 46 of the 3-Vote design, see column (1) of Table 4), and

    compare this with the first 3 periods of the endogenously chosen rule allowing punishment of low-but-not-high in both the

    3 and 5-Vote designs (columns (2) and (3)).24 Then we consider behavior for the punish low-but-not-high rule over all the

    periods which it governs in the 3- and 5-Vote designs (columns (4) and (5)).

    In each regression ofTable 4 the dependent variable is each subject is punishment received in each period (3 periods for

    regressions (1), (2), and (3), and up to 24 and 30 periods in regressions (4) and (5) respectively). The independent variables

    are the Average Contribution of Others, is Absolute Negative Deviation, is Positive Deviation, and period and group

    dummies (not shown).25,26

    The results in Table 4 are consistent with those of Fehr and Gachter in that Absolute Negative Deviation always obtains a

    significant positive coefficient. The coefficient on the Positive Deviation term in column (1), however, suggests that when itis allowed, perverse punishment exacerbates the incentive problem for high contributors.27 Table 5 re-organizes Table 4s

    ARTICLE IN PRESS

    Table 4

    determinants of punishment received.

    Independent variables Dependent variable: experimental dollars of punishment

    First three periods of the rule All periods of the rule

    Unrestricted

    punishment 3-Vote

    design (1)

    Punish low-but-

    not-high 3-Vote

    design (2)

    Punish low-but-

    not-high 5-Vote

    design (3)

    Punish low-but-

    not-high 3-Vote

    design (4)

    Punish low-but-

    not-high 5-Vote

    design (5)

    Constant 0.74 4.086 19.754 0.587 11.483

    (1.067) (2.353) (4.587) (2.222) (4.367)

    p 0.490 p 0.088 po0.001 p 0.792 p 0.010

    Average contribution by others 0.388 0.230 1.090 0.228 0.654

    (0.175) (0.244) (0.405) (0.206) (0.269)

    p 0.028 p 0.350 p 0.009 p 0.269 p 0.016

    Positive deviation 0.377 n.a. n.a. n.a. n.a.

    (0.152)

    p 0.014

    Absolute negative deviation 0.888 1.217 1.039 1.054 0.967

    (0.221) (0.148) (0.122) (0.138) (0.095)

    po0.001 po0.001 po0.001 po0.001 po0.001

    R2 0.54 0.91 0.86 0.75 0.78

    Observations 240 82 92 241 176

    Notes: Punishment received as a function of deviation from group average in unrestricted and restricted punishment conditions. OLS regressions with

    period and group fixed effects, not shown. Unrestricted punishment, in Column 1, is observed in periods 46, where each observation is for one subject

    and one period. Columns 25 include one observation per subject under the rule allowing punishment of low-but-not-high contributors. In Columns 2

    and 3, only the first three periods in which a group adopted the rule for the first time are included, while Columns 4 and 5 include all periods of restricted

    punishment. Numbers in parentheses are White heteroskedasticity-consistent standard errors. Significance at the 1% level.

    Significance at the 5% level. Significance at the 10% level.

    24 We include observations for only the first 3 periods under a rule in columns (2) and (3) to achieve comparability with the regression for periods 46

    (column (1)), in view of the possibility that learning or other factors might change behaviors with more repetitions.25 In both the unrestricted (Column 1) and restricted (Columns 25) punishment regressions, only the observations of individuals who could

    potentially be punished are included. The difference is that under unrestricted punishment, anyone can be punished.26 The regressions were also estimated by the Tobit method, treating 0 punishment observations as potentially left-censored. Resulting coefficients

    are similar and similarly significant except in the case corresponding to Column (1), where they are not significant at conventional levels.27 In fact there was considerable perverse punishment in periods 46 of the 3-Vote design. Of the 129 events of punishment, 28% were punishments

    aimed at higher-than-average contributors for the period and group in question, 19% at the highest contributor for the period and group in question and

    11% at individuals who contributed their full endowment. These percentages are calculated by counting each event (rather than dollar amount) of

    someone punishing someone else. They may be atypically high due to the short duration of the unrestricted punishment portion of our experiment. Yet

    similarly large amounts of perverse punishment are found in some other studies; see for example Anderson and Putterman (2006), Gachter and

    Herrmann (2005), and for a regression result similar to column (1), in which the absolute positive deviation term also has a positive significant coefficient,Ones and Putterman (2007), Table 2.

    A. Ertan et al. / European Economic Review 53 (2009) 495511506

  • 8/3/2019 Ertan - Who to Punish - EER 2009

    13/17

    findings in a manner that makes this clearer. In column (1) of Table 4 the coefficient for Absolute Negative Deviation is

    $0.89, the estimated punishment for a $1 reduction in contribution for a less-than-average contributor, under the rule ofunrestricted punishment, in the first 3 periods of the 3-Vote design, and shown as a negative gain of $0.89 in Column (1)

    ofTable 5. In Column (2) of Table 4 the coefficient for Absolute Negative Deviation is $1.22, the estimated punishment for a

    $1 reduction in contribution for a less than average contributor, under the rule of punish low-but-not-high contributors, in

    the first 3 periods of the 3-Vote design, and shown as a negative gain of $1.22 in Column (2) of Table 5, etc.

    The $+0.60 throughout Table 5 is the $1 gain in the private account from reducing ones contribution by $1, minus the

    $0.40 loss in the individuals earnings from the group account. In Column (1) ofTable 4 the coefficient for Positive Deviation

    is $0.38, the estimated punishment for each $1 of additional contribution for a higher-than-average contributor, under the

    rule of unrestricted punishment, in periods 46 of the 3-Vote design. The $+0.38 in Column (1) of Table 5 is the positive

    gain from contributing $1 less and avoiding $0.38 in perverse punishment, for a higher-than-average contributor. The cases

    labeled n.a. in Table 5 are for the rule of punish low-but-not-high in Columns (2)(5), in which case punishment of higher-

    than-average contributors is not allowed.

    Table 5 shows that for less-than-average contributors the net gain from contributing $1 less is negative for each of the

    cases in Columns (1)(5). The $0.29 in Column (1) suggests that unrestricted punishment can reverse a subjects incentiveto free ride, for a subject contributing less than average, replicating Fehr and Ga chters earlier finding for the case of less-

    than-average contributors. But the negative gains for less-than-average contributors is even more negative in Columns

    (2)(5), suggesting that the incentive against free riding is strengthened for less-than-average contributors under the rule

    of punish low-but-not-high.

    Table 5 suggests that the incentives to contribute $1 less for higher-than-average contributors is not reversed under

    unrestricted punishment or the rule of punish low-but-not-high. In Column (1) under unrestricted punishment, a subject

    with a higher-than-average contribution makes an estimated net gain of $0.98 from contributing $1 less (a gain of $0.38

    from reduced perverse punishment added to the $0.60 gain from shifting away from the group account). In Columns

    (2)(5), under the rule of punish low-but-not-high, a higher-than-average contributor bears no punishment, but still gains

    the $0.60 from a $1 shift from the public account. While neither rule reverses the incentive for a higher-than-average

    contributor to contribute less, the incentive toward free riding is less under the rule of punish low-but-not-high than under

    unrestricted punishment.28

    3.4. Do subjects vote according to their type?

    We conjectured that even though some subjects use opportunities to perversely punish (when punishment is

    unrestricted) and would likely vote to allow perverse punishment in our experiment, punishment of high contributors

    might nonetheless be ruled out since few groups would have a majority of members of this type. Results at group level are

    consistent with this conjecture. Is there also evidence at the level of individuals, however, that subjects tended to vote

    according to type? Logit regressions provide some affirmative evidence.

    We estimated regressions in which the dependent variable is 1 if a subject voted to permit punishment specified by a

    particular rule and 0 otherwise. Explanatory variables included the subjects contributions relative to their group averages

    ARTICLE IN PRESS

    Table 5

    Incentives to contribute $1 less.

    (1) Unrestricted

    punishment

    (2) Punish low-but-not-

    high

    (3) Punish low-but-not-

    high

    (4) Punish low-but-not-

    high

    (5) Punish low-but-not-

    high

    Less-than-average contributors, subject to punishment

    Abs. neg.

    deviation

    $0.89 1.22 1.04 1.05 0.97

    $1 account shift +0.60 +0.60 +0.60 +0.60 +0.60Net gain $0.29 0.62 0.44 0.45 0.37

    Higher-than-average contributors, subject to punishment only in Column (1)

    Positive

    deviation

    +0.38 n.a. n.a. n.a. n.a.

    $1 account shift +0.60 +0.60 +0.60 +0.60 +0.60

    Net gain +0.98 +0.60 +0.60 +0.60 +0.60

    Note: Net gain is the change in earnings from contributing $1 less.

    28 When subjects make their contribution decision, they do not know what the other subjects contributions will be, and are uncertain of what will be

    the average and its boundary line of punishment risk. This uncertainty creates an incentive toward higher contributions to be on the safe side of theunknown boundary.

    A. Ertan et al. / European Economic Review 53 (2009) 495511 507

  • 8/3/2019 Ertan - Who to Punish - EER 2009

    14/17

    during the periods preceding each vote, measures of how much punishment they had given and received, and period (i.e.

    vote) and group dummies. The coefficients on the subjects relative contribution were positive in the regressions on voting

    to allow punishment of low contributors, significant at the 5% level or better for both the 3- and the 5-Vote designs, and

    negative in the regressions on voting to allow punishment of high contributors, significant at the 10% level in the regression

    for the 3-Vote but not in that for the 5-Vote design.

    This evidence suggests that the subjects were more (less) likely to vote to allow punishment of less-(greater-)than-average

    contributors the higher on average was their contribution above their groups average contribution in the 8 (6) previous

    periods. Details are in the Working Paper (Ertan et al., 2005).

    4. Discussion and interpretation

    We discuss our results and interpretation under the following headings: (a) A rough calculation on the plausibility that

    no group would ever allow punishment of high contributors in the 160 group votes of the combined 3- and 5-Vote designs;

    (b) Institutional choice and its evolution with and without information on other groups performance; (c) Distaste for

    punishment and the role of opportunities to reconsider, (d) Variability of experimental results; (e) Implementation;

    (f) What the experiment appears to tell us about models of heterogeneous preference types; (g) Heterogeneous preferences

    in other voting models.

    (a) On the plausibility of unanimously prohibiting perverse punishment in 160 group votes. Even if only a quarter of subjects

    are prone to perversely punishing, it might seem implausibly rare that not a single group vote produced a majority for

    allowing it. As an anonymous referee commented: [t]he fact that no group ever allowed punishment of high contributorswill make readers suspicious, since results of such clarity are quite rare. How improbable is the unanimity result? Simple

    calculations suggest a wide range in the assessment of probability.

    Consider the following composite hypothesis: (i) about 25% of punishment is targeted on higher-than-average

    contributors when punishment is unrestricted (see Section 1.1 and footnote 27); (ii) an individual who has a preference

    toward punishing high contributors is just as likely to punish as an individual who has a preference against such

    punishment (i.e., the proportion of subjects of given preference is the same as the proportion of corresponding punishment

    observations); (iii) perverse punishers are likely to vote their preference type to allow punishing high contributors, and

    similarly normal punishers are likely to vote their preference type to prohibit punishing high contributors (evidence for

    this from the logit analysis in Section 3.4); and (iv) the preference types are stable and randomly distributed.

    With these rough assumptions the binomial probability that a group of four subjects chooses to allow perverse

    punishment by a majority of 3 or 4 votes for the third ballot item is 0.0508 (we are setting aside complications from

    abstentions), the probability of prohibiting perverse punishment is 0.9492, and the expected number of group votes

    prohibiting perverse punishment is 152 out of the combined 160 group votes in the 3- and 5-Vote designs. This calculationroughly suggests that the vast majority of group votes will be to prohibit perverse punishment. But the binomial

    probability of unanimity, the event that 160 out of 160 votes prohibit perverse punishment, is small, 0.0002.

    However, this calculation depends on the assumption of statistical independence in type from period to period even for

    the same individual, and this is an unrealistically strong assumption. Consider another simple but unrealistic assumption in

    the other direction: that preference types and beliefs are so stable that they remain fixed from period to period. Then it is as

    though there were only 40 independent group-level observations in the 10 sessions of the experiment and the same votes

    and other decisions are repeated many times. Then the expected number of votes prohibiting perverse punishment is 38

    out of the combined 40 group votes in the 3- and 5-Vote designs, and the binomial probability of unanimity, that 40 out 40

    votes prohibit perverse punishment, is much larger, 0.12.

    A glance at Fig. 2 shows that this second assumption on statistical dependence is unrealistically strong in the

    other directioni.e., views change over time. Presumably the probability of unanimity is somewhere between 0.0002

    and 0.12, likely pretty far from the two extreme calculations. The calculations serve to remind us of the sensitivity

    of assumptions on statistical independence, when there are aggregations over many periods, and of the other uncertaintiesin (i)(iv).

    (b) On institutional choice, learning and evolution: Our experiment is one of several recent ones in which institutions are

    chosen by subjects through voting. Despite its stylized character, we think it suggests the considerable potential that the

    experimental method has for contributing to our understanding of how institutions emerge and evolve. We note again our

    choice of promoting a more accelerated and informed evolution of institutions by sharing information about outcomes

    among groups in given sessions, despite some cost to statistical independence. We would argue that when real-world

    groups decide on rules and practices, they often have access to information about the experience of similar groups, so the

    information spill-over in the experiment has a real-to-life quality. We want to emphasize, however, that 40 first votes were

    taken by the 160 subjects in our core treatments, and 4 more by the 16 subjects in our pilot experiment, and that each of

    these votes occurred with no information about others choices or outcomes. Apart from the evolution towards more use of

    punish low-but-not-high with additional votes, our findingsunanimous rejection of allowing punishment of high

    contributors in the initial vote, higher contributions and earnings with than without punishment of low contributors, lower

    earnings with unrestricted punishment, lower frequency of successful free riding under the punish low rule

    are allsupported by tests using only decisions taken prior to information dissemination, as well as by tests using the full data set.

    ARTICLE IN PRESS

    A. Ertan et al. / European Economic Review 53 (2009) 495511508

  • 8/3/2019 Ertan - Who to Punish - EER 2009

    15/17

    (c) Distaste for punishment and opportunities to reconsider. In one treatment of a set of 4-person VCM interactions similar

    to those in this paper, Bochet et al. (2006) let subjects communicate in a chat room before the 1st, 4th, and 7th periods of

    ten rounds of play. A noticeable finding was that out of 12 groups in the chat room treatment with punishment

    opportunities, not a single group discussed an explicit strategy of punishing low contributors, and in some groups,

    members messages expressed the view that the punishment option was a trap set by the experimenters to reduce

    earnings. A distaste for punishment may help to account for the rejection of all forms of punishment in many of the initial

    votes in our experiment, for the rejection of punishment by most groups in Sutter et al. (2005) and Botelho et al. (2005),

    and for the initial preference shown for being in the group without punishment by most subjects in Gurerk et al. (2005,2006).

    While eschewing the punishment idea in their deliberations, however, many of Bochet et al.s subjects engaged in costly

    punishment when group members defected from their verbal agreements to contribute. And subjects in the present

    experiment seem to warm to the idea of allowing punishment of low contributors as they experience the sense of

    resentment of or anger at free riders and as they learn that groups permitting punishment tend to have higher earnings.

    The institutional choices made in our paper and in those of Sutter et al. (2005), Botelho et al. (2005), and Gurerk et al.

    (2005, 2006) might seem at first glance to be at odds, since our subjects and Gu rerk et al.s subjects seem to show a greater

    overall preference for punishment than do those of Sutter et al. and Botelho et al. However, all share a common reluctance

    to adopt punishment rules at the outset, and much of the difference in overall outcomes may be attributed to the fact that

    our subjects and Gurerk et al.s subjects have many opportunities to change rules or groups, while Sutter et al. and Botelho

    et al. subjects have only one opportunity to vote on rules. Also, our subjects might have voted more like Botelho et al.s had

    they been required to choose between no punishment and unrestricted punishment only, since the results of periods 46 of

    our 3-Vote design are consistent with Botelho et al.s point that subjects may be worse, not better, off with (unrestricted)punishment.

    (d) On variability: At the same time, even a brief review of the literature of punishment in social dilemmas shows a large

    variability in experimental results. Experimentalists are well-aware that small changes in experimental design and

    wording of instructions can affect experimental results, not just for experiments on punishment but quite generally. Still,

    the literature on punishment in social dilemmas seems to yield an especially large variability in results. Our suspicion is

    that this variability is partly due to punishment behavior itself being scattershot and variable. Thus there may not be a

    simple general answer to the question of whether punishment in social dilemmas is a good or bad thing. The effects of

    punishment may vary so much with the specific conditions that there is no general answer.29

    (e) On implementation: In the experiment, once a rule of punishment is chosen by vote, it is easily implemented by

    the experiments computer software. In the real world, there is no such easy implementation. Nonetheless, in the

    practical world most organizations are hierarchical or a blend of hierarchy and symmetric volunteer elements, and

    organizations often find ways of managing, albeit imperfectly, who gets punished. For example, in hierarchical

    organizations if managers were more aware of the possibly high frequencies of perverse punishers and high costs inefficiency, they might focus more on mitigation. Once aware, managers could work to limit decentralized punishment

    and attempt to instill norms of cooperation in much the same manner that managers attempt to control bullying behavior

    and harassment.

    (f) On heterogeneous preferences: There is a continuing discussion about keeping the standard model which limits the

    type of preferences to self-regarding (individual profit maximizing) preferences. In favor of this approach is that it is

    parsimonious and often leads to specific predictions, which in turn are often consistent with experimental results.

    However, in this experiment, we dont see how we can interpret the results without positing some form of other-regarding

    preference types (e.g. conditional cooperators, perverse punishers). Other experiments on social dilemmas also suggest the

    need for modeling heterogeneous preference types, including both self-regarding and other-regarding or reciprocating

    types. Our experiment adds to the interpretation of heterogeneity, in a particularly striking way.

    An appeal of modeling only homogeneous self-regarding preferences is that introducing heterogeneous preferences is

    too mushy, allowing almost any prediction and rationalizing almost any observed result. But our experiment has a strong

    and consistent pattern to it, suggesting that the existence of heterogeneous preferences need not always lead toindeterminate results.

    (g) On heterogeneous preferences in other voting models: Our analysis suggests that the presence of multiple preference

    types may be important to predicting voting outcomes, and this may be true for other instances of public choice as well.

    Pork barrel politics provides an example. Ordeshooks (1986, pp. 210215) model of pork barrel politics is one of a social

    dilemma where what is good for an individual legislator is bad for society as a whole. For example, Senator Stevens benefits

    ARTICLE IN PRESS

    29 The fact that Gurerk et al.s subjects earn more with than without unrestricted punishment while the comparison goes the opposite way for our

    subjects and Botelho et al.s illustrates this variability. In personal communication, Simon Ga chter reported that he and his collaborators found large

    differences in the frequency of perverse punishment and, correspondingly, in the benefit or lack of benefit of introducing a punishment option across

    subject pools in different countries and settings (a finding documented shortly before our paper went to press in the remarkable study by Herrmann et al.,2008).

    A. Ertan et al. / European Economic Review 53 (2009) 495511 509

  • 8/3/2019 Ertan - Who to Punish - EER 2009

    16/17

    by bringing pork to his district (the bridge to nowhere), while other Senators lose because their districts end up paying

    for the bridge, even when the net benefits of the bridge are negative. Why then dont the other Senators outvote Stevens?

    Ordeshooks answer is that in a pork bill, there can easily be an equilibrium where there are just enough ear-marked pork

    projects to form a winning coalition, even when each of the projects has negative net benefits.

    Ordeshooks analysis depends heavily on the assumption that each legislator is narrowly self-interested (the self-

    interest may be in the form of an increased probability of re-election). In fact, the assumption of a single preference type of

    self-interest is still common in voting models in the political science literature.

    Our experiment and others on VCMs, the dictator game, and the centipede game (McKelvey and Palfrey, 1992) suggestthat the assumption of homogeneous preference types can be misleading. If one allows for the possibility of heterogeneous

    preference types in Ordeshooks model, the equilibrium can shift and the predicted outcomes are not always as dire as

    Ordeshooks original model suggests. For example, some senators may care about doing the right thing, or some voters

    may choose not to reward a senator who joins a pork coalition, so the situation may be more fluid than it appears in

    Ordeshooks model.

    But if the situation is this fluid, can anything happen? To deal with this possibility we focused on observed

    behavior under the specific experimental conditions, and then interpreted the specific results in terms of heterogeneous

    preferences. We believe that this approach can work in experimental studies of other voting models, such as Ordeshooks,

    even when there are signs of heterogeneity and odd behavior, as there were in our study of voting and perverse

    punishment.

    As another example, Meltzer and Richards (1981) model of the level of redistributive taxation uses a median voter

    solution assuming strictly self-regarding preferences. More accurate explanations of the level of redistribution and its

    variation over time and place would consider the strength of preferences for greater equality, on the parts of some citizens,and resentment of the undeserving poor, on the parts of others (see, for instance, Benabou and Tirole, 2005). Such an

    addition of two almost opposite social preference types alongside self-interested types resembles the situation studied in

    this paper, where self-interested subjects co-exist with both cooperation-preferring and cooperation-resisting types, with

    the associated demographic leading to predictable voting outcomes.30

    Appendix

    Fig. A1 is the screen design for an individual to enter her contribution to the group account (box a), to learning of others

    contributions (boxes b, c, and d), to enter her punishment decisions (boxes b0, c0, and d0), and to observe the computers

    calculation of net earnings for a period.

    ARTICLE IN PRESS

    Put in groupaccount

    Total ingroup account

    Reduce others

    earnings

    You B C D

    a b c d

    a b c d

    e = a+b+c+d

    h = a +b+c +dSum of yourreductions of

    others earnings

    Earnings fromgroup account

    f = 0.4e

    i = 0.25h

    Cost of yourreductions of

    others earnings

    j

    k = g + f i j

    Total of othersreductions ofyour earnings

    Net earningsthis round

    Earnings fromprivate account

    g = 10.0 a

    Fig. A1. Screen design for entering contribution and punishment decisions, receiving information, and calculating net earnings.

    30 See Camerer and Fehr (2004) for other applications of other-regarding preferences to the study of public choice.

    A. Ertan et al. / European Economic Revie

top related