sex and performance under competition is there a ... · sex and performance under competition: is...

42
Sex and performance under competition: Is there a stereotype threat shadow? 1 Diogo Geraldes 23 Arno Riedl 2 Martin Strobel 2 September 2011 Preliminary version. Please do not quote or circulate without permission of the authors. Abstract In this paper we experimentally study performance of men and women under competition, with implicitly and explicitly induced stereotype threats to both sexes. We use a mathematical task that is perceived as male-dominant and creates an implicit stereotype threat against woman. We also study conditions in which we explicitly reinforce or contradict the implicit stereotype threat by providing appropriate information. We find that despite stereotype threats against women, both men and women react positively and equally strong to competitive incentives. When the stereotype threat is explicitly contradicted, competitive incentives do not have an effect on the performance of both men and women. Our findings contrast previous results suggesting that men are more responsive to competition than women. We observe that men and women react similarly to competition in terms of performance across three different stereotype threat conditions. Interestingly, we also find that explicit stereotype- based expectations that contradict the stereotype men and women hold harm the competitive performance of both sexes. Keywords: competition, incentives, sexes, stereotype threat JEL classification: C91; J16 1 We would like to thank for invaluable comments all the participants of the Experimental design seminar series at Maastricht University, Nordic Conference on BEE 2010 in Helsinki, IMEBE 2011 in Barcelona, International ESA Conference 2011, TIBER 2011 at Tilburg University and EEA-ESEM 2011 in Oslo. We are grateful that Thomas Dohmen provided us with his z-tree program for the work task. 2 Maastricht University, Department of Economics – Section AE1 3 Corresponding author. Address: Maastricht University, Department of Economics – Section AE1, P.O. Box 616, Maastricht 6200 MD, The Netherlands. E-mail: [email protected]

Upload: others

Post on 26-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Sex and performance under competition: Is

there a stereotype threat shadow?1

Diogo Geraldes23 Arno Riedl2 Martin Strobel2

September 2011

Preliminary version. Please do not quote or circulate without permission of the authors.

Abstract

In this paper we experimentally study performance of men and women under

competition, with implicitly and explicitly induced stereotype threats to both

sexes. We use a mathematical task that is perceived as male-dominant and

creates an implicit stereotype threat against woman. We also study conditions

in which we explicitly reinforce or contradict the implicit stereotype threat by

providing appropriate information. We find that despite stereotype threats

against women, both men and women react positively and equally strong to

competitive incentives. When the stereotype threat is explicitly contradicted,

competitive incentives do not have an effect on the performance of both men

and women. Our findings contrast previous results suggesting that men are

more responsive to competition than women. We observe that men and women

react similarly to competition in terms of performance across three different

stereotype threat conditions. Interestingly, we also find that explicit stereotype-

based expectations that contradict the stereotype men and women hold harm

the competitive performance of both sexes.

Keywords: competition, incentives, sexes, stereotype threat

JEL classification: C91; J16

1 We would like to thank for invaluable comments all the participants of the Experimental design seminar

series at Maastricht University, Nordic Conference on BEE 2010 in Helsinki, IMEBE 2011 in Barcelona,

International ESA Conference 2011, TIBER 2011 at Tilburg University and EEA-ESEM 2011 in Oslo.

We are grateful that Thomas Dohmen provided us with his z-tree program for the work task.

2 Maastricht University, Department of Economics – Section AE1

3 Corresponding author. Address: Maastricht University, Department of Economics – Section AE1, P.O.

Box 616, Maastricht 6200 MD, The Netherlands. E-mail: [email protected]

1

1. Introduction

A politically, socially and economically important stylized fact about sex1 differences is the

gap in wages and positions at the workplace. In 2006 women earned, on average, 25% less than

men in the 27 European Union countries. In academics, qualified research positions such as

PhD candidates, post-doctoral or assistant professors, associate professors and full professors

are dominated by men. For instance, only 19% of all full professors in the 27 European Union

countries are women.2 Further evidence for the disadvantaged position of women at the

workplace is presented, for example, by Bertrand and Hallock (2001).3

One common explanation for these differences between the sexes at the workplace is

discrimination (e.g., Black and Strahan, 2001). Another one is women’s higher sensitivity to

work-family conflicts and women’s weaker negotiation behavior (e.g., Babcock and Laschever,

2003). An alternative explanation has been suggested by recent studies in experimental

economics: men are more inclined to compete than women. These studies show that even under

tightly controlled and relatively abstract situations where men and women compete with each

other, a difference in attitude towards competition between the sexes exists. This literature

addresses two main questions:

I. Differences in preference for competitive environments, i.e., do men and women self-

select a competitive environment differently?

II. Differences in performance under competition, i.e., do men and women react differently

to competitive pressure in terms of performance?

Concerning self-selection of competitive environments a standard finding is that women

shy away from competition when given the option to compete whereas men do not (e.g., Datta

Gupta et al., 2005; Niederle and Vesterlund, 2007; Booth and Nolen, 2009; Gneezy et al.,

2009; Dohmen and Falk, 2010; Cason et al., 2010). Regarding behavior under competitive

pressure the sparse experimental economics literature shows that men generally increase their

work performance under competition whereas this is not (or much less) the case for women.

1 We use the term sex instead of gender because it is more scientifically correct even if it is less politically so. Sex

is a biological attribute, defined by chromosomes and anatomic characteristics. It is a binary, either/or trait.

Gender, by contrast, is a social construct, the sum of all the attributes typically associated with one sex. It is not

fixed and binary but a wide range between masculinity and femininity (see Eliot, 2009).

2 Source: European Commission (2009), “She Figures 2009: Statistics and Indicators on Gender Equality in

Science”.

3 For a general overview of sex differences on labour markets see Blau, Ferber and Winkler (2010).

2

Gneezy et al. (2003) and Gneezy and Rustichini (2004) find that men and women who show a

similar performance of a task under a non-competitive incentive scheme differ in performance

for the same task when they have to compete with each other in mixed sex groups.

The objective of this paper is to investigate experimentally the extent to which stereotypes

are related to men’s and women’s behavior under competitive pressure when they have to

compete with each other. The only study we are aware of that experimentally establishes a

possible connection between stereotypes and economic competition between sexes is from

Günther et al. (2010). The authors argue that the experimental economics literature on how the

sex influences the attitude towards competition is possibly flawed because these studies

contain tasks for which stereotype assumptions about male superiority are relevant for the

performance. They hypothesize that in mixed sex groups women compete less in tasks that are

perceived as typically male because women are stereotypically expected to perform worse than

men. Such an effect should not be observed in sex-neutral tasks and perceived female tasks.

Besides replicating the main finding in the literature that women react less to competitive

incentives for a male task, Günther et al. (2010) indeed find that women react as strongly as

men and more strongly than men in response to competitive pressure for a sex neutral task and

a female task, respectively.

This under-explored explanation for men’s and women’s attitudes towards competition is

motivated by the psychology literature on stereotype threat. Stereotype threat is essentially a

situational phenomenon in which a member of a group feels pressured by the possibility of

confirming a negative stereotype about his/her group (e.g., Steele and Aronson, 1995). This

literature reports that stereotype threat undermines task performance of various groups across

multiple domains. This effect is shown, for example, amongst African-Americans (e.g., Steele

and Aronson, 1995) and Latinos (e.g., Aronson et al., 1998) when compared to Caucasians on

tests labelled as indicators of intellectual ability, amongst women when compared to men

during tests evaluating mathematical ability (e.g., Spencer et al., 1999), and amongst Caucasian

men in math-tests when informed about Asian-Americans’ superior ability in mathematics

(Aronson et al., 1999). Fundamentally, this literature predicts that the performance gap

between members of a group prone to stereotype threat and members of a group not prone to

stereotype threat should be different depending on whether a threat exists or not. One should

highlight, though, that the focus of the stereotype threat literature is on performance evaluation

within non-competitive environments.4

4 For a review of the stereotype thereat literature see Kit, Tuokko and Mateer (2008).

3

A stereotype threat experience could be triggered in the presence of either an implicit or

explicit stereotype. Implicit activation of stereotype threat refers to cases in which simply being

placed in a situation within a domain where the negative stereotype is well known, although

not explicitly highlighted, is sufficient to trigger the threat. Explicit activation of stereotype

threat refers to cases in which the threat is activated by confronting an individual directly with

the negative stereotype (e.g., Smith and White, 2002). The main argument in Günther et al.

(2010) is that the way a task is described suffices to activate a stereotype threat. Therefore, in

these authors’ work the implicit activation of a stereotype threat is the key element that seems

to explain differences in performance under competition between men and women.

In real life examples of competition between the sexes such as, for example, women

competing for academic positions in math-intensive areas or aiming top paid corporate

positions, women are explicitly made aware that these positions are dominated by men and that

people expect men to be more successful than women in those domains. Although evaluating

the effect of implicit stereotype-based expectations gives an important first insight, we consider

a more pertinent approach to also study competition between men and women in contexts

where we explicitly induce stereotype-based expectations.

In this paper we use a controlled laboratory experiment to examine men’s and women’s

performance of a mathematical task when they have to compete with each other not only in the

presence of an implicit stereotype threat, but also in the presence of an explicit stereotype

threat. Moreover, in the explicit case we evaluate not only the effect of a negative stereotype

about women but also the effect of a negative stereotype about men. These explicit stereotype-

based expectations are induced by providing appropriate information. We hypothesize that

men’s and women’s competitive performance is harmed only if the stereotype-based

expectations they face contradict the stereotype men and women hold. Distracting thoughts

have been shown to interfere with working memory and attention (e.g., Brewin and Smart,

2005), which are essential to the performance of a mathematical task. Hence, if men’s and

women’s prior belief about the invoke stereotype is contradicted, one should expect distracting

thoughts to emerge and interfere with working memory and attention. Accordingly, we

hypothesize that men’s and women’s competitive performance is not harmed within an implicit

stereotype context because the stereotype-based expectations men and women could perceive

in this case are necessarily triggered by their prior belief. In the explicit context, where we

provide information to explicitly induce stereotype-based expectations, we hypothesize that

men’s and women’s performance under competition is harmed only if the stereotype-based

expectations embedded in the information we provide contradict the stereotype they hold.

4

To test our hypotheses, we examine competition between the sexes using three conditions.

We induce an implicit stereotype against women, an explicit stereotype against women and an

explicit stereotype against men in the first, second and third conditions, respectively. The

results validate our hypothesis: in the implicit case, women react positively and as strongly as

men to the competitive incentives. In the explicit stereotype against women case, in which we

explicitly induce stereotype-based expectations that support the stereotype that men and

women hold, both men and women react positively and equally strong to the competitive

incentives. Finally, in the explicit stereotype against men case, in which we explicitly induce

stereotype-based expectations that contradict the stereotype that men and women hold, both

men and women do not react significantly to the competitive incentives.

Our study shows a clear connection between explicit stereotypes and men’s and women’s

performance under competition but not in accordance with an explanation based on stereotype

threat. The main insight of this paper is that men’s and women’s response to competitive

pressure in terms of performance is similar across three different competitive contexts and it is

negatively affected only in the case where the explicit stereotype-based expectations contradict

the stereotype men and women hold.

The remainder of the paper is organized as follows. Section 2 describes the design of the

experiment and section 3 presents the results. Conclusions are discussed in section 4.

2. Experimental Design

A. Methods

In the first stage, subjects perform the work task under a non-competitive incentive scheme.

In the second stage, the same subjects perform the work task under a competitive incentive

scheme. Our main measure is the difference in performance of men and women between the

two stages, given they are competing against the opposite sex in the second stage. To our best

knowledge the only two studies that experimentally endogenize in the laboratory men’s and

women’s performance under an exogenously given competitive incentive scheme use a

between-subjects design in which the performance in a non-competitive environment serves as

the baseline (Gneezy et al., 2003; Günther et al., 2010). We consider, however, that a between-

subjects design could be problematic to analyse performance if the noise in unobserved

subject’s characteristics is large (chiefly, a subject’s ability to perform the task). Hence, we use

5

a within-subjects design instead which allows us to interpret the data without concern for

ability differences across subjects.5

B. The work task

The task we use requires mathematical ability. It consists of multiplying one- and two-digit

numbers and was already successfully used in Dohmen and Falk (2010). In the experiment all

problems are presented to subjects on computer screens. Subjects have to type their answer into

a box and confirm it by clicking an “OK”-button with their mouse. Having confirmed their

answer, subjects are informed whether or not the answer is correct. If it is correct, a new

problem appears instantaneously on the screen. If the answer is wrong, subjects have to tackle

the same problem again until the correct solution is entered. Subjects are forced to solve a

problem before a new question appears in order to prevent subjects from guessing and

searching for “easy” problems. The difficulty level of multiplying one- and two-digit numbers

varies quite a bit which implies that different problems require different usages of working

memory. As Dohmen and Falk (2010) we implement five different levels of difficulty.6 All

subjects go through the exact same sequence of problems and they are provided with as many

questions as they can solve within the allocated time. Subjects are informed that no aid is

allowed for answering the problems (calculator, paper and pencil, etc.), which is controlled

during the experiment.

An important reason for having chosen this task is that the implicit stereotype present in

the task description is unambiguous: the stereotype that “men are better at maths” (see, e.g.,

Spence et al., 1999). Moreover, for math tasks there is reliable information that we can use in

order to make our desired explicit stereotype manipulations without the need of deceiving

subjects.

C. Detailed design and conditions of the experiment

The experiment consists of a practice round, two performance stages, a confidence level

elicitation, a risk attitude elicitation and a competitive attitude elicitation. Figure 1 shows the

sequential order of each step:

5 This method was already used by Gneezy and Rustichini (2004) in a field experiment to study male and female

children’s reaction to competitive pressure in terms of performance.

6 Examples are: Level 1: 11 x 9; Level 2: 3 x 32; Level 3: 6 x 43; Level 4: 4 x 68; Level 5: 7 x 89.

6

Figure 1: Chart of the experiment

At the beginning of the experiment subjects are informed that the experiment consists of

three performance rounds and that they will receive specific instructions for each round before

the start of each round. They also are informed that they could earn money during the

experiment and that their earnings in one round are independent of their own and others

behavior in other rounds. There is no performance feedback during or at the end of each

performance round neither in absolute terms nor relative to others.

Practice Round: After the general instructions, the experiment starts with an unpaid practice

round in which subjects are asked to calculate as many multiplication problems as possible

within 2 minutes. This step serves to familiarize subjects with the work task. Subjects are

informed that it is in their best interest to gain practice since later in the experiment they can

earn money while performing the same task.

First Stage Performance: This round elicits subjects’ baseline performance under non-

competitive monetary incentives. It is this performance level to which we compare the

performance in the competitive second stage. Before subjects start with the task they are

informed that they have been randomly paired with another participant in the room, without

making any reference to the sexes at this stage. Subjects are asked to perform the work task for

5 minutes under a random pay incentive scheme. That is, in each pair one subject is chosen at

random to be paid out. The instructions they read explain how this incentive scheme works: “In

this round you have been randomly paired with another participant. At the end of the

experiment one of you two will be chosen randomly with equal probability. The chosen one

will earn € 0.40 for each multiplication solved correctly. The other earns nothing.”

Confidence Level Elicitation: To measure subjects’ confidence level, we ask them to estimate

their relative performance in the non-competitive first stage. Immediately after finishing this

stage, each subject is informed that he/she and 4 other participants present in the lab have been

randomly chosen with equal probability and is asked to indicate their best estimates, in

percentage, that exactly 0, exactly 1, exactly 2, exactly 3 or exactly 4 of these other participants

solved more problems correctly than they did themselves in the previous 5 minutes stage. This

Step 1 Step 2 Step 3 Step 4 Step 5 Step 6

Practice

Round

First Stage

Performance

Confidence

Level

Elicitation

Second Stage

Performance

Risk

Attitude

Elicitation

Competitive

Attitude

Elicitation

7

belief elicitation is incentivized using the quadratic scoring rule (Offerman, 1997) and is made

before subjects learn about the competitive second stage.7

Second Stage Performance: In this stage subjects are randomly assigned into one of three

different competitive conditions or a control condition. In each competitive condition mixed

sex pairs are randomly formed to compete against each other.8 Subjects are informed that they

are paired with an opposite sex partner. However, subjects do not know the identity of their

opponent during and after the experiment. Subjects are asked to perform the multiplication task

for 5 minutes under a winner-takes-all tournament incentive scheme. The instructions they

read explain the incentive scheme: “In this round you have been randomly paired with a

participant of the opposite sex. That is, if you are a male participant, you are paired with a

female participant. If you are a female participant, you are paired with a male participant. In

this round you have to compete with this opposite sex participant in order to earn money. The

competition works as follows: the participant in your pair who solves correctly the highest

number of multiplications will earn € 0.40 for each multiplication solved correctly. The other

earns nothing. In case you and the other participant correctly solve the same number of

multiplications then each of you receive half of the achieved earnings. Both of you will see

exactly the same sequence of multiplication problems.”

The payment procedure we use minimizes the risk difference between the non-competitive

and the competitive environments. The competitive incentive scheme that subjects face in the

second stage differs from a standard piece-rate incentive-scheme in two ways: being paid

depends on the performance of others and it becomes uncertain because only one, the better

performer, is paid. Therefore, different attitudes toward risk between men and women can

influence performance and obscure the pure competition effect. Hence, in order to minimize

effects coming from differences in perceived risk in the two stages, we use an incentive scheme

in the non-competitive stage - the random pay - that makes payment uncertain, although not

dependent on the performance of others.9

7 Our goal is to elicit subject’s beliefs about their relative rank compared to the other subjects present in the lab.

We ask subjects to rank themselves compared to 4 other randomly chosen participants present in the lab instead of

the total number of subjects present in the lab (24 subjects, on average) because the latter way would had been too

demanding and, probably, confusing for the subjects.

8 We form pairs instead of larger groups because we consider pairs the simplest way to unambiguously control for

subjects’ belief of their opponent sex.

9 Still, of course, subjects’ risk perception could be different in the non-competitive and the competitive stages.

This, however, will be due to the way subjects perceive the competition.

8

The only difference between the three competitive conditions is the way a stereotype is

induced.

Condition 1: Implicit stereotype against women. In this condition we do not give any additional

information but rely on the well documented fact that there is an unambiguous stereotype that

“men are better at maths than women” (e.g., Spencer et al., 1999)

Condition 2: Explicit Stereotype against women. In this condition we reinforce the implicit

stereotype against women by explicitly inducing a negative stereotype about women’s ability

to perform the work task. Just before starting to compete, both male and female participants

unexpectedly face the following information for 40s:

“Before starting the multiplication task in this round, please read carefully the following

information:

In order to assess the magnitude of sex differences in mathematics performance, three leading

researchers, J. Hyde, E. Fennema and J. Lamon, performed an evaluation of 100 studies in this

field. In their paper we can read the following: "refined discussions conclude that the overall

differences in mathematics performance appear in adolescence and favour boys in tasks

involving problem solving."

In "Sex Differences in Mathematics Performance: A Meta-Analysis", Psychological Bulletin.

If you wish, you can inspect this paper after the experiment.”

Condition 3: Explicit Stereotype against men. In this condition we explicitly induce a negative

stereotype about men’s ability to perform the work task. Just before starting to compete, both

male and female participants unexpectedly face the following information for 40s:

“Before starting the multiplication task in this round, please read carefully the following

information:

In order to assess the magnitude of sex differences in mathematics performance, three leading

researchers, J. Hyde, E. Fennema and J. Lamon, performed an evaluation of 100 studies in this

field. In their paper we can read the following: "refined discussions conclude that the overall

differences in mathematics performance appear in adolescence and favour girls in tasks

involving the use of only algorithmic procedures to find a single numerical answer."

In "Sex Differences in Mathematics Performance: A Meta-Analysis", Psychological Bulletin.

If you wish, you can inspect this paper after the experiment.”

9

As can be seen from the quotes above the text pieces in Condition 2 and Condition 3 are not

perfectly symmetric. Perfectly symmetric information could have been achieved only by

deceiving subjects, what we wanted to avoid. Therefore we use truthful information that

subjects could inspect at the end of the experiment and construct the text such that it minimizes

asymmetry without the need of deceiving subjects.

We did not expect any significant learning or fatigue effects for the chosen work task

during the experiment (Dohmen and Falk, 2010). In order to test this we conduct a control

condition.

Condition 4: Twice random pay. In this condition subjects perform again the multiplication

task for 5 minutes under the same random pay incentive scheme as in the first stage. They are

again randomly paired and not informed about the sex of their partner.

Risk Attitude Elicitation: We elicit subject’s risk attitude using two measures. We elicit

subjects’ response in a 0-10 scale to the Dohmen et al. (2009) general risk question in which

the value 0 means ‘not at all willing to take risks’ and the value 10 means ‘very willing to take

risks’.10 We also elicit subjects’ lottery choices based on the method developed by Holt and

Laury (2002).

Competitive Attitude Elicitation: As an indicator of subjects’ competitive attitude we use the

Machiavelli personality test, also known as Mach IV test (see appendix A), in which high

scores reliably predict competitive behavior (Christie and Geis, 1970). The Mach-IV test is a

twenty-statement personality survey with a score range of 20-140.

D. Experimental procedure

The experiment was computerized using z-tree software (Fischbacher, 2007) and

conducted in the Behavioral and Experimental Economics Laboratory (BEElab) at Maastricht

University’s School of Business and Economics. All instructions were presented on-screen and

all interactions were treated confidentially. Eight sessions were run, two sessions with each of

the four conditions. In total 188 subjects participated. The experiment involved 20, 22, 24, and

28 mixed sex pairs who participated in the implicit stereotype against women condition,

explicit stereotype against women condition, explicit stereotype against men condition and

10 In a paid field experiment Dohmen et al. (2009) show that responses to the question “How do you see yourself:

are you generally a person who is fully prepared to take risks or do you try to avoid taking risks? Please tick a

box on the scale, where the value 0 means: ‘not at all willing to take risks’ and the value 10 means: ‘very willing

to take risks’” reliably predict lottery choices.

10

twice random pay condition, respectively.11 The participants were predominantly (86%)

students of Business and Economics at Maastricht University. A session lasted on average 70

minutes. Average earnings were € 16.30.

3. Results

In this section we present the experimental results. In section A we examine how

stereotypes are related to men’s and women’s competitive performance by comparing non-

competitive and competitive performances. In section B we investigate alternative variables

that could explain our results. In section C we use the data to test the stereotype threat

hypothesis. Finally, in section D we evaluate alternative explanations for the results by

examining subjects’ effort provision and accuracy during task performance.

A. Non-competitive performance versus competitive performance

The pooled data from all four conditions show that in the non-competitive first stage men

perform significantly better than women (men’s average number of problems solved correctly:

28.7, standard deviation 10.51; women’s average number of problems solved correctly: 23.1,

std. dev. 10.20; p < 0.001; 2-sided t-test, n = 188).12 That men perform better in the first stage

can also be seen from Figure 2, which shows that the distribution of men’s performance

statistically dominates the corresponding distribution for women. This figure also shows that

there is a large inter-individual heterogeneity in performance. To account for this heterogeneity

in baseline performance we analyse the within-subject change of performance from the non-

competitive to the competitive stage. For each condition, we perform two types of analysis.

First, we compare men’s and women’s performance from the non-competitive stage to their

performance in the competitive stage. Second, to evaluate differences in this response to

competition between men and women, we perform a difference in differences analysis, i.e., we

compare men’s change in performance to women’s change in performance between stages.

This tests if and how men and women, respectively, respond to competitive incentives given

11 We invited 15 men and 15 women to each session. When an unequal number of participants in terms of sex

arrived to the lab, we randomly asked the excess participants to leave the laboratory and paid them a show-up fee.

We guarantee that the right sex subject(s) was/were selected to leave by randomly distributing cards with the

numbers 1-15 to girls and 16-30 to boys on their arrival to the waiting room. Hence, for example, when 13 girls

and 14 boys arrive to the lab we would ask the subject with card number 29 to leave.

12 Comparing men’s and women’s performances in the first stage separately for each of the four conditions gives:

condition 1 (men’s average number of problems solved correctly: 23.9; women’s average number of problems

solved correctly: 21.8; p = 0.556, 2-sided t-test, n = 40); condition 2 (men: 30.8; women: 23.7; p = 0.019, 2-sided

t-test: n = 44); condition 3 (men: 27.2; women: 20.6; p = 0.013, 2-sided t-test, n = 48); condition 4 (men: 30.4;

women: 25.7; p = 0.107, 2-sided t-test, n = 56).

11

they are competing against the opposite sex. Throughout this section, we report t-test statistics

to investigate differences in means and the Wilcoxon signed rank (WSR) test or Mann-

Whitney (MW) test to investigate differences in distributions. All tests are two-sided, unless

otherwise stated. Figure 3 displays graphically the results for all conditions:

Figure 3: Summary of results for all conditions

Competition_FEMALE

Competition_MALE

Competition_FEMALE

No Competition_MALE

Competition_MALE

Competition_MALE

No Competition Stage 1_FEMALE

No Competition Stage 2_FEMALE

No Competition Stage 1_MALE

No Competition Stage 2_MALE

No Competition_FEMALE

No Competition_MALE No Competition_FEMALE

Competition_FEMALE

No Competition_FEMALE

No Competition_MALE

192021222324252627282930313233343536

Average Number of Problems Solved Correctly

Implicit Stereotype Against Women Explicit Stereotype Against Women Explicit Stereotype Against Men Twice Random Pay

CONDITION 1 (N=40) CONDITION 2 (N=44) CONDITION 3 (N=48) CONDITION 4 (N=56)

means a Statistically Significant "Change" means a Not Statistically Significant "Change"NOTE:

To see whether learning or fatigue effects have to be taken into account we first report the

results of the control condition twice random pay (condition 4). In this condition the average

number of problems solved correctly by men is 30.4 (std. dev. 10.35) in the first stage and 31.7

(std. dev. 10.13) in the second stage. The difference is small and insignificant (p = 0.342, t-test;

p = 0.386, WSR test). Women solve on average 25.7 (std. dev. 11.58) problems correctly in the

first stage and 24.8 (std. dev. 15.33) in the second stage. This small difference is also not

significant (p = 0.500, t-test; p = 0.515, WSR test). The substantial overlap between the

cumulative distributions for the first and second stage performance shown in Figure 4

corroborates the finding that for both sexes there is neither a fatigue nor an experience effect.

Hence, we are confident that men’s and women’s performance is not affected by repetition in

the other conditions.

In the implicit stereotype against women condition the average number of problems solved

correctly by men is 23.9 (std. dev. 11.54) in the non-competitive first stage and 28 (std. dev.

13.23) in the competitive second stage. This difference is statistically significant (p = 0.003, t-

test; p = 0.006, WSR test). A similar result holds for women. They solve on average 21.8 (std.

12

dev. 10.79) problems correctly in the non-competitive stage and 26.2 (std. dev. 11.25)

problems in the competitive stage. This difference is also statistically significant (p = 0.008, t-

test; p = 0.018, WSR; test). In Figure 5, Panel (a) shows that the increase in men’s and

women’s performance under competition seems to be observed regardless of their baseline

non-competitive performance. Moreover, comparing the change in men’s performance to the

change in women’s performance between the two stages reveals that men and women respond

in a similar way to the introduction of competitive incentives (p = 0.879, t-test; p= 0.968, MW

test). We summarize our first result.

Result 1: When there is an implicit stereotype against women, men and women significantly

increase their performance under competition. In addition, both sexes respond similarly to the

competitive incentives.

In the explicit stereotype against women condition men solve on average 30.8 (std. dev.

10.10) problems correctly when there is no competition and 35.4 (std. dev. 12.97) if there is

competition. This difference is statistically significant (p = 0.002, t-test; p = 0.004, WSR test).

Women’s average performance in this condition is 23.7 (std. dev. 10.79) and 26.7 (std. dev.

11.12) problems solved correctly in the non-competitive and the competitive stage,

respectively. This difference is also statistically significant (p = 0.034, t-test; p = 0.044, WSR

test). Figure 5, Panel (b), shows that the increase in men’s and women’s performance under

competition seems again to be observed regardless of their baseline non-competitive

performance. Interestingly, as the graphical analysis also indicates, in this condition men’s

increase in performance is mainly driven by “middle” and “top” performers while women’s

increase in performance is mainly driven by “bottom” and “top” performers. As in the implicit

condition, comparing the change in men’s performance to the change in women’s performance

between the two stages, we find that they are statistically indistinguishable (p = 0.446, t-test;

p= 0.509, MW test). We therefore can state our next result.

Result 2: When there is an explicit stereotype against women, men and women significantly

increase their performance under competition. Moreover, the response to competitive

incentives of men and women is similar.

In the explicit stereotype condition against men the picture looks very different. In this

condition, the average number of problems solved correctly by men is 27.2 (std. dev. 9.45) in

the non-competitive stage and only 25.6 (std. dev. 10.33) in the competitive stage. Hence, men

performed worse under competition than under no competition, although the difference is

13

statistically not significant (p = 0.239, t-test; p = 0.269, WSR test). Women perform slightly

better under competition than under no competition, with 20.6 (std. dev. 7.70) and 22.8 (std.

dev. 10.55) correctly answered questions, respectively. This difference is also statistically not

significant (p = 0.148, t-test; p = 0.129, WSR test). Moreover, comparing the change in men’s

and women’s performance between the two stages, we find that the difference in response of

men and women to competitive incentives is marginally statistically significant (p = 0.061, t-

test; p= 0.081, MW test). Figure 5, Panel (c) illustrates this result. Interestingly, and in contrast

to the other two conditions, women’s change in performance is small across “low”, “middle”

and “high” performers and men’s competitive performance shows not only a small decrease for

the “bottom” and “middle” performers but also a clear decrease for the “top” performers. We

summarized these observations in the following result.

Result 3: When there is an explicit stereotype against men neither men nor women respond

significantly to the introduction of competitive incentives relative to non-competitive

incentives. However, when comparing the responses to competitive incentives between men

and women the former weakly decrease performance whereas the latter weakly increase

performance.

In order to further examine the treatment effects on men’s and women’s competitive

performance, we also perform a regression analysis. We apply the following linear regression

model that treats men and women equally because this is the model that better fits our data

according to the Chow test for structural stability between two groups:13, 14

where Performance change is the difference between the competitive second stage

performance and the non-competitive first stage performance; Implicit represents a dummy that

takes the value 1 for the individuals in the implicit stereotype against women condition, and the

value 0 otherwise; ExplicitW represents a dummy that takes the value 1 for the individuals in

13 We use the Chow test for structural stability to check whether there is any difference between men and women

both in terms of intercept and slope. According to this test we cannot reject that men and women behave equally

(p-value = 0.227). The model rejected by the Chow test that treats men and women differently is: Performance

change = b0 + b1*Non-competitive performance + b2*Implicit + b3*ExplicitW + b4*ExplicitM + b5*female +

b6(female*Non-competitive performance) + b7(female*Implicit) + b8(female*ExplicitW) + b9(female*ExplicitM)

+ u.

14 We also verify whether we should include the product of each condition dummy with Non-competitive

performance as interaction terms in our specification. In an unreported regression we find these interaction terms

are individually and jointly insignificant (p-value = 0.327 for the joint hypothesis).

Performance change = b0 + b1*Non-competitive performance + b2*Implicit + b3*ExplicitW + b4*ExplicitM + u

14

the explicit stereotype against women condition, and the value zero 0 otherwise; ExplicitM

represents a dummy that takes the value 1 for individuals in the explicit stereotype against men

condition, and the value 0 otherwise. The twice random pay condition is the base group for the

condition dummies. We run the regression using Non-competitive performance demean, i.e.,

(Non-competitive performance – sample mean of Non-competitive performance) in order to

make the intercept interpretation meaningful. That is, the intercept refers to individuals in the

twice random pay condition that solved correctly the sample mean number of correctly solved

problems in the first stage.15 Table 1 reports the regression result. The intercept indicates an

insignificant average change in performance of 0.257 problems solved correctly from the first

to the second stage of individuals in the twice random pay condition. In other words,

performance between stages is not affected by repetition. In comparison to individuals in the

twice random pay condition, individuals in the implicit stereotype against women condition

significantly solve 3.876 more problems correctly in the competitive stage than in the non-

competitive stage, controlling for non-competitive performance. For individuals in the explicit

stereotype against women condition this score is 3.58 and statistically significant. However, for

individuals in the explicit stereotype against men condition this score is - 0.136 and

insignificant. Importantly, we cannot reject that non-competitive performance has no effect on

the change in performance.16

Regarding the direction of the performance change, these econometric results undoubtedly

reinforce the connection between stereotypes and men’s and women’s competitive

performance identified in the statistical and graphical analysis: men’s and women’s response to

competitive pressure is similar across three different contexts. Both significantly increase

performance under competition with implicit and explicit stereotype against women. Both do

not significantly change performance under competition with explicit stereotype against men.

However, the statistical and graphical analysis is not crystal-clear about whether there is a

difference between men and women in terms of the magnitude of their competitive response in

the explicit stereotype against men condition. The Chow test result (see footnote 13) is

evidence against any difference between men and women. However, this test result holds for

the whole sample. Hence, perhaps the lack of difference in response magnitude between men

and women in the twice random pay, the implicit and explicit stereotype against women

conditions is camouflaging a difference in response magnitude between men and women in the

15 Evidently, this re-scaling of Non-competitive performance does not alter anything else in the regression output.

16 Although we have evidence to not include interaction terms of the condition dummies with Non-competitive

performance, in an unreported regression including those terms we find the same results.

15

explicit stereotype against men condition. Thus, in order to look more closely to the magnitude

of men’s and women’s competitive response we run an additional regression in which we

differentiate men and women. The model we use to compare men’s to women’s response

magnitude is:

Performance change = b0 + b1*Non-competitive performance + b2*Women_Implicit + b3*Men_Implicit +

b4*Women_ExplicitW + b5*Men_ExplicitW + b6*Women_ExplicitM + b7*Men_ExplicitM +

b8*Men_Twice_Random Pay + u

where women in the twice random pay condition is the base group for the dummies. Table 2

reports the regression result and Table 3 reports the three F-tests related to the regression in

which we compare men and women in terms of the magnitude of their competitive response.

Table 3 shows that the change in performance between stages of men is not significantly

different than women’s both in the implicit and explicit stereotype against women conditions.

For the explicit stereotype against men condition we find again marginal evidence that men’s

competitive response is more negatively affected than women’s. Relative to women’s change

in performance between stages, men’s solve 3.696 problems less from the first to the second

stage, holding non-competitive performance fixed.17

B. Confidence level, risk attitude and competitive attitude

In our analysis so far we have been stressing the influence of stereotypes over men’s and

women’s competitive performance. In this section we introduce controls that could influence

our results.

In the second stage performance the incentive-scheme a subject faces depends not only on

a subject’s own performance but also on another subject’s performance. Therefore, a subject’s

belief about his/her relative rank may affect a subject’s competitive performance. Table 4

presents the results on men’s and women’s self-assessed rank estimates for their non-

competitive performance. Men’s average estimate of 19.62% that they are the best performer in

the randomly formed group of 5 participants is significantly higher than the corresponding

women’s average estimate of 7.5%. Moreover, men’s average estimate of 7.65% that they are

the worst performer in the randomly formed group of 5 participants is significantly lower than

the corresponding women’s average estimate of 21.74%. The difference in the remaining

estimates is insignificant. These results already indicate that men are more optimistic about

17 Although the regression reported in Table 2 is not the most appropriate to examine the other comparisons we

draw in Table 1 (see footnote 13), all the results we find in Table 1 also hold qualitatively in Table 2 regression.

16

their relative performance than women. To further investigate this we compute a subject’s

confidence index which aggregates the elicited relative self-assessment beliefs.18 According to

this confidence index, men’s and women’s average rank estimate is 2.75 (std. dev. 0.81) and

3.33 (std. dev. 0.91), respectively. This difference is highly significant (p < 0.001, t-test, n =

188). Thus, we conclude that men are more confident than women before they enter the

competitive second stage.19, 20

Another candidate variable to cause performance differences between the non-competitive

and competitive stage is a person’s risk attitude. Even though we attempt to set the risk level as

similar as possible in the two stages in order to minimize the impact of risk attitude on

performance, we still elicit subjects’ risk attitude to verify its role. The responses to the

Dohmen et al. (2009) general risk question in our experiment are in line with the mounting

evidence that women are on average more risk averse than men (e.g., Croson and Gneezy,

2009). The average response to the general risk question of men, 5.95, is significantly higher

than women’s average response, 5.15 (p = 0.010, 2-sided t-test, n = 188).21

Finally, we are also interested in evaluating whether competitive attitude affects the

change in performance between stages. Assuming that a positive change in performance from

the non-competitive to the competitive stage requires an increase in effort, the more

competitive a person is the more effort, in relative terms, a person may exert in the competitive

stage. As a consequence, if effort and performance are indeed positively correlated we should

expect a person’s increase in performance from the non-competitive to the competitive stage to

be relatively higher the more competitive a person is. The average score of men in this test is

with 79.52 significantly higher than women’s average score of 74.06 (p = 0.003, 2-sided t-test,

18 Subject confidence index: 54321 543

5

1

21 ×+×+×+×+×=×∑=

pppppipi

i , where i is the outcome

that exactly (i-1) other participants solved correctly more problems and pi is a subject’s percentage estimate that outcome i is the actual one. Hence, the lower is this index the more confident a subject is.

19 We also examine if men are more confident only because they have a significantly better non-competitive

performance. In unreported regressions, we find that conditional on the non-competitive performance men are still

significantly more confident than women.

20 Interestingly, both men and women are neither over nor under confident relatively to their actual rank (see table

5).

21 As stated in the experimental design section, we also measure risk attitudes using the method developed by Holt

and Laury (2002). However, we only report Dohmen et al. measure because many subjects did not have a unique

switching point under the lottery measure and, consequently, it is not clear how these observations should be

treated. Furthermore, like in Dohmen et al. (2009) and Dohmen and Falk (2010), we find a strong correlation

between subjects’ responses to the risk question and the lottery choices for the subjects that have a unique

switching point under the lottery measure.

17

n = 188). Hence, according to the Mach-IV test, men have a higher competitive attitude than

women.22

To determine how these elements jointly affect the change in performance between stages,

and to understand their relative significance we use an augmented version of the linear

regression model used in subsection 3.A. In addition to the non-competitive first stage

performance and the condition dummies, the set of explanatory variables in Table 6 consists of

the confidence index, risk attitude, and competitive attitude. Results in Table 6 show that

neither the confidence level nor the risk attitude contribute significantly to the change in

performance from the non-competitive to the competitive stage. However, for the competitive

attitude the results indicate that a one point higher score of willingness to compete predicts

0.082 more problems solved correctly in the competitive stage, ceteris paribus. This effect is

significant but small.23

An important result shown in Table 6 is that the magnitude and significance of the

condition dummies as well as of the intercept is robust to the introduction of the additional

explanatory variables.24 This observation leads to our next result.

Result 4: The treatment effects regarding competitive performance are robust even after

controlling for confidence level, risk attitude and competitive attitude.

C. Is there a stereotype threat shadow?

Considering two groups, one prone and one not prone to stereotype threat, stereotype

threat theory essentially predicts that their performance gap in a context where the threat is

present should be different than the gap in their performances in a context where the threat is

not present. In our study the performance gap between men and women in the no stereotype

threat first stage is unfavourable to women in all conditions. Therefore, according to stereotype

threat theory we should observe, on the one hand, an increase in the performance gap between

men and women in the second stage of the implicit and explicit stereotype against women

22 It is reasonable to interpret the Mach IV test as only measuring a competitive attitude with “elbows”. Therefore,

strictly speaking we cannot claim a difference between men and women in their absolute competitive attitude

based on this test.

23 Although the Mach IV test is measured on a 20-140 point scale, the sample standard deviation of subjects’

scores in this test is only 0.928.

24 Since men and women differ significantly in these three individual characteristics, we run an additional

regression like the one on Table 6 but that also includes a dummy for sex. In this unreported regression, the

dummy for sex is insignificant (p-value = 0.719) while all the rest keeps virtually the same both qualitatively and

quantitatively.

18

conditions and, on the other hand, a decrease in the performance gap between men and women

in the second stage of the explicit stereotype threat against men condition. To analyse these

predictions we compare the performance gap between men and women in the first stage with

their performance gap in the second stage for each condition.25

Figure 7: Performance gap between men and women in the first and in the second stage for each

condition

2nd Stage_FEMALE

2nd Stage_MALE

2nd Stage_FEMALE

1st Stage_MALE

2nd Stage_MALE

2nd Stage_FEMALE

1st Stage_FEMALE

1st Stage_FEMALE

1st Stage _MALE

2nd Stage_MALE

1st Stage_FEMALE

1st Stage_MALE

1st Stage_FEMALE

1st Stage_Male

2nd Stage_FEMALE

2nd Stage_MALE

192021222324252627282930313233343536

Average Number of Problems Solved Correctly

Implicit Stereotype Against Women Explicit Stereotype Against Women Explicit Stereotype Against Men

CONDITION 1 (N=40) CONDITION 2 (N=44) CONDITION 3 (N=48) CONDITION 4 (N=56)

Twice Random Pay

No Stereotype Threat Context Stereotype Threat Context

In the twice random pay control condition, where no stereotype threat is induced in both

stages, the performance gap between men and women is 4.7 in the first stage and 6.9 in the

second stage.26 As expected, this difference in the gaps is statistically insignificant (p = 0.324;

t-test; p = 0.367, WSR test). In the implicit stereotype against women condition, the average

performance gap between men and women is 2.1 in the non-competitive and 1.8 in the

competitive stage. Besides being statistically insignificant (p = 0.557; 1-sided t-test; p = 0.445,

1-sided WSR test), the observed difference contradicts a stereotype threat based explanation

25 This comparison is informative regarding the effect of a stereotype threat because the first stage non-

competitive performance poses no stereotype threat. In this stage the necessary trigger for an implicit activation of

stereotype threat such as make a subject’s group membership salient and/or associate the work task with a

diagnostic of a subject’s ability and/or the fear of relative feedback or other’s evaluation is not present. 26 To compare the gaps in each condition we compute two variables: the difference between men’s and women’s

performance in the second stage per competition pair and the difference between men’s and women’s performance

in the first stage according to the second stage competition pairs. In condition 4, the control condition, we

compute the differences by using the pairs that are randomly formed to determine who gets paid in the random

pay incentive scheme. We also generate 100 random samples of pairings between men and women and, for each

random sample, compute the same two variables and bootstrap 10,000 times the difference in gaps. The results

using this approach are qualitatively the same as the ones following the criteria above.

19

which would require an increase rather than a decrease in the performance gap. In the explicit

stereotype against women condition, the gap between men’s and women’s average

performance is 7.1 and 8.7 in the non-competitive and competitive stage, respectively. The

change is in the direction predicted, but it is small and statistically insignificant (p = 0.313, 1-

sided t-test; p = 0.302, 1-sided WSR test). From this we conclude that also in this condition

there is no evidence for the adverse effect of a stereotype threat. Finally, in the explicit

stereotype against men condition, the performance gap between men and women is 6.6 in the

stereotype free non-competitive stage and 2.8 in the competitive stage. This direction of change

is consistent with a stereotype threat based explanation, but it is only marginally significant (p

= 0.075, 1-sided t-test; p = 0.072, 1-sided WSR test). Regression estimates from a linear

regression model with difference in the gaps as a dependent variable, and first stage

performance gap and the condition dummies as the explanatory variables corroborate these

results. Table 7 shows the estimation results. The coefficient of the intercept is 2.125 and

statistically not significantly different from zero. This indicates that the performance gap

between men and women in both stages does not differ in the twice random pay condition. The

table also shows that, compared to men and women in the twice random pay condition, the

difference between stages of the performance gap between men and women in the implicit

condition, -2.550, and in the explicit stereotype against women condition, -1.049, is not

statistically significantly different from zero. Finally, compared to men and women in the twice

random pay condition, the difference between the second and the first stage performance gap

between men and women in the explicit stereotype against men condition, -5.713, is marginally

significantly smaller. In other words, we have weak evidence that the performance gap

between men and women in the second stage is lower than their performance gap in the first

stage, were we do not induce any stereotype threat.

Overall, these results do not support an explanation based on stereotype threat.27

Moreover, the fact that our male and female participants regard their mathematical ability as

very important for themselves,28 reinforces the inconsistency of our results with stereotype

27 The only negatively stereotyped group that do not react positively to the competitive incentives across the three

competitive conditions are men in the explicit stereotype against men condition. One could argue that men do not

increase their performance under competition in the explicit stereotype against men condition because they

experience stereotype threat. However, on top of the evidence to support the stereotype threat hypothesis in this

condition only being weak, the fact that women also do not increase their performance under competition

indicates that a different effect underlies this observed change in behavior.

28 At the end of the experiment we ask subjects to indicate the degree to which they agree or disagree with the

statement “My math ability is important to me” using a seven-point Likert scale. The average for men and women

is 6.1 and 5.8, respectively.

20

threat theory because this theory suggests that an important mediating factor for an individual

to experience stereotype threat is domain identification. That is, an individual has to regard the

task’s domain as very important for his/her self-esteem (e.g., Aronson et. al, 1999). We

summarize in our next result.

Result 5: The observed treatment effects regarding men’s and women’s competitive

performance cannot be accommodated by an explanation based on stereotype threat.

D. Distracting thoughts versus strategic reasoning: effort provision and error rate

The results support our hypothesis that the key element mediating men’s and women’s

competitive performance and stereotypes is whether the stereotype-based expectations they

face support or contradict men’s and women’s prior belief about the invoked stereotype.

Ground in this mediating element, we suggest two alternative explanations that may

accommodate the significant increase of both, men’s and women’s, performance under

competition when the stereotype-based expectations support the prior belief they likely hold

and also explain why stereotype-based expectations contradicting the prior belief men and

women likely hold impair the competitive performance of both sexes.

One possible explanation is distracting thoughts. The chance to win the competition is

higher the more effort a person exerts. Therefore, when assuming the cost of providing effort is

small enough and that a person is better off by earning money, it is rational for a person to

provide extra effort in the competitive second stage. However, one cannot be sure that extra

effort provision implies optimal performance. Distracting thoughts have been shown to affect

working memory and attention (e.g., Brewin and Smart, 2005). Hence, even if a person

provides extra effort in the competitive stage, a conceivable cause for suboptimal performance

is information contradicting a stereotype a person holds. That is, stereotype-based expectations

contradicting a person’s prior belief may trigger distracting thoughts that interfere with the

performance of a mathematical task which involves working memory and attention.

Accordingly, men and women in the explicit stereotype against men condition do not improve

their performance in the competitive second stage because they are distracted and, as a result,

extra effort is not “efficiently” converted into correct answers. This inefficiency means that a

person cannot solve the problems faster than in the non-competitive stage. This happens

because either a person attempts more times to solve the problems within the 5 minutes but the

accuracy goes down compared to the non-competitive first stage performance, or in case the

person’ attempts are the same in both stages, the person needs more time in the second stage to

21

solve the same number of problems a person solved in the first stage due to lower accuracy.

Men and women in the explicit stereotype against women significantly increase performance

under competition as do men and women in the implicit stereotype condition, where no

information is provided, because the stereotype-based expectations men and women read in the

explicit condition just tells them something they already know. In simple terms, men and

women manage to provide extra effort efficiently in these two conditions because there are no

distracting thoughts to interfere with their working memory and attention while they perform

the task.

An alternative explanation for our results is related to strategic considerations and

Bayesian updating of beliefs about one’s relative performance. Assuming the cost of the

baseline non-competitive effort is negligible but the extra effort that men and women have to

provide as a necessary condition to increase their performance under competition is costly, men

and women will only provide extra effort if they believe they could win the competition.

Hence, if both men and women in the implicit and explicit stereotype against women

conditions are aware of the stereotype that “men are better at maths” but their prior belief is

that this differences are small, men and women will provide extra effort in the competitive

stage because they believe they could win the competition. However, if men and women

update their prior belief that sex differences in math ability are small according to the

information they receive in the explicit stereotype against men condition, they will believe the

difference in ability to perform the mathematical task at hand is substantial and favours

women. In this case, both men and women will provide baseline effort in the competitive

second stage because men believe they cannot win the competition and women believe

baseline effort is sufficient to win the competition.

Both explanations make predictions consistent with our results. In the following we

explore whether we can discriminate the two explanations. To this end we further analyse

men’s and women’s effort provision and accuracy during task performance in the two stages.

We measure accuracy in each stage using the error rate, i.e., the number of wrong answers a

subject provides divided by the total number of attempts to solve the problems within the 5

minutes performance. Concerning effort provision, we use as a measure in each stage the total

number of attempts to solve the problems within the 5 minutes performance. We also measure

the subjects’ average time response per correct problem.29 We consider this latter measure as

29 The average time response per correct problem of a subject is equal to (time in seconds of the last correct

answer) / (number of problems solved correctly). We use “time in seconds of the last correct answer” instead of

22

an additional measure of effort provision, which we can use in case the accuracy rate does not

change between stages.30 Men and women significantly increase performance in the

competitive second stage compared to the non-competitive first stage in the implicit and

explicit stereotype against women conditions. This can be because they provide more effort in

the second stage keeping the same accuracy of the first stage, and/or they increase their

accuracy in the second stage.31 Therefore, we first examine men’s and women’s error rates.

Table 8 reports both, the implicit and explicit stereotype against women conditions, men’s

and women’s average error rate in the non-competitive first stage is not statistically

significantly different from their average error rate in the competitive second stage. On the

other hand, in both these conditions the men’s and women’s average number of attempts to

solve the problems is significantly higher and their average time response per correct problem

is significantly lower in the competitive second stage relative to the first stage. Our next result

summarizes:

Result 6: Men and women significantly increase performance under competition in the implicit

and explicit stereotype against women condition because they exert more effort. This finding is

consistent with the predictions both of an explanation based on distracting thoughts or

strategic reasoning.

These findings are consistent with both alternative explanations. Both predict an increase

in men’s and women’s performance under competition in these two conditions due to higher

effort, regardless that extra effort may lead to an increase in accuracy or not. Considering the

explicit stereotype against men condition, Table 8 shows that in this condition men’s and

women’s average number of attempts to solve the problems is nearly the same in both stages.

That is, according to this measure their effort provision is not significantly different across

stages. Both an explanation based on distracting thoughts or strategic reasoning make

predictions about effort provision consistent with this finding. Importantly, however, we also

the total performance time (5minutes = 300 seconds) because many subjects make their last correctly attempt

before time is over.

30 If both the average time per correct problem and error rate significantly decreased in the second stage, this

would mean the improvement in performance is due to an increase in accuracy.

31 Recall that subjects can only solve a new problem after they answered the previous problem correctly, i.e., if

they provide a wrong answer, they have to tackle the same problem again. A simple example of the two different

ways a subject could increase performance in the second stage is: Assume a subject made 4 attempts within the 5

minutes first stage performance that correspond to 2 correct answers and 2 errors. Then a possible performance

increase in the second stage could be due to either extra effort keeping the error rate constant (e.g., 8 attempts in

the second stage corresponding to 4 correct answers and 4 errors) or an increase in accuracy as a result of more

effort (e.g., 4 attempts in the second stage corresponding to 4 correct answers).

23

observe that in the second stage of this condition women’s accuracy is the same whereas men’s

accuracy is significantly lower in comparison to the first stage. Thus in the case of women it

seems that they not only provide the same effort in both stages but also that they show the

same accuracy. This behavior is consistent with an explanation based on strategic reasoning

rather than on distracting thoughts. For men this is different. Although the number of problems

they solve in the second stage does not significantly differ from the first stage, the way they

solve the problems seems different. First, in absolute terms men take insignificantly more time

to solve the problems correctly in the second stage than in the first stage, 13.44 seconds and

12.6 seconds, respectively. Second, and more importantly, men’s error rate is statistically

significantly higher in the second stage. Hence, men in the second stage take more time to

solve virtually the same number of problems as they did in the first stage because they make

more errors. We summarize in our final result.

Result 7: In the explicit stereotype against men condition, women exert the same effort and

keep the same accuracy in both stages. Men not only exert the same effort in both stage but

also lower their accuracy in the second stage.

The former finding on women’s behavior supports an explanation based on strategic

reasoning. The latter finding on men’s behavior is better accommodated by an explanation

based on the interference of distracting thought with working memory and attention than by an

explanation based on a deliberate strategic decision to keep baseline effort and, as a result,

solve the same number of problems in the second stage as in the first stage.32

4. Discussion and conclusion

In this paper, we use a controlled laboratory experiment to test our hypothesis that men’s

and women’s performance under competition when they are competing with each other is

negatively affected by stereotypes only if the stereotype-based expectations they face

contradict the prior belief men and women hold about the invoked stereotype. Our results

support this hypothesis. In the two competitive contexts – the implicit stereotype against

women and the explicit stereotype against women conditions – in which we induce stereotype-

based expectations that support the stereotype that men and women hold, men and women

react positively to the competitive incentives. In the competitive context – the explicit

stereotype against men condition – in which we induce explicit stereotype-based expectations

32 Within our research program on competition between the sexes, we are currently investigating the neural

mechanisms behind our behavioral findings in a parallel Neuroeconomics study.

24

that contradict the stereotype that men and women hold, men and women do not react

positively to the competitive incentives.

Our results do not support the psychological literature on stereotype threat. According to

this literature, we should expect that the performance gap between individuals of a group prone

to stereotype threat and the individuals of a group not prone to stereotype threat should be

different in a context where the threat exits from a context where the threat is not present. As

shown in the results section, we cannot reject in each condition we study that the performance

gap between men and women in the first stage, where we do not induce any stereotype threat,

is the same as their performance gap in the second stage, where we induce a stereotype threat.

Yet, the psychology literature has been studying stereotype threat in non-competitive contexts.

Hence, a possible reason of why our results are not in accordance with a stereotype threat

based explanation is that we evaluate the impact of stereotypes in a competitive context

instead. Another possibility of why the adverse effect of stereotype threat over performance is

not observed in our study is that, in contrast to a standard stereotype threat study, we use

monetary incentives.33

The results support our hypothesis that the key element governing the relation between

stereotypes and men’s and women’s competitive performance is whether the stereotype-based

expectations support or contradict the stereotype men and women hold. In line with this

mediating element, we advance two alternative explanations to uncover the connection

between stereotypes and the competition of the sexes. Both an explanation based on distracting

thoughts and its interference with working memory and attention, and an explanation based on

strategic considerations and Bayesian updating of beliefs make predictions consistent with our

results. Moreover, by analysing men’s and women’s effort provision and accuracy during task

performance, we find that men’s behavior is better accommodated by an explanation based on

distracting thoughts whereas women’s behavior is better accommodated by an explanation

based on strategic reasoning.

An alternative interpretation of our results relates to the different priming of men and

women in the explicit stereotype conditions. Men and women read the same information in the

explicit stereotype conditions. In the explicit stereotype against women condition, the

information negatively stereotypes women from a women’s perspective whereas it positively

stereotypes men from a men’s perspective. The opposite occurs in the explicit stereotype

against men condition. Therefore, it is conceivable that women and men excel under

33 Using monetary incentive within a non-competitive context, Fryer, Levitt and List (2008) did not find evidence

for the negative impact of stereotype threat either.

25

competition against each other when women have to disconfirm they are worse whereas men

have to confirm they are better. Women and men choke under competition against each other

when women have to confirm they are better whereas men have to disconfirm they are worse.

This sex difference interpretation is not convincing in our view because it implies that

participants in the explicit conditions fear the evaluation of others. This is clearly not the case

in our experimental setting.

Finally, our findings contradict previous results in the experimental economics literature

suggesting that men are more responsive to competitive incentives than women (e.g., Gneezy

et al., 2003). In contrast, we observe that men and women react similarly to competition in

terms of performance across three different stereotype threat conditions. A possible reason for

this contrast in results is related to our within-subjects design approach. The data supports our

view that performance comparisons without controlling for the ability to perform the work

task, as is the case in a between-subjects design, could lead to flawed conclusions.34 Another

possible reason is the impact of risk attitudes on performance when comparing performances

elicited both under a non-competitive and a competitive incentive scheme. Considering the

well documented differences between men’s and women’s risk attitude (e.g., Croson and

Gneezy, 2009), it is very likely that the influence of this variable is higher in previous studies

compared to ours because a less risky incentive scheme is used to elicit non-competitive

performance in those studies.

Our paper is part of a research program that is aimed at understanding why women are

underrepresented in many high-status jobs and earn lower wages than men. Taking into

account the pervasiveness of a stereotype, our results indicate that men and women have a

similar reaction to competitive pressure in terms of performance when they have to compete

against each other. In other words, different attitudes between the sexes towards competition

do not seem to be an explanation for the observed differences between men and women at the

workplace in the case that they are already competing. Still, our results have a practical

34 Since repetition does not affect performance in our experiment, we can make the following counterfactual

reasoning: if the same subjects that were randomly assigned to the implicit stereotype against women condition

(condition 1) had instead been asked to only perform the competitive second stage, their performance would have

been virtually the same as the one they actually displayed in the experiment’s second stage. Hence, if to evaluate

the implicit stereotype against women condition we had instead used a between subjects-design in which the

reference of men’s average competitive performance was virtually the same as the one we elicited in the second

stage of the implicit stereotype against women (28 problems. See Figure 3) and the reference of men’s average

non-competitive performance was the one corresponding to the men’s average performance we elicited in the first

stage of the explicit stereotype against women condition (30.8 problems. See Figure 3), we would have (wrongly)

concluded that men do not increase their performance under competition in the implicit stereotype against women

condition. Since the data of our experiment was obtained using a random procedure to assign the subjects per each

condition, it clearly indicates that performance comparisons based on a between-subjects design could be

problematic.

26

implication regarding policy design to cope with stereotypes at the workplace. A

recommendation found in the stereotype threat literature to prevent a negative effect of

stereotypes is the “stereotype nullification”, i.e., to explicitly provide individuals with

information that does not conform to the stereotype (e.g., Smith and White, 2002). In stark

contrast, our results indicate that within a competitive environment no information should be

provided at all. If men and women are already competing against each other, they seem to cope

well in terms of performance with a stereotype they hold. In this case, providing information

contradicting that stereotype seems to harm the performance not only of the negatively but also

of the positively stereotyped group.

27

References

Aronson, J., Quinn, D. M., and Spencer, S. J. (1998). Stereotype threat and the

academic under-performance of minorities and women. In Swim, J. K., and

Stangor, C. (eds.), Prejudice: The Target’s Perspective, Academic Press, New

York, pp. 83–103.

Aronson, J., Lustina, M. J., Good, C., Keough, K., Steele, C. M., and Brown, J.

(1999). “When White men can’t do math: Necessary and sufficient factors in

stereotype threat”. Journal of Experimental Social Psychology, 35, 29–46.

Babcock, Linda and Sara Laschever (2003). Women Don’t Ask: Negotiation and the

Gender Divide. Princeton, NJ: Princeton University Press.

Bertrand, M. and Hallock, K. F. (2001). “The Gender Gap in Top Corporate Jobs”.

Industrial and Labor Relations Review, LV, 3–21.

Black, S. and Strahan, P. E. (2001). “The Division of Spoils: Rent-Sharing and

Discrimination in a Regulated Industry”. American Economic Review, XCI,

814–831.

Blau, F., Ferber, M. and Winkler, A. (2010). The Economics of Women, Men,

and Work. Englewood Cliffs, NJ: Prentice Hall, 6th Edition.

Booth, A. L. and Nolen, P. J. (2009). “Choosing to Compete: How Different are

Girls and Boys?”. CEPR Discussion Paper No. 7214.

Brewin, C. R. and Smart, L. (2005). “Working memory capacity and suppression of

intrusive thoughts”. Journal of Behavior Therapy and Experimental Psychiatry,

36(1), 61-68.

Cason, T. N., Masters, W. A. and Sheremeta, R. M. (2010). “Entry into Winner-

Take-All and Proportional-Prize Contests: An Experimental Study”. Journal of

Public Economics, forthcoming.

Christie, R. and Geis, F Sacale (1970). Scale construction. In: Studies in

Machiavellianism, Academic Press, New York 10–33.

Croson, R. and Gneezy, U. (2009). “Gender Differences in Preferences”. Journal of

Economic Literature, 47(2), 1–27.

Datta Gupta, N., Poulsen, A. and Villeval, M. (2005). “Male and Female

Competitive Behavior: Experimental Evidence”. IZA Discussion Paper No.

1833.

Dohmen, T., Falk, A., Huffman, D., Sunde, U., Schupp, J. and Wagner, G. (2009).

“Individual Risk Attitudes: Measurement, Determinants and Behavioral

Consequences”. Journal of the European Economic Association, Forthcoming.

Dohmen, T., and Falk, A. (2010). “Performance Pay and Multi-dimensional Sorting:

Productivity, Preferences and Gender”. American Economic Review, 101(2),

556-90.

28

Eliot, L. (2009). Pink Brain, Blue Brain: How Small Differences Grow Into

Troublesome Gaps — And What We Can Do About It. New York: Houghton

Mifflin Harcourt.

European Commission (2009), “She Figures 2009: Statistics and Indicators on

Gender Equality in Science,” Luxembourg: Publication Office of the European

Union.

Fryer, R., Levitt, S. and List, J. (2008). “Exploring the Impact of Financial Incentives

on Stereotype Threat: Evidence from a Pilot Study”. American Economic

Review: Papers & Proceedings, 98, 2, 370-375.

Gneezy, U. and Rustichini, A. (2004). “Gender and Competition at a Young Age”.

American Economic Review, Papers and Proceedings, 94, 377-381.

Gneezy, U., Leonard, K. L. and List, J. A. (2009). “Gender Differences in

Competition: Evidence from a Matrilineal and a Patriarchal Society”.

Econometrica, 77, 1637-1664.

Fischbacher, U. (2007). “Z-Tree: Zurich Toolbox for Ready-made Economic

experiments”. Experimental Economics, 10(2), 171-178.

Günther, C., Ekinci, N. A., Schwieren, C. and Strobel, M. (2010). “Women Can't

Jump? – An Experiment on Competitive Attitudes and Stereotype Threat”.

Journal of Economic Behavior & Organization, 75 (3), 395-401.

Hyde, J. S., Fennema, E. and Lamon, S. J. (1990). “Gender Differences in

Mathematics Performance: A Meta-Analysis”. Psychological Bulletin, CVII

(1990), 139-155.

Kit, K., Tuokko, H. and Mateer, C. (2008). “A Review of the Stereotype Threat

Literature and Its Application in a Neurological Population”. Neuropsychology

Review, 18(2), 132-148.

Niederle, M., and Vesterlund, L. (2007). “Do Women Shy Away from Competition?

Do Men Compete Too Much?”. Quarterly Journal of Economics, 122, 1067-

1101.

Offerman, T. (1997). Beliefs and Decision Rules in Public Good Games. (eds)

Theory and Experiments Kluwer, Dordrecht/Boston/London.

Smith, J. L., and White, P. H. (2002). “An examination of implicitly activated,

explicitly activated, and nullified stereotypes on mathematical performance:

It’s not just a woman’s issue”. Sex Roles, 47, 179–191.

Spencer, S. J., Steele, C. M., and Quinn, D. M. (1999). “Stereotype threat and

women’s math performance”. Journal of Experimental Social Psychology, 35,

4–28.

Steele, C. M., and Aronson, J. (1995). “Stereotype threat and the intellectual test

performance of African-Americans”. Journal of Experimental Social

Psychology, 69, 797–811.

29

FIGURE 2. Men’s and Women’s non-competitive first stage performance for all 4

conditions

0.2

.4.6

.81

0 5 10 15 20 25 30 35 45 50 55 6040

Women Men

Number of correct answers in the non-competitive first stage

Cumulative Distribution

Note: The figure plots the cumulative distribution of the number of correct

answers for all 4 conditions during the 5 minutes non-competitive first stage,

separately for men and women.

30

FIGURE 4. Men’s and women’s performance in the twice random pay condition

0.2

.4.6

.81

Cumulative Distribution

0 5 10 15 20 25 30 35 40 45 50 55 60Number of correct answers in 5 minutes

Men_Non Competitive 1st stage Men_Non Competitive 2nd stage

0.2

.4.6

.81

Cumulative Distribution

0 5 10 15 20 25 30 35 40 45 50Number of correct answers in 5 minutes

Women_Non Competitive 1st stage Women_Non Competitive 2nd stage

Note: The figure plots the cumulative distributions of the number of correct answers during 5 minutes

in the first stage and in the second stage of the twice random pay condition.

31

FIGURE 5. Men’s and women’s non-competitive and competitive performance in the stereotype conditions

(a) Implicit stereotype against women condition

0.2

.4.6

.81

Cumulative Distribution

0 5 10 15 20 25 30 35 40 45 50Number of correct answers in 5 minutes

MEN_No Competition MEN_Competition0

510

15

20

25

30

35

40

45

Average number of correct answers in 5 m

inutes

Bottom third Middle third Top third

Men_No competition Men_Competition

Clustering based on men's non-competitive perfomance

0.2

.4.6

.81

Cumulative Distribution

0 5 10 15 20 25 30 35 40 45 50Number of correct answers in 5 minutes

Women_No Competition Women_Competition

05

10

15

20

25

30

35

40

45

Average number of correct answers in 5 m

inutes

Bottom third Middle third Top third

Women_No competition Women_Competition

Clustering based on women's non-competitive performance

(b) Explicit stereotype against women condition

0.2

.4.6

.81

Cumulative Distribution

0 5 10 15 20 25 30 35 40 45 50 55 60Number of correct answers in 5 minutes

Men_No Competition Men_Competition

05

10

15

20

25

30

35

40

45

Average number of correct answers in 5 m

inutes

Bottom third Middle third Top third

Men_No competition Men_Competition

Clustering based on men's non-competitive performance

32

0.2

.4.6

.81

Cumulative Distribution

0 5 10 15 20 25 30 35 40 45 50Number of correct answers in 5 minutes

Women_NoCompetition Women_Competition

05

10

15

20

25

30

35

40

45

Average number of correct answers in 5 m

inutes

Bottom third Middle third Top third

Women_No competition Women_Competition

Clustering based on women's non-competitive performance

(c) Explicit stereotype against men condition

0.2

.4.6

.81

Cumulative Distribution

0 5 10 15 20 25 30 35 40 45 50Number of correct answers in 5 minutes

Men_No competition Men_Competition

05

10

15

20

25

30

35

40

45

Average number of correct answers in 5 m

inutes

Bottom third Middle third Top third

Men_No competition Men_Competition

Clustering based on men's non-competitive performance

0.2

.4.6

.81

Cumulative Distribution

0 5 10 15 20 25 30 35 40 45 50Number of correct answers in 5 minutes

Women_No Competition Women_Competition

05

10

15

20

25

30

35

40

45

Average number of correct answers in 5 m

inutes

Bottom third Middle third Top third

Women_No competition Women_Competition

Clustering based on women's non-competitive performance

Notes: Each panel of figure 5 plots 2 graphs for men and 2 graphs for women. The first graph shows the cumulative distributions of

the number of correct answers during 5 minutes, separately for the non-competitive and the competitive stage. The second graph

shows subjects’ average non-competitive performance and their corresponding average competitive performance by clusters based on

subjects’ non-competitive performance. “Bottom third” includes the 33.(3)% worst performers in the non-competitive first stage;

“Top third” includes the 33.(3)% best performers in the non-competitive first stage; “Middle third” includes the remaining 33.(3)%

performers in the non-competitive first stage. Panel (a) refers to the implicit stereotype against women condition, Panel (b) refers to

the explicit stereotype against women condition and Panel (c) to the explicit stereotype against men condition.

33

TABLE 1 – Linear regression on the change in performance between stages

(men and women treated equally)

Dependent variable Performance change

Coefficient Standard error p-value

Non-competitive performance -0.024 0.047 0.611

Implicit stereotype against women 3.876* 1.416 0.007

Explicit stereotype against women 3.580* 1.350 0.009

Explicit stereotype against men -0.136 1.341 0.919

Intercept 0.257 0.912 0.778

Observations: 188

Notes: Performance change is the difference between the competitive second stage performance and the non-

competitive first stage performance; Implicit stereotype against women represents a dummy that takes the value 1

for the individuals in the implicit stereotype against women condition, and the value 0 otherwise; Explicit

stereotype against women represents a dummy that takes the value 1 for the individuals in the explicit stereotype

against women condition, and the value 0 otherwise; Explicit stereotype against men represents a dummy that

takes the value 1 for individuals in the explicit stereotype against men condition, and the value 0 otherwise. The

twice random pay condition is the base group for the condition dummies; We run the regression using Non-

competitive performance demean, i.e., (Non-competitive performance – sample mean of Non-competitive

performance) in order to make the intercept interpretation meaningful; * statistically significant at 5% level.

34

TABLE 2 – Linear regression on the change in performance between stages

(men and women treated differently)

Dependent variable Performance change

Coefficient Standard error p-value

Non-competitive performance -0.213 0.048 0.663

Women_Implicit ST against women 5.124* 1.965 0.010

Men_Implicit ST against women 4.870* 1.957 0.014

Women_Explicit ST against women 3.901* 1.882 0.040

Men_Explicit ST against women 5.512* 1.920 0.005

Women_Explicit ST against men 2.907 1.901 0.130

Men_Explicit ST against men -0.789 1.834 0.668

Men_Twice random pay 2.253 1.816 0.216

Intercept -0.855 1.262 0.499

Observations: 188

Notes: Performance change is the difference between the competitive second stage performance and the non-

competitive first stage performance; Women_Implicit ST against women represents a dummy that takes the value 1

for the women in the implicit stereotype against women condition, and the value 0 otherwise; Men_Implicit ST

against women represents a dummy that takes the value 1 for the men in the implicit stereotype against women

condition, and the value 0 otherwise; Women_Explicit ST against women represents a dummy that takes the value

1 for the women in the explicit stereotype against women condition, and the value 0 otherwise; Men_Explicit ST

against women represents a dummy that takes the value 1 for the men in the explicit stereotype against women

condition, and the value 0 otherwise; Women_Explicit ST against men represents a dummy that takes the value 1

for women in the explicit stereotype against men condition, and the value 0 otherwise; Men_Explicit ST against

men represents a dummy that takes the value 1 for men in the explicit stereotype against men condition, and the

value 0 otherwise; Men_Twice random pay represents a dummy that takes the value 1 for men in the twice random

pay condition, and the value 0 otherwise. The women in the twice random pay condition is the base group for the

condition dummies; * statistically significant at 5% level.

35

TABLE 3 – Magnitude of the difference in competitive response between men and

women in each condition

Condition

Regression estimate

p-value (F-test)

Implicit stereotype against women

-0.257 0.904

Explicit stereotype against women

1.611 0.427

Explicit stereotype against men

-3.696* 0.061

Notes: The regression estimate is based on Table 2 regression. The null

hypothesis in each condition is: i) Ho: Men_Implicit ST against women =

Women_Implicit ST against women; ii) Ho: Men_Explicit ST against women

= Women_Explicit ST against women; iii) Ho: Men_Explicit ST against men

= Women_Explicit ST against men. * statistically significant at 10% level.

36

TABLE 4 – Men’s and women’s self-assessed rank estimates for their non-competitive performance

Notes: Given 4 other participants that have been randomly chosen with equal probability, subjects are asked to indicate their best estimates in percentage that exactly 0 (rank

1), exactly 1 (rank 2), exactly 2 (rank 3), exactly 3 (rank 4) or exactly 4 (rank 5) of these other participants solved more problems correctly than they did themselves in the 5

minutes non-competitive first stage. The p-value refers to a 2-sided t-test, n = 188.

TABLE 5 – Frequency of men and women per rank intervals according to their confidence index and to their actual rank

Rank MEN WOMEN

According to confidence index According to actual rank According to confidence index According to actual rank

[1-2[: Best 18 33 7 17

[2-3[ 29 23 27 25

[3-4[ 38 26 32 21

[4-5]: Worst 9 12 28 31

Total 94 94 94 94

Notes: In table 5 men and women are assigned to each rank interval, firstly according to their confidence index, and secondly according to their actual rank. A subject’s

confidence index is equal to ∑=

×5

1i

i ip , where i is the outcome that exactly (i-1) other participants solved more problems correctly and pi is the subject’s elicited percentage

estimate that outcome i is the actual one. Actual rank is computed as follows: using a linear extrapolation, we rescale a subject’s non-competitive performance rank in the

session he/she attended into a 1-5 scale. Men’s average confidence index and average actual rank is 2.75 and 2.60, respectively (p = 0.305, 2-sided t-test). Women’s average

confidence index and average actual rank is 3.33 and 3.28, respectively (p = 0.728, 2-sided t-test).

RANK MEN WOMEN

p-value Average estimate (in probability) Average estimate (in probability)

1: Best 19.62 7.50 <0.001

2 22.56 20.78 0.560

3 28.18 24.65 0.261

4 21.99 25.33 0.371

5: Worst 7.65 21.74 <0.001

Total 100 100

37

TABLE 6 – Linear regression on the change in performance between stages including

background variables: confidence level, risk attitude and competitive attitude

(men and women treated equally)

Dependent variable Performance change

Coefficient Standard error p-value

Non-competitive performance -0.001 0.054 0.979

Implicit stereotype against women 4.037* 1.407 0.005

Explicit stereotype against women 3.456* 1.355 0.012

Explicit stereotype against men -0.065 1.353 0.962

Confidence index 0.570 0.590 0.336

Risk attitude 0.177 0.232 0.446

Competitive attitude 0.082* 0.039 0.037

Intercept 0.235 0.911 0.797

Observations: 188

Notes: Performance change is the difference between the competitive second stage performance and the non-

competitive first stage performance; Implicit stereotype against women represents a dummy that takes the value 1

for the individuals in the implicit stereotype against women condition, and the value 0 otherwise; Explicit

stereotype against women represents a dummy that takes the value 1 for the individuals in the explicit stereotype

against women condition, and the value 0 otherwise; Explicit stereotype against men represents a dummy that

takes the value 1 for individuals in the explicit stereotype against men condition, and the value 0 otherwise;

Confidence index measures a subject’s relative self-assessment for his/her non-competitive first stage performance

on a 1-5 rank scale in which the value 1 is the best and value 5 is the worst; Risk attitude is measured on a 1-10

scale in which the value 0 means ‘not at all willing to take risks’ and the value 10 means ‘very willing to take

risks’; Competitive attitude is measured on a 20-140 scale in which higher scores predict more competitive

behavior; The twice random pay condition is the base group for the condition dummies. We run the regression

using Non-competitive performance, Confidence index, Risk attitude and Competitive attitude demean in order to

make the intercept interpretation meaningful; * statistically significant at 5% level.

38

TABLE 7 – Linear regression on the difference between stages of the performance gap

between men and women

Dependent variable Difference in performance gaps between men and women

Coefficient Standard error p-value

First stage performance gap -0.037 0.081 0.684

Implicit stereotype against women -2.550 3.127 0.417

Explicit stereotype against women -1.049 3.045 0.731

Explicit stereotype against men -5.713* 3.006 0.061

Intercept 2.125 2.036 0.300

Observations: 94

Notes: Difference in performance gaps between men and women is equal to the second stage performance gap

between men and women minus their first stage performance gap; First stage performance gap is the gap in

performance between men and women in the first stage; Implicit stereotype against women represents a dummy

that takes the value 1 for the individuals in the implicit stereotype against women condition, and the value 0

otherwise; Explicit stereotype against women represents a dummy that takes the value 1 for the individuals in the

explicit stereotype against women condition, and the value 0 otherwise; Explicit stereotype against men represents

a dummy that takes the value 1 for individuals in the explicit stereotype against men condition, and the value 0

otherwise. The twice random pay condition is the base group for the condition dummies; We run the regression

using First stage performance gap demean, i.e., (First stage performance gap – sample mean of First stage

performance gap) in order to make the intercept interpretation meaningful; * statistically significant at 10% level.

39

TABLE 8 – Men’s and women’s accuracy and effort provision in the non-competitive first stage and in the competitive second stage for

each stereotype condition

CONDITION

Average

error rate p-value

Average

number

of

attempts

p-value

Average time

response per

correct problem

(in seconds)

p-value

IMPLICIT

Women 1st Stage 0.262 0.159

(Ha: 1st > 2

nd)

28.55 < 0.001**

(Ha: 1st < 2

nd )

17.30 0.050**

(Ha: 1st > 2

nd ) 2

nd Stage 0.220 33.05 14.60

Men 1st Stage 0.198 0.114

(Ha: 1st > 2

nd)

29.4 0.003**

(Ha: 1st < 2

nd )

16.52 0.009**

(Ha: 1st > 2

nd ) 2

nd Stage 0.159 32.95 14.13

EXPLICIT

AGAINST

WOMEN

Women 1st Stage 0.205 0.200

(Ha: 1st > 2

nd)

29.35 0.003**

(Ha: 1st < 2

nd )

14.44 0.094*

(Ha: 1st > 2

nd ) 2

nd Stage 0.177 32.13 13.11

Men 1st Stage 0.152 0.778

(Ha: 1st > 2

nd)

35.86 < 0.001**

(Ha: 1st < 2

nd )

11.21 0.068*

(Ha: 1st > 2

nd ) 2

nd Stage 0.164 41.55 10.57

EXPLICIT

AGAINST

MEN

Women 1st Stage 0.258 0.575

(Ha: 1st < 2

nd)

26.39 0.202

(Ha: 1st = 2

nd )

15.68 0.930

(Ha: 1st = 2

nd ) 2

nd Stage 0.250 28.00 15.49

Men 1st Stage 0.149 0.010**

(Ha: 1st < 2

nd)

31.72 0.946

(Ha: 1st = 2

nd )

12.60 0.361

(Ha: 1st = 2

nd ) 2

nd Stage 0.215 31.80 13.44

Notes: Error rate is equal to the number of wrong answers a subject provide divided by the total number of attempts to solve the problems

within the 5 minutes performance; Number of attempts is equal to the total number of attempts made by a subject to solve the problems

within the 5 minutes performance; Average time spent per correct problem for a subject is equal to (time in seconds of the last correct

answer) / (number of problems solved correctly). The p-values refer to a paired t-test between the first and the second stage (WSR test

conclusions are qualitatively the same). The alternative hypotheses (Ha) are drawn according to the performance results we find in

subsection 3.A; ** statistically significant at 5% level; * statistically significant at 10% level.

40

Appendix A: Mach IV questionnaire

41