racial bias in basketball foul calls: a replication attempt alan reifman texas tech university...

1
Racial Bias in Basketball Foul Calls: A Replication Attempt Alan Reifman Texas Tech University [email protected] INTRODUCTION Recent large-scale analyses of National Basketball Association (NBA) game statistics have examined how the interaction between the race/ethnicity of referees and that of players may affect foul calling (Price & Wolfers, 2007) and betting markets (Larsen, Price, & Wolfers, 2008). Such studies have the potential to map onto social psychological theories of racism, prejudice, and discrimination (e.g., Dovidio & Gaertner, 2004). The initial study, by Price and Wolfers, indeed demonstrated trends consistent with racial bias, in NBA games from the 1991-92 season to 2003-04. With over a quarter- million “player-games” as the units of analysis (i.e., each player was repeatedly used as a separate case for each game he played), these authors found evidence “suggesting that a player earns 0.18 fewer fouls per 48 minutes played when facing three referees [the size of an officiating crew] of his own race than when facing three opposite-race referees,” (p. 7). As illustrated in the preceding quote, Price and Wolfers’s (2007) use of box scores as their data source did not allow them to pinpoint which referee in a game called any particular foul. Rather, each player’s number of fouls in a game could be associated only with a three-person officiating crew (according to the referee names listed in the box score) and with the aggregate racial composition of the crew. Whenever a player’s fouls were viewed alongside referee combinations other than all-white or all-black – e.g., an African-American player and a crew consisting of two white and one black referees – there would be an implication that white officials called most of the fouls. However, this could not be documented. An important research concept that can be applied to this line of inquiry is that of triangulation (e.g., Anderson, 1989). To the extent that different researchers, using different methodological approaches to study the same underlying phenomenon, obtain similar findings, then the field’s confidence in the robustness of the findings is strengthened. Accordingly, the present article reports an attempted conceptual replication of Price and Wolfers’ (2007) referee bias study, using a different method. As elaborated below, the present studies had observers watch basketball games and, for each foul called, record the races of the referee making the call, the player called for committing the foul (i.e., the “guilty” party), and the player who was “harmed” by the foul. Though Price and Wolfers discussed the possible role of the “victim” in foul calls, box scores do not permit them to be identified. Whereas our microanalyses of basketball games allow greater detail regarding fouls than do box scores, a negative trade-off for us is small sample size, due to the labor-intensive nature of tracking individual fouls while viewing games. METHODS RESULTS Study 1 METHODS RESULTS Study 2 As part of a research methodology class project at a large southwestern U.S. university, undergraduate students recorded detailed foul data for men’s college basketball games in the Big 12 conference during 2008 league play (the participating school belonged to this conference, hence the professor of the course felt these games would hold the greatest intrinsic interest for students). The class contained roughly 25 students, each of whom was required to collect data on two games (three for extra credit); the professor of the class (the present author) also gathered data from several games. Students were free to select which Big 12 games they wanted to observe, thus resulting in some duplication of effort. A data-recording sheet was created by the professor and posted on the class webpage for students to print off. These sheets could be completed either at a game in person, while watching a game live on television (with no video recording capability), or while watching a game the student had recorded. The latter option was recommended to students, as they could use online play-by-play sheets (available shortly after the completion of a game) to guide their tracking of fouls in the game video; such play-by-play sheets listed the time of each foul, the guilty party, and (if a shooting foul) the name of the harmed player. Not all students considered themselves knowledgeable about basketball, of course. To combat this, the instructor showed online videos of fouls (not from the season used in the main study) to train students; further, most of the completed data sheets could be cross-checked against those of other students (or the professor) who watched the same game, as well as against online box scores and play-by-play sheets. As was the case in Price and Wolfers’ (2007) study, race/ethnicity of referees and players was based on raters’ perceptions (one foul involving a Chinese player was excluded). A total of 96 regular-season games are played each season in the Big 12. Six schools (Baylor, Oklahoma, Oklahoma State, Texas, Texas A&M, and Texas Tech) comprise the southern division, whereas another six (Colorado, Iowa State, Kansas, Kansas State, Missouri, and Nebraska) comprise the northern division. Each team plays each of the other teams in its own half twice (once home and once away), and plays each team in the other half once. The participating school is in the southern half. Not surprisingly, the proportion of games coded, out of all possible, was highest for south-vs.-south games (19/30, 63%), presumably as a result of student interest and availability of television broadcasts. For north-vs.-south games, the coverage rate was 11/36 (31%), and for north-vs.-north games, it was 5/30 (17%). Numerous factors can affect the prevalence of fouls called on members of one racial/ethnic group against members of the same or a different racial/ethnic group. These include each team’s racial/ethnic player composition, the positions played by members of different racial/ethnic groups, and teams’ styles of play (e.g., a finesse team that relies on passing and outside shooting, as opposed to a team that tries to use its muscle inside to rebound and get shots from in close). Whereas Price and Wolfers (2007) controlled for player attributes in their regression equations, we used a different approach to hold extraneous variables constant. What we did was restrict our analyses to games in which there was at least one African-American referee and at least one white referee (in practice, we did not have games with more than one African-American referee). That way, the black and white referees would be watching the same games, with the same players, same player offensive/defensive match-ups, etc. Also, referee protocols (FIBA, 2006) call for officials to rotate their locations on the court, thus avoiding potential confounds between referees’ race/ethnicity and viewing location (e.g., we would not want the area underneath the basket, where a lot of player contact occurs, to be monitored exclusively by a single referee, whether black or white). Limitation of our sample to games with diverse referee crews reduced the number of eligible contests to 11. These included both games between Texas A&M and Oklahoma State, each of whose player line-ups were overwhelmingly African- American; because virtually all fouls involved African-Americans fouling African-Americans in these two games, they were excluded. The final sample thus included nine games (both games between the University of Texas and Texas Tech; both games between Texas A&M and Texas Tech; Kansas State at Texas Tech; Missouri at Texas Tech; Kansas at Nebraska; Missouri at Baylor; and Missouri at Kansas; because the latter game was coded from a condensed TV rebroadcast, some fouls were missing). Beyond haphazard missing data (e.g., fouls called outside of camera view on television or being called simultaneously by both a black and white official), there was one other scenario under which fouls were excluded. In the closing minutes of reasonably close games (i.e., where one team is within single-digit points of the other), the trailing team will usually commit fouls to put the leading team on the free-throw line. If free throws are missed, that gives the trailing team (assuming it gets the rebound) a chance to come back and score to reduce the deficit. Such situations are identifiable in play-by-play sheets where, for example, the trailing team scores a basket and then, within a second or two, fouls the leading team. Because the referee-bias notion would seem most applicable to quick, spontaneous judgments (cf., Dovidio and Gaertner, 2004), as opposed to the pro forma nature of late fouls by the trailing team, we excluded the latter. In contrast to Price and Wolfers (2007), who focused on players, we used fouls (N = 335) as the units of analysis. Table 1 presents a chi-square contingency table organized by race of referee, race of player called for committing a foul, and race of player who was fouled (because the columns combine the race of the player called for the foul with the race of the player who was fouled, the table appears in a two-way [2 X 4] arrangement, instead of a three-way [2 X 2 X 2] set-up). Because the African-American and white referees (in the aggregate) saw the same nine games, with the same players, same match-ups, etc., the null hypothesis would be that white referees’ proportional allocations of fouls into black against black, black against white, white against black, and white against white would be the same as black referees’ allocations into these four categories (for each referee race category, proportions should be read across). Departures from equal-proportional allocation could suggest racial bias (e.g., white referees calling fouls disproportionately on black players or vice-versa). A chi-square test was nonsignificant, χ2 (3) = 2.58. For three of the player combinations, the white and black referees’ allocations were within .04 of each other. For example, whereas 19 percent of the total fouls called by white officials were of the black-fouling-white type, 15 percent of the total fouls called by black officials were so constituted. The above 2 X 4 table was also collapsed into 2 X 2 tables, two different ways. Neither an analysis of referee race X race of player called for the foul (collapsing over race of player fouled) nor one of referee race X race of player fouled (collapsing over race of player called for the foul) was significant. The respective χ2 values were 0.57 and 1.85. Black Player Fouls Black Player Black Player Fouls White Player White Player Fouls Black Player White Player Fouls White Player White Referees 98 (.44) 42 (.19) 64 (.29) 19 (.09) Black Referees 58 (.52) 17 (.15) 31 (.28) 6 (.05) Black Player Fouls Black Player Black Player Fouls White Player White Player Fouls Black Player White Player Fouls White Player White Referees 105 (.63) 26 (.16) 25 (.15) 11 (.07) Black Referees 63 (.65) 16 (.16) 14 (.14) 4 (.04) Proportions may not add to 1.00 horizontally, due to rounding. Table 2. Distribution of All Fouls According to Races of Referee, Player Committing Foul, and Player Who Was Fouled (Number and Proportion within Each Referee Race): 2007 & 2008 WNBA Basketball Proportions may not add to 1.00 horizontally, due to rounding. Table 1. Distribution of All Fouls According to Races of Referee, Player Committing Foul, and Player Who Was Fouled (Number and Proportion within Each Referee Race): 2007-08 Big 12 Men’s Basketball Coding followed the same procedures as in Study 1, except this time it was done exclusively by the present author. The WNBA (women’s pro league ) maintains an extensive online archive of full-length game videos (http:// www.wnba.com /video/ ), thus facilitating the research. Seven games were selected, based on racial-ethnic balance in the referees (at least one black and one white official per game) and players, yielding 264 total fouls. These games featured: Los Angeles at Chicago (May 22, 2007); Los Angeles at Sacramento (June 2, 2007); Detroit at Houston (June 2, 2007); Minnesota at Indiana (July 24, 2008); Atlanta at New York (September 5, 2008); Atlanta at Los Angeles (September 11, 2008); and Washington at Connecticut (September 13, 2008). Play-by-play sheets from ESPN.com were used to advance the videos to the times at which fouls occurred. These sheets included the player harmed on each foul, as well as the transgressor, thus allowing the video coding to focus exclusively on the race-ethnicity of the referee calling each foul. Table 2 lays out the results for Study 2 in the same format as was used for Study 1. This time, the similarity of white and black referees’ proportional allocations of fouls to the four categories (black/white player committing foul X black/white player being fouled) was even more pronounced than in Study 1. Accordingly, the chi-square test was nonsignificant, χ2 (3) = 0.74. REFERENCES DISCUSSION NEW YORK TIMES ARTICLE ON ORIGINAL U. PENN- CORNELL STUDY The present studies cannot reject the null hypothesis that white referees’ foul-allocation proportions (regarding the races of players committing and being harmed by fouls) would be indistinguishable from those of African-American referees. Several implications for future research on possible referee bias are apparent: 1. Though statistically significant in the Price and Wolfers (2007) study and potentially capable of affecting game outcomes when aggregated over large numbers of contests, the referee-bias phenomenon is likely to be a relatively small effect. Thus, sample sizes much larger than the present ones would be necessary to detect race-based discrepancies. Other analytic differences between Price and Wolfers’s player-in-game focus and the call-focused structure of the present study could, of course, affect the results in additional ways. As one example, Price and Wolfers weighted each of their observations by an athlete’s minutes played in each particular game, whereas the present methodology had the effect of giving greater weight to players who committed more fouls than to those who committed fewer. Though there is probably some degree of correlation between minutes played and number of fouls committed, the two data-organization schemes ultimately may not be capable of having their exact results dovetail with each other. Hence, as noted above, the present study should be considered a conceptual replication attempt (focusing on relations among the same larger constructs as examined by Price and Wolfers), rather than an exact one. 2. The direct observation of basketball action, should a researcher have the capability to obtain data on large numbers of games, has the potential to enrich the characterization and analysis of foul calls. As long as large numbers of games were available to ensure variation on targeted variables, trained observers could rate features such as the roughness of contact leading to foul calls (i.e., “very slight” to “very severe”), as well as locations on the floor where they occur (e.g., under the basket where fierce battles for rebounds can occur), in order to explore additional hypotheses. One such hypothesis, implied by the aforementioned exclusion from the present studies of fouls in the closing minutes designed to put the leading team on the free-throw line, is that officiating bias is most likely to occur on close calls. One could try to use objective criteria for characterizing the nature of fouls, such as whether officials applied the designation of “flagrant foul” to an infraction, or whether players or coaches visibly complained to officials about a foul call (suggestive of a close call). Both of these occurrences appear to be rare, however, limiting their applicability to statistical analysis (players and coaches probably suppress their urge to complain, in order to avoid technical fouls). Given the limitations of objective measures of close calls, therefore, researchers would need to develop a rating system that could be used by future investigators. 3. “Non-calls,” where appreciable contact appears to occur between players but no foul is called, can also potentially be indicative of referee bias. To the extent that referees favor players of their own race, such favoritism can be manifested in failure to call a foul on an own-race player when one might have been warranted (this was suggested by Holly Coffee, a student in the class). 4. The present studies have expanded the inquiry beyond the research of Price and Wolfers (2007) by examining college, as well as professional, and women’s, as well as men’s, basketball. Both of the present studies yielded similar null findings. It is interesting to note that, with its summer- based schedule, the WNBA’s season runs about as long as college men’s (and women’s) seasons (roughly 30-35 games), in contrast to the 82-game regular season of the men’s NBA. One possible consequence for four-calling is that the NBA gives referees much greater opportunity than the other leagues under discussion to gain familiarity with the players and to develop impressions of their playing styles; whether this would increase or reduce bias is unclear, however. In conclusion, the present studies failed to replicate the original Price and Wolfers (2007) finding of own-race bias in basketball officiating. By introducing a different method, however, we hope to spur further replication attempts and consideration of additional factors that may affect referee behavior. Anderson, C. A. (1989). Temperature and aggression: Ubiquitous effects of heat on occurrence of human violence. Psychological Bulletin, 106, 74-96. Dovidio, J. F., & Gaertner, S. L. (2004). Aversive racism. In M. P. Zanna (Ed.), Advances in experimental social psychology (pp. 1-52). San Diego, CA: Academic Press. FIBA (International Basketball Federation) (2006). Official basketball rules 2006, referees’ manual, three-person officiating. Available at: http://www.fibaasia.net/cms/pdf/3%20person%20manual.pdf Larsen, T., Price, J. & Wolfers, J. (2008). Racial bias in the NBA: Implications in betting markets. Journal of Quantitative Analysis in Sports, 4. Available at: http://www.bepress.com/jqas/vol4/iss2/7 Price, J., & Wolfers, J. (2007). Racial discrimination among NBA referees. Working Paper, National Bureau of Economic Research, Cambridge, MA.

Upload: cora-fitzgerald

Post on 12-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Racial Bias in Basketball Foul Calls: A Replication Attempt Alan Reifman Texas Tech University alan.reifman@ttu.edu INTRODUCTION Recent large-scale analyses

Racial Bias in Basketball Foul Calls: A Replication Attempt Alan Reifman

Texas Tech [email protected]

INTRODUCTIONRecent large-scale analyses of National Basketball Association (NBA) game statistics have examined how the interaction between the race/ethnicity of referees and that of players may affect foul calling (Price & Wolfers, 2007) and betting markets (Larsen, Price, & Wolfers, 2008). Such studies have the potential to map onto social psychological theories of racism, prejudice, and discrimination (e.g., Dovidio & Gaertner, 2004). The initial study, by Price and Wolfers, indeed demonstrated trends consistent with racial bias, in NBA games from the 1991-92 season to 2003-04. With over a quarter-million “player-games” as the units of analysis (i.e., each player was repeatedly used as a separate case for each game he played), these authors found evidence “suggesting that a player earns 0.18 fewer fouls per 48 minutes played when facing three referees [the size of an officiating crew] of his own race than when facing three opposite-race referees,” (p. 7).

As illustrated in the preceding quote, Price and Wolfers’s (2007) use of box scores as their data source did not allow them to pinpoint which referee in a game called any particular foul. Rather, each player’s number of fouls in a game could be associated only with a three-person officiating crew (according to the referee names listed in the box score) and with the aggregate racial composition of the crew. Whenever a player’s fouls were viewed alongside referee combinations other than all-white or all-black – e.g., an African-American player and a crew consisting of two white and one black referees – there would be an implication that white officials called most of the fouls. However, this could not be documented.

An important research concept that can be applied to this line of inquiry is that of triangulation (e.g., Anderson, 1989). To the extent that different researchers, using different methodological approaches to study the same underlying phenomenon, obtain similar findings, then the field’s confidence in the robustness of the findings is strengthened. Accordingly, the present article reports an attempted conceptual replication of Price and Wolfers’ (2007) referee bias study, using a different method. As elaborated below, the present studies had observers watch basketball games and, for each foul called, record the races of the referee making the call, the player called for committing the foul (i.e., the “guilty” party), and the player who was “harmed” by the foul. Though Price and Wolfers discussed the possible role of the “victim” in foul calls, box scores do not permit them to be identified. Whereas our microanalyses of basketball games allow greater detail regarding fouls than do box scores, a negative trade-off for us is small sample size, due to the labor-intensive nature of tracking individual fouls while viewing games.

METHODS RESULTSStudy1

METHODS RESULTSStudy2

As part of a research methodology class project at a large southwestern U.S. university, undergraduate students recorded detailed foul data for men’s college basketball games in the Big 12 conference during 2008 league play (the participating school belonged to this conference, hence the professor of the course felt these games would hold the greatest intrinsic interest for students). The class contained roughly 25 students, each of whom was required to collect data on two games (three for extra credit); the professor of the class (the present author) also gathered data from several games. Students were free to select which Big 12 games they wanted to observe, thus resulting in some duplication of effort. A data-recording sheet was created by the professor and posted on the class webpage for students to print off. These sheets could be completed either at a game in person, while watching a game live on television (with no video recording capability), or while watching a game the student had recorded. The latter option was recommended to students, as they could use online play-by-play sheets (available shortly after the completion of a game) to guide their tracking of fouls in the game video; such play-by-play sheets listed the time of each foul, the guilty party, and (if a shooting foul) the name of the harmed player. Not all students considered themselves knowledgeable about basketball, of course. To combat this, the instructor showed online videos of fouls (not from the season used in the main study) to train students; further, most of the completed data sheets could be cross-checked against those of other students (or the professor) who watched the same game, as well as against online box scores and play-by-play sheets. As was the case in Price and Wolfers’ (2007) study, race/ethnicity of referees and players was based on raters’ perceptions (one foul involving a Chinese player was excluded).

A total of 96 regular-season games are played each season in the Big 12. Six schools (Baylor, Oklahoma, Oklahoma State, Texas, Texas A&M, and Texas Tech) comprise the southern division, whereas another six (Colorado, Iowa State, Kansas, Kansas State, Missouri, and Nebraska) comprise the northern division. Each team plays each of the other teams in its own half twice (once home and once away), and plays each team in the other half once. The participating school is in the southern half. Not surprisingly, the proportion of games coded, out of all possible, was highest for south-vs.-south games (19/30, 63%), presumably as a result of student interest and availability of television broadcasts. For north-vs.-south games, the coverage rate was 11/36 (31%), and for north-vs.-north games, it was 5/30 (17%).

Numerous factors can affect the prevalence of fouls called on members of one racial/ethnic group against members of the same or a different racial/ethnic group. These include each team’s racial/ethnic player composition, the positions played by members of different racial/ethnic groups, and teams’ styles of play (e.g., a finesse team that relies on passing and outside shooting, as opposed to a team that tries to use its muscle inside to rebound and get shots from in close). Whereas Price and Wolfers (2007) controlled for player attributes in their regression equations, we used a different approach to hold extraneous variables constant. What we did was restrict our analyses to games in which there was at least one African-American referee and at least one white referee (in practice, we did not have games with more than one African-American referee). That way, the black and white referees would be watching the same games, with the same players, same player offensive/defensive match-ups, etc. Also, referee protocols (FIBA, 2006) call for officials to rotate their locations on the court, thus avoiding potential confounds between referees’ race/ethnicity and viewing location (e.g., we would not want the area underneath the basket, where a lot of player contact occurs, to be monitored exclusively by a single referee, whether black or white).

Limitation of our sample to games with diverse referee crews reduced the number of eligible contests to 11. These included both games between Texas A&M and Oklahoma State, each of whose player line-ups were overwhelmingly African-American; because virtually all fouls involved African-Americans fouling African-Americans in these two games, they were excluded. The final sample thus included nine games (both games between the University of Texas and Texas Tech; both games between Texas A&M and Texas Tech; Kansas State at Texas Tech; Missouri at Texas Tech; Kansas at Nebraska; Missouri at Baylor; and Missouri at Kansas; because the latter game was coded from a condensed TV rebroadcast, some fouls were missing).

Beyond haphazard missing data (e.g., fouls called outside of camera view on television or being called simultaneously by both a black and white official), there was one other scenario under which fouls were excluded. In the closing minutes of reasonably close games (i.e., where one team is within single-digit points of the other), the trailing team will usually commit fouls to put the leading team on the free-throw line. If free throws are missed, that gives the trailing team (assuming it gets the rebound) a chance to come back and score to reduce the deficit. Such situations are identifiable in play-by-play sheets where, for example, the trailing team scores a basket and then, within a second or two, fouls the leading team. Because the referee-bias notion would seem most applicable to quick, spontaneous judgments (cf., Dovidio and Gaertner, 2004), as opposed to the pro forma nature of late fouls by the trailing team, we excluded the latter.

In contrast to Price and Wolfers (2007), who focused on players, we used fouls (N = 335) as the units of analysis. Table 1 presents a chi-square contingency table organized by race of referee, race of player called for committing a foul, and race of player who was fouled (because the columns combine the race of the player called for the foul with the race of the player who was fouled, the table appears in a two-way [2 X 4] arrangement, instead of a three-way [2 X 2 X 2] set-up). Because the African-American and white referees (in the aggregate) saw the same nine games, with the same players, same match-ups, etc., the null hypothesis would be that white referees’ proportional allocations of fouls into black against black, black against white, white against black, and white against white would be the same as black referees’ allocations into these four categories (for each referee race category, proportions should be read across). Departures from equal-proportional allocation could suggest racial bias (e.g., white referees calling fouls disproportionately on black players or vice-versa).

A chi-square test was nonsignificant, χ2 (3) = 2.58. For three of the player combinations, the white and black referees’ allocations were within .04 of each other. For example, whereas 19 percent of the total fouls called by white officials were of the black-fouling-white type, 15 percent of the total fouls called by black officials were so constituted.

The above 2 X 4 table was also collapsed into 2 X 2 tables, two different ways. Neither an analysis of referee race X race of player called for the foul (collapsing over race of player fouled) nor one of referee race X race of player fouled (collapsing over race of player called for the foul) was significant. The respective χ2 values were 0.57 and 1.85.

Black Player

Fouls

Black Player

Black Player

Fouls

White Player

White Player

Fouls

Black Player

White Player

Fouls

White Player

White Referees 98 (.44) 42 (.19) 64 (.29) 19 (.09)

Black Referees 58 (.52) 17 (.15) 31 (.28) 6 (.05)

Black Player

Fouls

Black Player

Black Player

Fouls

White Player

White Player

Fouls

Black Player

White Player

Fouls

White Player

White Referees 105 (.63) 26 (.16)

25 (.15) 11 (.07)

Black Referees 63 (.65) 16 (.16) 14 (.14) 4 (.04)

Proportions may not add to 1.00 horizontally, due to rounding.

Table 2. Distribution of All Fouls According to Races of Referee, Player Committing Foul, and Player Who Was Fouled (Number and Proportion within Each Referee Race): 2007 & 2008 WNBA Basketball

Proportions may not add to 1.00 horizontally, due to rounding.

Table 1. Distribution of All Fouls According to Races of Referee, Player Committing Foul, and Player Who Was Fouled (Number and Proportion within Each Referee Race): 2007-08 Big 12 Men’s Basketball

Coding followed the same procedures as in Study 1, except this time it was done exclusively by the present author. The WNBA (women’s pro league ) maintains an extensive online archive of full-length game videos (http://www.wnba.com/video/), thus facilitating the research. Seven games were selected, based on racial-ethnic balance in the referees (at least one black and one white official per game) and players, yielding 264 total fouls. These games featured: Los Angeles at Chicago (May 22, 2007); Los Angeles at Sacramento (June 2, 2007); Detroit at Houston (June 2, 2007); Minnesota at Indiana (July 24, 2008); Atlanta at New York (September 5, 2008); Atlanta at Los Angeles (September 11, 2008); and Washington at Connecticut (September 13, 2008). Play-by-play sheets from ESPN.com were used to advance the videos to the times at which fouls occurred. These sheets included the player harmed on each foul, as well as the transgressor, thus allowing the video coding to focus exclusively on the race-ethnicity of the referee calling each foul.

Table 2 lays out the results for Study 2 in the same format as was used for Study 1. This time, the similarity of white and black referees’ proportional allocations of fouls to the four categories (black/white player committing foul X black/white player being fouled) was even more pronounced than in Study 1. Accordingly, the chi-square test was nonsignificant, χ2 (3) = 0.74.

REFERENCES

DISCUSSION

NEW YORK TIMESARTICLE ON

ORIGINALU. PENN-CORNELL

STUDY

The present studies cannot reject the null hypothesis that white referees’ foul-allocation proportions (regarding the races of players committing and being harmed by fouls) would be indistinguishable from those of African-American referees. Several implications for future research on possible referee bias are apparent:

1. Though statistically significant in the Price and Wolfers (2007) study and potentially capable of affecting game outcomes when aggregated over large numbers of contests, the referee-bias phenomenon is likely to be a relatively small effect. Thus, sample sizes much larger than the present ones would be necessary to detect race-based discrepancies. Other analytic differences between Price and Wolfers’s player-in-game focus and the call-focused structure of the present study could, of course, affect the results in additional ways. As one example, Price and Wolfers weighted each of their observations by an athlete’s minutes played in each particular game, whereas the present methodology had the effect of giving greater weight to players who committed more fouls than to those who committed fewer. Though there is probably some degree of correlation between minutes played and number of fouls committed, the two data-organization schemes ultimately may not be capable of having their exact results dovetail with each other. Hence, as noted above, the present study should be considered a conceptual replication attempt (focusing on relations among the same larger constructs as examined by Price and Wolfers), rather than an exact one.

2. The direct observation of basketball action, should a researcher have the capability to obtain data on large numbers of games, has the potential to enrich the characterization and analysis of foul calls. As long as large numbers of games were available to ensure variation on targeted variables, trained observers could rate features such as the roughness of contact leading to foul calls (i.e., “very slight” to “very severe”), as well as locations on the floor where they occur (e.g., under the basket where fierce battles for rebounds can occur), in order to explore additional hypotheses. One such hypothesis, implied by the aforementioned exclusion from the present studies of fouls in the closing minutes designed to put the leading team on the free-throw line, is that officiating bias is most likely to occur on close calls. One could try to use objective criteria for characterizing the nature of fouls, such as whether officials applied the designation of “flagrant foul” to an infraction, or whether players or coaches visibly complained to officials about a foul call (suggestive of a close call). Both of these occurrences appear to be rare, however, limiting their applicability to statistical analysis (players and coaches probably suppress their urge to complain, in order to avoid technical fouls). Given the limitations of objective measures of close calls, therefore, researchers would need to develop a rating system that could be used by future investigators.

3. “Non-calls,” where appreciable contact appears to occur between players but no foul is called, can also potentially be indicative of referee bias. To the extent that referees favor players of their own race, such favoritism can be manifested in failure to call a foul on an own-race player when one might have been warranted (this was suggested by Holly Coffee, a student in the class).

4. The present studies have expanded the inquiry beyond the research of Price and Wolfers (2007) by examining college, as well as professional, and women’s, as well as men’s, basketball. Both of the present studies yielded similar null findings. It is interesting to note that, with its summer-based schedule, the WNBA’s season runs about as long as college men’s (and women’s) seasons (roughly 30-35 games), in contrast to the 82-game regular season of the men’s NBA. One possible consequence for four-calling is that the NBA gives referees much greater opportunity than the other leagues under discussion to gain familiarity with the players and to develop impressions of their playing styles; whether this would increase or reduce bias is unclear, however.

In conclusion, the present studies failed to replicate the original Price and Wolfers (2007) finding of own-race bias in basketball officiating. By introducing a different method, however, we hope to spur further replication attempts and consideration of additional factors that may affect referee behavior.

Anderson, C. A. (1989). Temperature and aggression: Ubiquitous effects of heat on occurrence of human violence. Psychological Bulletin, 106, 74-96.

Dovidio, J. F., & Gaertner, S. L. (2004). Aversive racism. In M. P. Zanna (Ed.), Advances in experimental social psychology (pp. 1-52). San Diego, CA: Academic Press.

FIBA (International Basketball Federation) (2006). Official basketball rules 2006, referees’ manual, three-person officiating.  Available at: http://www.fibaasia.net/cms/pdf/3%20person%20manual.pdf  

Larsen, T., Price, J. & Wolfers, J. (2008). Racial bias in the NBA: Implications in betting markets. Journal of Quantitative Analysis in Sports, 4. Available at: http://www.bepress.com/jqas/vol4/iss2/7

Price, J., & Wolfers, J. (2007). Racial discrimination among NBA referees. Working Paper, National Bureau of Economic Research, Cambridge, MA.