the speed of human capital formation in the baseball industry: the information value of minor-league...

12
MANAGERIAL AND DECISION ECONOMICS Manage. Decis. Econ. 32: 193–204 (2011) Published online 3 February 2011 in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/mde.1526 The Speed of Human Capital Formation in the Baseball Industry: The Information Value of Minor-League Performance in Predicting Major-League Performance Neil Longley and Glenn Wong Isenberg School of Management, University of Massachusetts, Amherst, MA, USA Using a data set of well over 1200 different pitchers covering an almost 20-year time period, this paper reveals that the process of human capital formation for professional baseball pitchers is relatively slow, rendering minor league statistics to be of limited value when projecting major league performance. This indicates that a considerable amount of the performance differences across pitchers at the major league level are revealed only after they reach the majors, and hence is unforeseen given their minor league statistics. These findings illustrate just how difficult it is for all organizations to predict the future success of their apprentice-level employees. Even in an industry such as baseball—where employee output is easily measurable and highly quantifiable, and where the nature of the work at the developmental level is identical to that at the advanced level (i.e. pitching a baseball)—apprentice-level performance only provides modest insights into how that employee will ultimately perform at the advanced level. Thus, firms that erroneously overestimate the importance of apprentice-level performance are at risk of making systematic errors in personnel decisions. Copyright r 2011 John Wiley & Sons, Ltd. 1. INTRODUCTION In occupations that require high levels of skill and/ or ability, employees face a substantial learning curve—it may take many years for them to reach their full potential. In these occupations, the human capital formation process is relatively slow, with employees going through a long ‘developmental’ or ‘apprenticeship’ period. Some of these individuals will ultimately turn out to be high performers at more senior levels in the organization, while others will not. Since an employee’s true productivity will not reveal itself until some point in the future, employers have an incentive to attempt to predict such future outcomes based on current information—they have an interest in predicting which current and prospective employees will eventually become ‘stars’. As such, those firms that are better able to correctly forecast the future productivity of current and prospective employees will have a competitive advantage in the marketplace. However, the problem for employers is that there are many scenarios that can complicate this forecasting process. For example, employees at an apprenticeship level are performing less complex tasks than what they will perform later in their careers, and hence it may be difficult to discern *Correspondence to: Isenberg School of Management, University of Massachusetts, Amherst, MA 01003, USA. E-mail: [email protected] Copyright r 2011 John Wiley & Sons, Ltd.

Upload: neil-longley

Post on 15-Jun-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

MANAGERIAL AND DECISION ECONOMICS

Manage. Decis. Econ. 32: 193–204 (2011)

Published online 3 February 2011 in Wiley Online Library

(wileyonlinelibrary.com) DOI: 10.1002/mde.1526

The Speed of Human Capital Formationin the Baseball Industry: The InformationValue of Minor-League Performance

in Predicting Major-League Performance

Neil Longley� and Glenn Wong

Isenberg School of Management, University of Massachusetts, Amherst, MA, USA

Using a data set of well over 1200 different pitchers covering an almost 20-year time period,

this paper reveals that the process of human capital formation for professional baseball

pitchers is relatively slow, rendering minor league statistics to be of limited value when

projecting major league performance. This indicates that a considerable amount of theperformance differences across pitchers at the major league level are revealed only after theyreach the majors, and hence is unforeseen given their minor league statistics.

These findings illustrate just how difficult it is for all organizations to predict the future

success of their apprentice-level employees. Even in an industry such as baseball—whereemployee output is easily measurable and highly quantifiable, and where the nature of the

work at the developmental level is identical to that at the advanced level (i.e. pitching a

baseball)—apprentice-level performance only provides modest insights into how that employeewill ultimately perform at the advanced level. Thus, firms that erroneously overestimate the

importance of apprentice-level performance are at risk of making systematic errors in

personnel decisions. Copyright r 2011 John Wiley & Sons, Ltd.

1. INTRODUCTION

In occupations that require high levels of skill and/or ability, employees face a substantial learningcurve—it may take many years for them to reachtheir full potential. In these occupations, thehuman capital formation process is relativelyslow, with employees going through a long‘developmental’ or ‘apprenticeship’ period. Someof these individuals will ultimately turn out to behigh performers at more senior levels in theorganization, while others will not. Since an

employee’s true productivity will not reveal itselfuntil some point in the future, employers havean incentive to attempt to predict such futureoutcomes based on current information—theyhave an interest in predicting which current andprospective employees will eventually become‘stars’. As such, those firms that are better ableto correctly forecast the future productivity ofcurrent and prospective employees will have acompetitive advantage in the marketplace.

However, the problem for employers is thatthere are many scenarios that can complicate thisforecasting process. For example, employees at anapprenticeship level are performing less complextasks than what they will perform later in theircareers, and hence it may be difficult to discern

*Correspondence to: Isenberg School of Management,University of Massachusetts, Amherst, MA 01003, USA.E-mail: [email protected]

Copyright r 2011 John Wiley & Sons, Ltd.

major performance differences amongst them,making prediction of future performance difficult.Sometimes, true differences across employees arenot revealed until later in their development process,when tasks and expectations become higher. Or, itmay be the case that some employees are highperformers at lower levels, but this does notultimately translate into high performance at moresenior levels—the employees ‘peak-out’ early in theircareer. Conversely, some employees may be ‘slowstarters’—i.e. early in their careers their performancelags behind their peers—but these employees growand mature in a way that eventually leads them tooutpace others. For employers, the risk hereinvolves discarding these employees too quickly,thus giving-up on an employee who would haveultimately been successful.

The issue, then, is that employers have aneconomic interest in identifying reliable cues, orsignals, that will allow them to predict early in anemployee’s career whether or not that employeewill ultimately turn out to be a high performer.This early identification is important—the longerit takes to reveal who will ultimately be the highestperformers, the costlier it is for employers.In some cases, unnecessary development costs areincurred—i.e. resources might be used to furtherdevelop employees who have little chance of beingsuccessful—and in other cases, the employers mayundervalue an employee’s potential, and thus risklosing that employee to competitor employers whoare better at assessing such things.

However, there is a reality that, in mostorganizations, such notions are extremely difficultto implement. Practical problems abound. First,even measuring employee performance can behighly subjective—many organizations may value,for example, such attributes as ‘creativity’ or‘analytical ability’, but accurately and objectivelymeasuring such factors is often extremely difficult, ifnot impossible. This may be particularly true at theapprenticeship levels, where there may be limitedopportunity and context within which one canmeasure such high-level skills. Second, even ifmeasurable, one must understand, and be able tomeasure, the exact manner in which these individualattributes ultimately affect the organization’ssuccess, assuming the latter is, in itself, objectivelymeasurable, something which is often not the case.In other words, if an organization values, say, bothcreativity and analytical ability, it presumably needsto know precisely how each affect organizational

success, so it is able to properly trade-off one skillagainst the other. If, for example, an apprenticeemployee has greater potential with regards tocreativity rather than analytical ability, how doesone compare this to another employee whoseanalytical potential is greater than his/hercreativity potential?

There is, however, one type of industry—theprofessional sport industry—where such practicalproblems are substantially reduced. The professionalsport industry is characterized by a simplicity andtransparency that allows one to empiricallyinvestigate such issues—not only can bothindividual performance and firm (team) output beobjectively measured, but the exact relationshipbetween the two can be calculated. Furthermore,to the outside researcher, the professional sportindustry offers an environment where all these dataare publicly available and easily accessible.

Within this context, this paper examinespitchers in Major League Baseball (MLB), andanalyzes the extent to which a player’sperformance in ‘the majors’ (at the MLB level)is predictable from his performance in thedevelopmental (i.e. minor) leagues. MLB’s threeprimary developmental levels—A, AA, andAAA—are progressive, in the sense that playerstypically move from A to AA to AAA as theirskills advance. Using a comprehensive data setspanning almost 20 years, the paper is able to tracehow a player’s performance at each of thesedevelopmental levels corresponds to his ultimateperformance as a major league baseball player.It finds, as one might expect, that the quality of the‘signal’ increases as one moves to progressivelyhigher levels of minor league baseball. Thus, aplayer’s AAA performance is generally a betterpredictor of major league success than is his AAperformance, which in turn is a better predictorof major league success than is his performance atthe A level. However, the paper also finds somenotable exceptions to this rule, and identifiescertain player performance measures that arefully revealed as early as A-level baseball.

Overall, however, while the paper does find thatminor league performance has some ability toexplain major league success, there is still arelatively large variation across pitchers in theirmajor league performance that is unexplainableby their corresponding minor league statistics. Thisindicates that a considerable amount of performancedifferences across pitchers is revealed only after they

N. LONGLEY AND G. WONG194

Copyright r 2011 John Wiley & Sons, Ltd. Manage. Decis. Econ. 32: 193–204 (2011)DOI: 10.1002/mde

reach the majors, and hence is unforeseen given theirminor league statistics. This slow rate of humancapital formation in professional baseball impliesthat MLB general managers (GMs) must use minorleague statistics with caution, and an overreliance onthese statistics, in the belief that they accuratelyforetell future MLB performance, may lead toincorrect personnel decisions.

2. RESEARCH PATTERNS IN SPORT LABOR

MARKETS

Kahn (2000) describes the sport industry as a labormarket laboratory—its appeal to researchers isdue to the fact that both individual and firmoutput can be objectively measured, and that suchdata are publicly available. Much of this research,dating back to the seminal work of Scully (1974),has concentrated on player pay, and has examinedthe effects on salaries of such institutional arrange-ments as free agency, salary caps, arbitration, etc.Given the monopsonistic nature of sport labormarkets, an overriding theme of much of theliterature has been to measure the extent to whichvarious institutional arrangements allow players tocapture (in the form of salaries) the full economicvalue of their playing services, as measured by theplayer’s marginal revenue product (MRP). Thus, ithas been the nature of the management/playerrelationship—and the wealth distribution betweenthe two—that has guided much of the economicsliterature on sport labor markets.

On the contrary, there has been somewhat lessresearch analyzing the specific labor market choices ofindividual teams within the market, relative to otherteams. This type of research often pertains to issues ofmarket efficiency—i.e. are assets (players) correctlypriced in the marketplace, or do teammanagers makesystematic errors in the judgment of player talent?Spurr (2000), for example, in a study examining theplayer draft in baseball, found that there were nostatistically significant differences across teams in theirability to find talent, a result consistent with thefunctioning of efficient labor markets.

However, the publication of Michael Lewis’spopular book Moneyball (2003) generated increasedinterest, both popular and academic, in the overallissue. Lewis essentially argued that there wereinefficiencies in the baseball players’ labor market,in that certain player attributes—such as, for

example, on-base percentage—were ‘undervalued’,in the sense that their importance to teamperformance exceeded the extent to which theywere compensated in the market. The implicationof such an assertion is that GMs, as a whole, werenot correctly using all the possible informationavailable to them, and were thus making syste-matic errors in evaluating talent. Lewis arguedthat a GM like Billy Beane of the Oakland A’sdiscovered some of these inefficiencies, largelythrough the use of statistical analysis, andexploited them by acquiring undervalued players,thus allowing him to construct a better team for agiven level of payroll.

With the publication of Moneyball, economistsbegan to more extensively examine such issues.Hakes and Sauer (2006) found general support forthe Moneyball hypothesis, in that on-base per-centage was undervalued for a period of time, butthat such inefficiencies ultimately corrected them-selves. Bradbury (2007) focused on pitchers, andusing ‘defensive-independent’ pitching statistics asmeasures of pitcher performance found the marketto have correctly priced such performance attributes.

A theme of Lewis’s argument in Moneyball isthat labor market inefficiencies were exploited byBilly Beane through the use of quantitative analysisof player’s statistics, rather than relying on themore traditional approach employed in baseball,where the subjective, qualitative, opinions of theteam’s scouting staff were heavily relied upon.Lewis asserts that the subjective methods areprone to various observation errors, ultimatelyresulting in systematically incorrect assessments ofcertain players.

However, an implicit underlying assumption ofthe statistical-based approach is that a player’spast performance, as evidenced by his statisticalmeasures, is an effective tool to predict his futureperformance. This paper provides a test of thisassumption by examining the relationship betweena player’s performance in the major leagues, versushis performance in the minor leagues. Clearly, thestronger this connection between minor league andmajor league performance, the greater the valuethat one can place on minor league performance,and the more comfortable GMs should be inmaking personnel decisions based on a player’sminor league performance. There are certainlysome individuals associated with the game whofeel the nature of this relationship is quite strong—‘sabermetrics’ pioneer, and current Boston Red

THE SPEED OF HUMAN CAPITAL FORMATION IN THE BASEBALL INDUSTRY 195

Copyright r 2011 John Wiley & Sons, Ltd. Manage. Decis. Econ. 32: 193–204 (2011)DOI: 10.1002/mde

Sox consultant, Bill James has been quoted assaying that he is ‘adamant that minor leagueperformance can be used to project big leagueperformance’ (Boston Globe, November 11, 2007,p. D13). However, while James is an icon withinthe baseball analytics community, such viewshave never been subjected to rigorous academicscrutiny. As Bradbury (2007) notes, the analytics/sabermetrics community and the academiccommunity have generally had little crossover,and thus many of the arguments of thesabermetrics community have not necessarilybeen subjected to rigorous empirical testing, norhave they been subjected to formal peer review.

Before proceeding to the empirical test, thefollowing section provides a brief background onthe nature of minor league baseball.

3. BASEBALL’S MINOR LEAGUES

Major League Baseball relies significantly on itsplayer development system, especially whencompared to other major professional team sportsin North America, with almost all major leagueplayers having spent time in the minor leagues. In1901, the existing minor leagues formed theNational Association of Professional Baseball inan effort to preserve their independence and providea collective voice to negotiate with the Americanand National Leagues. This association culminatedwith the signing of the ‘National Agreement for theGovernment of Professional Baseball Clubs’ in1903. In 1962, MLB began using a StandardizedPlayer Development Contract (PDC). While clubspreviously had informal relationships with minorleague teams, the new PDC contracts mandated thatmajor league clubs pay a portion of the minorleague players’ salaries, provide travel and per diemcosts, and share television revenue with their minorleague affiliates (Zimbalist, 1992). Today, withnearly 200 affiliated minor league teams and morethan 4600 players spread throughout the UnitedStates and Canada, the use of PDCs have allowedmajor league teams to maintain control of theirprospects while allowing them to progress throughthe minor leagues, and ultimately contribute at themajor league level.

The current minor league system, established in1963, is composed of four distinct classifications;Class AAA, Class AA, Class A, and Rookie, each

having a varying level of talent, ranging from ‘majorleague ready’ in Class AAA to recently signed highschool draft picks in the Rookie level. As of 2009,there were 187 affiliated teams playing in 20 differentleagues throughout the United States and Canada.

In 2009, roughly half of the 1521 playersdrafted were high school seniors. For most ofthem, their path to the major leagues will begin inthe lowest level of the minor leagues, Rookie Ball.There are two leagues at this level, the Gulf CoastLeague and the Arizona League. Games areplayed daily in the afternoon, usually at themajor league club’s spring training complex,and usually with very few fans in attendance.Following a year or less in Rookie Ball, prospectsare normally promoted to ‘Short Season’ A, wherethey are joined by many of the college playersdrafted (players drafted out of college tend todevelop at a faster rate than players drafted out ofhigh school). The two leagues at this classificationare the New York-Penn and Northwest League.To accommodate the recently drafted collegiateplayers, these leagues do not begin play until afterthe completion of the NCAA College WorldSeries. Players who show potential rarely spendmore than one year at this level before beingpromoted to regular Class A.

Upon promotion to Class A, players firstencounter the rigors of an April–Septemberschedule, playing in one of the two further sub-classifications within Class A; Class A-Advancedand Class A. At this level, players have usually beenin the minor leagues for two to three years, with theexception of a team’s high first round draft picks.

For many minor leaguers, promotion to ClassAA means that an organization is aware of theplayer’s progress and views the player as apotential major leaguer. In Class AA, the gamestraditionally draw moderate-sized crowds, andplayers begin to compete against a higher levelof talent that does not exist in the lowerclassifications. Players with major league talentwill usually spend an additional two to three yearsat this level; however, teams routinely will call up aClass AA pitcher for a ‘spot start’ with the majorleague club due to an injury, or to give theirpitching rotation extra rest. If a player continuesto excel during his Class AA career, a promotionto Class AAA is usually forthcoming.

Class AAA, or ‘Triple A’ as it is commonlycalled, is the pinnacle of baseball’s minor leagues,with teams playing in large cities that are not hosts

N. LONGLEY AND G. WONG196

Copyright r 2011 John Wiley & Sons, Ltd. Manage. Decis. Econ. 32: 193–204 (2011)DOI: 10.1002/mde

to an MLB team. Whereas travel in the rest of theminor leagues is usually done by bus, Class AAAteams routinely travel via air, and it is the closestexperience to that of an actual major league club.Games at this level are highly competitive andhave large fan following; average attendance in2009 was 6753. A team’s roster in Class AAAtraditionally includes up to 15 members of themajor league club’s ‘40 Man Roster’ who are noton the ‘25 Man Active Roster (i.e. major leagueroster),’ as well as several players who are almostready for a chance at ‘big league’ success. Onaverage, a position player will spend between fourand six years playing in the minor leagues, whilepitchers tend to spend three to five years beforeplaying in the major leagues full-time.

While a player’s progression through the minorleagues is traditionally one-way, there areexceptions. If a major league player is placed onthe disabled list, he will often play at least onegame in Class AAA to get re-acclimated beforereturning to the majors. This is especially true forpitchers, who may have two or three starts in theminor leagues before being activated to the majorleague roster. More recently, players who havebeen suspended for the use of performanceenhancing substances, such as Los AngelesDodgers left fielder Manny Ramirez, have madeseveral appearances in the minor leagues toprepare for their return. In addition, playerstoward the end of their careers will often signminor league contracts with major league teams inhope of performing well enough to be added to theactive roster. For example, Paul Byrd, a pitcher,was signed by the Red Sox in July 2009 andpitched in four minor league games prior tostarting seven games for Boston.

4. METHODS AND BASE MODEL RESULTS

4.1. The Focus Variable: Major League Performance

The primary goal of the paper is to determine theextent to which variations across pitchers in theirmajor league performance are traceable tocorresponding variations in their minor leaguestatistics.

A first step in the process is to specify the mannerin which the pitcher’s major league performance willbe measured. Unfortunately, there is no single,universally agreed-upon, measure of pitcher

performance. Traditionally, however, earned runaverage (ERA) has probably been the single mostwidely recognized summary statistic of pitchingquality, whether it be by the sports media, fans, oreven many of those within the baseball industry.This acceptance of ERA has carried over to theacademic literature. For example, Zimbalist (1992),Kahn (1993), and Krautman et al. (2003) all useERA as a measure of pitcher quality.

However, the use of ERA is not without itscritics. Those in the sabermetrics community havelong been critical of ERA, contending that it iscontaminated by the quality of defense on apitcher’s team, and hence is not a pure measureof pitcher ability. This criticism has led to thecreation of various ‘defensive independent pitchingstatistics’ (DIPS), all of which are intended tocontrol for differences across pitchers in their levelof defensive support. Bradbury (2007) hasextended this debate into the academic literature,and uses DIPS in his analysis.

While this debate as to the relative efficacy ofalternative measures of pitcher performance is nodoubt important, it is not our intent in this paperto enter into this particular discussion. Thus, weproceed in the paper using ERA as the measure ofpitcher performance, largely because of itshistorical preeminence, but at the same time weacknowledge its potential weaknesses.

Given our very large sample size (over 1500pitchers, to be discussed later), we do not believe thatthis measurement issue will materially affect ourresults. Unless defensive support systematically variesacross pitchers, i.e. it is not random, and that somepitchers are more likely to be plagued by this thanothers over their career, then its impact across apitcher’s entire career should be negligible. Giventhat the number of innings-pitched is very large formost pitchers in our sample, and given that mosthave played for many teams and have had manydifferent teammates, it is reasonable to presume thatany defensive factors would ultimately have little orno aggregate impact over time.

4.2. The Production Function: Explaining Major

League ERA

With ERA adopted as the sole measure of pitcher‘output’ (i.e. performance), the next step is tospecify the various ‘inputs’ that determine suchoutput. Unlike ERA, which measures an end-result—i.e. the number of ‘earned’ runs scored

THE SPEED OF HUMAN CAPITAL FORMATION IN THE BASEBALL INDUSTRY 197

Copyright r 2011 John Wiley & Sons, Ltd. Manage. Decis. Econ. 32: 193–204 (2011)DOI: 10.1002/mde

against a pitcher per nine innings—the inputvariables relate more to the micro-level aspects ofthe game, and more precisely analyze the results ofspecific pitcher-batter confrontations. In essence,there can only be two outcomes of the pitcher-batter confrontation–either the batter gets safelyon-base, or he does not. Furthermore, there arethree ways to get on-base—by getting a hit (eithera single, double, triple, or home-run), by gettingwalked (i.e. receiving a base-on-balls), or bygetting hit by a pitch. Since the latter is relativelyrare, a pitcher’s hits and base-on-balls totals are thetwo primary determinants of his ultimate success.In this context, then, the following productionfunction is employed:

ERAMLB ¼ fðHMLB; BBMLB; KMLBÞ ð1Þ

where ERAMLB is the pitcher’s ERA at the majorleague level; HMLB is the pitcher’s hits-allowed pernine innings-pitched at the major league level;BBMLB is the pitcher’s base-on-balls per nineinnings-pitched at the major league level; andKMLB is the pitcher’s strikeouts per nine innings-pitched at the major league level. The latter variablepertains to how the pitcher got the batter out—andmeasures outs made via strikeouts, as opposed toouts made after the batter puts the ball in play. Therehas long been a belief in baseball that ‘strikeoutpitchers’ are preferred, not only because they may bemore dominant and intimidating, but because theyare less dependent on luck (i.e. for example, softly hitballs dropping in for a hit), and because baserunnersare less likely to advance on a strikeout compared toouts made with balls put in play.

The regression in Equation (1) is run using dataon 1577 different pitchers (all data obtained fromStats Inc.) who pitched at least 20 innings in themajor leagues between 1986 and 2004. The resultsare provided in Table 1.

Table 1 shows that the overall explanatorypower of Equation (1) is relatively high, with thethree independent variables explaining 73% of thevariation in ERAs. All three independent variablesare highly significant, and all three have positivecoefficients. The positive signs on BBMLB andHMLB are as expected—pitchers that allow morewalks and hits will tend to have higher ERAs.

The positive sign on KMLB indicates that, for anygiven number of hits and base-on-balls that a pitcherallows, pitchers that record a greater proportion oftheir outs by strikeouts, as opposed to outs on ballsput ‘in play’, will have higher ERAs. At first, this

may seem somewhat counterintuitive and contraryto conventional wisdom, where the media, forexample, closely tracks ‘strikeouts leaders’ as astatistical category, and where ‘strikeout pitchers’are portrayed as particularly valuable. However,there may be a couple of explanations for thepositive sign on KMLB. First, while our modelcontrols for ‘hits’, the variable HMLB measures alltypes of hits (singles, doubles, triples, and homeruns). It may be the case that ‘strikeout pitchers’ aremore likely to give-up certain types of hits—namely,home runs—than are other pitchers (data limitationsprevented us from obtaining comprehensive minor-league home-run information for all pitchers in thesample). Strikeout pitchers tend to throw at a highvelocity, and high velocity pitches will be hit furtherif the batter is able to make contact. In fact,conventional wisdom in baseball does suggest thatstrikeout pitchers are generally more prone to‘giving up the long ball’. Thus, HMLB may becapturing two opposing effects. First, strikeoutpitchers are valuable in that they record outswithout balls being put in play, thus minimizing apitcher’s reliance on the quality of his defenders, andalso preventing the batter from advancing a runnerthrough a sacrifice. However, this effect is fully orpartially offset by the fact that strikeout pitchers aremore likely to allow more damaging types of hits.

A second possible explanation for the positivesign on KMLB is that the variable measures strike-outs per nine innings, not the more commonlyreported media statistic of total strikeouts. To theextent that the best pitchers in any given season areobserved to also have high levels of total strikeoutsin that season may mask the fact that these pitchersalso have high levels of non-strikeout ‘outs’. Sincetotal strikeouts are closely tracked in the media,but total non-strikeout ‘outs’ are not, there maya misperception that strikeouts (as opposed tonon-strikeout outs) are more valuable to pitcher

Table 1. Importance of Skill Variables to MajorLeague ERA

Dependent variable: major league ERA

Independentvariable

Unstandardized coefficient(standardized coefficient in brackets)

t-Statistic

Constant �0.42 �21.7KMLB 0.04 (0.05) 3.49BBMLB 0.41 (0.41) 30.03HMLB 0.70 (0.76) 51.52

R2 5 0.73; N5 1577.

N. LONGLEY AND G. WONG198

Copyright r 2011 John Wiley & Sons, Ltd. Manage. Decis. Econ. 32: 193–204 (2011)DOI: 10.1002/mde

success than they actually are. As an example of thelatter issue, of the six leading career (total) strikeoutleaders in our dataset (Nolan Ryan, Roger Clemens,Randy Johnson, Bert Blyleven, Greg Maddux, andFrank Tanana), only one (Randy Johnson) is notalso on the top-6 list of career non-strikeout ‘outs’.In fact, in our dataset as a whole (N5 1577pitchers), the correlation coefficient between totalstrikeouts and total non-strikeout ‘outs’ was a veryhigh 0.93. Thus, good pitchers, by definition, get alot of outs, both strikeout and non-strikeout.

However, what we measure with KMLB is nottotal strikeouts, but strikeouts per nine innings (i.e.per 27 outs). Thus, the issue is not whether gettingouts is good for a pitcher—we clearly know it is—but whether getting outs via strikeouts is superior togetting outs via other means. To further examinethis, we again go back to our dataset and rank all1577 pitchers by their career innings-pitched. Wethen divide these players into groups of 200, andmeasure how each group’s statistics change withrespect to our three skill variables—strikeouts pernine innings (KMLB), base-on-balls per nine innings(BBMLB), and hits per nine innings (HMLB). Theresults are shown in Table 2, and indicate that bothBBMLB and HMLB behave as expected—i.e. theyboth steadily increase as one moves to successivelylower-ranked groups of pitchers. However, forKMLB, no such pattern is found—in fact, thetop-200 group of pitchers actually averages fewerstrikeouts per nine innings than any other group inthe dataset, albeit the differences across all groupstend to be very small. Thus, while a furtheranalysis of this issue is beyond the scope of thispaper, and is thus left for future research, thesefindings do indicate the possibility, contrary toconventional wisdom, that strikeouts are no moreimportant than non-strikeout ‘outs’ to a pitcher’soverall success.

4.3. Minor League Performance

The second step in the process is to determine theextent to which the input variables at the majorleague level—i.e. KMLB, BBMLB, and HMLB—areexplainable by the corresponding input variables atthe minor league level. If, for example, pitchers withhigh BBMLB values also tended to have given-up ahigh number of base-on-balls in the minors, andvice-versa, then a pitcher’s minor league base-on-balltotals might provide an effective early signal as to thepitcher’s expected future ERA in the major leagues.

To begin with, a pitcher’s minor leagueperformance will be considered in aggregatedform—that is, his performance at the AAA, AA,and A levels will be consolidated. Aggregation hasboth advantages and disadvantages. The advantageis that it increases the number of ‘innings-pitched’for every pitcher—i.e. since all input measures are inthe form of ratios, in that they are on an ‘innings-pitched’ basis, consolidating the data automaticallyincreases the number of innings-pitched, and, inessence, gives us more ‘observations’ on the pitcher’sperformance. The disadvantage of aggregating isthat it masks potential differences in effects acrossthe three levels. As such, the consolidationrestriction will be relaxed in the following section,and minor league performance will be disaggregatedinto the three levels of minor league play.

Using, then, the consolidated minor league statis-tics, three separate regressions were run for each ofour three input variables. In each case, a player’sMLB statistic (Example, KMLB) was regressed onhis minor league performance for the same variable(KMIN). The results are given in Table 3.

The results show a considerable differenceacross variables in terms of the ability of minorleague performance to predict major leagueperformance. K has the greatest explanatorypower, with 37% of the variation in MLB

Table 2. Ranking of Pitcher Quality Relative toSkill Variables

Rank: career innings-pitched KMLB BBMLB HMLB

1–200 5.85 3.06 8.91201–400 6.21 3.33 9.00401–600 6.30 3.60 9.00601–800 6.21 3.87 9.27801–1000 6.03 4.05 9.451001–1200 6.12 4.23 9.631201–1400 5.94 4.59 9.901401–1577 5.94 4.95 10.26Average 6.12 3.96 9.45

Table 3. Relationship of Minor League to MajorLeague Performance for Various Skill Variables

Dependent variable

KMLB BBMLB HMLB

Constant 1.31 (8.10) 1.73 (17.63) 4.75 (15.04)KMIN 0.65 (30.11)BBMIN 0.65 (23.66)HMIN 0.55 (14.91)R2 0.37 0.26 0.12N 1577 1577 1577

t-Stats in parentheses.

THE SPEED OF HUMAN CAPITAL FORMATION IN THE BASEBALL INDUSTRY 199

Copyright r 2011 John Wiley & Sons, Ltd. Manage. Decis. Econ. 32: 193–204 (2011)DOI: 10.1002/mde

performance across pitchers being explained by thecorresponding variation at the minor league level.BB has the second greatest explanatory power,with an R2 of 0.26, while H has the leastexplanatory power with an R2 of 0.12.

The results are not necessarily surprising.For example, the K statistic may depend moreupon a pitcher’s natural physical ability—some are‘power’ pitchers, while others are not—and henceone would not expect this to vary significantlybetween the majors and minors. Conversely,statistics like BB and H may depend moregreatly on finesse and experience, and thus onecould expect less predictability. There could also,of course, be other reasons for the lower R2 forBB and H—for example, with BB, there may begreater inconsistency in umpires calling balls andstrikes in the minor leagues, compared to themajors; along a similar line for H, there may begreater variation in the minors than in the majorsin both defensive ability and in the propensity ofofficial scorers to call errors. For each of the threeregressions, then, the following is true.

KMLB ¼ KPMLB1KR

MLB;

BBMLB ¼ BBPMLB1BBR

MLB;

HMLB ¼ HPMLB1HR

MLB

where KMLB is a pitcher’s actual number of strikeoutsper nine innings at the major league level, KP

MLB is thepitcher’s predicted (i.e. fitted) number of major leaguestrikeouts, based on his minor league strikeouts,and KR

MLB is the residual from the regression. Theequations for BBMLB and HMLB are interpretedsimilarly. The residual can be interpreted as thatportion of a pitcher’s major league strikeouts (or BBs,or hits) which were not predictable from his minorleague performance. In essence, the residual can beviewed as information about a pitcher that is onlyrevealed at the major league level, and that thereforewas not observable at the minor league level.

By substituting these predicted and residualvalues for their actual major league totals,Equation (1) can be rewritten as1:

ERAMLB ¼ fðKPMLB; K

RMLB; BB

PMLB; BB

RMLB;

HPMLB; H

RMLBÞ ð2Þ

In order for the regression results to be moreeasily interpretable, the three ‘predicted’ variablesin Equation (2) are transformed so that they arestated in units of ‘minor league performance’

rather than ‘major league performance’. With thetransformation, KP

MLB, for example, is convertedto its corresponding minor league figure, by notingthat:

KPMLB ¼ C1B1KMIN where C is a constant, B1 is

the regression coefficient, and KMIN is the actualnumber of base-on-balls per nine-inning that thepitcher allows in the minor leagues. Thus, KMIN issimply a linear transformation of KP

MLB, and willthus vary in the same manner as KP

MLB. UsingKMIN in the regression instead of KP

MLB allows oneto measure variability across pitchers in terms ofminor league base-on-balls, instead of thecorresponding, but less intuitively appealing,‘predicted’ (from minor league performance)major league base-on-balls.

Making this substitution, the regression resultsare provided in Table 4.

Perhaps the most instructive way to analyze theresults is to take each type of input variable, one ata time. Starting first with hits, both the residualvalue and the minor league (i.e. predicted) valuesare significant. This means that differences acrosspitchers in hits-allowed at the minor league leveldo impact future major league ERA performance.However, with HMLB

R also being significant, itmeans there are performance differences (in termsof hits-allowed) across pitchers at the majorleague level that are not explained by minorleague performance, and that those differencesare significant factors in determining a pitcher’smajor league ERA. With the latter, thesedifferences across pitchers are revealed only atthe major league level, and could be termed MLB-specific information. Examining the coefficients onthe two variables gives a sense as to the magnitude

Table 4. Impact on Major League ERA of Aggre-gated Minor League Performance

Dependent variable: major league ERA

Independent variable Unstandardized coefficient(standardized coefficientin brackets)

t-Statistic

Constant 0.40 1.33KR

MLB 0.02 (0.02) 1.57

BBRMLB 0.44 (0.37) 27.57

HRMLB 0.70 (0.70) 49.04

KMIN 0.06 (0.07) 4.30BBMIN 0.20 (0.15) 11.11HMIN 0.38 (0.26) 15.29

R2 5 0.74; N5 1577.

N. LONGLEY AND G. WONG200

Copyright r 2011 John Wiley & Sons, Ltd. Manage. Decis. Econ. 32: 193–204 (2011)DOI: 10.1002/mde

of the effect of each on major league ERA. Thecoefficient on HMLB

R is 0.70, compared to only 0.38for HMIN, indicating that information at the majorleague level is more ‘valuable’ than similarinformation at the minor league level—MLB-specific differences across pitchers in hits-allowedare almost twice as important to a pitcher’s MLBERA as the corresponding differences acrosspitchers in hits-allowed in the minor leagues.

A similar pattern of results exists for the base-on-balls variables, with both BBR

MLB and BBMIN

being positive and significant, but with thecoefficient on the former (0.44) being more thantwice as large as the coefficient on BBMIN (0.20).Thus, while walks-allowed at the minor leaguelevel are not unimportant in explaining majorleague ERAs, they matter much less than the newinformation (regarding base-on-balls) that getsrevealed only at the major league level.

In contrast to the hits and base-on-ball variables,the results for the strikeouts variables show a verydifferent effect. While KMIN is significant, KR

MLB isnot. This indicates that it is only the variations instrikeouts at the minor league level which impactmajor league ERA, and that no new information(with respect to strikeouts) is added at the majorleague level. However, the magnitude of the impactof minor league strikeouts is still quite small whencompared with the hits and walks variables, witheach increase in strikeouts (per nine innings) at theminor league level ultimately increasing majorleague ERA by 0.06.

As discussed earlier, this result for strikeouts isnot necessarily surprising. The skills related toproducing strikeouts may be relatively ‘‘physical’’in nature, and hence determined earlier in aplayer’s career, whereas the skills associated withpreventing hits and walks may be more subtle andfinesse-oriented, and hence may take longer todevelop and master.

5. AN EXTENSION: ASSESSING THE

INFORMATION CONTENT AT EACH

MINOR LEAGUE LEVEL

This section extends the analysis of the previoussection by disaggregating a player’s combined minorleague performance into three separate levels—AAA, AA, and A—and then measuring the effectsof each on MLB performance. The specificempirical analysis is identical to the one in the

previous section, except that it replaces the single,aggregated, minor league performance measures foreach variable, with separate measures for each ofthe three different minor league levels.

For example, with the AAA analysis, each MLBinput variable (K, BB, and H) is regressed on theplayer’s corresponding AAA stat, providing both apredicted and residual value. These values are thenemployed in a second-step regression that uses aplayer’s MLB ERA as the dependent variable,identical to the analysis of the previous section.A separate analysis is done for both AA and Astats. In each case, the purpose is to determine theextent to which a player’s stats at that levelultimately determine his MLB performance. Sincethe minor league levels represent a progression inthe quality of play, one would expect performanceat the AAA level to be a better predictor of MLBperformance than performance at the AA level,which in turn should be a better predictor of MLBperformance than A-level performance.

For each of the three analyses, players wereincluded if they pitched at least 20 innings at theMLB level, and at least 20 innings at thecorresponding minor league level. The results areprovided in Table 5.

Numerous insights can be gained from theresults. First, KR

MLB is not significant in any of theregressions, while the K variables associated witheach of the respective minor league levels aresignificant in all regressions. With the latter set ofvariables, both the magnitude and significance are

Table 5. Impact on Major League ERA of Dis-aggregated Minor League Performance

Dependent variable: major league ERA

Model 1 Model 2 Model 3

Constant 3.08 (13.96) 3.00 (15.54) 1.74 (7.86)KR

MLB �0.002 (�0.12) �0.004 (�0.80) �0.008 (�0.57)

BBRMLB 0.41 (25.43) 0.41 (24.93) 0.41 (23.63)

HRMLB 0.65 (46.26) 0.66 (46.20) 0.67 (47.24)

KA 0.05 (4.05)BBA 0.10 (5.89)HA 0.13 (7.13)KAA 0.03 (2.82)BBAA 0.11 (6.91)HAA 0.15 (9.21)KAAA 0.04 (2.92)BBAAA 0.19 (11.30)HAAA 0.24 (14.16)R2 0.73 0.73 0.74N 1376 1384 1488

t-Stats in parentheses.

THE SPEED OF HUMAN CAPITAL FORMATION IN THE BASEBALL INDUSTRY 201

Copyright r 2011 John Wiley & Sons, Ltd. Manage. Decis. Econ. 32: 193–204 (2011)DOI: 10.1002/mde

highest at the A-level, meaning that differences instrikeouts across pitchers contain more informationvalue (with respect to influencing major leagueERA) at the A-level, than they do at the AA or AAAlevels. This result is generally consistent with thediscussion of the previous section, in that differencesacross pitchers in strikeout potential will revealthemselves relatively early in a player’s career.

For the two other types of input statistics, hitsand walks, both the residuals (i.e. MLB-specificinformation) and the respective minor league statsare significant in all three regressions. In all cases,the MLB-specific variables have greater signi-ficance and magnitude than the correspondingminor league stats. For example, in the AAAregression, each additional MLB-specific (i.e.unexplained by AAA stats) walk increases majorleague ERA by 0.41, compared to only a 0.19impact for each AAA walk; similarly, for hits,each MLB-specific hit increases major leagueERA by 0.65, compared to 0.24 for each AAAhit-allowed.

Furthermore, for both hits and walks, themagnitude and significance of the coefficientincreases as one moves to successively higherlevels of minor league play. For example, withhits, the expected impact on major league ERA ofan additional hit-allowed at the A-level is 0.13,compared to 0.15 at the AA-level, and 0.24 at theAAA-level. Such results are consistent with thenotion that performance at the AAA-level shouldbe a better predictor of major league success thanperformance at the AA or A levels.

While these results are intuitively appealing, theanalysis can be refined still further. Up to now, theanalysis treats each minor league level as aseparate entity, and a player’s stats at that levelare assumed to be exogenous. In reality, however,performance at, say, the AAA-level is partiallyexplainable by his performance at the AA-level,which, in turn, is partly explainable by hisperformance at the A-level. For example, takinghits, the following is true:

HMLB ¼ fðHAAAÞ

That is, a player’s major league hits-allowed is afunction of his hits-allowed at AAA. However, thefollowing is also true.

HAAA ¼ fðHAAÞ and HAA ¼ fðHAÞ

Thus, separate regressions must be run for eachof the three equations, and with each there will be

an ‘explained’ component (i.e. explainable fromperformance at a lower level) and an ‘unexplained’component (i.e. new information that is revealedonly at the level in question).

Thus, HMLB ¼ fðHRMLB; HAAAÞ, meaning that

hits-allowed in the majors is a function of bothinformation revealed only at the major league level(i.e. the residual value, HR

MLB), and informationalready revealed from AAA performance (HAAA).Similarly:

HAAA ¼ fðHRAAA; HAAÞ and

HAA ¼ fðHRAA; HAÞ

Substituting gives:

HMLB ¼ fðHRMLB; H

RAAA; H

RAA; HAÞ

Thus, hits-allowed at the MLB level is a functionof the new information that is revealed at each ofthe four levels. Running this regression gives amore pure measure of the value-added at eachlevel, because it controls for information alreadyrevealed at previous levels.

Using a similar process for both the BB andK variables, the GLS results are reported in Table 6.

The results reveal patterns not discernible fromthe previous analysis. Taking first the ‘hit’ variables,all four are significant and positive, with MLB-specific information again having the greatestimpact (a coefficient of 0.66), distantly followedby AAA-specific information, with AA-specific andA-specific information being the lowest, and almost

Table 6. Impact on Major League ERA of Endo-genous Minor League Performance

Dependent variable: major league ERA

Independentvariable

Unstandardized coefficient(standardized coefficientin brackets)

t-Statistic

Constant 3.26 12.65KR

MLB �0.001 (�0.001) �0.06KR

AAA �0.004 (�0.003) �0.21KR

AA 0.02 (0.02) 1.07KA 0.05 (0.06) 3.27BBR

MLB 0.40 (0.32) 20.39

BBRAAA 0.19 (0.12) 8.01

BBRAA 0.07 (0.05) 3.54

BBA 0.07 (0.06) 4.02HR

MLB 0.66 (0.70) 42.28

HRAAA 0.19 (0.16) 9.28

HRAA 0.12 (0.12) 7.00

HA 0.13 (0.11) 6.05

R2 5 0.74; N5 1217.

N. LONGLEY AND G. WONG202

Copyright r 2011 John Wiley & Sons, Ltd. Manage. Decis. Econ. 32: 193–204 (2011)DOI: 10.1002/mde

equal to each other. Similarly, all four base-on-ball variables are significant, with MLB-specificinformation again having the largest impact,followed again by AAA-specific information, andthen AA and A-specific information.

However, with strikeouts, a very differentpattern emerges, compared to that for hits andbase-on-balls. Of the four strikeout variables, onlyone—that pertaining to A-specific information—issignificant. This implies that no new informationabout strikeouts (at least as it applies to impactingmajor league ERA) is added at the major league,AAA or AA levels. This further reinforces theearlier discussion regarding strikeout rates beingdetermined early in a player’s career. Thus, theprevious analysis in this section, where each minorleague level was taken separately, masked someimportant relationships. In that analysis, bothAAA strikeouts and AA strikeouts, takenindependently, were found to also impact majorleague ERAs. However, by disaggregating theeffects, one could now conclude that the infor-mation content was actually added at the A-level,and that the AAA and AA strikeouts were simplyreflecting information that had already beenrevealed at the A-level.

The results imply that general managers shouldbe able to observe pitchers at the A-level, and begenerally assured that differences across pitchers instrikeouts at this level will fully capture all relevant(in terms of major league ERA) information aboutthis variable, and that no new information will beadded later in the player’s career. Thus, once‘power’ pitchers are identified at the A-level, theywill not generally transform into finesse pitchersat the major league level, or vice versa.

This finding regarding strikeouts should begenerally encouraging for general managers, inthat it allows them to make a relatively earlyidentification of a player’s expected futureperformance in this area, compared to thesituation with hits and base-on-balls, wheresignificant information is revealed throughout aplayer’s entire minor league career, and, even moreimportantly, after the player actually reaches themajors. However, the ultimate impact of suchinformation is limited by the fact that strikeouts(per nine innings) themselves do not have a largeimpact on ERA, compared to hits and base-on-balls. The coefficient on KA shows that eachadditional strikeout per nine innings at theA-level only impacts major league ERA by 0.05,

and this effect is, in fact, an adverse one—morestrikeouts per nine innings actually increasesexpected ERA at the major league level.

5. CONCLUSIONS

This paper reveals that the process of humancapital formation for professional baseballpitchers is relatively slow, rendering minor leaguestatistics to be of limited value when projectingmajor league performance. This indicates thata considerable amount of the performancedifferences across pitchers at the major leaguelevel is revealed only after they reach the majors,and hence is unforeseen given their minor leaguestatistics.

These findings illustrate just how difficult it isfor all organizations to predict the future successof their apprentice-level employees. Even in anindustry such baseball—where employee output iseasily measurable and highly quantifiable, andwhere the nature of the work at the develop-mental level is identical to that at the advancedlevel (i.e. pitching a baseball)—apprentice-levelperformance only provides modest insights intohow that employee will ultimately perform at theadvanced level. Thus, firms that erroneouslyoverestimate the importance of apprentice-levelperformance are at risk of making systematicerrors in their personnel decisions.

NOTES

1. Using residuals from one regression as an input intoa second regression has been employed in a variety ofother works. For example, Kalt and Zupan (1984)regressed legislators’ Congressional voting recordson a vector of variables representing constituentinterests, and then interpreted the residuals asmeasures of the legislator’s personal ideology(i.e. that part of their voting record that was notexplained by constituent interests). They then usedthis residual as an independent variable in a second-step regression to explain voting patterns on strip-mining issues. In the economics of sport area,Alexander and Kern (2005) use this approach intheir study of PGA tour earnings. For example, theyregress ‘greens in regulation’ on ‘driving distance’ and‘driving accuracy’, and interpret the residual as ameasure of the golfer’s iron-playing ability (i.e. thatpart of ‘greens in regulation’ not explained by drivingskills). This residual is then used in a second-stepregression to explain player money winnings. They

THE SPEED OF HUMAN CAPITAL FORMATION IN THE BASEBALL INDUSTRY 203

Copyright r 2011 John Wiley & Sons, Ltd. Manage. Decis. Econ. 32: 193–204 (2011)DOI: 10.1002/mde

note that one property of OLS regressions is that theresiduals are distributed N(0, s2). See Wooldridge(2002).

REFERENCES

Alexander D, Kern W. 2005. Drive for show and puttfor dough? An analysis of the earnings of PGA tourgolfers. Journal of Sports Economics 6: 46–60.

Bradbury JC. 2007. The Baseball Economist: The RealGame Exposed. Penguin: New York.

Hakes JK, Sauer RD. 2006. An economic evaluationof the ‘moneyball’ hypothesis. Journal of EconomicPerspectives 20: 173–185.

Kahn LM. 1993. Free agency, long-term contracts andcompensation in major league baseball: estimatesfrom panel data. The Review of Economics andStatistics 75: 157–164.

Kahn LM. 2000. The sports business as a labor marketlaboratory. The Journal of Economic Perspectives14(3): 75–94.

Kalt J, Zupan M. 1984. Capture and ideology in theeconomic theory of politics. American EconomicReview 74: 279–300.

Krautman AC, Gustaffson E, Hadley L. 2003. A noteon the structural stability of salary equations: majorleague baseball pitchers. Journal of Sports Economics4:56–63.

Lewis ML. 2003. Moneyball: The Art of Winning anUnfair Game. Norton: New York.

Scully GW. 1974. Pay and performance in major leaguebaseball. American Economic Review 64:915–930.

Spurr SJ. 2000. The baseball draft: a study of the abilityto find talent. Journal of Sports Economics 1: 66–85.

Wooldridge J. 2002. Econometric Analysis of CrossSection and Panel Data. MIT Press: Cambridge, MA.

Zimbalist AS. 1992. Baseball and Billions. Basic Books:New York.

N. LONGLEY AND G. WONG204

Copyright r 2011 John Wiley & Sons, Ltd. Manage. Decis. Econ. 32: 193–204 (2011)DOI: 10.1002/mde