discretion in hiring& · resum´ es, interviews, ... aligned with firm objectives.1 firms may...

36
DISCRETION IN HIRING MITCHELL HOFFMAN LISA B. KAHN DANIELLE LI Job-testing technologies enable firms to rely less on human judgment when making hiring decisions. Placing more weight on test scores may improve hiring decisions by reducing the influence of human bias or mistakes but may also lead firms to forgo the potentially valuable private information of their managers. We study the introduction of job testing across 15 firms employing low-skilled service sector workers. When faced with similar applicant pools, we find that managers who appear to hire against test recommendations end up with worse average hires. This suggests that managers often overrule test recommendations because they are biased or mistaken, not only because they have superior private information. JEL Codes: M51, J24. I. INTRODUCTION Hiring the right workers is one of the most important and dif- ficult problems that a firm faces. R´ esum´ es, interviews, and other screening tools are often limited in their ability to reveal whether a worker has the right skills or will be a good fit. Furthermore, the managers that firms employ to gather and interpret this informa- tion may have poor judgment or preferences that are imperfectly aligned with firm objectives. 1 Firms may thus face both informa- tion and agency problems when making hiring decisions. We are grateful to Jason Abaluck, Ajay Agrawal, Ricardo Alonso, Pol Antr` as, Ian Ball, David Berger, Arthur Campbell, David Deming, Alex Frankel, Avi Gold- farb, Lawrence Katz, Harry Krashinsky, Peter Landry, Jin Li, Liz Lyons, Steve Malliaris, Mike Powell, Kathryn Shaw, Steve Tadelis, numerous seminar partic- ipants, and anonymous referees for helpful comments. We are grateful to the anonymous data provider for providing access to proprietary data. Hoffman ac- knowledges financial support from the Social Science and Humanities Research Council of Canada. All errors are our own. 1. For example, a manager could have preferences over demographics or family background that do not maximize productivity. In a case study of elite professional services firms, Rivera (2012) provides evidence that an important determinant of hiring is the presence of shared leisure activities. C The Author(s) 2017. Published by Oxford University Press on behalf of the Pres- ident and Fellows of Harvard College. All rights reserved. For Permissions, please email: [email protected] The Quarterly Journal of Economics (2018), 765–800. doi:10.1093/qje/qjx042. Advance Access publication on October 10, 2017. 765 Downloaded from https://academic.oup.com/qje/article-abstract/133/2/765/4430650 by guest on 01 April 2018

Upload: others

Post on 03-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DISCRETION IN HIRING& · Resum´ es, interviews, ... aligned with firm objectives.1 Firms may thus face both informa-tion and agency problems when making hiring decisions. ∗Weare

DISCRETION IN HIRING∗

MITCHELL HOFFMAN

LISA B. KAHN

DANIELLE LI

Job-testing technologies enable firms to rely less on human judgment whenmaking hiring decisions. Placing more weight on test scores may improve hiringdecisions by reducing the influence of human bias or mistakes but may also leadfirms to forgo the potentially valuable private information of their managers. Westudy the introduction of job testing across 15 firms employing low-skilled servicesector workers. When faced with similar applicant pools, we find that managerswho appear to hire against test recommendations end up with worse average hires.This suggests that managers often overrule test recommendations because theyare biased or mistaken, not only because they have superior private information.JEL Codes: M51, J24.

I. INTRODUCTION

Hiring the right workers is one of the most important and dif-ficult problems that a firm faces. Resumes, interviews, and otherscreening tools are often limited in their ability to reveal whethera worker has the right skills or will be a good fit. Furthermore, themanagers that firms employ to gather and interpret this informa-tion may have poor judgment or preferences that are imperfectlyaligned with firm objectives.1 Firms may thus face both informa-tion and agency problems when making hiring decisions.

∗We are grateful to Jason Abaluck, Ajay Agrawal, Ricardo Alonso, Pol Antras,Ian Ball, David Berger, Arthur Campbell, David Deming, Alex Frankel, Avi Gold-farb, Lawrence Katz, Harry Krashinsky, Peter Landry, Jin Li, Liz Lyons, SteveMalliaris, Mike Powell, Kathryn Shaw, Steve Tadelis, numerous seminar partic-ipants, and anonymous referees for helpful comments. We are grateful to theanonymous data provider for providing access to proprietary data. Hoffman ac-knowledges financial support from the Social Science and Humanities ResearchCouncil of Canada. All errors are our own.

1. For example, a manager could have preferences over demographics or familybackground that do not maximize productivity. In a case study of elite professionalservices firms, Rivera (2012) provides evidence that an important determinant ofhiring is the presence of shared leisure activities.C© The Author(s) 2017. Published by Oxford University Press on behalf of the Pres-ident and Fellows of Harvard College. All rights reserved. For Permissions, pleaseemail: [email protected] Quarterly Journal of Economics (2018), 765–800. doi:10.1093/qje/qjx042.Advance Access publication on October 10, 2017.

765Downloaded from https://academic.oup.com/qje/article-abstract/133/2/765/4430650by gueston 01 April 2018

Page 2: DISCRETION IN HIRING& · Resum´ es, interviews, ... aligned with firm objectives.1 Firms may thus face both informa-tion and agency problems when making hiring decisions. ∗Weare

766 QUARTERLY JOURNAL OF ECONOMICS

The increasing adoption of “workforce analytics” and job test-ing has provided firms with new hiring tools.2 Job testing has thepotential to improve information about the quality of candidatesand reduce agency problems between firms and human resource(HR) managers. As with interviews, job tests provide an addi-tional signal of a worker’s quality. Yet unlike interviews and othersubjective assessments, job testing provides information aboutworker quality that is directly verifiable by the firm.

What is the impact of job testing on the quality of hires andhow should firms use job tests? In the absence of agency problems,firms should allow managers discretion to weigh job tests along-side interviews and other private signals when deciding whom tohire. Yet if managers are biased or if their judgment is otherwiseflawed, firms may prefer to limit discretion and place more weighton test results, even if this means ignoring the private informationof the manager. Firms may have difficulty evaluating this trade-off because they cannot tell whether a manager hires a candidatewith poor test scores because of private evidence to the contrary,or because he or she is biased or simply mistaken.

In this article, we evaluate the introduction of a job test andanalyze the consequences of making hiring decisions that deviatefrom test score recommendations. We use a unique personnel dataset consisting of 15 firms that employ workers in the same low-skilled service sector. Prior to the introduction of testing, firmsemployed HR managers who were involved in hiring new workers.After the introduction of testing, HR managers were also givenaccess to a test score for each applicant: green (high-potentialcandidate), yellow (moderate-potential candidate), or red (lowestrating). Managers were encouraged to factor the test into theirhiring decisions, but were not required to hire strictly accordingto test recommendations.

We first estimate the impact of introducing the job test on thequality of hired workers. Exploiting the staggered introductionof job testing across sample locations, we show that cohorts ofworkers hired with job testing have substantially longer tenuresthan cohorts of workers hired without testing, holding constant avariety of time-varying location and firm variables. In our setting,job tenure is a key measure of quality because turnover is costlyand workers already spend a substantial fraction of their tenure

2. See, for instance, Forbes: http://www.forbes.com/sites/joshbersin/2013/02/17/bigdata-in-human-resources-talent-analytics-comes-of-age/.

Downloaded from https://academic.oup.com/qje/article-abstract/133/2/765/4430650by gueston 01 April 2018

Page 3: DISCRETION IN HIRING& · Resum´ es, interviews, ... aligned with firm objectives.1 Firms may thus face both informa-tion and agency problems when making hiring decisions. ∗Weare

DISCRETION IN HIRING 767

in paid training. This finding suggests that this job test containsuseful information about the quality of candidates.

Next, we examine how managers use job test information.We propose a model in which firms rely on potentially biased HRmanagers who observe a private signal of worker quality in ad-dition to the publicly observable job test. Managers can decide tohire workers with the best test scores or make “exceptions” byhiring against the test recommendation. In the absence of bias,managers make exceptions only when they have additional in-formation, resulting in better hires for the firm. However, biasedmanagers are also more likely to make exceptions, and these ex-ceptions lead to worse hires on average. This model thus providesintuition for why the observed relationship between a manager’spropensity to make exceptions and worker outcomes can be in-formative about the role of bias in hiring: a positive relationshipsuggests that managers primarily make exceptions because theyare better informed, while a negative relationship suggests thepresence of bias or mistaken beliefs.

Our data, which include information on applicants as well ashired workers, allow us to explore this relationship empirically.We define an “exception” as hiring an applicant with a yellowtest score when one with a green score had also applied but isnot hired (or similarly, when a “red” applicant is hired while a“yellow” or “green” is not). Across a variety of specifications, wefind that exceptions are strongly correlated with worse outcomes.Even controlling for the test scores of the applicant pools theyhire from, managers who appear to make more exceptions sys-tematically bring in workers who leave their jobs more quickly.This result suggests that managers make exceptions not only be-cause they are better informed but also because they are biasedor mistaken.

Finally, we show that our results are unlikely to be drivenby the possibility that managers sacrifice job tenure in searchof workers who have higher quality on other dimensions. If thiswere the case, limiting discretion may improve worker durations,but at the expense of other quality measures. To examine thispossibility, we examine the relationship between hiring exceptionsand a direct measure of individual productivity, daily output perhour, which we observe for a subset of firms in our sample. Inthis supplemental analysis, we find no evidence that exceptionsare related to increased productivity; this makes it unlikely thatmanagers trade off duration for productivity.

Downloaded from https://academic.oup.com/qje/article-abstract/133/2/765/4430650by gueston 01 April 2018

Page 4: DISCRETION IN HIRING& · Resum´ es, interviews, ... aligned with firm objectives.1 Firms may thus face both informa-tion and agency problems when making hiring decisions. ∗Weare

768 QUARTERLY JOURNAL OF ECONOMICS

Our empirical approach differs from an experiment in whichdiscretion is granted to some managers and not others. Rather,our analysis exploits differences across managers in the extentto which they appear to make exceptions by overruling test rec-ommendations. Our approach uses this nonrandom variation inwillingness to exercise discretion to infer whether discretion fa-cilitates better hires. If managers use discretion only when theyhave better information, then managers who make more excep-tions should have better outcomes than managers who do not. Ifexceptions are instead associated with worse outcomes, then it islikely that managers are also biased or mistaken.

The validity of this approach relies on two key assumptions.First, we must be able to isolate variation in exceptions that is re-flective of managerial choices, and not driven by lower yield ratesfor higher-quality applicants. A weakness of our data is that wedo not observe job offers; because of this, managers who hire yel-low or red workers only after green applicants have turned downjob offers will mistakenly look as though they made more excep-tions. Second, it must also be true that the unobserved qualityof applicants is similar across low- and high-exception cohorts.For example, we want to rule out cases where managers makeexceptions precisely because the pool of green applicants is id-iosyncratically weak. We discuss both of these assumptions inmore detail throughout the text and estimate specifications thateither directly address or limit these concerns.

As data analytics is more frequently applied to HR manage-ment decisions, it becomes increasingly important to understandhow these new technologies impact the organizational structureof the firm and the efficiency of worker–firm matching. While alarge theoretical literature has studied how firms should allocateauthority, and a smaller empirical literature has examined dis-cretion and rule making in other settings, empirical evidence ondiscretion in hiring is scant.3 Our article provides a first steptoward an empirical understanding of the potential benefits ofdiscretion in hiring. Our findings provide evidence that screening

3. For theoretical work, see the canonical Aghion and Tirole (1997), the Boltonand Dewatripont (2013) survey, and Dessein (2002) and Alonso and Matouschek(2008) for particularly relevant instances. For empirical work, see, for example,Paravisini and Schoar (2013) and Wang (2014) for analyses of loan officers, Li(2017) on grant committees, Kuziemko (2013) on parole boards, and Diamond andPersson (2016) on teacher grading.

Downloaded from https://academic.oup.com/qje/article-abstract/133/2/765/4430650by gueston 01 April 2018

Page 5: DISCRETION IN HIRING& · Resum´ es, interviews, ... aligned with firm objectives.1 Firms may thus face both informa-tion and agency problems when making hiring decisions. ∗Weare

DISCRETION IN HIRING 769

technologies may improve information symmetry between firmsand managers. In this spirit, our article is related to the classicBaker and Hubbard (2004) analysis of the adoption of onboardcomputers in the trucking industry.

Our work is most closely related to Autor and Scarborough(2008), the first paper in economics to provide an estimate of theimpact of job testing on worker performance.4 The authors evalu-ate the introduction of a job test in retail trade, with a particularfocus on whether testing will have a disparate impact on minorityhiring. We also find positive impacts of testing, and, from there,focus on the complementary question of the consequences of over-ruling the job test. Our results are broadly aligned with findingsin psychology and behavioral economics that emphasize the po-tential of machine-based algorithms to mitigate errors and biasesin human judgment across a variety of domains.5

The remainder of this article proceeds as follows. Section IIdescribes the setting and data. Section III evaluates the impact oftesting on job duration. Section IV presents a model of hiring withpotentially biased managers. Motivated by the model, Section Vempirically assesses whether managers use their discretion toimprove hires. Section VI concludes. All appendix material can befound in the Online Appendix.

II. SETTING AND DATA

Firms have increasingly incorporated testing into their hiringpractices. One explanation for this shift is that the rising powerof data analytics has made it easier to look for regularities thatpredict worker performance. We obtain data from an anonymousjob-testing provider that follows such a model. We hereafter callthis firm the “data firm.” In this section, we summarize the keyfeatures of our setting and data set. More detail about both thejob test and our sample can be found in Section A of the OnlineAppendix.

4. We also contribute to the broader literatures on screening technologies (e.g.,Autor 2001; Burks et al. 2015; Brown, Setren, and Topa 2016; Pallais and Sands2016; Stanton and Thomas 2016; Horton 2017) and employer learning (Farber andGibbons 1996; Altonji and Pierret 2001; Kahn and Lange 2014).

5. See Kuncel et. al. (2013) for a meta-analysis of this literature, Kahneman(2011) for a behavioral economics perspective, and Kleinberg at al. (2018) forempirical evidence that machine-based algorithms outperform judges in decidingwhich arrestees to detain pretrial.

Downloaded from https://academic.oup.com/qje/article-abstract/133/2/765/4430650by gueston 01 April 2018

Page 6: DISCRETION IN HIRING& · Resum´ es, interviews, ... aligned with firm objectives.1 Firms may thus face both informa-tion and agency problems when making hiring decisions. ∗Weare

770 QUARTERLY JOURNAL OF ECONOMICS

II.A. Job Test and Testing Adoption

Our data firm offers a test designed to predict performancefor a particular job in the low-skilled service sector. We are unableto reveal the exact nature of the job, but it is similar to jobs suchas data entry work, standardized test grading, and call centerwork (and is not a retail store job). The data firm sells its servicesto clients (hereafter, client firms) that wish to fill these types ofpositions. We have 15 such client firms in our data set.

Across locations, the workers in our data are engaged in afairly uniform job and perform essentially a single task. For ex-ample, one should think of our data as made up entirely of dataentry jobs, entirely of standardized test grader jobs, or entirelyof call center jobs. Workers generally do not have other major jobtasks to perform. As with data entry, grading, or call center work,workers in our sample engage in individual production: they donot work in teams to create output nor does the pace of theiroutput directly impact others.

The job test provided by our data firm consists of an onlinequestionnaire comprising a large battery of questions, includingthose on computer/technical skills, personality, cognitive skills, fitfor the job, and various job scenarios. The data firm matches appli-cant responses with subsequent performance in order to identifythe various questions that are the most predictive of future work-place success in this setting. Drawing on these correlations, aproprietary algorithm delivers a green–yellow–red job test score.In our sample, 48% of applicants receive a green score, 32% scoreyellow, and 20% score red. See Section A.1 of the Online Appendixfor more detail on the test itself.

Job testing was gradually rolled out across locations (estab-lishments) within a given client firm. We observe the date at whichtest scores appear in our data, but not all workers are tested im-mediately. Our preferred measure defines test adoption as themonth at which the modal hire in a location had a test score. SeeOnline Appendix A.2 for more discussion and robustness to otherdefinitions.

The HR managers in our data are referred to as recruiters byour data provider and are unlikely to manage day-to-day produc-tion. Prior to the introduction of job testing, our client firms gavetheir HR managers discretion to make hiring recommendations

Downloaded from https://academic.oup.com/qje/article-abstract/133/2/765/4430650by gueston 01 April 2018

Page 7: DISCRETION IN HIRING& · Resum´ es, interviews, ... aligned with firm objectives.1 Firms may thus face both informa-tion and agency problems when making hiring decisions. ∗Weare

DISCRETION IN HIRING 771

based on interviews and resumes.6 After adopting this job test,firms made applicant test scores available to managers and en-couraged them to factor scores into hiring recommendations, butmanagers were still permitted to hire their preferred candidate.7

II.B. Applicant and Worker Data

Our data contain information on hired workers, including hireand termination dates, job function, and worker location. This in-formation is collected by client firms and shared with the datafirm. Once a partnership with the data firm forms, we observe ad-ditional information, including applicant test scores, applicationdate, and an identifier for the HR manager responsible for a givenapplicant.

Table I provides sample characteristics. We observe nearly266,000 hires; two-thirds are observed before testing was intro-duced and one-third after. Our post-testing sample consists of400,000 applicants and 91,000 hires assigned to 445 managers.8

Our primary worker outcome is job duration. We focus onturnover for three main reasons. Foremost, turnover is a peren-nial challenge for firms employing low-skilled service sector work-ers. Hence, tenure is an important measure of worker quality forour sample firms. To illustrate this concern, Figure I shows a his-togram of job tenure for completed spells (79% of the spells in ourdata) among employees in our sample. The median worker (solidline) stays only 99 days, or just over 3 months. One in six work-ers leave after only a month. Despite these short tenures, hiredworkers in our sample spend the first several weeks of their em-ployment in paid training.9 Our data firm and its client firms are

6. Other managers may take part in hiring decisions as well. For example,in one firm, recruiters typically endorse a candidate to another manager (e.g., amanager in operations one rank above the frontline supervisor) who will make a“final call.”

7. We do not directly observe managerial authority in our data. However,information provided to us by the data firm indicates that managers at clientfirms were not required to hire strictly by the test, and we see in our data thatmany workers with low test scores are hired. Also, some client firms had otherforms of job testing before partnering with our data firm (see Online Appendix A.3for details and robustness to restricting the sample to client firms that likely didnot have pre-sample testing).

8. See Section A.7 of the Online Appendix regarding sample restrictions.9. Reported lengths of paid training vary considerably, from around one to two

weeks to around a couple of months or more, but is provided by all client firms inour sample.

Downloaded from https://academic.oup.com/qje/article-abstract/133/2/765/4430650by gueston 01 April 2018

Page 8: DISCRETION IN HIRING& · Resum´ es, interviews, ... aligned with firm objectives.1 Firms may thus face both informa-tion and agency problems when making hiring decisions. ∗Weare

772 QUARTERLY JOURNAL OF ECONOMICS

TABLE ISUMMARY STATISTICS

All Pre-testing Post-testing

Sample coverage# Locations 127 113 97# Hired workers 265,648 174,329 91,319# Applicants 403,006# HR managers 445# Pools 3,698# Applicants/pool 260

Pre-testing Post-testing Green Yellow Red

Worker characteristics mean (std. dev.)Duration of completed 252 116 122 110 92

spell (days) (N = 209,808) (323) (138) (143) (130) (121)Duration of censored 807 252 265 235 223

spell (days) (N = 55,840) (510) (245) (252) (232) (223)Share censored 0.19 0.25 0.24 0.26 0.25

(0.39) (0.43) (0.43) (0.44) (0.43)Output per hour 8.35 8.44 8.39 8.32 9.16

(N = 62,427) (4.66) (5.16) (5.01) (5.11) (6.08)

Post-testing Green Yellow Red

Applicant pool characteristicsShare applicants 0.48 0.32 0.20Hire probability 0.19 0.23 0.18 0.08

Notes. Post-testing is defined at the location–month level as the first month in which 50% of hires hadtest scores, and all months thereafter. An applicant pool is defined at the manager–location–month level andincludes all applicants that had applied within four months of the current month and were not yet hired.Number of applicants reflects the total number of applicants across all pools. Applicant pool characteristicsare unweighted averages across pools, and are calculated using individuals with test scores in the post-testingsample.

aware of these concerns: in its marketing materials, our data firmemphasizes the ability of its job test to reduce turnover. Second, inaddition to its importance for our sample firms, in many canonicalmodels of job search (e.g., Jovanovic 1979), worker tenures can bethought of as a proxy for match quality. As such, job duration is acommonly used measure of worker quality. For example, it is theprimary worker quality measure used by Autor and Scarborough(2008), who also study the impact of job testing in a low-skilledservice sector setting (retail). Finally, job duration is available forall workers in our sample.

For a subset of our client firms, we also observe a direct mea-sure of worker productivity: output per hour.10 Again, we are notable to reveal the exact nature of the job. That said, output perhour measures the number of primary tasks that an individualworker is able to complete. For example, this would be numberof words entered per hour in data entry, number of tests graded

10. A similar productivity measure was used in Lazear, Shaw, and Stanton(2015) to evaluate the value of bosses in a comparable setting to ours.

Downloaded from https://academic.oup.com/qje/article-abstract/133/2/765/4430650by gueston 01 April 2018

Page 9: DISCRETION IN HIRING& · Resum´ es, interviews, ... aligned with firm objectives.1 Firms may thus face both informa-tion and agency problems when making hiring decisions. ∗Weare

DISCRETION IN HIRING 773

0.0

02.0

04.0

06.0

08D

ensi

ty

0 200 400 600 800 1000Days of Tenure

Solid line = Mean, Dashed line = Median

FIGURE I

Distribution of Length of Completed Job Spells

Figure I plots the distribution of completed job spells at the individual level. Forlegibility, this histogram (though not the computed mean or median) omits 3% ofobservations with durations over 1,000 days.

in test grading, or number of calls handled in call centers. Recallthat in our setting, individuals perform essentially one major taskand engage in individual production. Because of the discretizednature of the work, output per hour is a very common performancemetric for the type of job we study and is easily measured. How-ever, our data firm was only able to provide us with this measurefor a subset of client firms (roughly a quarter of hired workers).We report these findings separately when we discuss alternativeexplanations.

The middle panel of Table I provides summary statistics forduration and output per hour. Job durations are censored forthe 21% of hired workers who were still employed at the timeour data were collected. In our analysis, we take censoring intoaccount by estimating censored normal regressions whenever weuse duration as an outcome measure.

Table I shows that both censored and uncensored job du-rations increase in color score. For example, among those with

Downloaded from https://academic.oup.com/qje/article-abstract/133/2/765/4430650by gueston 01 April 2018

Page 10: DISCRETION IN HIRING& · Resum´ es, interviews, ... aligned with firm objectives.1 Firms may thus face both informa-tion and agency problems when making hiring decisions. ∗Weare

774 QUARTERLY JOURNAL OF ECONOMICS

completed spells, greens stay 12 days (11%) longer than yellowswho stay 18 days (20%) longer than reds. These differences are sta-tistically significant and provide initial evidence that test scoresare predictive of worker performance. Furthermore, if managershire red and yellow applicants only when their unobserved qualityis high, then tenure differences in the overall applicant populationshould be even larger. There is no difference across color score inthe share of observations that are censored.11

Average output per hour in our data set is 8.4 and is fairly sim-ilar across color. Red workers have somewhat higher productivityalong this metric, although these differences are not significant;also, controlling for client firm fixed effects removes any differencein output per hour for red workers. Finally, the bottom panel ofTable I shows that scores are predictive of hiring: greens are morelikely to be hired than yellows, who are in turn substantially morelikely to be hired than reds.

III. THE IMPACT OF TESTING

III.A. Empirical Strategy

We first evaluate the impact of introducing testing itself. Thisanalysis helps us understand whether the test has useful infor-mation that at least some managers take advantage of. To do so,we exploit the gradual rollout of testing across locations and overtime, and examine its impact on worker quality, as measured bytenure:

(1)log(Duration)ilt = α0 + α1Testinglt + δl + γt + Positioniβ + εilt.

Equation (1) compares outcomes for workers hired with and with-out job testing. We estimate censored normal regressions withindividual-specific truncation points to account for the fact thatnot all workers are observed through the end of their employ-ment spell. We regress log duration (log(Duration)ilt) for a workeri, hired to a location l, at time t, on an indicator for whether thelocation l had testing at time t (Testinglt). Recall, we assign the

11. One thing to note in our table is that, somewhat counterintuitively, jobdurations are longer for workers hired before testing than afterward. The mainreason for this is mechanical: on average, pre-testing periods are earlier in thesample (by about 16 months), allowing hired workers more time to accrue moretenure. Hire cohort fixed effects account for this effect in our regression analysis.

Downloaded from https://academic.oup.com/qje/article-abstract/133/2/765/4430650by gueston 01 April 2018

Page 11: DISCRETION IN HIRING& · Resum´ es, interviews, ... aligned with firm objectives.1 Firms may thus face both informa-tion and agency problems when making hiring decisions. ∗Weare

DISCRETION IN HIRING 775

test adoption date as the first month in which the modal hire at alocation had a test score. After that point, the location is alwaysassigned to the testing regime. We choose to define testing at thelocation, rather than individual, level to avoid the possibility thatwhether an individual worker is tested may depend on observedpersonal characteristics (Table A1 of the Online Appendix showsthat our results are robust to defining testing at the individuallevel or at the first date on which any worker is tested).

All regressions include location (δl) and month-by-year of hire(γ t) fixed effects to control for time-invariant differences acrosslocations within our client firms, and for cohort and macroeco-nomic effects that may impact job duration or censoring probabil-ity. We also always include position-type fixed effects (the vector,Positioni, and associated coefficients β) that adjust for small differ-ences in job function across individuals.12 In some specifications,we also include additional controls, which we describe alongsidethe results. In all specifications, standard errors are clustered atthe location level to account for correlated observations within alocation over time.

Section A.3 of the Online Appendix discusses sample cover-age of locations over time and shows robustness to using a morebalanced panel; Section A.4 explores the timing of testing and as-sesses whether early testing locations look different on observablecharacteristics.

III.B. Results

Table II reports regression results. Column (1) presents re-sults with controls for location, cohort, and position. In the sub-sequent columns, we cumulatively add controls. Column (2) addsclient firm-by-year fixed effects, to control for the implementationof any new strategies and HR policies that firms may have adoptedalong with testing.13 The column (2) coefficient of 0.24 means thatemployees hired with the assistance of job testing stay, on aver-age, 0.24 log points longer. Column (3) adds local unemployment

12. For example, in data entry, fixed effects would distinguish workers whoenter textual data from those who transcribe auditory data, and those who enterdata regarding images; in test grading, individuals may grade science or mathtests; in call centers, individuals may engage in customer service or sales.

13. Our data firm indicated that it was not aware of other client-specific policychanges, though they acknowledge they would not have had full knowledge ofwhether such changes may have occurred.

Downloaded from https://academic.oup.com/qje/article-abstract/133/2/765/4430650by gueston 01 April 2018

Page 12: DISCRETION IN HIRING& · Resum´ es, interviews, ... aligned with firm objectives.1 Firms may thus face both informa-tion and agency problems when making hiring decisions. ∗Weare

776 QUARTERLY JOURNAL OF ECONOMICS

TABLE IIIMPACT OF JOB TESTING ON JOB DURATIONS

Dependent variable: log(Duration)

(1) (2) (3) (4)

Post-testing 0.368*** 0.244** 0.248*** 0.233***(0.120) (0.113) (0.0754) (0.0637)

N 265,648 265,648 265,648 265,648

Year–month FEs X X X XLocation FEs X X X XPosition-type FEs X X X XClient firm × year FEs X X XLocal unemployment controls X XLocation time trends X

Notes.We regress log durations on an indicator for testing availability (this equals 1 in the first month inwhich the modal hire at a location was tested, and in all months thereafter for that location) and the controlsindicated. We use censored-normal regressions with individual-specific truncation points (using “cnreg” inStata) to account for the fact that 21% of hired workers had not yet left their job at the end of our datacollection. Standard errors are in parentheses and are clustered at the location level. FE = fixed effects.*** p < .01, ** p < .05, * p < .1.

rate controls to account for the fact that outside job options willimpact turnover. In practice, we use education-specific state-levelunemployment rates measured at an annual frequency, obtainedfrom the American Community Survey.14 Finally, column (4) addslocation-specific time trends to account for the possibility that thetiming of the introduction of testing is related to trends at thelocation level, for example, that testing was introduced first tolocations that were on an upward (or downward) trajectory.

With full controls, we find that testing improves completedjob tenures by 0.23 log points, or just over 25%. These results arebroadly consistent with previous estimates from Autor and Scar-borough (2008).15 These estimates reflect the treatment on the

14. For the 25% of locations that are international, we use aggregated (i.e., non-education-specific), annual, national unemployment rates obtained from the WorldBank. For a small set of location identifiers in our data where state cannot be easilyassigned (e.g., because workers typically work off-site in different U.S. states), weuse national education-specific unemployment rates from the Current PopulationSurvey. We include one set of variables for education-specific unemployment rates(either national or state) and one variable for international unemployment rates.Values are replaced with zeros when missing because of location type, and locationfixed effects indicate type. Our results are robust to restricting to the 70% oflocations with U.S. state-level data.

15. Although our estimates are larger, the Autor and Scarborough (2008)estimate of 12% is inside the range of our 95% confidence interval with full controls.

Downloaded from https://academic.oup.com/qje/article-abstract/133/2/765/4430650by gueston 01 April 2018

Page 13: DISCRETION IN HIRING& · Resum´ es, interviews, ... aligned with firm objectives.1 Firms may thus face both informa-tion and agency problems when making hiring decisions. ∗Weare

DISCRETION IN HIRING 777

treated effect for the sample of firms that select into receiving thesort of test we study. Given that firms often select into receivingtechnologies based on their expected returns (e.g., Griliches 1957),it is quite possible that other firms (e.g., those that are less opento new technologies) might experience less of a return.

In addition to log duration, we examine whether testing af-fects the probability that hires reach particular tenure milestones:staying at least 3, 6, or 12 months. For these samples, we re-strict to workers hired 3, 6, or 12 months, respectively, beforethe data end date. We estimate OLS models because censoringfor these variables is based only on start date and not survivaltime. The top panel of Online Appendix Table C1 reports resultsusing these milestone measures. We find a positive impact of test-ing for all these variables, with the most pronounced effects atlonger durations. For example, using our full set of controls, wefind that testing increases the probability of workers surviving atleast 6 months by 6 percentage points (13%) and 12 months by7.5 percentage points (23%).

Figure II plots the accompanying event studies. The treat-ment effect of testing appears to grow over time, suggesting thatHR managers and other participants might take some time tolearn how to use the test effectively.16

Our results in this section indicate that job testing increasesjob durations relative to the sample firms’ initial hiring practices.In the remainder of the article, we focus on analyzing the conse-quences of overruling job test recommendations.

IV. MODEL

In this section, we formalize a model in which a firm makeshiring decisions with the help of an HR manager and a job test.As in our sample firms, managers in this model observe job test

We also estimated Cox proportional hazard models, and obtained coefficients a bitsmaller in magnitude than those from censored normal models, but that werequalitatively similar. Our results are also robust to performing OLS on the lengthof completed job spells (as in Autor and Scarborough 2008).

16. Figure II includes all controls except location time trends so that anypretrends will be apparent in the figure. Estimates are especially large and noisy10 quarters after testing, reflecting only a few locations that can be observed tothat point. Online Appendix Figure A3 replicates Figure II while restricting to abalanced panel of locations that hire in each of the four quarters before and aftertesting. Impacts there are smaller, but are qualitatively similar.

Downloaded from https://academic.oup.com/qje/article-abstract/133/2/765/4430650by gueston 01 April 2018

Page 14: DISCRETION IN HIRING& · Resum´ es, interviews, ... aligned with firm objectives.1 Firms may thus face both informa-tion and agency problems when making hiring decisions. ∗Weare

778 QUARTERLY JOURNAL OF ECONOMICS

−.5

0.5

11.

52

−10 −5 0 5 10

Log(Duration)

−.2

0.2

.4

−10 −5 0 5 10

Survived 3 Months−

.20

.2.4

.6

−10 −5 0 5 10

Survived 6 Months

−.2

0.2

.4.6

−10 −5 0 5 10

Survived 12 Months

Coe

ffici

ent

Quarters from Introduction of Testing

FIGURE II

Event Study of Duration Outcomes

This figure plots the impact of testing on worker durations as a function ofevent-time (in quarters) relative to testing adoption. The underlying estimatingequation is given by Outcomeilt = α0 + Itime since testing

lt α1 + controls + εilt, whereItime since testing is a vector of event-time dummies in quarters, with the omittedcategory, −1, indicated by the vertical line. Controls include location, hire year-month, position, and client-by-year fixed effects, as well as local labor marketvariables. The top left panel is estimated using censored normal regression whilethe others are estimated using OLS for the sample of workers hired at least 3, 6,or 12 months before the data end date (measured for each of the 15 firms by thelatest termination date or latest hire date in our data, whichever is later). Dashedlines indicate the 95% confidence interval. Online Appendix Figure A3 replicatesthis figure while restricting to a balanced panel of locations that hire in each ofthe four quarters before and after testing.

recommendations, but are not required to hire strictly by the test.The main purpose of this model is to highlight the trade-offs in-volved in granting discretion to managers: discretion enables HRmanagers to take advantage of their private information but alsogives them scope to make hires based on biases or incorrect beliefsthat are not in the interest of the firm. The model also providesintuition for how we might empirically assess the roles of infor-mation and bias/mistakes in hiring. All proofs are in Section B ofthe Online Appendix.

Downloaded from https://academic.oup.com/qje/article-abstract/133/2/765/4430650by gueston 01 April 2018

Page 15: DISCRETION IN HIRING& · Resum´ es, interviews, ... aligned with firm objectives.1 Firms may thus face both informa-tion and agency problems when making hiring decisions. ∗Weare

DISCRETION IN HIRING 779

IV.A. Setup

A mass 1 of applicants apply for job openings within a firm.The firm’s payoff of hiring worker i is given by ai. We assume thatai is drawn from a distribution that depends on a worker’s type,ti ∈ {G, Y}; a share of workers pG are type G, a share 1 − pG aretype Y, and a|t ∼ N(μt, σ

2a ) with μG > μY and σ 2

a ∈ (0,∞). Thisworker-quality distribution enables us to naturally incorporatethe discrete test score into the hiring environment. We do so byassuming that the test publicly reveals t.17

The firm’s objective is to hire a proportion W of workers thatmaximizes expected quality, E[a|Hire].18 For simplicity, we alsoassume W < pG.19

To hire workers, the firm must employ HR managers whoseinterests are imperfectly aligned with those of the firm. A man-ager’s payoff for hiring worker i is given by

Ui = (1 − k)ai + kbi.

In addition to valuing the firm’s payoff, managers receive an id-iosyncratic payoff bi, which they value with a weight k ∈ [0, 1]. Weassume that b is independent of (a, t).

Managers may value the firm’s payoff a because they aredirectly incentivized to, because they risk termination or desirepromotion, or because they are simply altruistic.20 The additionalquality b can be thought of in two ways. First, it may capture

17. The values of G and Y in the model correspond to test scores green andyellow, respectively, in our data. We assume binary outcomes for simplicity, eventhough in our data the signal can take three possible values. This is without lossof generality for the mechanics of the model.

18. In theory, firms should hire all workers whose expected value is greaterthan their cost. However, one explanation for the hire share rule is that a thresh-old rule is not contractible because ai is unobservable. Nonetheless, a firm withrational expectations will know the typical share W of applicants that are worthhiring, and W itself is contractible. Assuming a fixed hiring share is also consistentwith the previous literature, for example, Autor and Scarborough (2008).

19. This implies that a manager could always fill a hired cohort with type Gapplicants. In our sample of applicant pools (see Table I), the average share ofapplicants who are green is 48%, the average share of green or yellow applicantswho are green is 59%, and the average hiring rate is 19%. Thus, W < pG will holdfor the typical pool.

20. We do not have systematic data on manager incentives. However, a man-ager at the data firm told us that HR managers often face some targets and/orincentives. See Online Appendix A.8 for more detail.

Downloaded from https://academic.oup.com/qje/article-abstract/133/2/765/4430650by gueston 01 April 2018

Page 16: DISCRETION IN HIRING& · Resum´ es, interviews, ... aligned with firm objectives.1 Firms may thus face both informa-tion and agency problems when making hiring decisions. ∗Weare

780 QUARTERLY JOURNAL OF ECONOMICS

managerial preferences for certain workers (e.g., for certain de-mographic groups or those with shared interests). Second, b canrepresent manager mistakes, such as overconfidence, that leadthem to prefer the wrong candidates.21

The parameter k measures the manager’s bias, that is, the de-gree to which the manager’s incentives are misaligned with thoseof the firm or the degree to which the manager is mistaken. An un-biased manager has k = 0, while a manager who makes decisionsentirely based on bias or the wrong characteristics corresponds tok = 1.

The manager privately observes information about ai and bi.For simplicity, we assume that the manager directly observes bi,which is distributed in the population by N(0, σ 2

b ) with σ 2b ∈ R+.

Second, we assume the manager observes a noisy signal of ai:

si = ai + εi,

where εi ∼ N(0, σ 2ε ) is independent of ai, ti, and bi, and attributes

of applicants are independent across workers i. The parameterσ 2

ε ∈ R+ measures the manager’s information. A manager withperfect information on ai has σ 2

ε = 0; as σ 2ε approaches ∞, a man-

ager tends toward having no private information. This privateinformation of managers can be thought of as their assessmentsof interviews or the worker’s overall resume, and so on. Unlikethe job test, these subjective signals cannot be verifiably commu-nicated to the firm.

Let M denote the set of managers in a firm. For a given man-ager m ∈ M, his or her type is defined by the pair (k, 1

σ 2ε

), corre-sponding to the bias and precision of private information, respec-tively. These have implied subscripts m, which we suppress forease of notation. We assume firms do not observe manager type,si, or bi.22

21. For example, a manager may genuinely have the same preferences asthe firm but draw incorrect inferences from his or her interview. Indeed, work inpsychology (e.g., Dana, Dawes, and Peterson 2013) shows that interviewers areoften overconfident about their ability to read candidates. Such mistakes fit ourassumed form for manager utility because we can always separate the posterior be-lief over worker ability into a component related to true ability, and an orthogonalcomponent resulting from their error.

22. We assume that managers observe the same distribution of other qual-ities b, but value them differently, based on the parameter k. In contrast, the

Downloaded from https://academic.oup.com/qje/article-abstract/133/2/765/4430650by gueston 01 April 2018

Page 17: DISCRETION IN HIRING& · Resum´ es, interviews, ... aligned with firm objectives.1 Firms may thus face both informa-tion and agency problems when making hiring decisions. ∗Weare

DISCRETION IN HIRING 781

Managers form a posterior expectation of worker quality andhire a worker if and only if E(Ui|si, bi, ti) exceeds some thresh-old. Managers thus wield “discretion” because they choose how toweigh various signals about an applicant when making hiring de-cisions. We denote the quality of hires for a given manager underthis policy as E[a|Hire] (where an m subscript is implied).

IV.B. Model Discussion

This model illustrates the fundamental trade-off inherent inallowing managers’ discretion over hiring decisions: a manager’sprivate information may be valuable to the firm, but worker qual-ity is hurt by his or her bias.

In practice, firms cannot directly observe a manager’s biasor information. However, firms generally do observe the hiringchoices of managers and the quality of workers that are hired. Wesay that when a manager chooses to overrule the test by hiringa Y worker over a G one, the manager is making an “exception.”The frequency of such exceptions is increasing in both a manager’sbias and the precision of their private information (Proposition 1in Online Appendix B). That is, managers are more likely to makeexceptions both when their preferences differ from those of thefirm and when they have information that is not captured by thetest.

To assess whether bias plays a role in driving exceptions,we examine the relationship between a manager’s propensity tomake exceptions and the realized quality of his or her hires. Inthe absence of bias, managers only make exceptions when theyare better informed, resulting in better hires. By contrast, biasreduces the quality of hires (Proposition 2 in Online Appendix B).Finding a negative relationship between exceptions and the qual-ity of hires would therefore mean that managers make exceptionsnot only because they are better informed but also because theyare biased or mistaken.

The presence of bias raises the possibility that firms maybe able to improve outcomes by placing some limits on man-agerial discretion. To illustrate an example, Proposition 3 (inOnline Appendix B) summarizes theoretical conditions underwhich a firm would prefer no discretion to full discretion. In gen-eral, greater bias pushes the firm to prefer no discretion, while

distribution of si will vary across managers, based on the value of their privateinformation.

Downloaded from https://academic.oup.com/qje/article-abstract/133/2/765/4430650by gueston 01 April 2018

Page 18: DISCRETION IN HIRING& · Resum´ es, interviews, ... aligned with firm objectives.1 Firms may thus face both informa-tion and agency problems when making hiring decisions. ∗Weare

782 QUARTERLY JOURNAL OF ECONOMICS

better information pushes it toward allowing discretion. Theseextreme cases need not be optimal and firms may also wish toconsider intermediate policies such as limiting the number of ex-ceptions that managers can make.23

V. EMPIRICAL ANALYSIS ON DISCRETION

In our data, we observe job applicants, hires, and the out-comes of hired workers. This allows us to assess how effectivelymanagers exercise discretion, by examining whether worker out-comes are better when managers follow test recommendationsmore closely, or when they choose to hire more workers as excep-tions. If we find that managers who make many exceptions tendto do worse than managers who make few exceptions (holding allelse constant), then this suggests that managers are biased. Firmsin our setting may then want to consider limiting discretion, atleast for the managers that make such frequent exceptions.

To examine the consequences of overruling test recommenda-tions, we must first define an “exception rate” that correspondsto a manager’s choices, rather than his or her circumstances. Forexample, two managers with the same information and bias maynonetheless make different numbers of exceptions if they face dif-ferent applicant pools or need to hire different numbers of work-ers. Our metric should also adjust for applicant pool characteris-tics that make exceptions mechanically more likely, for example,in pools with few green applicants relative to slots.

Second, we must compare outcomes for managers who havedifferent exception rates despite facing similar applicant poolsin similar labor market conditions. These potentially unobserv-able factors may also separately impact worker tenure, making itdifficult to learn about the relationship between exceptions andworker outcomes. For example, we need to address the concernthat because we do not observe job offers, applicant pools in whichmore green workers turn down offers may wrongly appear to havea higher exception rate.

We discuss how we address these issues in the next two sub-sections. We first define an exception rate that takes into account

23. See Frankel (2017) for follow-up theoretical work providing a discussionof optimal hiring in a related model. In particular, the author shows that thereare conditions under which the optimal firm response is to cap the number ofexceptions that managers are permitted.

Downloaded from https://academic.oup.com/qje/article-abstract/133/2/765/4430650by gueston 01 April 2018

Page 19: DISCRETION IN HIRING& · Resum´ es, interviews, ... aligned with firm objectives.1 Firms may thus face both informa-tion and agency problems when making hiring decisions. ∗Weare

DISCRETION IN HIRING 783

observable differences in applicant pools. Second, we discuss arange of empirical specifications that help deal with unobserveddifferences across applicant pools (i.e., differences within color oracross locations).

V.A. Defining Exceptions

To construct an empirical analogue to the exception rate, weuse data on the test scores of applicants and hires in the post-testing period. First, we define an “applicant pool” as a group ofapplicants being considered by the same manager for jobs at thesame location in the same month.24

We then measure how often managers overrule the recom-mendation of the test by either (i) hiring a yellow when a greenhad applied and is not hired, or (ii) hiring a red when a yellow orgreen had applied and is not hired. We define the exception rate,for a manager m at a location l in a month t, as

(2) Exception Ratemlt = Nhy ∗ Nnh

g + Nhr ∗ (

Nnhg + Nnh

y

)

Maximum # of Exceptions,

where Nhcolor and Nnh

color are the numbers of hired and not hired ap-plicants, respectively. These variables are defined at the pool level(m, l, t) though subscripts have been suppressed for notationalease.

The numerator of ExceptionRatemlt counts the number of ex-ceptions (or “order violations”) a manager makes when hiring, thatis, the number of times a yellow is hired for each green that goesunhired plus the number of times a red is hired for each yellow orgreen that goes unhired. This definition assigns a higher excep-tion rate to a manager when he or she hires a yellow applicantfrom a pool of 100 green applicants and 1 yellow applicant, thanfrom a pool of 1 green applicant and 100 yellow applicants.

However, the total number of order violations in a pool de-pends both on the manager’s choices and on factors related to theapplicant pool, such as size and color composition. For example,if a pool has only green applicants, it is impossible to make anyexceptions. Similarly, if the manager needs to hire all available ap-plicants, then there can be no exceptions. These variations were

24. An applicant is under consideration if he or she applied in the last fourmonths and had not yet been hired. Over 90% of workers are hired within fourmonths of the date they submitted an application.

Downloaded from https://academic.oup.com/qje/article-abstract/133/2/765/4430650by gueston 01 April 2018

Page 20: DISCRETION IN HIRING& · Resum´ es, interviews, ... aligned with firm objectives.1 Firms may thus face both informa-tion and agency problems when making hiring decisions. ∗Weare

784 QUARTERLY JOURNAL OF ECONOMICS

implicitly held constant in our model, but need to be accountedfor in the empirics. To control for pool characteristics that maymechanically impact the number of exceptions, we normalize thenumber of order violations by the maximum number of violationsthat could occur, given the applicant pool that the recruiter facesand the number of hires required.25 This results in an exceptionrate that ranges from 0 if the manager never made any excep-tions, to 1 if the manager made all possible exceptions. Impor-tantly, although the propositions in Section IV are derived for theprobability of an exception, their proofs hold equally for this defi-nition of an exception rate. In Online Appendix A.6, we show thatour results are also robust to alternative definitions of exceptionrates.

As described in Table I, we observe nearly 3,700 applicantpools consisting of, on average, 260 applicants.26 On average, 19%of workers in a given pool are hired and this proportion is in-creasing in the score of the applicant. Despite this, exceptions arecommon: the average exception rate across applicant pools is 22%.

There is substantial variation in exception rates across bothapplicant pools and managers. Figure III shows histograms ofthe exception rate at the application pool level in the top panel.The left graph shows the unweighted distribution, while the rightgraph shows the distribution weighted by the number of hires. Ineither case, the median exception rate is about 20% of the maximalnumber of possible exceptions, and the standard deviation is about15 percentage points.

We then aggregate exception rates to the manager level byaveraging over all pools a manager hired in, weighting by thenumber of hires in the pool. The middle panels of Figure III showhistograms of manager-level exception rates: these have the same

25. That is, we count the number of order violations that would occur if themanager first hired all available reds, then, if there are still positions to fill, allavailable yellows. Specifically,

Maximum # of Exceptions

=⎧⎨⎩

Nh(NAg +NA

y ) if Nh�NAr ,

NhNAg +NA

r (NAy −(Nh − NA

r )) if NAr < Nh�NA

y +NAr ,

(NAr +NA

y )(NAg − (Nh−NA

r − NAy )) if NA

y +NAr < Nh,

where NAcolor is the number of applicants of a given color and Nh is the total number

of hires.26. This excludes months in which no hires were made.

Downloaded from https://academic.oup.com/qje/article-abstract/133/2/765/4430650by gueston 01 April 2018

Page 21: DISCRETION IN HIRING& · Resum´ es, interviews, ... aligned with firm objectives.1 Firms may thus face both informa-tion and agency problems when making hiring decisions. ∗Weare

DISCRETION IN HIRING 785

02

46

8

0 .2 .4 .6 .8 1

Pool Level, Unweighted

01

23

4

0 .2 .4 .6 .8 1

Pool Level, Weighted by # Hires0

12

34

0 .2 .4 .6 .8 1

Manager Level, Unweighted

02

46

8

0 .2 .4 .6 .8 1

Manager Level, Weighted by # Hires

02

46

8

0 .2 .4 .6

Location Level, Unweighted

05

1015

0 .2 .4 .6

Location Level, Weighted by # Hires

Solid line = mean, dashed line = median

FIGURE III

Distributions of Exception Rates

These figures plot the distribution of the exception rate, as defined by equation (2)in Section V. The top panel presents results at the applicant pool level (defined tobe a manager–location–month). The middle (bottom) panel aggregates these datato the manager (location) level. Figures on the left define the distribution givingapplicant pools equal weight while figures on the right weight by number of hires.Exception rates are defined only for the post-testing sample.

mean and a slightly smaller standard deviation of 10 percent-age points. This means that managers very frequently make ex-ceptions, and some managers consistently make more exceptionsthan others. The bottom panels of Figure III aggregate exceptionrates to the location level and show that there is also systematicvariation in exception rates across locations.

When we examine the relationship between manager-levelexception rates and worker outcomes, we require that variationin exception rates reflects differences in manager choices, drivenby their information and biases. However, it is possible that otherfactors, such as unobserved differences in applicant quality, alsoinfluence exception rates. In the next section, we describe how ourempirical analysis seeks to control for such confounders.

Downloaded from https://academic.oup.com/qje/article-abstract/133/2/765/4430650by gueston 01 April 2018

Page 22: DISCRETION IN HIRING& · Resum´ es, interviews, ... aligned with firm objectives.1 Firms may thus face both informa-tion and agency problems when making hiring decisions. ∗Weare

786 QUARTERLY JOURNAL OF ECONOMICS

V.B. Empirical Specifications

1. Post-Testing Correlation between Exception Rates and Out-comes. We first examine the relationship between the manager-level exception rate and the realized durations of hires in thepost-testing period:

Log(Duration)imlt = a0 + a1Exception Ratem + δl + γt

+ Positioniβ + εimlt.(3)

Our variation comes from differences in manager-level exceptionrates for managers employed at the same location. In our data,99.1% of workers are hired at locations with more than 1 manager.The average location in our sample has nearly 7 managers and theaverage worker is at a location with 11 managers. The coefficientof interest is a1. A negative coefficient, a1 < 0, indicates that thequality of hires is decreasing in the manager’s exception rate.Such a finding suggests that managers may be making exceptionsbecause they are biased or mistaken, and not solely because theyhave useful private information. We cluster standard errors atthe location level, again to take into account any correlation inobservations within a location over time.27

We face two key concerns in interpreting a1. First, exceptionrates may be driven by omitted variables that separately impactworker durations. For example, some locations may be inherentlyless desirable places that both attract more managers with biasesor bad judgment and retain fewer workers. This would drive anegative correlation between exception rates and outcomes thatis unrelated to discretion.

Second, as discussed in the introduction, we observe onlyhires and not offers. This means that we cannot tell the dif-ference between a yellow that is hired even when a greenapplicant is available or a yellow that is hired after all greenapplicants have turned down the offer. One concern is that wemay observe more “false exceptions” when green workers havebetter outside options. In such cases, we may also see lower dura-tions simply because the manager was forced to hire second choiceworkers.

27. If we instead cluster by manager, the level of variation underlying our keyright-hand-side variable, we get very similar standard errors.

Downloaded from https://academic.oup.com/qje/article-abstract/133/2/765/4430650by gueston 01 April 2018

Page 23: DISCRETION IN HIRING& · Resum´ es, interviews, ... aligned with firm objectives.1 Firms may thus face both informa-tion and agency problems when making hiring decisions. ∗Weare

DISCRETION IN HIRING 787

In both cases, accounting for controls may alleviate someconcerns. For example, location fixed effects control for fixeddifferences across locations in unobserved applicant quality;location-specific time trends further control for smooth changesin these characteristics. Controlling for local labor market condi-tions reduces the likelihood that our results are driven by “falseexceptions,” because such exceptions may be more common whengreen workers have better outside options. Our full set of controlsincludes location, time, and position-type fixed effects; client-yearfixed effects; local labor market variables; location-specific timetrends; and detailed controls for the quality and number of ap-plicants in an application pool (fixed effects for each decile of thenumber of applicants, hire rate, share of applicants that are green,and share that are yellow).28

In addition to these controls, examining manager-level ex-ception rates has the benefit of smoothing idiosyncratic variationacross individual pools that may drive both exception rates andoutcomes. For example, in some pools, green applicants may beatypically weak. In this case, managers may optimally hire yel-low applicants, but their hires would still have low average du-rations relative to workers that the location is usually able toattract. Similarly, some pools may have more “false exceptions”because of an idiosyncratically low yield rate for green applicants.Averaging to the manager level (the average manager hires in 18applicant pools) reduces the extent to which our measure of ex-ceptions is driven by such sources of variation. Given our controls,this same concern would apply only if some managers systemati-cally face idiosyncratically weaker pools or lower yield rates thanother managers at the same location facing observably similarpools.

2. Differential Impact of Testing, by Exception Rates. We alsoconsider how the impact of testing differs across exception rates.If managers exercise discretion because they are biased or misin-formed, then we may expect the benefits of testing that we doc-ument in Section III.B to be lower for high-exception managers.We estimate this using a similar specification to that described in

28. We have also explored controls for the number of hires made in the severalpreceding months to take into account that applicant pools may be depleted overtime. Results are very similar with these controls.

Downloaded from https://academic.oup.com/qje/article-abstract/133/2/765/4430650by gueston 01 April 2018

Page 24: DISCRETION IN HIRING& · Resum´ es, interviews, ... aligned with firm objectives.1 Firms may thus face both informa-tion and agency problems when making hiring decisions. ∗Weare

788 QUARTERLY JOURNAL OF ECONOMICS

Section III.A:

log(Duration)imlt = b0 + b1Testinglt × Exception Ratel

+ b2Testinglt + δl + γt + Positioniβ + εimlt.

(4)

Equation (4) includes the main effect of testing but allows testingto interact with the location-specific exception rate. We considerlocation-level variation in this case because we do not observemanager identifiers in the preperiod. The coefficient of interest, b1,thus estimates how the impact of testing differs at locations thatsubsequently make more or fewer exceptions. A value of b1 < 0indicates that exceptions attenuate gains from testing. Unlikeequation (3), which can be estimated only on post-testing data,this specification uses our full data set, making it possible for usto more precisely identify location fixed effects and other controls.

V.C. Results

Figure IV examines the correlation between exception ratesand durations for hired workers after the introduction of testing.We divide managers into 20 equally sized bins based on their hire-weighted exception rate (x-axis) and regress duration measures onan exhaustive set of indicators for each bin, plus base controls. Weplot the coefficients on these exception rate bin indicators (y-axis)against the average exception rate in each bin (x-axis). The top leftpanel summarizes the log duration regression, which adjusts forcensoring with a censored-normal regression. The remaining pan-els plot the milestone measures for the probability that a workerstays at least 3, 6, or 12 months. For all outcomes, we see a neg-ative relationship: job durations are shorter for workers hired bymanagers with higher exception rates.

Table III presents the accompanying regression analysis. Inthese regressions, we standardize exception rates to be mean 0and standard deviation 1 so that the units are easier to inter-pret. Column (1) contains our base specification and indicatesthat a one standard deviation increase in the exception rate isassociated with a 7% reduction in job durations, significant atthe 5% level. Adding controls reduces the size of the standard er-ror and the coefficient slightly. In our full-controls specification,a one standard deviation higher exception rate is associated with6% shorter durations, still significant at the 5% level. This says

Downloaded from https://academic.oup.com/qje/article-abstract/133/2/765/4430650by gueston 01 April 2018

Page 25: DISCRETION IN HIRING& · Resum´ es, interviews, ... aligned with firm objectives.1 Firms may thus face both informa-tion and agency problems when making hiring decisions. ∗Weare

DISCRETION IN HIRING 789

5.4

5.5

5.6

5.7

5.8

.1 .2 .3 .4 .5

Log(Duration)

.48

.5.5

2.5

4.5

6

.15 .2 .25 .3 .35 .4

Survived 3 Months.3

.32

.34

.36

.38

.15 .2 .25 .3 .35 .4

Survived 6 Months

.17

.18

.19

.2.2

1.2

2

.15 .2 .25 .3 .35 .4

Survived 12 Months

Dur

atio

n M

easu

re

Average Exception Rate

FIGURE IV

Manager-Level Exception Rates and Post-Testing Job Durations

We relate post-testing job durations to manager-level exception rates across 20equally sized bins (weighted by number of hires). In the top left panel, we estimatea censored normal regression of log duration on the 20 bins and control variables(location, hire month, and position fixed effects), where the x-axis represents theaverage exception rate within each bin. On the y-axis, we plot the coefficientestimates on the 20 bins. This is implemented in Stata using “cnreg” (with noconstant/intercept term included). The panel also shows the line of best fit for the20 points. In the top right, lower left, and lower right panels, we plot means ofwhether workers stay 3, 6, or 12 months as a function of exception rates (controllingfor location, hire month, and position fixed effects) using a binned scatter plot(using “binscatter” in Stata (Stepner 2013)). In these three panels, we restrictattention to workers hired at least 3, 6, or 12 months before the data end date foreach of the 15 client firms, and we show the line of best fit.

that even when we analyze managers at the same location hir-ing the same number of workers out of pools that have the sameshare of red, yellow, and green applicants, we continue to findthat managers who make more exceptions do worse. The middlepanel of Online Appendix Table C1 summarizes regressions forthe milestone measures, where we also find significant negativerelationships.

Next, Table IV examines how the impact of testing variesby the extent to which locations make exceptions. Estimates arebased on equation (4). Including the full set of controls (column

Downloaded from https://academic.oup.com/qje/article-abstract/133/2/765/4430650by gueston 01 April 2018

Page 26: DISCRETION IN HIRING& · Resum´ es, interviews, ... aligned with firm objectives.1 Firms may thus face both informa-tion and agency problems when making hiring decisions. ∗Weare

790 QUARTERLY JOURNAL OF ECONOMICS

TABLE IIIEXCEPTION RATES AND POST-TESTING DURATION

Dependent variable: log(Duration)

(1) (2) (3) (4) (5)

Manager exception rate −0.0682** −0.0658** −0.0661** −0.0607** −0.0557**(0.0346) (0.0321) (0.0322) (0.0292) (0.0283)

N 91,319 91,319 91,319 91,319 91,319

Year–month FEs X X X X XLocation FEs X X X X XPosition-type FEs X X X X XClient firm × year FEs X X X XLocal unemployment controls X X XLocation time trends X XApplicant pool controls X

Notes. This table reports censored normal regressions of manager-level exception rates and tenure outcomesrestricted to the post-testing sample. The exception rate is defined as the number of times a yellow is hiredabove a green plus the number of times a red is hired above a green or yellow, divided by the maximumexceptions possible in that applicant pool. It is then aggregated to the manager level and standardized tobe mean 0 and standard deviation 1. Applicant pool controls include fixed effects for deciles of each of thefollowing variables: number of applicants, hire rate, share of applicants that are green, and share that areyellow. Standard errors are clustered by location. *** p < .01, ** p < .05, * p < .1.

(4)), we find that at the mean exception rate (recall that we stan-dardize exception rates), testing increases durations by 0.23 logpoints, but this effect is substantially offset (by 0.14 log points)for each standard deviation increase in the exception rate, sig-nificant at the 5% level.29 The bottom panel of Online AppendixTable C1 shows that these results are robust to OLS estimatesusing milestone measures as dependent variables.

Figure V illustrates how the impact of testing varies for loca-tions with different average exception rates, using base controls.30

For all tenure outcomes (log(Duration) and milestones) we find anegative relationship that does not appear to be driven by anyparticular exception-rate bin.

Across a variety of specifications, we consistently find thatworker tenure is lower for managers who made more exceptionsto test recommendations. The magnitude of this estimate implies

29. Here, we do not include controls for applicant pool quality because it isunavailable pre-testing.

30. To construct this, we divide locations into 20 hire-weighted bins basedon their average location-level exception rate post-testing and augment equation(4) with indicators for the interaction of exception rate bins and the post-testingdummy. We then plot the bin-specific impact-of-testing coefficient on the y-axis andthe average exception rate in each bin on the x-axis. For the log(Duration) outcome(top left panel), we adjust for censoring with censored-normal regressions.

Downloaded from https://academic.oup.com/qje/article-abstract/133/2/765/4430650by gueston 01 April 2018

Page 27: DISCRETION IN HIRING& · Resum´ es, interviews, ... aligned with firm objectives.1 Firms may thus face both informa-tion and agency problems when making hiring decisions. ∗Weare

DISCRETION IN HIRING 791

TABLE IVTHE IMPACT OF TESTING BY EXCEPTION RATE

Dependent variable: log(Duration)

(1) (2) (3) (4)

Post-testing 0.366*** 0.234** 0.242*** 0.234***(0.119) (0.107) (0.0696) (0.0589)

Location exception −0.142** −0.150** −0.152** −0.142**Rate * post-testing (0.0652) (0.0679) (0.0655) (0.0573)N 265,648 265,648 265,648 265,648

Year–month FEs X X X XLocation FEs X X X XPosition-type FEs X X X XClient firm × year FEs X X XLocal unemployment controls X XLocation time trends X

Notes. This table reports censored normal regressions of the differential impact of testing adoption, bylocation-level exception rate. We use the same sample as defined by the notes to Table II. The exception rateis defined as the number of times a yellow is hired above a green plus the number of times a red is hired abovea green or yellow, divided by the maximum exceptions possible in that applicant pool. It is then aggregatedto the location level and standardized to be mean 0 and standard deviation 1. *** p < .01, ** p < .05, * p < .1.

that a firm made up of managers at the 10th percentile of the ex-ception distribution would have approximately 15% longer workerdurations in the post-testing period, relative to a firm made up of90th percentile managers (higher exception rates are worse). Fur-ther, locations at the 10th percentile of the exception distributionexperience duration improvements with the adoption of testingthat are multiple times larger than improvements at 90th per-centile locations.

Viewed in light of our theoretical predictions, these resultssuggest that managers often make exceptions because they areeither biased or misinformed. Even if high-exception managerswere well informed about worker quality, the fact that their hiringoutcomes are worse suggests that their biases lead them to makechoices that do not maximize quality.

V.D. Additional Robustness Checks

In this section we address several alternative explanationsfor our findings.

1. Quality of “Passed Over” Workers. There are some scenariosunder which we may find a negative correlation between workeroutcomes and exception rates, even when managerial discretionimproves hiring. For example, as discussed earlier, a manager may

Downloaded from https://academic.oup.com/qje/article-abstract/133/2/765/4430650by gueston 01 April 2018

Page 28: DISCRETION IN HIRING& · Resum´ es, interviews, ... aligned with firm objectives.1 Firms may thus face both informa-tion and agency problems when making hiring decisions. ∗Weare

792 QUARTERLY JOURNAL OF ECONOMICS

−.5

0.5

1

.1 .2 .3 .4 .5

Log(Duration)

−.4

−.2

0.2

.1 .2 .3 .4 .5

Survived 3 Months−

.10

.1.2

.3

.1 .2 .3 .4 .5

Survived 6 Months

−.1

0.1

.2.3

.4

.1 .2 .3 .4 .5

Survived 12 Months

Impa

ct o

f Tes

ting

Average Exception Rate

FIGURE V

Location-Level Exception Rates and the Impact of Testing on Job Durations

We plot the impact of testing within 20 equally sized bins, based on the location-level exception rate, on the average exception rate in each bin. Estimates includebase controls: location, hire month, and position fixed effects. The top left graph isestimated with censored-normal regressions, while the others are estimated usingOLS for the sample of workers hired at least 3, 6, or 12 months before the dataend date for each of the 15 firms.

tend to make more exceptions because he or she sees idiosyncrat-ically weak green applicants, relative to the typical applicants atthe location. As another example, locations with high exceptionrates may benefit less from the test because its managers alwayshave better private information.

In these and other similar scenarios, it should still be thecase that individual exceptions are correct: a yellow hired as anexception should perform better than a green who is not hired.To examine this, we would ideally compare the counterfactual du-ration of applicants who are not hired with the actual durationsof those who were. While this is not generally possible, we can,in some cases, approximate such a comparison by exploiting thetiming of hires. Specifically, we compare the tenure of yellow work-ers hired as exceptions to green workers from the same applicantpool who are not hired that month, but who subsequently begin

Downloaded from https://academic.oup.com/qje/article-abstract/133/2/765/4430650by gueston 01 April 2018

Page 29: DISCRETION IN HIRING& · Resum´ es, interviews, ... aligned with firm objectives.1 Firms may thus face both informa-tion and agency problems when making hiring decisions. ∗Weare

DISCRETION IN HIRING 793

TABLE VTENURE OF EXCEPTIONS VERSUS PASSED OVER APPLICANTS

Dependent variable: log(Duration)

(1) (2)

Quality of yellow exceptions versus passed over greensPassed over greens 0.0402* 0.0449

(0.0220) (0.0357)N 53,166 53,166

Quality of red exceptions versus passed over greens and yellowsPassed over greens 0.159*** 0.143**

(0.0543) (0.0634)Passed over yellows 0.143*** 0.121**

(0.0546) (0.0597)N 25,782 25,782

Base controls X XComparison pool FEs X

Notes. Regressions are restricted to the post-testing sample, adjust for censoring, and standarderrors are clustered at the location level. The top (bottom) panel compares yellow (red) exceptions—the omitted category—to passed over greens (and yellows) who were available at the same time buthired in a later month. Observations are restricted to pools with at least one exception and one passedover worker, and are further restricted to locations and pools with at least 10 and 5 observations,respectively. Base controls are location, hire month, and position-type fixed effects. Comparison poolfixed effects are defined by the manager–location–month for the applicant pool in which candidateswere considered together. *** p < .01, ** p < .05, * p < .1.

working in a later month. If managers make exceptions whenthey have better information, then exception yellows should havelonger tenures than “passed over” greens.

Table V shows that is not the case. The first panel com-pares durations of workers who are exception yellows (the omit-ted group) to greens whose application was active in the samemonth, but were hired only in a later month. Because these work-ers are hired at different times, all regressions control for hiremonth fixed effects to account for mechanical differences in du-ration. In column (2), which includes applicant pool fixed effects,the coefficient on “passed over greens” compares this group toyellow applicants from the same applicant pool who were hiredbefore them.31 The second panel of Table V repeats this exercise,

31. The applicant pool fixed effect is at the location–manager–date level, wherethe date is the month in which both applications were active, the yellow was hired,and the green was hired only later. These fixed effects thus subsume a number ofcontrols from our full specification from Table III.

Downloaded from https://academic.oup.com/qje/article-abstract/133/2/765/4430650by gueston 01 April 2018

Page 30: DISCRETION IN HIRING& · Resum´ es, interviews, ... aligned with firm objectives.1 Firms may thus face both informa-tion and agency problems when making hiring decisions. ∗Weare

794 QUARTERLY JOURNAL OF ECONOMICS

comparing exception reds (the omitted group) to passed over yel-lows and greens.32

Both panels show that workers hired as exceptions haveshorter tenures. Passed over greens stay 4% longer than yellowshired before them from the same pool (column (2), top panel),though this estimate is noisy. We estimate a more precise rela-tionship for exception-red workers: passed over greens and yel-lows stay roughly 14% and 12% longer, respectively. These re-sults suggest it is unlikely that exceptions are driven by betterinformation: high-scoring workers who are initially passed overoutperform low-scoring workers chosen first.33

2. “False Exceptions.” As mentioned, one may be concernedthat we do not observe job offers and thus cannot distinguishbetween cases in which yellow applicants are hired as true ex-ceptions, or when they are hired because green applicants turneddown offers. By analyzing manager- or location-level exceptionrates, we aggregate over some of the idiosyncratic variation thatmay generate false exceptions, and our controls for local labor mar-ket conditions may help absorb some of the time-varying driversof such exceptions.

As an additional test, Online Appendix Table C3 shows thatour results hold when restricting to pools with at least as manygreen applicants as the total number of hires (84% of hires camefrom such a pool). In such pools, it is less likely that a yellow or redworker was hired because all green applicants received an offerand turned it down.

3. Heterogeneity across Locations. Another possible concernis that the relevance of the test varies across locations and thatthis drives the negative correlation between exception rates andworker outcomes. For example, in very undesirable locations,green applicants might have better outside options and be moredifficult to retain. In these locations, a manager attempting toavoid costly retraining may optimally decide to make exceptionsin order to hire workers with lower outside options.

32. We restrict observations in both panels to pools in which there were bothan exception and a passed over applicant (92% and 59% of hires in the top andbottom panels, respectively). To identify control variables, we further restrict tolocations and pools that have at least 10 and 5 observations, respectively.

33. An alternative explanation is that the applicants with higher test scoreswere not initially passed up, but were instead initially unavailable, for examplebecause they were engaged in on-the-job search. However, Online Appendix TableC2 shows that delays are not correlated with worker quality.

Downloaded from https://academic.oup.com/qje/article-abstract/133/2/765/4430650by gueston 01 April 2018

Page 31: DISCRETION IN HIRING& · Resum´ es, interviews, ... aligned with firm objectives.1 Firms may thus face both informa-tion and agency problems when making hiring decisions. ∗Weare

DISCRETION IN HIRING 795

In Online Appendix A.5 we provide evidence that the appar-ent usefulness of the test does not systematically vary by locationcharacteristics. There we explore the relationship between colorscore and job duration as a function of a wide range of locationcharacteristics, such as exception rates and average durations.We robustly find that color score is predictive of worker quality,regardless of the location’s characteristics on each of these dimen-sions.

4. Productivity. Our results show that high-exception man-agers hire workers with lower job duration. These exceptions maystill benefit the firm if such workers are better on other dimen-sions. For example, managers may optimally hire workers whoare more likely to turn over if their private signals indicate thatthose workers might be more productive while they are employed.

Our final set of results provides evidence that this is unlikelyto be the case. For a subset of client firms, we observe a direct mea-sure of worker productivity: output per hour. Recall that in oursetting, individuals perform essentially one major task and engagein individual production. Some examples may include the num-ber of data items entered per hour, the number of standardizedtests graded per hour, and the number of phone calls completedper hour. As in these examples, output per hour is an importantmeasure of productivity for the fairly homogeneous task we study.

We define a worker-level output per hour metric as a worker’soutput per hour averaged over all the days where such a metricis observed for that worker. Across all workers with an availablemeasure, output per hour has an average of 8.4 with a standarddeviation of 4.7. There is thus a wide range of performance out-comes.34 From Table I, average output per hour is slightly higherin the post-testing sample period and varies slightly by color score,though the differences are not significant.

This measure is available for 62,427 workers (one-quarter ofall hires) in six client firms. The primary reason for missing outputis that the metric is not made available to us for many locations,time periods, and end clients (i.e., the ones purchasing servicesfrom the client firms).35 In addition, its availability depends on

34. We also control for the number of tasks in a day that are used to measurea worker’s output per hour. We aggregate this to the worker level by averagingindicators for count decile across all observations for a worker.

35. We can account for half of the variation in whether an output measureis available for an individual worker with location, time, and position controls.

Downloaded from https://academic.oup.com/qje/article-abstract/133/2/765/4430650by gueston 01 April 2018

Page 32: DISCRETION IN HIRING& · Resum´ es, interviews, ... aligned with firm objectives.1 Firms may thus face both informa-tion and agency problems when making hiring decisions. ∗Weare

796 QUARTERLY JOURNAL OF ECONOMICS

TABLE VITESTING, EXCEPTION RATES, AND OUTPUT PER HOUR

Dependent variable: output per hour

(1) (2) (3) (4)

Post-testing sample Introduction of testingPost-testing 0.343 0.156

(0.366) (0.327)Exception rate ×

post-testing−0.0659 −0.111 0.137 −0.137(0.134) (0.0953) (0.153) (0.217)

N 28,858 28,858 62,421 62,421

Base controls X X X XFull controls X X

Notes. See notes to Tables III and IV. The dependent variable in this case is output per hour and regressionsare estimated with OLS. In columns (1) and (2), we examine the post-testing correlation between the manager-level exception rate and output per hour. In columns (3) and (4), we examine the differential impact of testingas a function of location-level exception rates. Base controls include location, hire month, and position fixedeffects, as well as controls for the number of tasks in a day that are used to measure a worker’s output perhour. Full controls add client-by-year effects, local unemployment rates, and location-specific time trends. Forthe post-testing sample regressions (columns (1) and (2)), full controls also include applicant pool controls.*** p < .01, ** p < .05, * p < .1.

workers completing their training and being permitted to performthe job task: this is the period in which workers become valuableto client firms.

Relative to our main sample, the set of workers with outputdata is positively selected on duration. Despite this, output re-mains positively correlated with job duration. Online AppendixFigure C1 presents a binned scatter of output per hour for 20evenly sized bins of log(Duration).36 Except for one outlier forworkers with very low tenure, there is a strong positive relation-ship: workers with longer job durations have higher output perhour.

Table VI summarizes our main analyses using output perhour as the dependent variable. Columns (1)–(2) document thepost-testing correlation between manager-level exceptions andoutput per hour. For both our base and full sets of controls, weobtain negative coefficients that are not significant. For examplein column (2), the estimate is −0.11 with a standard error of 0.095.

According to the data firm, certain lines of business within a firm do not maketheir productivity data available.

36. We control for location fixed effects to account for differences in averageoutput per hour across locations.

Downloaded from https://academic.oup.com/qje/article-abstract/133/2/765/4430650by gueston 01 April 2018

Page 33: DISCRETION IN HIRING& · Resum´ es, interviews, ... aligned with firm objectives.1 Firms may thus face both informa-tion and agency problems when making hiring decisions. ∗Weare

DISCRETION IN HIRING 797

Recall the manager-level exception rates are standardized to bemean 0 and standard deviation 1 in the full post-testing sam-ple. The estimate therefore implies that a one standard deviationhigher exception rate manager hires workers who perform 0.11fewer units of output per hour, on average. This is 2.3% of the stan-dard deviation for output per hour (4.7, mentioned above). Basedon the standard error, we can rule out positive effects beyond 1.7%and negative effects beyond −6.4% of a standard deviation with95% confidence.

Columns (3)–(4) examine the differential impact of testing bylocation-level exception rate. The coefficient on testing gives theimpact of testing for locations with the mean exception rate (basedon the full sample). In the baseline specification, testing improvesoutput per hour by 0.34 or 7% of a standard deviation. This effect issmall in magnitude and is not statistically significant. We also findmodestly sized and insignificant coefficients for the interactionterm. Coefficients are similar in magnitude but opposite in signacross base and full controls. With full controls, the point estimateof −0.137 implies that the impact of testing in a location with aone standard deviation higher exception rate is offset by 0.137output per hour units, or about 2.9% of a standard deviation. Wecan rule out positive effects outside of 6.3% and negative effectsoutside of −12% with 95% confidence.

Although noisy, these findings taken together suggest thatoutput per hour does not appear to be strongly related to excep-tion rates. We do not find evidence of a large negative associationbetween exceptions and output per hour, as we did with job dura-tions. However, in all cases we find no evidence that managerialexceptions improve output per hour by any sizable amount. Thisis inconsistent with a model in which managers optimally sacri-fice job tenure in favor of workers who perform better on otherquality dimensions.

VI. CONCLUSION

We evaluate the introduction of a hiring test for a low-skilledservice sector job. Exploiting variation in the timing of adoptionacross locations within firms, we show that testing significantlyincreases the durations of hired workers. We then document sub-stantial variation in how managers appear to use job test rec-ommendations: some tend to hire applicants with the best testscores while others appear to make many more exceptions. Across

Downloaded from https://academic.oup.com/qje/article-abstract/133/2/765/4430650by gueston 01 April 2018

Page 34: DISCRETION IN HIRING& · Resum´ es, interviews, ... aligned with firm objectives.1 Firms may thus face both informa-tion and agency problems when making hiring decisions. ∗Weare

798 QUARTERLY JOURNAL OF ECONOMICS

a range of specifications, we show that hiring against test recom-mendations is associated with worse outcomes.

Firms in our setting may find this result useful as they decidehow much discretion to grant their managers, and how much torely on job tests or other signals of worker quality. For example, incases where high-exception managers are associated with worseoutcomes, firms may wish to decrease the rate of exceptions eitherby limiting managerial discretion (particularly for high-exceptionmanagers) or by finding new managers who are less biased. Morebroadly, firms may be able to improve outcomes by adopting poli-cies to influence manager behavior, such as increasing feedbackabout the quality of hires or tying pay more closely to perfor-mance. Such policies may encourage managers to find ways tocomplement the test as they continue to learn.

There are several caveats one must keep in mind wheninterpreting our results. First, while our results suggest thathigh-exception managers make decisions with bias, limitingmanagerial discretion could have unintended consequences (suchas demoralizing managers), and could be bad for some managerswho do have valuable private information. Second, we emphasizethat our findings may not apply to all firms. We focus on workerswho perform low-skilled service sector tasks without a teamworkcomponent. A manager’s private signals of worker quality maybe more valuable in higher-skilled settings with more complextasks.37 Furthermore, managers may have more opportunities tocorrect their mistaken beliefs in settings where they regularly in-teract with applicants on the job. The HR managers we study gen-erally do not supervise applicants after they are hired, which alsolimits the scope for a manager–employee match component thatmight make discretion more useful. An additional contribution ofour article is that we present a way to assess the consequencesof discretion using only data that would readily be available formany firms using workforce analytics.

More broadly, our findings highlight the role that technologycan play in reducing the impact of managerial mistakes or biases

37. In fact, Frederiksen, Kahn, and Lange (2017) show that managerial discre-tion over performance management can be valuable in the context of a high-skilledservice profession. Li and Agha (2015) show that the judgment of human reviewersprovides valuable information about the quality of scientific proposals that is notavailable from CVs and other quantitative metrics. Hoffman and Tadelis (2017)show that subordinates provide subjective assessments of managers that predicthard outcomes in another high-skilled setting.

Downloaded from https://academic.oup.com/qje/article-abstract/133/2/765/4430650by gueston 01 April 2018

Page 35: DISCRETION IN HIRING& · Resum´ es, interviews, ... aligned with firm objectives.1 Firms may thus face both informa-tion and agency problems when making hiring decisions. ∗Weare

DISCRETION IN HIRING 799

by changing how decision making is structured within the firm.As workforce analytics become an increasingly important part ofHR management, more work needs to be done to understand howsuch technologies interact with organizational structure and theallocation of decision rights within the firm. This article makesan important step towards understanding and quantifying theseissues.

UNIVERSITY OF TORONTO AND NATIONAL BUREAU OF ECONOMIC

RESEARCH

YALE UNIVERSITY AND NATIONAL BUREAU OF ECONOMIC RESEARCH

MIT AND NATIONAL BUREAU OF ECONOMIC RESEARCH

SUPPLEMENTARY MATERIAL

An Online Appendix for this article can be found at The Quar-terly Journal of Economics online. Code replicating tables and fig-ures in this article can be found in Hoffman, Kahn, and Li (2017)in the Harvard Dataverse, doi:10.7910/DVN/DWPXXT.

REFERENCES

Aghion, Philippe, and Jean Tirole, “Formal and Real Authority in Organizations,”Journal of Political Economy, 105 (1997), 1–29.

Alonso, Ricardo, and Niko Matouschek. “Optimal Delegation,” Review of EconomicStudies, 75 (2008), 259–293.

Altonji, Joseph, and Charles Pierret, “Employer Learning and Statistical Discrim-ination,” Quarterly Journal of Economics, 116 (2001), 313–350.

Autor, David, “Why Do Temporary Help Firms Provide Free General Skills Train-ing?,” Quarterly Journal of Economics, 116 (2001), 1409–1448.

Autor, David, and David Scarborough, “Does Job Testing Harm Minority Workers?Evidence from Retail Establishments,” Quarterly Journal of Economics, 123(2008), 219–277.

Baker, George, and Thomas Hubbard, “Contractibility and Asset Ownership: On-Board Computers and Governance in U.S. Trucking,” Quarterly Journal ofEconomics, 119 (2004), 1443–1479.

Bolton, Patrick, and Mathias Dewatripont “Authority in Organizations,” in TheHandbook of Organizational Economics, Robert Gibbons and John Roberts,eds. (Princeton, NJ: Princeton University Press, 2013).

Brown, Meta, Elizabeth Setren, and Giorgio Topa, “Do Informal Referrals Lead toBetter Matches? Evidence from a Firm’s Employee Referral System,” Journalof Labor Economics, 34 (2016), 161–209.

Burks, Stephen, Bo Cowgill, Mitchell Hoffman, and Michael Housman, “The Valueof Hiring through Employee Referrals,” Quarterly Journal of Economics, 130(2015), 805–839.

Dana, Jason, Robyn Dawes, and Nathaniel Peterson, “Belief in the UnstructuredInterview: The Persistence of an Illusion,” Judgment and Decision Making, 8(2013), 512–520.

Dessein, Wouter, “Authority and Communication in Organizations,” Review of Eco-nomic Studies, 69 (2002), 811–838.

Diamond, Rebecca, and Petra Persson, “The Long-Term Consequences of TeacherDiscretion in Grading of High-Stakes Tests,” Mimeo, Stanford University,2016.

Downloaded from https://academic.oup.com/qje/article-abstract/133/2/765/4430650by gueston 01 April 2018

Page 36: DISCRETION IN HIRING& · Resum´ es, interviews, ... aligned with firm objectives.1 Firms may thus face both informa-tion and agency problems when making hiring decisions. ∗Weare

800 QUARTERLY JOURNAL OF ECONOMICS

Farber, Henry, and Robert Gibbons, “Learning and Wage Dynamics,” QuarterlyJournal of Economics, 111 (1996), 1007–1047.

Frankel, Alexander, “Selecting Applicants,” Mimeo, University of Chicago, 2017.Frederiksen, Anders, Lisa B. Kahn, and Fabian Lange, “Supervisors and Perfor-

mance Management Systems,” NBER Working Paper No. 23351, 2017.Griliches, Zvi, “Hybrid Corn: An Exploration in the Economics of Technological

Change,” Econometrica, 25 (1957), 501–522.Hoffman, Mitchell, Lisa B. Kahn, and Danielle Li. “Replication Data for ‘Discretion

in Hiring’,” Harvard Dataverse (2017), doi:10.7910/DVN/DWPXXT.Hoffman, Mitchell, and Steven Tadelis. “People Management Skills, Employee

Attrition, and Manager Rewards: An Empirical Analysis,” Working Paper,University of Toronto, 2017.

Horton, John, “The Effects of Algorithmic Labor Market Recommendations: Ev-idence from a Field Experiment,” Journal of Labor Economics, 35 (2017),345–385.

Jovanovic, Boyan, “Job Matching and the Theory of Turnover,” Journal of PoliticalEconomy, 87 (1979), 972–990.

Kahn, Lisa B., and Fabian Lange, “Employer Learning, Productivity and the Earn-ings Distribution: Evidence from Performance Measures,” Review of EconomicStudies, 81 (2014), 1575–1613.

Kahneman, Daniel, Thinking Fast and Slow (New York: Farrar, Strauss, andGiroux, 2011).

Kleinberg, Jon, Himabindu Lakkaraju, Jure Leskovec, Jens Ludwig, and Send-hil Mullainathan, “Human Decisions and Machine Predictions,” QuarterlyJournal of Economics, 133 (2018), 237–293.

Kuncel, Nathan, David Klieger, Brian Connelly, and Deniz Ones, “MechanicalVersus Clinical Data Combination in Selection and Admissions Decisions: AMeta-Analysis,” Journal of Applied Psychology, 98 (2013), 1060–1072.

Kuziemko, Ilyana, “How Should Inmates Be Released from Prison? An Assessmentof Parole versus Fixed Sentence Regimes,” Quarterly Journal of Economics,128 (2013), 371–424.

Lazear, Edward, Kathryn Shaw, and Christopher Stanton, “The Value of Bosses,”Journal of Labor Economics, 33 (2015), 823–861.

Li, Danielle, “Expertise and Bias in Evaluation: Evidence from the NIH,” AmericanEconomic Journal: Applied Economics, 9 (2017), 60–92.

Li, Danielle, and Leila Agha, “Big Names or Big Ideas: Do Peer Review PanelsSelect the Best Science Proposals?,” Science, 348 (2015), 434–438.

Pallais, Amanda, and Emily Sands, “Why the Referential Treatment? Evidencefrom Field Experiments on Referrals,” Journal of Political Economy, 124(2016), 1793–1828.

Paravisini, Daniel, and Antoinette Schoar, “The Incentive Effect of IT: RandomizedEvidence from Credit Committees,” NBER Working Paper No. 19303, 2013.

Rivera, Lauren, “Hiring as Cultural Matching: The Case of Elite ProfessionalService Firms,” American Sociological Review, 77 (2012), 999–1022.

Stanton, Christopher, and Catherine Thomas, “Landing the First Job: The Valueof Intermediaries in Online Hiring,” Review of Economic Studies, 83 (2016),810–854.

Stepner, Michael, “BINSCATTER: Stata module to generate binned scatterplots,”Statistical Software Components, Boston College Department of Economics,2013.

Wang, James. “Why Hire Loan Officers? Examining Delegated Expertise,” Mimeo,University of Michigan, 2014.

Downloaded from https://academic.oup.com/qje/article-abstract/133/2/765/4430650by gueston 01 April 2018