paweª doligalski abdoulaye ndiaye nicolas werquinpdoligalski.github.io/files/dnw_taxperfpay.pdf ·...

Redistribution with Performance Pay*

Paweª Doligalski

University of Bristol

Abdoulaye Ndiaye

New York University

Nicolas Werquin

Toulouse School of Economics

May 21, 2020

Abstract

Half of the jobs in the U.S. feature pay-for-performance. We study nonlinear in-

come taxation in a model where such labor contracts arise as a result of moral hazard

frictions within �rms. We derive novel formulas for the incidence of arbitrarily nonlin-

ear reforms of a given tax code on both average earnings and their sensitivity to output

risk. We show theoretically and quantitatively that, following an increase in tax pro-

gressivity, the higher sensitivity of earnings to performance caused by the crowding-out

of private insurance is almost fully o�set by a countervailing performance-pay e�ect

driven by labor supply responses. As a result, earnings risk is hardly a�ected by pol-

icy. We then turn to the normative analysis of a government that levies taxes and

transfers to redistribute income across workers with di�erent levels of uninsurable pro-

ductivity. We �nd that setting taxes without accounting for the endogeneity of private

insurance is close to optimal. Thus, the common concern that standard models of tax-

ation underestimate the cost of redistribution is, in the context of performance-based

compensation, overblown.

*We thank Árpád Ábrahám, Gadi Barlevy, Katherine Carey, Antoine Ferey, Xavier Gabaix,Daniel Garrett, Louis Kaplow, Stefan Pollinger, Morten Ravn, Kjetil Storesletten, Florian Scheuer,Karl Schulz, Stefanie Stantcheva, Aleh Tsyvinski, Hélène Turon, Venky Venkateswaran, PhilippWangner, and numerous seminar and conference participants, for helpful comments and suggestions.Nicolas Werquin acknowledges support from ANR under grant ANR-17-EURE-0010 (Investissementsd'Avenir program).

Introduction

What do fruit harvesters, real estate brokers, bankers and CEOs have in common?

All of them are paid based on their performance. Performance-pay contracts have

become increasingly popular across the income distribution. Empirically, a large share

� roughly half � of all the jobs in the U.S. involves performance-based compensation

(Lemieux, MacLeod, and Parent (2009)) in the form of piece rates, commissions,

bonuses, and stock options. These contracts are qualitatively di�erent from usual

wage contracts. Indeed, the structure of earnings is designed not only to compensate

the employee for completing the job, but also to provide incentives for e�ort in the

�rst place. When wages are highly sensitive to performance � incentives are high-

powered � employees are generously rewarded for better outcomes, but at the same

time they are also more exposed to risk. Crucially, we expect both the level and

the performance-sensitivity of these contracts to be endogenous to the tax policy

implemented by the government. Yet despite the prevalence of these compensation

schemes, they have not been systematically studied in the taxation literature. We �ll

this gap. In a general and tractable framework we derive in closed form the incidence

of tax reforms on the earnings and utility that performance-pay workers receive in

equilibrium. We also derive the impact of taxes on government revenue and social

welfare, as well as the optimal rate of tax progressivity in the presence of such realistic

labor contracts.

A widespread concern is that traditional models of income taxation in the tradi-

tion of Mirrlees (1971) substantially overstate the optimal level of taxes, by assuming

that heterogeneity in wage rates is exogenous and policy-invariant.1 Instead, when

wage risk is endogenous, increasing the progressivity of income taxes should lead to a

crowding-out of private insurance provided by �rms, that is, a one-for-one spread of

the pre-tax earnings distribution. Theoretically, this crowding-out has been shown to

be of critical importance in various contexts � in particular by Attanasio and R�os-

Rull (2000), Golosov and Tsyvinski (2007), and Krueger and Perri (2011) � where

it severely limits the ability of governments to provide social insurance. Empirically,

evidence of such crowding-out has been highlighted in several markets, for instance

unemployment or health insurance � see Cullen and Gruber (2000); Schoeni (2002);

1This is the case both in the static (for instance Saez (2001)) and dynamic (for instance Golosov,Kocherlakota, and Tsyvinski (2003); Farhi and Werning (2013); Golosov, Troshkin, and Tsyvinski(2016)) frameworks.

1

Cutler and Gruber (1996a,b). Yet the empirical literature that studies the impact

of income taxes on the structure of performance-pay contracts often fails to �nd sig-

ni�cant crowding-out e�ects, see for instance Rose and Wolfram (2002); Frydman

and Molloy (2011). Our paper reconciles these �ndings by highlighting a counter-

vailing force that keeps earnings risk practically una�ected by tax policy. This novel

�performance-pay� e�ect is driven by labor supply adjustments. Under a more pro-

gressive tax code, the worker's optimal level of e�ort is lower. The �rm elicits this

labor supply reduction by providing more insurance (crowding-in). We �nd that this

performance-pay e�ect almost fully o�sets the crowding-out.

We set up a model in which income inequality arises from two distinct sources,

namely, innate ability di�erences, and ex-post performance shocks that a�ect the out-

put of equally talented workers. While the former source of wage disparities cannot

be insured by private markets, the latter is very much shaped in the labor market.

In the presence of moral hazard frictions, wage risk has a productive role: employers

choose the amount of risk faced by their employees through performance-based pay

contracts in order to strike a balance between insurance and incentives for e�ort. Our

modeling of labor markets is based on those of Edmans and Gabaix (2011) for our

static setting, and Edmans, Gabaix, Sadzik, and Sannikov (2012) for our dynamic set-

ting. Their frameworks have been very successful at explaining the empirical features

of actual performance-based contracts (see Edmans and Gabaix (2016)). We extend

them to incorporate sophisticated nonlinear policy instruments. The key technical

breakthrough is that we allow for arbitrarily nonlinear tax instruments. Previous

models of moral hazard were tractable only under very restricted forms of the utility

of consumption � for instance, Holmstrom and Milgrom (1987) impose exponential

utility functions. This makes it impossible to consider a wide class of tax schedules

� typically, they would have to be restricted to being a�ne � since nonlinear taxes

e�ectively modify the concavity of the utility that workers receive from their salaries.

Instead, the analysis of Edmans and Gabaix (2011) remains tractable for very general

utility functions. Therefore it allows us to study the incidence of arbitrary tax reforms

(say, increasing taxes on the rich, or altering the shape of the EITC) of any initial tax

schedule (say, the U.S. tax code). Our analysis is thus very general and can be used

for both positive and normative investigation. The government has an e�ective role to

play despite the fact that private insurance markets are constrained e�cient. Indeed,

while �rms optimally provide insurance against ex-post output risk, the government

2

uses tax policy for redistribution between workers with di�erent ex-ante ability.

We start with a positive analysis of the incidence of tax reforms on the workers'

labor contracts and the distribution of utilities. In standard models with exogenous

wage risk, taxes a�ect earnings only by modifying individual labor e�ort decisions. In

our framework, wage risk is endogenous to policy as well. We show that it responds to

tax changes via two channels: a crowding-out e�ect, and a performance-pay e�ect. On

the one hand, the crowding-out e�ect is the optimal response of �rms to a change in

social insurance: they adjust the earnings contract endogenously so that the workers'

incentives for e�ort and participation constraints remain satis�ed after the reform.

Thus, following an improvement in social insurance (higher tax progressivity), �rms

respond by spreading the pre-tax earnings schedule. The performance-pay e�ect, on

the other hand, arises from the optimal labor supply adjustment to the tax reform.

As in standard models of income taxation, workers' optimal e�ort is lower in response

to an increase in marginal tax rates or tax progressivity. But eliciting a lower e�ort

level in the presence of moral hazard frictions is achieved by lowering the sensitivity of

pre-tax earnings to performance, that is, by compressing the wage distribution. This

e�ect counteracts the direct crowding-out of private insurance that the tax reform

induces. Crucially, because our model is tractable, we are able to derive this tax

incidence analysis entirely in closed form for an arbitrary baseline tax system and

arbitrary tax reforms.

We show both theoretically and quantitatively in a calibrated version of our model

that the two earnings risk adjustments almost fully o�set each other in response to

an increase in the progressivity of the tax code. Taken separately these e�ects are

both signi�cant, but summing them implies that taxes barely a�ect the sensitivity

of pay to compensation. Moreover, this result is robust to the value of the labor

supply elasticity. The fundamental reason is that the sensitivity of the contract to

performance is proportional to the marginal disutility of labor. As a result, in order

to elicit a given increase in labor e�ort, the �rm must increase the pass-through of

output risk to earnings proportionally to the inverse of the labor supply elasticity.

Therefore, if labor e�ort is relatively inelastic, the change in performance-sensitivity

necessary to elicit the optimal e�ort change must be large, and vice versa. We evaluate

the robustness of this result to other canonical tax reforms and show that in all cases,

the performance-pay o�sets at least �fty percent, and in some cases even dominates,

the direct crowding-out e�ect.

3

Armed with this tax incidence analysis, we then derive the impact of tax reforms

on government revenue and social welfare, as well as the optimal level of tax progres-

sivity. This analysis extends Chetty and Saez (2010) to our environment with arbi-

trarily nonlinear taxes. In addition to the standard e�ects obtained in the benchmark

model with exogenous wage risk, the crowding-out and performance-pay e�ects cre-

ate �scal externalities: given an initially progressive tax code, a spread (respectively,

contraction) of the pre-tax earnings distribution impacts positively (resp., negatively)

the government budget. Moreover, the crowding-out e�ect has a �rst-order negative

impact on social welfare. This is because, following a tax reform, �rms adjust wages

in a way that renders tax cuts less accurately targeted than in a model with exoge-

nous risk. This modi�es the relevant social welfare weights in the direction of less

redistribution. We then impose a number of functional form assumptions to make

the analysis as transparent as possible and obtain sharper results. In particular, we

assume that the nonlinear tax schedule is restricted to having a constant rate of pro-

gressivity (as in, for instance, Heathcote, Storesletten, and Violante (2017)). Within

this class of tax schedules, we derive the optimal rate of progressivity in closed form

and show that it is smaller than when wage risk is considered exogenous. However,

the welfare losses from setting taxes suboptimally by ignoring the endogeneity of wage

risk are quantitatively limited, equivalent to a mere 0.24% drop in consumption. This

is because only roughly half of the jobs in our calibration are performance-pay, which

reduces the aggregate welfare losses from ignoring the endogeneity of wage risk to a

quarter of what they would be if all jobs were subject to agency frictions. We con-

clude that the common concern that standard models overstate optimal tax policy by

ignoring the endogeneity of private insurance is � in the context of performance-pay

jobs � overblown.

Literature Review. The two papers that are closest to ours are Golosov and

Tsyvinski (2007) and Chetty and Saez (2010). Golosov and Tsyvinski (2007) study

an economy in which �rms insure their workers subject to unobservable productivity

and hidden asset trades. They show that tax reforms generate a large crowding

out e�ect which reduces the gains from public insurance. The government optimally

refrains from providing insurance and instead uses tax policy to correct the externality

generated by hidden trades. In our environment, markets are constrained e�cient.

Instead, we study how the redistributive motive for government intervention interacts

4

with the endogenous private insurance on the labor market. Chetty and Saez (2010)

derive a su�cient statistics formula for the optimal linear tax in the presence of

linear private insurance contracts. We extend their analysis in two ways. First, and

most importantly, rather than following an approach based purely on endogenous

su�cient statistics � in particular, the elasticity of crowd-out with respect to tax

policy � we study a tractable structural microfoundation for the equilibrium labor

contracts. This allows us to characterize analytically the e�ects of government policy

on private insurance contracts via crowding-out and performance-pay responses, and

derive explicit theoretical formulas for tax incidence and optimal taxes. Second,

we allow for arbitrarily nonlinear taxes in the equilibrium with nonlinear incentive

contracts, and we show that several novel e�ects arise from theses nonlinearities.

Our paper is motivated by the large literature that studies performance-pay con-

tracts as an optimal way for �rms to incentivize workers' e�ort in the presence of

moral hazard frictions. On the theoretical side, our baseline framework is the model

of Edmans and Gabaix (2011) for our static setting, and that of Edmans et al. (2012)

for our dynamic setting. These models have been very successful at explaining the

structure of performance-pay contracts of CEOs (Frydman and Jenter (2010); Edmans

and Gabaix (2016); Edmans, Gabaix, and Jenter (2017)). On the empirical side, there

is growing reduced-form and structural evidence that moral hazard in labor markets is

pervasive (Foster and Rosenzweig (1994); Prendergast (1999); Shearer (2004); Lazear

and Oyer (2010); Bandiera, Barankay, and Rasul (2011); Ábrahám, Alvarez-Parra,

and Forstner (2016a)), that employers are important providers of insurance for their

employees (Guiso, Pistaferri, and Schivardi (2005); Lamadon (2016); Friedrich, Laun,

Meghir, and Pistaferri (2019); Lamadon, Mogstad, and Setzler (2019)), and that

the fraction of jobs with explicit pay-for-performance is high and rising (Lemieux,

MacLeod, and Parent (2009); Bloom and Van Reenen (2010); Bell and Van Reenen

(2014); Grigsby, Hurst, and Yildirmaz (2019)). Analogous to Kaplow (1991), our key

contribution to this large literature is to analyze the e�ects of policy in such envi-

ronments where the worker-�rm relationship is modeled as a moral hazard problem.

Crucially, our policy instruments are very general, yet our analysis remains tractable.

Our results can help guide future empirical analysis on the impact of taxes on the

level and structure of performance-pay packages in the spirit of Rose and Wolfram

(2002); Frydman and Molloy (2011); Bird (2018); Dale-Olsen (2012).

Several other papers study optimal taxation with endogenous earnings risk. These

5

papers focus on risk generated by human capital accumulation (Kapicka and Neira

(2013); Findeisen and Sachs (2016); Stantcheva (2017); Makris and Pavan (2017)),

job search (Sleet and Yazici (2017)), or wage randomization in response to exces-

sive tax regressivity (Doligalski (2019)). Blomqvist and Horn (1984); Rochet (1991);

Cremer and Pestieau (1996) studied the joint design of optimal insurance and re-

distribution but in these papers the government is the sole provider of insurance.

Another strand in the taxation literature studies government taxation in the pres-

ence of endogenous consumption insurance, understood either as informal exchanges

in family networks or asset trades. Attanasio and R�os-Rull (2000) and Krueger

and Perri (2011) demonstrate a potentially large crowding-out of private insurance

in response to increased public insurance. Park (2014); Ábrahám et al. (2016b);

Heathcote et al. (2017); Chang and Park (2017); Raj (2019), characterize the optimal

tax systems in such economies. In contrast to these papers, it is pre-tax earnings

risk � rather than consumption risk � that is endogenous to policy in our model.

Finally, several papers in the optimal taxation literature allow wages to be deter-

mined on private labor markets, for instance Hungerbühler, Lehmann, Parmentier,

and Van der Linden (2006); Rothschild and Scheuer (2013, 2016, 2014); Stantcheva

(2014); Piketty, Saez, and Stantcheva (2014); Scheuer and Werning (2017, 2016); Ales,

Kurnaz, and Sleet (2015); Ales and Sleet (2016); Ales, Bellofatto, and Wang (2017);

Sachs, Tsyvinski, and Werquin (2020). These papers do not account for wage-rate

risk and performance-based earnings caused by moral hazard frictions.

Outline of the Paper. Our paper is organized as follows. We set up our baseline

static environment in Section 1. In Section 2, we analyze the incidence of arbitrary

nonlinear tax reforms on the structure of performance-based compensation and on

the distribution of utilities. In Section 3, we derive the excess burden and the social

welfare gains of tax reforms. We then focus on a special case of our model to derive

sharper results, as well as the optimal rate of progressivity, in Section 4. We study

our results quantitatively in Section 5. The proofs, extensions of our baseline model,

and the dynamic analysis are gathered in Appendices A to H.

6

1 Environment

1.1 Labor Market

Individuals. There is a continuum of mass one of agents indexed by their exogenous

innate ability θ ∈ Θ ⊂ R+ distributed according to the c.d.f. F (θ). Their preferences

over consumption c and labor e�ort a ≥ 0 are represented by a separable utility

function u (c)−h (a), where u and h are twice continuously di�erentiable, u is concave,

and h is strictly convex. An agent with earnings2 w pays a tax liability T (w) and

consumes c = w − T (w). The tax schedule T : R+ → R is twice continuously

di�erentiable. We denote by R (w) ≡ w−T (w) the retention function and by r (w) ≡R′ (w) = 1− T ′ (w) the retention (or net-of-tax) rate. We assume that the utility of

earnings w 7→ v (w) ≡ u (R (w)) is concave.3

Labor Contract. A worker with ability θ who provides e�ort a produces output

y = θ (a+ η) , (1)

where the �performance shock� η ∈ R is a random variable with mean 0, distributed

on a (possibly unbounded) interval with interior (η, η). The �rm observes both the

agent's ability θ and her realized output y, but cannot disentangle her e�ort a from

her performance shock η. A performance-based contract speci�es an e�ort level and

an earnings schedule as a function of realized output. Following Edmans and Gabaix

(2011), we impose the following assumption in order to characterize the optimal

contract analytically.

Assumption 1. The agent chooses e�ort after observing the realization of her per-

formance shock η. The �rm recommends the same e�ort level a (θ) for all agents with

ability θ.

We discuss Assumption 1 in Section 1.3 below. We relax its second part and extend

our analysis to arbitrary e�ort schedules a (θ, η) in Appendix C. Note that the �rm

2Throughout the paper we denote a worker's earnings or income by w, while the term wage-ratestands for earnings per unit of e�ort w/a.

3This condition holds as long as the tax schedule T is not too regressive; see Appendix A fordetails. It is a natural restriction: Doligalski (2019) shows that when this condition is violated,�rms have incentives to o�er stochastic earnings even in the absence of moral hazard frictions.Furthermore, the tax schedule which encourages such earnings randomization is Pareto ine�cient.

7

can infer the worker's performance shock η = y/θ − a (θ) upon observing her output

y, assuming that she has exerted the recommended e�ort level a (θ). Therefore,

the earnings contract can be equivalently expressed as a function of the inferred

performance shock η rather than the realized output y. Since recommended e�ort is

incentive-compatible by construction, in equilibrium the �rm infers the worker's true

performance shock, that is, η = η. Thus, throughout the paper we simply denote the

earnings schedule by the map η 7→ w (θ, η).

The �rm chooses the contract {a (θ) , w (θ, ·)} that maximizes its expected pro�t

given the tax schedule T and the worker's reservation utility U (θ), that is,4

Π (θ) = maxa(θ),w(θ,·)

E [y − w (θ, η)] , (2)

subject to the incentive-compatibility constraints:

a (θ) = arg maxa≥0

u (R (w (θ, η)))− h (a) , ∀η, (3)

and the participation constraint:

E [u (R (w (θ, η)))]− h (a (θ)) ≥ U (θ) . (4)

Since the participation constraint (4) binds at the optimum, the expected utility of

workers with ability θ is equal to U (θ). The incentive-compatibility constraints (3)

deserve some explanation. Since e�ort is chosen after the worker observes her per-

formance shock η, it must maximize utility state-by-state rather than in expectation.

Thus, equation (3) must hold for every performance shock realization η.

Labor Market Equilibrium. To close the model, we assume that there is free

entry of �rms in each labor market θ.5 Thus, in equilibrium pro�ts are equal to zero,

Π (θ) = 0. (5)

4Throughout the paper, the operator E denotes the expectation over performance shocks η, orequivalently output y, conditional on ability θ.

5We can easily generalize our analysis to environments where �rms have market power andmake positive pro�ts. In particular, the optimal contract characterization (6, 7) holds for anyreservation value U (θ), not necessarily determined by free-entry. For instance we can assume thatthe worker's reservation value is a convex combination of the reservation value under free entry andsome exogenous outside option.

8

This condition pins down the workers' reservation value U (θ), which is just high

enough so that no additional �rm �nds it pro�table to enter the labor market.

1.2 Equilibrium Labor Contract

The following proposition characterizes the optimal contract between the �rm and a

worker with ability θ. It is an application of the results of Edmans and Gabaix (2011)

to our environment with a nonlinear tax schedule.

Proposition 1. The optimal contract {a (θ) , w (θ, ·)} and equilibrium expected utility

U (θ) of agents with ability θ when a (θ) > 0 are characterized by the following three

equations.6 The earnings schedule w (θ, ·) satis�es

u (R (w (θ, η)))− h (a (θ)) = U (θ) + h′ (a (θ)) η, ∀η. (6)

The optimal e�ort level a (θ) satis�es

E[h′ (a (θ))

v′ (w (θ, η))

]+ E

[h′′ (a (θ))

v′ (w (θ, η))η

]= θ. (7)

The equilibrium reservation utility U (θ) is determined by

E[v−1 (U (θ) + h (a (θ)) + h′ (a (θ)) η)

]= θa (θ) . (8)

Proof. See Appendix A.

Earnings Schedule. In order to motivate high-performing workers to provide as

much e�ort a (θ) as those with lower performance shocks, the �rm needs to reward

them with higher earnings. Equation (6) shows that the agent's ex-post utility

u (R (w (θ, η))) − h (a (θ)) is an a�ne function of the performance shock η that the

�rm infers. The linearity of the contract in the utility space is a consequence of our

assumption of a separable utility function.7 Since we assumed that the utility of earn-

6When the optimal e�ort level is zero, the worker optimally receives no compensation from the�rm. Our analysis goes through without assuming a (θ) > 0 if h′ (0) = 0.

7Note that the earnings schedule (6) is non-trivial even if the utility of consumption u (·) is linear:in this case, a progressive income tax schedule implies that the utility of earnings v (w) = R (w) isstrictly concave, so that the worker is e�ectively averse to pre-tax earnings risk and values insuranceagainst performance shocks.

9

ings w 7→ v (w) ≡ u (R (w)) is concave, this translates into a convex earnings schedule

w (θ, ·). Empirically, performance-pay contracts are indeed often convex, either due

to nonlinear commission rates as in the case of stock and travel brokers (Levitt and

Syverson (2008))8 or to stock options (Edmans and Gabaix (2011, 2016)).

The two key features of the utility schedule (6) are its demogrant and its slope.

Its demogrant in (6) is equal to U (θ): a higher reservation value leads the �rm to

raise the utility of workers uniformly regardless of their performance so as to preserve

incentive-compatibility. Its slope is equal to h′ (a (θ)): inducing an agent with large

unobservable performance shock to provide costly work e�ort requires a larger reward

if the marginal disutility of labor is higher. Crucially, since the marginal disutility

h′ (·) is increasing, the sensitivity of utility to performance shocks is strictly increasingin labor e�ort a (θ). This observation captures the fundamental insight that eliciting

higher e�ort from a worker in the presence of moral hazard requires a higher exposure

to output risk.

E�ort Level. Equation (7) pins down the value of e�ort that maximizes the �rm's

pro�t. The optimal level a (θ) is such that the expected gain in output θa due to

a marginal increase a > 0 in the workers' e�ort is exactly compensated by the pay

raise necessary to elicit this higher e�ort. This cost has two components. First,

to ensure that agents' participation constraint (4) remains satis�ed despite their

higher labor supply, their earnings must increase to compensate their utility loss

−∆h (a (θ)) = −h′ (a (θ)) a. In a frictionless economy, this would be the only e�ect

and (7) would reduce to the familiar optimality condition h′(a(θ))v′(w(θ,η))

= θ, according to

which the marginal rate of substitution (MRS) between e�ort and earnings is equal

to the marginal rate of transformation, or labor productivity θ.

In our setting with moral hazard, agency frictions create a wedge between labor

productivity and the (expected) marginal rate of substitution, even in the absence

of any distortionary taxes.9 Providing incentives to work harder requires increasing

8While it is common for real-estate brokers to be compensated with a �xed commision rate,thus leading to a linear earnings schedule, Levitt and Syverson (2008) show that such contracts aresuboptimal and could be improved by introducing convexity.

9The term h′′(a(θ))v′(w(θ,η)) can be rewritten as 1

ε(θ)a(θ)h′(a(θ))v′(w(θ,η)) where ε (θ) is the Frisch elasticity of

labor supply. Thus, the wedge τMCI between the marginal rate of substitution and the marginal rateof transformation at performance shock realization η, de�ned by (1 + τMCI)

h′(a(θ))v′(w(θ,η)) = θ, is equal

to ηε(θ)a(θ) .

10

the sensitivity h′ (a (θ)) of utility to performance shocks by ∆h′ (a (θ)) = h′′ (a (θ)) a,

and hence the slope of the earnings schedule by h′′(a(θ))av′(w(θ,η))

. This mechanically changes

the labor cost of an agent with performance shock η ∈ R by h′′(a(θ))av′(w(θ,η))

η. This leads

to the second expectation in the left-hand side of (7) which we call the marginal

cost of incentives (MCI). In particular, eliciting a higher e�ort level requires raising

(respectively, lowering) the earnings of high- (resp., low-) performers. Yet, since

the marginal utility v′ (w) = r (w)u′ (R (w)) is decreasing the pro�t generated by a

smaller wage bill for unlucky workers does not fully compensate the �rm for the cost

of raising the wages of lucky workers. As a result, the expected cost of providing

incentives is positive.

Expected Utility. Finally, equation (8) is simply a rewriting of the free-entry

condition (5). It implies that the average income E [w (θ, η)] of agents with ability θ

is equal to their expected output E [y] = θa (θ). Using formula (6), this equilibrium

condition pins down the workers' reservation value U (θ).

1.3 Discussion of Assumptions

To obtain the tractable characterization of the contract described in Proposition 1,

our analysis relied on several key assumptions.

Utility Function. The �rst restriction is the separability of the utility function

between consumption and e�ort. This assumption is not essential and is only made

for clarity of exposition. As in Edmans and Gabaix (2011), it is straightforward to

extend our analysis to a larger class of utility functions, namely, φ (u (c)− h (a)) where

φ exhibits non-increasing absolute risk aversion (NIARA).10 In particular, this would

allow us to nest the functional form assumed by Holmstrom and Milgrom (1987). The

fact that the slope of the contract is equal to h′ (a (θ)), which is crucial for our main

results, is robust to this more general speci�cation. The only di�erence that this more

general speci�cation would make is that the distribution of a rent by the �rm would

no longer lead to a uniform shift in ex-post utilities via the demogrant U (θ). Our

arguments can however be straightforwardly extended to alternative distributions of

rents.

10Most common utility functions, in particular those with constant absolute risk aversion (CARA)or constant relative risk aversion (CRRA), belong to the NIARA class.

11

Timing. The �rst part of Assumption 1 imposes that the worker chooses e�ort

a after observing the performance shock η. This timing assumption was originally

introduced by La�ont and Tirole (1986) and was subsequently used by, for instance,

Edmans and Gabaix (2011); Garrett and Pavan (2015). It allows us to solve the

�rm's problem for a very general class of utility functions. Allowing for arbitrary

utility functions is crucial for our analysis. Indeed, nonlinear taxes e�ectively modify

the concavity of the utility that workers derive from their gross earnings. If we had

to restrict the utility function to a speci�c functional form (for instance, CARA as in

Holmstrom and Milgrom (1987)) we would only be able to study tax schedules that

preserve this functional form (for instance, linear or a�ne). Instead, the tractability

allowed by our timing assumption allows us to characterize the incidence of arbitrarily

nonlinear taxes.

E�ort. In the main body of the paper, we impose that the �rm chooses to elicit the

same level of e�ort regardless of the worker's performance shock � this is the second

part of Assumption 1. This restriction is also imposed by Edmans and Gabaix (2011)

in their main model, and by Edmans, Gabaix, Sadzik, and Sannikov (2012). It is an

exogenous restriction on the set of contracts that is not without loss of generality.

It substantially simpli�es our analysis without restricting the shape of the earnings

schedule, which is crucial for our investigation of private insurance and nonlinear

taxation. Carroll and Meng (2016) provide a microfoundation of this restriction; they

call this property reliability and show that it may be optimal when �rms aim to design

a contract that is robust to uncertainty about the distribution of the performance

shock.11 For completeness, we relax this �constant-e�ort� assumption and generalize

our main result (Theorem 1) to fully optimal contracts in Appendix C � our theoretical

analysis remains technically straightforward and carries qualitatively over to this case.

Performance Shocks. Finally, and importantly, note that we do not impose any

restriction on the distribution of performance shocks η, other than it must take values

in an interval (bounded or unbounded). We view this generality as an important

feature of our analysis. As an example, this allows us to capture the structure of

contracts that specify of a �xed baseline income and an additional bonus paid with

11In particular, in our environment such a contract leads to the same level of expected outputEy = θa (θ) regardless of the distribution of η. However, the �rm's expected pro�t depends on thedistribution of η as earnings w (θ, η) are not linear in η.

12

positive probability by letting the distribution of η have a mass point at the lower

bound η, and a smooth density on (η, η). Importantly, it also allows us to let the

distribution of η, and hence the degree of performance-pay, depend explicitly on the

ability level θ.

2 General Tax Incidence Analysis

This section is devoted to the positive analysis of nonlinear tax incidence. We derive

the impact of tax reforms on earnings, �rst in Section 2.1 in a benchmark setting

with exogenous risk, then in Section 2.2 in our general environment that takes into

account the endogeneity of private insurance. In Section 2.3 we derive the impact

of tax reforms on individual utilities. We �nally introduce the relevant notions of

earnings elasticities in Section 2.4 in order to express our tax incidence formulas in

terms of empirically estimable variables.

Nonlinear Tax Reforms. We start by formally de�ning the concept of nonlinear

reforms of an arbitrary initial tax system. Consider a given (potentially suboptimal)

tax schedule T , say the U.S. tax code, and another function T : R+ → R. Our goal isto evaluate the e�ects of perturbing the initial tax schedule T by δT , where δ > 0 is a

scalar that parametrizes the size of the reform in the direction T . Formally, consider

an outcome variable Ψ, for instance individual earnings, utility, government revenue,

or social welfare, that depends on the tax schedule T . The �rst-order change in the

value of this functional T 7→ Ψ (T ) following a marginal tax reform in the direction

T is given by the Gateaux derivative

Ψ(T, T ) ≡ limδ→0

Ψ(T + δT )−Ψ (T )

δ. (9)

We analyze several concrete examples of tax reforms in Section 4 and Appendix B.

2.1 Exogenous Risk Benchmark

As a preliminary step towards our general analysis, we derive in this section the

incidence of tax reforms T on earnings w (θ, η) and utilities U (θ) that would arise

in an environment with fully exogenous risk. In this benchmark model, as in our

13

framework, there are two sources of heterogeneity: innate ability θ and job-speci�c

productivity shocks η. A worker with characteristics (θ, η) is o�ered a �xed wage rate

x (θ, η) that re�ects her exogenous labor productivity. To make this model comparable

to ours we assume moreover that the worker's labor e�ort a (θ) is independent of η.12

A worker's income w (θ, η) is then the product of her exogenous labor productivity

x (θ, η) and her labor supply a (θ). The key di�erence with our general environment

is that in this benchmark setting, wage rates w(θ,η)a(θ)

= x (θ, η) are policy-invariant.

Incidence of Tax Reforms on Earnings. In such an environment, tax reforms

only a�ect earnings w (θ, η) by the endogenous change in e�ort a (θ) caused by the

reform, multiplied by the constant wage rate x (θ, η). Thus, the incidence of tax

reforms is given by13

wex (θ, η) = x (θ, η) a (θ) = w (θ, η)a (θ)

a (θ). (10)

This is the standard behavioral response to taxes through labor supply choices ana-

lyzed in most of the optimal taxation literature following Mirrlees (1971).14 Impor-

tantly, note that the e�ort change a (θ) in formula (10) depends on the particular

reform that is implemented � formally, it is the Gateaux derivative of the e�ort func-

tional a (θ) in the direction T . Thus, at this stage a(θ)a(θ)

is a policy elasticity in the

sense of Hendren (2015).15 Section 2.4 below is devoted to expressing this labor sup-

ply response in terms of standard elasticities and income e�ect parameters that can

be estimated empirically independently of a particular choice of tax reform.

Formula (10) implies that wex (θ, η) > 0 i� a (θ) > 0. That is, the earnings

schedule is shifted up (resp., down) if e�ort increases (resp., decreases) following the

12This can be justi�ed by assuming that in this model e�ort is chosen before observing therealization of η.

13For notational simplicity, whenever there is no ambiguity we ignore the dependence of theGateaux derivative a (θ) on the initial tax schedule T and the tax reform T .

14Note that we would obtain exactly the same expression in the Mirrlees model without within-group inequality, that is, σ2

η = 0. In this case, all earnings di�erences are due to innate ability(or labor productivity) θ and e�ort a (θ), so that the compensation schedule w (θ, ·) conditional onability is degenerate. Equation (10) then reduces to wex (θ, η) = θa (θ).

15This is the concept of elasticity used in several papers in the taxation literature, for instance,Chetty and Saez (2010).

14

reform. Earnings adjust on average by

E[w (θ, η)

a (θ)

a (θ)

]= θa (θ) . (11)

Now, consider the impact of the reform on earnings risk around this mean adjust-

ment. We measure earnings risk by the variance of log-earnings conditional on ability

θ. Since e�ort a (θ) does not depend on η, equation (10) immediately implies that

earnings risk after the reform is the same as before the reform, that is,

Var [log (w (θ, η) + δwex (θ, η)) | θ] = Var [log (w (θ, η)) | θ] (12)

for δ > 0 small enough. Therefore, in the benchmark model that ignores the endo-

geneity of private insurance, tax reforms a�ect the average level of earnings but do

not modify the amount of risk to which workers are exposed.16

Incidence of Tax Reforms on Welfare. Finally, in the benchmark model with

exogenous risk, the incidence of the tax reform on the average utility of agents with

ability θ is given by

U (θ) = −E[u′ (R (w (θ, η))) T (w (θ, η))

]. (13)

Intuitively, an increase in the tax payment of an agent by T (w (θ, η)) lowers her

ex-post utility by the marginal utility of consumption u′ (R (w (θ, η))). This is a

simple consequence of the envelope theorem: since labor e�ort is chosen optimally

by equation (3), the endogenous change in e�ort a (θ) triggered by the reform has no

�rst-order impact on welfare.17 Taking expectations leads to the change in expected

utility (13).

16We can also de�ne earnings risk at a disaggregated level by the pass-through function ∂ logw(θ,η)∂η ,

that is, the sensitivity of log-earnings to performance shocks. Equation (10) implies that the pass-through is una�ected by the reform for every value of η. If we de�ne instead earnings risk as thesensitivity of earnings (rather than log-earnings) with respect to performance shocks, that is, ∂w(θ,η)

∂η ,

tax reforms would raise earnings risk if and only if ∂w(θ,η)∂η > 0. In the setting analyzed in this section,

formula (10) implies that this is the case whenever the reform raises e�ort, that is, a (θ) > 0.17In particular, in this environment a change in marginal tax rates that keeps the total tax

payment unchanged has no �rst-order impact on individual welfare.

15

2.2 Incidence of Tax Reforms on Earnings

We now proceed to characterizing the incidence of an arbitrary tax reform T on the

compensation schedule w (θ, ·) of workers with ability θ in our general environment.

This is the �rst main result of our paper.

Theorem 1. Suppose that a (θ) > 0. Denote by a (θ) the change in e�ort induced

by the reform, which we study in Section 2.4 below. The �rst-order e�ect of the tax

reform T on earnings w (θ, ·) is given by

w (θ, η) = wex (θ, η) + wco (θ, η) + wpp (θ, η) , (14)

where the crowding-out e�ect wco has mean zero and is given by

wco (θ, η) =T (w (θ, η))

r (w (θ, η))− (v′ (w (θ, η)))−1

E[(v′ (w (θ, ·)))−1] E

[T (w (θ, ·))r (w (θ, ·))

], (15)

and the performance-pay e�ect wpp has mean zero and is given by

wpp (θ, η) =

[h′ (a (θ)) + h′′ (a (θ)) η

v′ (w (θ, η))− w (θ, η)

a (θ)

]a (θ) . (16)

Proof. See Appendix B.

Equation (14) gives the adjustment of the earnings schedule following an arbitrary

tax reform T as a function of the tax rates, earnings distribution, and labor supply

responses in the initial (pre-reform) economy. In practice, this formula only requires

choosing a functional form for the utility function u and the disutility of e�ort h in

order to evaluate the incidence of any potential reform of the current tax code.

Theorem 1 shows that the earnings adjustment in response to the tax reform,

w (θ, η), is in general di�erent than in the standard model with exogenous risk,

wex (θ, η), analyzed in Section 2.1. Speci�cally, the tax reform modi�es the earn-

ings schedule by the same average amount as in the benchmark model (see equation

(11)), since E[wco (θ, η)] = E[wpp (θ, η)] = 0. However, it also introduces two ad-

justments to earnings risk around this mean shift. The �rst, wco (θ, η), captures the

crowding-out of the private insurance contract by the tax change, keeping e�ort con-

stant. The second, wpp (θ, η), is the performance-pay e�ect due to the endogenous

change in labor e�ort. We analyze them in turn.

16

Crowding-Out of Private Insurance. Equation (15) gives the adjustment to the

compensation schedule that the �rm must implement in order to keep the worker's

incentive and participation constraints both satis�ed following the reform. First,

consider the adjustment T (w(θ,η))r(w(θ,η))

of the earnings schedule (�rst term in (15)). This

term implies that the agent's consumption c (θ, η) = w (θ, η)−T (w (θ, η)) changes by

c (θ, η) = −T (w (θ, η)) + (1− T ′ (w (θ, η))) w (θ, η)

= −T (w (θ, η)) + (1− T ′ (w (θ, η)))T (w (θ, η))

1− T ′ (w (θ, η))= 0.

Thus, absent any other forces � in particular, if e�ort were kept constant � the �rm

would adjust the contract such that, for every performance shock realization η, the

agent's disposable income c (θ, η), and hence her realized utility, remain �xed. In

other words, any attempt by the government to a�ect consumption insurance would

be fully absorbed by the �rm so as to keep the worker's payo�s unchanged.

Second, suppose that the tax reform is such that the tax liabilities of workers

with ability θ are reduced, that is, T (w (θ, η)) < 0 for all η. Per our discussion in the

previous paragraph, this reform generates a rent for the �rm equal to −E[ T (w(θ,·))r(w(θ,·)) ] > 0:

intuitively, the �rm compensates the reduction in tax payments by an equivalent

reduction in wages. Now, by the free-entry condition, this rent must be shared with

workers, whose expected utility rises as a result.18 Recall that, by equation (6), this

increase in utility must be distributed uniformly among all agents � regardless of their

performance η � in order to preserve their incentive compatibility condition for e�ort.

But this implies that the salary of high-performers must increase by a larger amount,

since their marginal utility of earnings v′ (w (θ, η)) is lower. As a result, the share of

the rent assigned to workers with performance shock η is inversely proportional to

their marginal utility, that is, equal to (v′(w(θ,η)))−1

E[(v′(w(θ,·)))−1]. This leads to the second term

in (15).

Performance-Pay E�ect. Now, suppose that in response to the tax reform T ,

the �rm �nds it optimal to elicit a higher e�ort level, so that a (θ) > 0. In order

to do so, we showed in Section 1.2 that it must both compensate workers for their

utility loss, and increase the pass-through of performance shocks to earnings. These

two adjustments are captured by the term h′(a(θ))+h′′(a(θ))ηv′(w(θ,η))

in equation (16). Since

18We analyze this change in reservation value U (θ) in Section 2.3 below.

17

h′(a(θ))+h′′(a(θ))ηv′(w(θ,η))

a (θ) is an increasing function of η, eliciting a higher e�ort level re-

quires increasing the sensitivity of earnings to performance. Now, wpp is de�ned by

subtracting the income change wex (θ, η) that would arise in the benchmark model

with exogenous risk in response to the same change in e�ort (equation (10)). As a

result, the performance-pay e�ect has mean zero and is the pure contribution of moral

hazard to the change in earnings risk via labor supply decisions.

Generalization to an E�ort Schedule. In Appendix C we extend this result to

the environment where the �rm can elicit an arbitrary (non-constant) e�ort schedule

a (θ, η). We show that the crowding-out e�ect is identical to (15). The performance-

pay e�ect is analogous to (16) except that it depends on the change in the entire

e�ort schedule rather than in the single e�ort level.

2.3 Incidence of Tax Reforms on Utilities

We now proceed to analyzing the incidence of tax reforms on the expected utility

U (θ) of workers with ability θ in our general environment.

Proposition 2. The �rst-order e�ect of the tax reform T on expected utility U (θ) is

given by

U (θ) = − 1

E[

1v′(w(θ,η))

]E[ T (w (θ, η))

r (w (θ, η))

]. (17)


To understand Proposition 2, recall that a decrease in the tax payment of an

agent by T (w (θ, η)) < 0 allows the �rm to decrease her earnings by T (w(θ,η))r(w(θ,η))

in

order to keep her consumption (and, hence, incentives) unchanged. Moreover, by

the envelope theorem, any change in pay that operates via labor supply (wex, wpp)

generates only second-order changes in total labor costs. Therefore, the tax reform

creates an expected rent for the �rm equal to −E[ T (w(θ,η))r(w(θ,η))

] > 0, which is then shared

with the workers and leads to an increase in their reservation value by U (θ) > 0.

Now, this increase in expected utility U (θ) > 0 must be distributed across work-

ers with di�erent performance shocks η. As explained in the previous sections, every

worker's utility must increase uniformly. Therefore, realized earnings w (θ, η) must

18

increase in proportion to the inverse marginal utility 1/v′ (w (θ, η)). Hence, this shar-

ing rule costs the �rm E[ U(θ)v′(w(θ,η))

]. As a result, the value of U (θ) which ensures that

pro�ts remain equal to zero (free-entry condition) satis�es:

E

[T (w (θ, η))

r (w (θ, η))

]+ E

[U (θ)

v′ (w (θ, η))

]= 0.

Solving for U (θ) easily leads to equation (14).

Analogous to the standard model where tax changes a�ect individual consump-

tion directly rather than being intermediated by �rms, equation (17) implies that

workers' expected utility increases when their expected tax payments (weighted by

retention rates) are reduced. Conversely, an increase in their expected tax bill lowers

their utility. However, the level of change in utility U (θ) di�ers from that obtained in

the benchmark model with exogenous risk (equation (13)) unless σ2η = 0. As a simple

example, suppose that the initial tax schedule is a�ne, so that the retention rate

r (w (θ, η)) is constant. Consider a tax reform that consists of a uniform lump-sum

transfer for all agents. We show in the Appendix that this reform is represented by

T (w) = −1 for all w. In the benchmark model with exogenous risk, equation (13)

shows that individual welfare would increase on average by the expected marginal

utility, E [u′ (R (w (θ, η)))]. Now, in the general model with agency frictions, applying

Jensen's inequality to equation (17) yields 0 < U (θ) < E [u′ (R (w (θ, η)))]. There-

fore, a lump-sum transfer leads to a strictly smaller rise in utility when tax cuts are

distributed by �rms than when they are directly targeted to workers.

2.4 Elasticities of Average Earnings

The last step of our tax incidence analysis is to characterize the impact of a tax

reform T on the optimal e�ort level a (θ). We tackle this in two (complementary)

ways. First, we use a structural approach and derive analytically the impact of tax

reforms on labor e�ort in terms of primitives. Second, we express these labor supply

responses in terms of su�cient statistics that can be estimated empirically regardless

of the values of the underlying primitives.

Structural Approach. When the structure of the model is simple enough, it is

worthwhile to derive explicitly the elasticity of e�ort with respect to the particular

19

tax reform under consideration, that is, a(θ)a(θ)

. This allows us to compare the incidence

of taxes across di�erent contractual environments. The next result and Lemma 2

below illustrate this approach through the lens of simple examples.

Lemma 1. Suppose that the utility function is linear in consumption, that is u (c) = c,

and that earnings w (θ, η) are located in a bracket with constant marginal tax rate τ

for all performance shocks η. The response of labor e�ort to an arbitrary tax reform

T is then given by

a (θ)

a (θ)= − 1

1− τε (θ)E

[T ′ (w (θ, η))

]− 1

1− τCov

(T ′ (w (θ, η)) ;

η

a (θ)

), (18)

where ε (θ) ≡ h′(a(θ))a(θ)h′′(a(θ))

is the Frisch elasticity of labor supply.


Equation (18) shows that the response of labor e�ort to the tax reform T is

the sum of two terms, which re�ect both elements (MRS and MCI) of the �rst-order

condition (7). First, the marginal rate of substitution (�rst expectation in (7)) implies

that the change in expected marginal tax rates, E[T ′ (w (θ, η))], reduces labor supply

by the Frisch elasticity ε (θ). This is the standard response one would obtain in

models with exogenous risk. Second, recall that in our moral hazard environment

the optimal e�ort level a (θ) is also determined by the marginal cost of incentives

(second expectation in (7)). But the MCI is positively related to the progressivity of

the tax schedule: with quasilinear utility we have MCI ∝ Cov( 1r(w(θ,η))

; η), which is

equal to zero (respectively, positive) when the marginal tax rates are constant (resp.,

increasing with income). Consequently, starting from an a�ne tax code, we expect a

progressive tax reform � for which the marginal tax rate adjustments T ′ (·) increasewith income � to raise the cost of incentive provision, and hence trigger an additional

downward adjustment in e�ort. Formally, this is indeed implied by the negative

covariance term in equation (18). Therefore, taking into account the endogeneity of

private insurance against output risk magni�es the negative impact of raising tax

progressivity on labor e�ort. In Section 4, we generalize this result to the case of a

utility function with income e�ects and a nonlinear baseline tax schedule and show

that the same insight carries over.

20

Su�cient-Statistic Approach. In our most general environment, the analytical

expressions for the policy elasticities a(θ)a(θ)

are technically straightforward to derive, but

they may fail to deliver sharp comparative statics with respect to the variance of per-

formance shocks σ2η or the strength of moral hazard frictions. Instead, it is standard

since Saez (2001) to express these labor supply responses in terms of substitution

and income e�ects that can be estimated in the data, and treat the resulting elas-

ticities as su�cient statistics in our tax incidence analysis (Chetty (2009)). Namely,

our tax formulas depend on the empirical values of these parameters, regardless of

the underlying structure of the model that generates them � that is, in our case,

regardless of whether private insurance against performance shocks is exogenous or

endogenous. Our goal is therefore to express the labor supply response a(θ)a(θ)

to any po-

tential tax reform T in terms of standard elasticity parameters that can be estimated

independently of the particular reform.

To do so, recall that, by the free-entry condition (5), average earnings conditional

on ability θ, E [w (θ, ·)], are equal to θa (θ). As a consequence, the elasticities of

e�ort a (θ) with respect to tax changes are equal to the corresponding elasticities of

average earnings E [w (θ, ·)]. Note that to evaluate average earnings E [w (θ, ·)] theeconometrician does not need to observe the actual value of ability θ � it is enough

to group workers into ordinal ability groups proxied by education, experience, etc.

We can thus de�ne the (compensated) elasticity of average earnings of agents with

ability θ with respect to the retention rate at income level w (θ, η) by

εEw,r (θ, η) ≡ r (w (θ, η))

E [w (θ, ·)]∂E [w (θ, ·)]∂r (w (θ, η))

. (19)

We also de�ne the income e�ect parameter as the semi-elasticity of average earnings

of agents with ability θ with respect to a lump-sum transfer at income level w (θ, η),

that is,

εEw,R (θ, η) ≡ 1

E [w (θ, ·)]∂E [w (θ, ·)]∂R (w (θ, η))

∣∣∣∣U

. (20)

These elasticities can be estimated empirically, and their explicit analytical expres-

sions in terms of primitives are given in Appendix B.

Proposition 3. The �rst-order e�ect of the tax reform T on labor e�ort a (θ) can be

21

expressed as

a (θ)

a (θ)= −E

[εEw,r (θ, η)

T ′ (w (θ, η))

r (w (θ, η))

]+ E

[εEw,R (θ, η)

T (w (θ, η))

w (θ, η)

](21)

where εEw,r (θ, η) and εEw,R (θ, η) are de�ned in (19) and (20), respectively.


The interpretation of Lemma 3 is standard. An increase in the marginal tax rate by

T ′ (w (θ, η)) (resp., an increase in the average tax rate by T (w(θ,η))w(θ,η)

) at the income level

w (θ, η) a�ects the optimal e�ort level a (θ) in proportion to the compensated elasticity

εEw,r (θ, η) (resp., the income e�ect parameter εEw,R (θ, η)). The intuition underlying

these substitution and income e�ects is the same as in the standard model of nonlinear

income taxation. Namely, an increase in marginal tax rates (respectively, in lump-sum

liabilities) lowers (resp., raises) the worker's optimal e�ort level by creating a wedge

between the marginal rate of substitution and the marginal rate of transformation

in the optimality condition (7). The only di�erence is that in our framework, it is

the �rm rather than the worker that chooses how much e�ort should optimally be

provided, and it achieves this by spreading or compressing the earnings schedule.

Nevertheless, standard methods of estimating taxable income elasticities would give

the correct values for the parameters (19) and (20).

3 Aggregate E�ects of Tax Reforms

In this section, we use our tax incidence results of Section 2 to characterize the

aggregate costs and bene�ts of tax reforms. We introduce the government and de�ne

formally the concepts of excess burden and welfare gains of policies in Section 3.1.

We derive the theoretical results in Section 3.2. Readers primarily interested in the

incidence of tax reforms on individual performance-pay contracts can skip to Section

4.

3.1 Government

In our model, the government observes both between- and within-group inequality,

that is, earnings di�erences due to ex-ante ability θ (proxied by education, experience,

22

etc.) and ex-post job-speci�c shocks η. However, taxes and transfers can only be

conditioned on realized earnings w and not on ability θ. The labor income tax schedule

is function T ∈ C2(R+,R).

Government Revenue and Social Welfare. Given the tax schedule T , govern-

ment revenue is given by

R (T ) =

ˆΘ

E [T (w (θ, η))] dF (θ) . (22)

Throughout the paper, we assume that the government faces an exogenous expen-

diture requirement G ≥ 0. Any extra revenue is used for redistribution between

workers with di�erent (uninsurable) levels of ability θ. Social welfare is evaluated by

a weighted-utilitarian functional

W (T ) =

ˆΘ

α (θ)U (θ) dF (θ) , (23)

where the map of Pareto weights θ 7→ α (θ) is positive, decreasing, and satis�es´Θα (θ) dF (θ) = 1.

Mechanical E�ect of Tax Reforms. Consider a tax reform T of the initial tax

schedule T . The mechanical, or statutory, e�ect of this reform is equal to its impact

on government revenue assuming that everyone's earnings remain �xed. It is given

by

ME(T, T ) =

ˆΘ

E[T (w (θ, η))]dF (θ) . (24)

That is, in the absence of endogenous earnings responses, government revenue would

simply change by the sum of (positive or negative) additional tax payments T (w (θ, η))

of all agents.

Excess Burden of Tax Reforms. The excess burden, or deadweight loss, of a

tax reform T is (minus) the change in government revenue caused by the endoge-

nous earnings adjustments. Since the government retains a share T ′ (w (θ, η)) of the

23

workers' earnings gains or losses w (θ, η), the excess burden is given by

EB(T, T ) = −ˆ

Θ

E[T ′ (w (θ, η)) w (θ, η)]dF (θ) , (25)

where w (θ, η) is given by (14). For instance, if a tax reform mechanically raises $1 of

revenue absent earnings adjustments, but causes distortions � say, reductions in labor

supply � which lower government revenue by ¢20, then the marginal excess burden is

equal to a fraction 20% of the mechanical e�ect. The total impact of the tax reform

on government budget (that is, the Gateaux derivative of the tax revenue functional

R (T )) is therefore equal to R(T, T ) = ME(T, T )− EB(T, T ).

Welfare Gains of Tax Reforms. The welfare gains of a tax reform T is the change

in social welfare W (T ) that it causes, expressed in monetary units. The change in

social welfare is equal to W(T, T ) =´α (θ) U (θ) dF (θ), where U (θ) is the change in

expected utility incurred by agents with ability θ and α (θ) measures their weight in

the social objective. To convert this welfare measure into units of revenue, consider

another, benchmark reform T ∗ within the available set of tax instruments, that costs

one dollar of revenue.19 Let the marginal value of public funds λ be the increase in

social welfare brought about by this reform T ∗. We then de�ne the welfare gain of

the tax reform T by

WG(T, T ) =1

λ

ˆΘ

α (θ) U (θ) dF (θ) . (26)

Optimum Tax Schedule. The optimal tax schedule is such that no tax reform of

the initial tax schedule that keeps the government budget constraint satis�ed has a

positive �rst-order impact on social welfare. We show in Appendix D that the optimal

tax schedule (respectively, the optimum within a restricted class of tax instruments)

19If universal lump-sum taxes and transfers are available, as in Mirrlees (1971), we naturallychoose T ∗ to be a uniform lump-sum transfer. We show in Appendix B that a lump-sum transferof $1 per worker is represented by the constant function −1 for all w. Denote by R(T,−1) < 0 theloss in government revenue from this transfer, once all behavioral responses have been taken into

account. Then the reform T ∗ (w) = −1/∣∣∣R(T,−1)∣∣∣ for all w is a uniform lump-sum transfer that

reduces government budget by $1 by construction. If instead the policy is restricted to the CRPtax schedules as in Section 4, the benchmark reform T ∗ consists of a decrease in the parameter τ ,normalized analogously to yield $1 of revenue. See Appendix D for details.

24

is characterized by

EB(T, T ) = ME(T, T ) + WG(T, T ), (27)

for all tax reforms T (resp., all tax reforms in the restricted class). In other words,

the marginal cost and marginal bene�t of any reform must be equal at the optimum.

3.2 Excess Burden and Welfare Gains

Substituting expression (14) for w (θ, η) into equation (25) yields the following char-

acterization. This is the second main result of our paper.

Theorem 2. The excess burden of the tax reform T is given by

EB(T, T ) = −ˆ

Θ

E [T ′ (w (θ, η)) wex (θ, η)] dF (θ) (28)

−∑

i∈{co,pp}

ˆΘ

Cov (T ′ (w (θ, η)) , wi (θ, η)) dF (θ)

The welfare gains of the tax reform T are given by

WG(T, T ) = −1

λ

ˆΘ

E[α (θ; η)u′ (R (w (θ, η))) T (w (θ, η))

]dF (θ) , (29)

where the modi�ed social welfare weights are given by α (θ; η) ≡ (v′(w(θ,η)))−1

E[(v′(w(θ,·)))−1]α (θ).

Proof. See Appendix D.

Formulas (28) and (29) can be used in practice to evaluate whether a concrete tax

reform proposal has a positive or negative e�ect on government revenue and social

welfare, starting from any (not necessarily optimal) tax code.

Excess Burden of Tax Reforms. The �rst integral in (28) is equal to the dead-

weight loss one would obtain in standard models with exogenous risk � recall that

in this case, the tax reform a�ects earnings via the standard labor supply channel

by wex (θ, η) = w (θ, η) a(θ)a(θ)

. This deadweight loss depends on the average earnings

elasticities summarized in a(θ)a(θ)

, as described in Lemma 3. This integral is analogous

to those typically derived in the optimal taxation literature (see for instance Saez

(2001)).

25

Now consider the case where the endogeneity of private insurance is taken into

account. Recall that the full adjustment to earnings w (θ, η) has the same mean as in

the frictionless benchmark. Thus, any �scal externalities due to moral hazard must

come from the change in earnings risk due to the crowding-out and the performance-

pay e�ects wco (θ, η), wpp (θ, η). The �rst implication of Corollary 2 is that if the

marginal tax rates T ′ (w (θ, η)) are initially constant � as in Chetty and Saez (2010)

� both covariances in the second line of equation (28) are equal to zero. Therefore,

the excess burden of the tax reform is the same as in the standard model, despite

the presence of moral hazard frictions and endogenous risk. In other words, the

performance-based nature of contracts and the endogenous crowding-out of private

insurance do not give rise to additional �scal externalities when the tax code T is

initially a�ne, even if the tax reform T itself is highly nonlinear.20

Consider �nally the case where the tax schedule T is initially nonlinear. In this

case, the covariances in (28) are no longer equal to zero and capture the impact on

government budget of the novel sources of earnings risk highlighted in formula (14).

Suppose for concreteness that the tax schedule is initially progressive, that is, the

marginal tax rates T ′ (·) are increasing. In this case, Cov (T ′ (w (θ, η)) , wi (θ, η)) > 0

whenever ∂wi(θ,η)∂η

> 0, that is, whenever the sensitivity of pre-tax earnings to perfor-

mance shocks increases following the reform. Therefore, a spread (resp., contraction)

of the earnings distribution causes a positive (resp., negative) �scal externality, that

is, a �rst-order gain (resp., loss) in government revenue. This is a consequence of

Jensen's inequality: a progressive (concave) tax code generates more tax revenue for

the government if earnings are more volatile, keeping their mean constant. For budget

purposes, the government is therefore tempted to induce an increase in the dispersion

of pre-tax earnings.

Welfare Gains of Tax Reforms. In the benchmark model with exogenous risk,

the increase in social welfare achieved by giving one additional unit of consumption

(say, via a tax break) to agents with ability θ and performance shock η is given

by the marginal social welfare weight α (θ)u′ (R (w (θ, η))), equal to their marginal

20In particular, consider the highest-income earners for whom performance-pay contracts areparticularly prevalent, and suppose that their baseline income absent any bonus � that is, theirearnings given the lowest realization η of the performance shock � is located in the highest taxbracket. Then the performance-sensitivity of their salary is irrelevant for their contribution togovernment revenue.

26

utility of consumption times their weight α (θ) in the social objective. Now, in the

environment with moral hazard frictions, the crowding-out of private insurance by

tax policy highlighted in Theorem 1 has welfare consequences that must be taken

into account.21

Indeed, we saw in Section 2.2 that the tax break raises the expected utility of

the workers (Proposition 2), as in a standard model with exogenous risk. Crucially,

however, recall that this utility gain must be shared uniformly across agents in order

to preserve their e�ort incentives. But since the marginal utility is decreasing, this

implies that workers with a higher output realization y end up getting a higher increase

in consumption. As a result, expression (29) implies that the marginal social welfare

weights that would arise in the benchmark model are now weighted by the share(v′(w(θ,η)))−1

E[(v′(w(θ,·)))−1]of the tax cut that workers actually receive. These weights are regressive

� richer agents end up with higher e�ective welfare weights in the social objective.

Intuitively, tax cuts accrue mostly to the highest-performing agents of a given ability

group. They are thus less e�ciently targeted than in the standard model without �rm

intermediation, in which the government could directly alter workers' consumption.

This regressive distribution of rents in turn reduces the welfare bene�ts of providing

social insurance compared to the exogenous-risk environment.

4 The Loglinear Framework

In this section we introduce a special case of our general model that allows us to

derive sharp consequences of our results of Sections 2 and 3. In particular, under the

following functional form restrictions the equilibrium labor contract is loglinear and

our tax incidence formulas become particularly transparent.

Assumption 2. The utility of consumption is logarithmic, u (c) = log c. The Frisch

elasticity of labor supply ε > 0 is constant, that is h (a) = (1 + 1ε)−1a1+ 1

ε . The

performance shocks are normally distributed, η ∼ N(0, σ2

η

). The tax schedule has a

constant rate of progressivity (CRP),22 that is, there exist τ ∈ R and p < 1 such that

21Because of the envelope theorem, the performance-pay e�ect (equation (16)) induces onlysecond-order welfare gains or losses. The �rst part of crowding-out (�rst term in equation (15))also keeps welfare constant since it ensures that workers' consumption remains �xed.

22The CRP tax code is a good approximation of the U.S. tax system, see for instance Heathcote,Storesletten, and Violante (2017). The rate of progressivity p is equal to (minus) the elasticity ofthe retention rate 1− T ′ (w) with respect to income w. Alternatively, 1− p is equal to the ratio of

27

T (w) = w − 1−τ1−pw

1−p.

We characterize the equilibrium labor contract in Section 4.1. We then focus on a

particular tax reform, namely, an increase in the (constant) rate of progressivity of the

initial tax schedule. We derive the incidence of this reform on earnings and utilities

in Section 4.2, and its excess burden and welfare gains in Section 4.3. We conclude

in Section 4.4 by characterizing the optimal rate of progressivity in this economy. In

Appendix G and H, we generalize our analysis of the optimal rate of progressivity to

the dynamic environment of Edmans, Gabaix, Sadzik, and Sannikov (2012).

4.1 Equilibrium Labor Contract

Under these assumptions, the labor contract characterized in Proposition 1 can be

simpli�ed as follows.

Corollary 1. Suppose that Assumption 2 holds. Denote by ψ ≡ ∂ logw(θ,η)∂η

the pass-

through of performance shocks to log-earnings. The earnings schedule is log-linear and

given by

logw (θ, η) = log (θa) + ψ η − 1

2ψ2σ2

η with ψ =a1/ε

1− p. (30)

E�ort a is independent of θ and satis�es

a =[(1− p)

(1− εψ,aψ2σ2

η

)] ε1+ε , (31)

where εψ,a ≡ ∂ logψ∂ log a

= 1ε. Expected utility is given by

U (θ) = log (R (θa))− h (a)− 1

2(1− p)ψ2σ2

η. (32)

Proof. See Appendix E.

In the standard Mirrlees (1971) model, the worker's wage rate is equal to her

marginal productivity θ, and her earnings are θa. In our more general model, equation

(30) implies that the �rm designs an incentive-based compensation contract that has

mean θa, but is also dispersed around this mean. The amount of risk to which the

marginal retained income 1− T ′ (w) to average retained income 1− T (w) /w.

28

�rm exposes the worker is summarized by the constant (that is, independent of η)

pass-through ψ of performance shocks to log-earnings.

Crucially, this parameter ψ ≡ ∂ logw(θ,η)∂η

= a1/ε

1−p is endogenous to policy: it depends

on the rate of progressivity p of the tax schedule both directly and indirectly through

the optimal e�ort level a. If the elasticities

εψ,a =∂ logψ

∂ log a, and εψ,1−p =

∂ logψ

∂ log (1− p)(33)

were both equal to zero, the model would be equivalent to one with an exogenous

and uninsurable shock η analogous to θ.

In general, however, the elasticity εψ,a is positive and measures the strength of

the moral hazard friction: it determines how much more exposure to performance

shocks is necessary to elicit a higher level of e�ort from the agent. Since εψ,a = 1ε, our

model implies that the sensitivity of earnings risk to the desired e�ort level is inversely

proportional to the Frisch elasticity of labor supply. If in response to a tax reform the

�rm wants to reduce the e�ort provided by the worker, it implements it by reducing

her exposure to risk, that is, by providing more insurance against performance shocks.

In the sequel we refer to εψ,a as the performance-pay elasticity.

Second, the elasticity εψ,1−p = −1 < 0 implies that higher tax progressivity leads

to a steeper pre-tax earnings schedule. This means that ceteris paribus (that is, keep-

ing e�ort constant), public insurance crowds out private insurance against output risk.

Intuitively, this is because an increase in tax progressivity compresses the disposable

income distribution and thus reduces the amount of risk that workers are e�ectively

facing; as a response, the �rm spreads out the pre-tax earnings schedule in order to

preserve incentives for e�ort. In the sequel we refer to εψ,1−p as the crowding-out

elasticity.

Finally, equations (31) and (32) imply that the worker's e�ort and expected utility

are both strictly lower in the environment with moral hazard and endogenous private

insurance, where εψ,a > 0 and εψ,1−p < 0, than in the exogenous-risk model where

εψ,a = εψ,1−p = 0. E�ort is a decreasing function of the rate of tax progressivity p.

29

4.2 Tax Incidence Analysis

Throughout this section we consider a tax reform that marginally raises the rate of

progressivity p. This policy is represented by (see Appendix E for technical details)

T (w) =

(logw − 1

1− p

)1− τ1− p

w1−p, ∀w > 0. (34)

To derive the incidence of this tax reform, we can either directly di�erentiate with

respect to p the equilibrium labor contract given by Corollary 1, or apply Theorem 1

to the corresponding function (34). We �rst derive the impact of this tax reform on

labor e�ort before analyzing its e�ect on earnings and utility.

Lemma 2. Suppose that Assumption 2 holds. The elasticity of e�ort with respect to

progressivity εa,1−p = ∂ log a∂ log(1−p) is given by

εa,1−p =ε

1 + ε·

1 + εψ,a ψ2σ2

η

1 + 1−ε1+ε

εψ,a ψ2σ2η

, (35)

where εψ,a = 1εdenotes the performance-pay elasticity (33). Thus, the labor supply

elasticity εa,1−p is strictly larger in the presence of moral hazard (εψ,a > 0) than in

the benchmark model with exogenous risk (εψ,a = 0).


Equation (35) gives an analytical expression for the labor supply elasticity that

drives the response of labor supply to the tax reform (34), a(θ)a(θ)

= − 11−pεa,1−p. This

elasticity is strictly larger in an economy with moral hazard and endogenous private

insurance than in the benchmark setting with exogenous risk. Speci�cally, in the

polar case where εψ,a = 0, we obtain εa,1−p = ε1+ε

, which is an increasing function of

the Frisch elasticity ε. Instead, when the exposure to risk varies endogenously with

e�ort so that εψ,a > 0, we have εa,1−p > ε1+ε

. The intuition underlying this result

is analogous to the case of the utility without income e�ects analyzed in Lemma 1.

Recall that the marginal cost of eliciting higher e�ort is equal to the expected marginal

rate of substitution (MRS, �rst term in (7)) plus, in the presence of moral hazard,

the expected marginal cost of incentive provision (MCI, second term in (7)). But the

latter increases when the tax code becomes more progressive, since it is proportional

to 1v′(w(θ,η))

= w(θ,η)1−p if the utility is logarithmic and the tax schedule is CRP. Note

30

that the greater the Frisch elasticity, the more the standard model underestimates

the true distortionary cost of raising tax progressivity.

Corollary 2. The impact of an increase in progressivity on earnings is given by

formula (14), where the standard adjustment in a model with exogenous risk is given

by

wex (θ, η)

w (θ, η)= − 1

1− pεa,1−p, (36)

the crowding-out e�ect is given by23

wco (θ, η)

w (θ, η)= − 1

1− pεψ,1−p

(ψη − ψ2σ2

η

), (37)

and the performance-pay e�ect is given by

wpp (θ, η)

w (θ, η)= − 1

1− pεψ,a εa,1−p

(ψη − ψ2σ2

η

), (38)

where εψ,a = 1εand εψ,1−p = −1 denote the pass-through elasticities (33). Overall,

pre-tax earnings are strictly more exposed to output risk after the reform.


Equations (36) to (38) give closed-form expressions for the three sources of earn-

ings adjustments caused by the tax reform: the standard labor supply e�ect, the

crowding-out e�ect, and the performance-pay e�ect. The �rst, wex (θ, η), is straight-

forward: it simply states that if wage rates are exogenous, the percentage earnings

response to the reform, wex(θ,η)w(θ,η)

, is equal to the percentage e�ort response, a(θ)a(θ)

=

− 11−pεa,1−p. Taking into account the endogeneity of private insurance yields the other

two e�ects, wco (θ, η) and wpp (θ, η). As explained in Section 4.1 above, the crowding-

out e�ect strictly raises the amount of risk to which workers are exposed, measured

by the pass-through of performance shocks to log-earnings, via the crowding-out elas-

ticity εψ,1−p < 0. This pre-tax earnings adjustment ensures that their incentives are

preserved. On the other hand, the reform lowers the optimal e�ort that �rms would

23The proof in the Appendix provides the decomposition of wco (θ, η) into its two e�ects high-lighted in equation (15).

31

like workers to exert by εa,1−p > 0. This reduction in e�ort is elicited by improv-

ing the worker's insurance against output shocks. Thus, the performance-pay e�ect

strictly reduces exposure to risk via the performance-pay elasticity εψ,a > 0. This

indirect increase in private insurance counteracts the direct crowding-out response to

the policy change.

Overall, using the structural expression (35) for the labor supply elasticity εa,1−p,

we can easily show that

εψ,1−p + εψ,a εa,1−p < 0. (39)

As a consequence, we obtain that the total earnings adjustment due to moral hazard

frictions, wco (θ, η) + wpp (θ, η), unambiguously leads to an increase in the sensitiv-

ity of pre-tax log-earnings to performance shocks. That is, the crowding-out e�ect

outweighs the performance-pay e�ect and private insurance is reduced on net.

The Crowding-Out and Performance-Pay E�ects Almost O�set Each Other.

We now study the relative magnitude of the crowding-out and the performance-pay

e�ects. Our conclusion is that, while each of them taken separately has a large impact

on the structure of compensation, on net they almost fully o�set each other so that

the earnings schedule is only barely riskier following an increase in tax progressivity.

Recall that the crowding-out elasticity εψ,1−p is equal to 1 in absolute value: this

is equivalent to saying that keeping e�ort constant, the variance of log-consumption

remains constant after the tax reform and the endogenous earnings adjustment. The

performance-pay e�ect, on the other hand, is driven by the optimal change in e�ort

given by the labor supply elasticity εa,1−p. The �rm implements this change in e�ort

by adjusting the sensitivity of the contract via the pass-through elasticity εψ,a. Why

does this performance-pay e�ect εψ,aεa,1−p have the same order of magnitude as the

direct crowding-out adjustment εψ,1−p?

The key insight is that the performance-pay elasticity εψ,a is proportional to the

inverse of the (Frisch) elasticity of labor supply ε. This is an immediate consequence

of our key result that the slope of the contract (6) (or (30) in the loglinear model) is

equal to the marginal disutility of labor h′ (a (θ)). Thus, to raise the worker's e�ort by

a, the �rm must raise the slope of the contract in percentage terms by h′′(a(θ))ah′(a(θ))

= 1ε(θ)

aa,

where ε (θ) is the Frisch elasticity of labor supply.

32

As a result, to the extent that the labor supply elasticity εa,1−p has a similar order

of magnitude as the Frisch elasticity of labor supply ε, the performance-pay e�ect is

approximately equal to 1ε× εa,1−p ≈ 1, that is, about the same as the crowding-out

e�ect. Crucially, this result is robust to the value of the labor supply elasticity. Indeed,

if the labor supply elasticity is small, so that e�ort moves only a little in response to

a tax change, then the pass-through must mechanically increase by a large amount

in order to elicit this change in e�ort, so that the product of the two elasticities is

always approximately equal to 1. Intuitively, if labor supply is very inelastic, e�ort

will barely change in response to tax reforms; but precisely because of this inelastic

behavior, a very large change in performance-sensitivity will then be necessary to

convince workers to adjust their e�ort by this small amount.

The previous discussion is correct if the labor supply elasticity εa,1−p (or, more

generally, aain our general model) is indeed approximately equal to the Frisch elasticity

ε. In practice, this need not be exactly the case. Formula (35) gives the structural

expression for εa,1−p as a function of ε and the variance of performance shocks σ2η. We

showed that the endogeneity of private insurance raises the labor supply elasticity

with respect to an increase in tax progressivity, relative to the benchmark setting

with exogenous risk. Therefore, we know that εa,1−p must be at least as large as its

value in this environment, namely, ε1+ε

.24 We therefore have

εψ,a εa,1−p >1

ε× ε

1 + ε=

1

1 + ε.

For a Frisch elasticity ε ≈ 12, this means that the performance pay e�ect o�sets at

least 11+0.5

= 23of the crowding-out e�ect εψ,1−p = −1. To re�ne this estimate, we use

the structural expression for the labor supply elasticity εa,1−p derived in Lemma 2. A

Taylor approximation in ψ2σ2η yields

− wco (θ, η)

wpp (θ, η)=

εa,1−pε

≈ 1 +(ε− 2ψ2σ2

η

). (40)

Our calibration in Section 5.4 implies that the variance of earnings conditional on

ability that best matches the data is equal to ψ2σ2η ≈ 0.2. We thus get

(ε− 2ψ2σ2

η

)≈

0.1, so that the earnings adjustment due to crowding-out amounts to about 110%

(in absolute value) of the adjustment caused by the performance-pay e�ect. In other

24This value is lower than the Frisch elasticity ε because of the income e�ects on labor supply.

33

words, the labor supply responses o�set about 90% of the crowding-out of private

insurance by tax progressivity.

Discussion. The analysis of our quantitative model in Section 5 con�rms that the

crowding out e�ect (15) and the performance-pay e�ect (16) are both signi�cant but

o�set each other almost entirely, so that the overall e�ect of tax progressivity on earn-

ings risk is small. In Section 5.6 and Appendix B, we analyze both theoretically and

numerically the incidence of several other tax reforms: lump-sum tax and marginal

tax rate increases on high incomes, and a constant percentage increase in retention

rates. For each of these reforms, as in the case of an increase in tax progressivity,

the performance-pay e�ect counteracts, and sometimes even dominates, the direct

crowding-out of the private insurance contract. Intuitively, raising marginal tax rates

leads to a spread of the pre-tax earnings distribution, but the reduction in labor sup-

ply that it causes tends to contract it. Conversely, raising lump-sum tax payments

leads to a crowding-in of private pre-tax insurance, but the income e�ect on labor

supply again runs in the opposite direction. These results are consistent with the em-

pirical literature. In particular, Frydman and Molloy (2011) exploit the relative tax

advantage of di�erent forms of CEO pay from 1946 to 2005 and �nd that the structure

of compensation responds little to changes in tax rates on labor income. By high-

lighting the two counteracting forces at play � crowding-out versus performance-pay

adjustments � our analysis provides an explanation for these �ndings.

4.3 Excess Burden and Welfare Gains

We now derive expressions for the excess burden and the welfare gains of raising the

rate of progressivity of the tax schedule.

Corollary 3. Suppose that Assumption 2 holds. Suppose moreover that ability types

are lognormally distributed, log θ ∼ N (µθ, σ2θ). The excess burden of an increase in

the rate of progressivity p of the CRP tax schedule is given by

EB =

(1

(1− g) (1− p)− 1

)εa,1−pC + (εψ,1−p + εψ,a εa,1−p) pψ

2 σ2η C, (41)

where C is the economy's aggregate private consumption, g is the ratio of government

expenditures G to aggregate output Y = C + G, εa,1−p is the labor e�ort elasticity

34

given by (35), and εψ,a = 1ε, εψ,1−p = −1 are the pass-through elasticities.

Suppose moreover that the planner is utilitarian, that is, α (θ) = 1 for all θ.25 The

welfare gains (including the mechanical e�ect) of an increase in the rate of progres-

sivity are given by

ME + WG = (1− p)(σ2θ + ψ2σ2

η

)C + εψ,1−p ψ

2 σ2η C. (42)


Excess Burden of Raising Progressivity. The �rst term in the right-hand side

of (41), [((1− g) (1− p))−1−1]εa,1−pC, is the standard deadweight loss from distorting

labor e�ort, that is, the behavioral e�ect of taxation that would arise in a model with

exogenous risk. This e�ect, equal to the �rst integral in equation (28), is increasing in

the elasticity of e�ort with respect to progressivity (35) that measures the disincentive

e�ects of raising tax rates, and in the rate of progressivity of the tax code that captures

the share of income losses borne by the government as reduced revenue. Moreover,

government expenditures raise the excess burden of tax progressivity. Intuitively, this

is because a given marginal increase in tax progressivity implies a larger deadweight

loss if the tax burden is already large due to high spending needs.

The second part of equation (41) captures the �scal externalities that arise when

private insurance against output risk is endogenous, that is, the two covariance terms

in equation (28). The term εψ,1−pp(ψση)2C is the value of

´Cov(T ′, wco)dF , and

the term εψ,aεa,1−pp(ψση)2C is the value of

´Cov(T ′, wpp)dF . Since εψ,1−p < 0, the

crowding-out e�ect contributes to reducing the excess burden of the reform, because

it increases the sensitivity of earnings to output risk � by Jensen's inequality, this

generates more tax revenue when the tax schedule is initially progressive (p > 0).

Conversely, since εψ,a, εa,1−p > 0 the performance-pay e�ect contributes to raising

the excess burden (lowering government revenue) via a reduction in e�ort and hence

risk exposure. Now recall that, by Corollary 2, the crowding-out e�ect dominates

the performance-pay e�ect so that the earnings schedule becomes more risky overall.

Therefore, conditional on the value of the labor e�ort elasticity, the deadweight loss

of raising taxes is strictly smaller than in the standard model with exogenous risk.

25In the Appendix we consider the more general case where the social welfare weights are givenby α (θ) = e−α log θ´

e−α log θ′dF (θ′)for all θ, where α ≥ 0.

35

However, we showed in Section 4.2 that the crowding-out and performance-pay e�ects

almost fully o�set each other. We therefore expect the net positive �scal externality

to be small in magnitude. We con�rm this intuition in Section 5.

Welfare Gains of Raising Progressivity. Finally, equation (42) shows that the

welfare gains (including the mechanical e�ect) ME+WG of the tax reform is the sum

of two terms. The �rst, (1− p) (σ2θ + (ψση)

2), captures the insurance gains obtained

by raising the rate of progressivity of the tax schedule, as in a standard optimal

taxation model. Note that tax progressivity insures both the initial ability di�erences

θ and the performance shock η passed-through to earnings, that is, both between-

and within-group heterogeneity. The larger their respective variances σ2θ and ψ2σ2

η

and the lower the initial rate of progressivity p ∈ (−∞, 1), the higher the gains of

marginally raising progressivity.

However, recall that the private insurance contract adjusts endogenously to the

policy reform. Since the pass-through elasticity with respect to progressivity is equal

to εψ,1−p = −1, the welfare e�ect of this crowding-out (last term in equation (42))

satis�es

εψ,1−pψ2σ2

η < εψ,1−p (1− p)ψ2σ2η = − (1− p)ψ2σ2

η.

As a result, the crowding-out more than fully o�sets the additional insurance against

performance shocks provided by public policy, (1− p)ψ2σ2η. Intuitively, in response to

increased public insurance through higher tax progressivity, the �rm adjusts the pre-

tax earnings contract so that total (public plus private) insurance remains unchanged

� there is a one-for-one crowding-out. However, there is an additional force at play.

Recall that in our model, tax changes are intermediated by �rms rather than being

directly distributed to individual workers. We saw that as a consequence, the bene�ts

of a tax cut accrue primarily to the richest workers of a given ability group. This

strictly reduces the welfare gains of raising progressivity relative to the benchmark

model with exogenous risk.

Note �nally that, while the earnings distribution and government revenue remain

practically unchanged following a tax reform as the crowding-out and performance-

pay e�ects almost o�set each other, the welfare e�ects of raising progressivity, on the

other hand, can be large in magnitude. Indeed, labor supply adjustments cause only

36

second-order changes in welfare by the envelope theorem. As a result, the welfare

implications of crowding-out are not counteracted by those of the performance-pay

e�ect. We evaluate quantitatively the welfare cost of ignoring the endogeneity of

private insurance in Section 5.3.

4.4 Optimal Rate of Progressivity

We �nally gather our results on the excess burden and the welfare gain of tax reforms

to characterize the optimal CRP tax schedule.

Proposition 4. Suppose that Assumption 2 holds, that ability types are lognormally

distributed, and that the social welfare objective is utilitarian. The optimal rate of

progressivity satis�es

p∗

(1− p∗)2 =σ2θ + (1 + εψ,1−p)ψ

2σ2η(

1 + g(1−g)p∗

)εa,1−p + (1− p∗) εψ,aεa,1−pψ2σ2

η

, (43)

where g = G/Y is the ratio of government spending to output, εa,1−p is the elasticity of

e�ort with respect to the rate of progressivity given by (35), and εψ,a = 1εand εψ,1−p =

−1 are the pass-through elasticities. In particular, the optimal rate of progressivity is

strictly smaller in the model with endogenous private insurance than in the benchmark

environment with exogenous risk where εψ,1−p = εψ,a = 0.


Formula (43) is obtained by equating the excess burden EB to the welfare gain

(including the mechanical e�ect) ME + WG of raising progressivity, both derived in

Corollary 3. Consider �rst the polar case with exogenous risk, that is, where all

earnings di�erences are attributed to exogenous labor productivity shocks θ and η.

In this case, letting εψ,1−p = εψ,a = 0 in formula (43) leads to

p∗

(1− p∗)2 =

(1 +

g

(1− g) p∗

)−1 σ2θ + ψ2σ2

η

εa,1−p. (44)

Thus, the optimal rate of progressivity in this model is increasing in the variances of

the ability and performance shock distributions, and decreasing in the elasticity of

e�ort εa,1−p = ε1+ε

. In this benchmark setting, the government trades-o� the bene�ts

37

of insuring the entire earnings risk, which is determined by the variance of log-earnings

Var (logw) = σ2θ + ψ2σ2

η, with the excess burden of raising progressivity.

Now consider the general model with endogenous partial insurance against per-

formance shocks, so that εψ,1−p < 0 and εψ,a > 0. Equating the excess burden to the

welfare gains of raising progressivity implies that p∗ is the solution to(1

(1− g) (1− p∗)− 1

)εa,1−p + (εψ,1−p + εψ,aεa,1−p) p

∗ψ2σ2η

= (1− p∗)[σ2θ + ψ2σ2

η

]+ εψ,1−pψ

2σ2η, (45)

from which (43) follows. This formula implies that p∗ is strictly decreasing in σ2η, and

hence that the optimal rate of progressivity is strictly lower than in the previous case.

This is because the positive �scal externality and the welfare loss of crowding-out ex-

actly cancel each other out, as they are respectively equal to εψ,1−pp∗Var (logw | θ)and [(1− p∗)+εψ,1−pp

∗]Var (logw | θ), where Var (logw | θ) = ψ2σ2η is the variance of

log-earnings conditional on ability θ. The only remaining term is therefore the neg-

ative �scal externality due to the performance-pay e�ect, εψ,aεa,1−pp∗Var (logw | θ).This �scal externality is captured by the second term in the denominator of (43).

Overall, we obtain that by ignoring these e�ects, a planner that would ignore the

endogeneity of private insurance would overestimate the optimal rate of tax progres-

sivity.

5 Quantitative Analysis

In Section 5.1 we extend the model of Section 4 to make it suitable for policy analysis.

Speci�cally, we incorporate the coexistence of jobs with and without performance pay

and a Pareto tail for productivity types. We calibrate the model to match several

key moments of U.S. data in Section 5.2. We then analyze the impact of two tax

reforms in Sections 5.3 and 5.4. First, we consider a small reform that increases the

rate of progressivity by one percentage point around the current tax code. Second,

we study a large reform that nearly doubles the current rate of progressivity and

brings the economy to the utilitarian optimum. We �nally compare in Section 5.5 the

optimal rate of progressivity in our calibrated model with two important benchmarks:

the optimum in the model without performance-pay jobs, and the progressivity rate

38

chosen by a government that would wrongly assume that wage risk is exogenous.

5.1 Quantitative Model

We extend the loglinear model of Section 4 by adding the following elements. A

share π of workers have a performance-pay job, denoted with a subscript m, and the

remaining share 1− π of workers have a normal job (subscript n). The output of the

worker with productivity θ and a job type j ∈ {m,n} is θ(aj + η), where aj is the

e�ort level and η ∼ N(0, σ2

η,j

)is the performance shock. Performance-pay jobs are

subject to the agency frictions described in Section 1. At these jobs, the employer

observes the output but not the e�ort of the worker nor the performance shock and,

hence, o�ers a wage which depends on the stochastic output realization according to

the pass-through coe�cient ψ. Normal jobs, in contrast, are free from agency frictions

and guarantee a risk-free wage.

We treat the job type of a worker as exogenous. In the data, the share of perfor-

mance pay jobs increases with earnings (see Lemieux, MacLeod, and Parent (2009);

Grigsby, Hurst, and Yildirmaz (2019)). We allow the share of job types to be cor-

related with productivity by assuming that productivity is drawn from a job-type-

speci�c Pareto-lognormal distribution (Colombi (1990)). That is, conditional on the

job type j ∈ {n,m}, the log productivity is the sum of independently drawn nor-

mal and exponential random variables: log θ = θ1 + θ2 where θ1 ∼ N(µθ,j, σ

2θ,j

)and

θ2 ∼ Exp (λθ,j). We keep government expenditures G �xed when comparing di�erent

policy scenarii. We derive and analyze the theoretical formula for the optimal rate of

progressivity in this generalized environment in Appendix F.

5.2 Calibration

We calibrate to model to match the evidence on elasticities and wage distribution

in the U.S. We choose the value ε = 0.5 for the Frisch elasticity, which implies a

compensated elasticity of labor supply at normal jobs of approximately 0.3. Both

values are consistent with empirical evidence (Keane (2011); Chetty (2012)).

We assume that log-productivity distributions for both job types have a common

normal variance σ2θ,m = σ2

θ,n = σ2θ and tail parameter λθ,m = λθ,n = λθ. As a result,

39

our model implies that the variance of log earnings is given by

Var (logw) =

(σ2θ +

1

λ2θ

)+ πψ2σ2

η + π (1− π)

(µm − µn + log

aman−ψ2σ2

η

2

)2

,

where σ2θ + 1

λ2θis the variance of log-productivity, ψ2σ2

η is the variance of log-earnings

at the performance-pay jobs due to the performance shocks, and the last term cap-

tures the contribution of the di�erence between the mean log-earnings at normal and

performance-pay jobs.

Lemieux, MacLeod, and Parent (2009) study performance-pay jobs using Panel

Study of Income Dynamics (PSID) and �nd that their fraction π was 0.45 in 1998,

the most recent year included in their analysis. They report that performance-pay

jobs have mean hourly wages higher by 30%, and the variance of wages higher by

42%, relative to normal jobs.26 The �rst statistic pins down µm−µn = log (1.3). The

second will be crucial in determining σ2η,m, the variance of the performance shock at

performance-pay jobs.

For the levels of the mean and variance of log-earnings in the entire economy, we

turn to the Survey of Consumer Finances (SCF) which uses data from the Internal

Revenue Service Statistics of Income program to accurately represent the distribution

of high income households. Based on the SCF, Heathcote and Tsujiyama (2019) report

a mean household labor income of $77, 325 and an overall variance of log labor income

of 0.618 in 2007. They also estimate the tail parameter of the log earnings distribution

λθ at 2.2.

Regarding the government policy, Heathcote, Storesletten, and Violante (2017)

estimate the empirical rate of tax progressivity at 0.181 and Heathcote and Tsujiyama

(2019) report a ratio of government purchases to output of 18.8 percent.

Given these estimates, we choose σ2θ = 0.31 and σ2

η,m = 0.4 to match the overall

variance of log-earnings as well as the relative variance between performance-pay and

normal jobs. Matching mean labor income as well as the ratio of mean wage rates

at the two job types implies µθ,m = 3.88 and µθ,n = 3.62. The implied distribution

of wage rates and job types is depicted in Figure 1. The share of performance-pay

jobs is largest in the top quartile of the wage distribution. This is consistent with the

26These values are based on Table 1, Figure IV and Figure V in Lemieux, MacLeod, and Parent(2009). When statistics are available for multiple years, the last year available is used (either 1996or 1998).

40

Figure 1: Joint distribution of wage rates and job types

(a) (b)

1 2 3 4 5 6 7 8

log wage rate

0.0

0.1

0.2

0.3

0.4

0.5

0.6

dens

ity

normal jobsperformance-pay jobs

Q1 Q2 Q3 Q4

quartiles of the wage rate distribution

20

30

40

50

60

shar

e of

per

form

ance

-pay

jobs

(%)

empirical evidence that bonuses and other forms of performance-related pay are more

prevalent at higher income levels (Lemieux, MacLeod, and Parent (2009); Grigsby,

Hurst, and Yildirmaz (2019)).

5.3 Marginal Reform of Tax Progressivity

Table 1 shows the impact of a small increase of the rate of progressivity by one per-

centage point, from 0.181 to 0.191, on the performance-pay jobs, the normal jobs, and

all jobs. Note that the e�ects on performance-pay jobs are generally larger in absolute

value, since performance-pay workers have higher average output and earnings than

those in normal jobs. An increase in progressivity leads to a large redistributive gain

for both types of jobs. For the performance-pay workers, an increase of progressivity

would also lead to a substantial gain from better insurance against the earning risk if

this risk was policy-invariant. However, an increase in progressivity generates a large

crowding-out which, as we saw in Section 4, fully o�sets the gains from insurance

and, in addition, somewhat reduces the gains from redistribution. To understand the

latter e�ect, note that workers who on average gained from the reform will see their

earnings structure adjusted to keep incentives intact: their consumption will increase

disproportionally in high-output contingencies, which reduces their expected utility

gain. Hence, the crowding-out e�ect makes redistribution less potent. We �nd that,

quantitatively, the redistributive gain for the performance-pay jobs is reduced by 6%.

The excess burden of the reform, EB, is substantially larger for performance-

pay jobs, because the elasticity of e�ort εa,1−p is 25% greater for these workers. This

di�erence in elasticities is only slightly mitigated by the combined impact of the �scal

41

externalities caused by the crowding-out and performance-pay e�ects. These e�ects

have a non-negligible impact on the excess burden when considered separately, but

roughly cancel each other out and lead to a very modest positive �scal externality.

To understand this result, recall the excess burden formula obtained in Corollary 3.

Since εψ,1−p = −1, the positive �scal externality due to crowding-out relative to the

standard deadweight loss induced by labor supply responses is proportional to the

inverse labor e�ort elasticity 1/εa,1−p and given by

1

εa,1−p×(

1

(1− g) (1− p)− 1

)−1

pψ2σ2η ≈ 17.85%.

Since εψ,a = 1ε, the additional negative �scal externality due to performance-pay

relative to the standard deadweight loss is proportional to the inverse Frisch elasticity

1/ε and given by

−1

ε×(

1

(1− g) (1− p)− 1

)−1

pψ2σ2η ≈ −14.9%.

Therefore, each of these e�ects signi�cantly alters the standard calculation of the

excess burden of raising tax progressivity. However, the sum of these e�ects is

proportional to the di�erence between the labor supply and the Frisch elasticities,

1/εa,1−p − 1/ε. Since this di�erence is small, the overall �scal externality caused by

moral hazard frictions is equal to a mere 2.94% of the standard deadweight loss of

raising tax progressivity. Therefore, this reform is only slightly less costly for the

government budget than one would estimate by ignoring the endogeneity of private

insurance.

5.4 Large Reform: From Status Quo to Optimum

We extend the theoretical optimal progressivity formula to our quantitative model

with two types of jobs in Appendix F. We �nd that the utilitarian optimum progres-

sivity rate is given by p∗ = 0.356. This rate is almost double the current progressivity

rate in the U.S., and the implied social welfare increase is equivalent to a 3% increase

in consumption. To get a sense of the magnitude of this reform, note that the average

tax rate, including transfers, of a worker with labor income $33, 000 would decrease

from −0.7% to −14.2%. The tax rate at the mean household income $77, 325 would

42

Table 1: Impact of a small increase in the rate of progressivity

Perf.-pay jobs Normal jobs All jobs

Welfare gain ME + WG 354 292 320due to redistribution 376 292 330due to insurance 114 0 51due to crowding-out -136 0 -61

Excess burden EB 137 99 116due to standard e�ect 141 99 118due to crowding-out -25 0 -11due to performance-pay 21 0 9

Total: ME + WG - EB 217 192 203

Note: The three columns show the mean impact of increasing the progressivity rate by 1 percentage point (0.01) onthe performance-pay jobs, the normal jobs, and all jobs, respectively. All the e�ects are expressed in USD per

worker in a given job category.

increase from 13.6% to 15.6%. The tax rate at $500, 000, which roughly corresponds

to the top 1% threshold, would increase from 38.4% to 56.6%. In this section we

analyze the impact of a large reform of the current tax code that implements the op-

timal progressivity and adjusts the other tax parameter to keep government revenue

unchanged.

The impact of this reform is depicted in Figure 2 and analyzed in Table 2. Fol-

lowing a large increase in tax progressivity, the earnings schedule is barely altered:

the pass-through of performance shocks to log-earnings increases only modestly from

0.731 to 0.76. As a result, the variance of log-earnings conditional on productivity

increases by 8%, while the overall dispersion of log-earnings among performance pay

workers increases by 2.3%. Given that the pre-tax earnings schedule hardly moves,

this progressivity-increasing reform substantially �attens the consumption schedule,

leading to a much better consumption insurance. Indeed, both the individual con-

sumption risk and overall consumption dispersion among performance-pay workers

fall by more than 30%. Better insured workers have weaker incentives to exert e�ort

which, for our calibrated labor supply elasticity, falls by a substantial 9.6% due the

large magnitude of the tax reform.

Underlying the weak response of the earnings schedule are two countervailing

forces: the crowding-out of private insurance and the performance-pay e�ect. If �rms

attempted to motivate workers to maintain their original level of e�ort, better private

43

Figure 2: Earnings and consumption schedules of performance-pay workers

(a) Earnings schedule (b) Consumption schedule

100 50 0 50 100 150 200 250

output ($1,000)

0

25

50

75

100

125

150

175

200

earn

ings

($1,

000)

original earnings schedule+ crowding-out effect+ performance-pay effect (final)

100 50 0 50 100 150 200 250

output ($1,000)

0

25

50

75

100

125

150

175

200

cons

umpt

ion

($1,

000)

original consumption schedule+ crowding-out effect+ performance-pay effect (final)

Note: The adjustment of the earnings and consumption schedules for a performance-pay worker with a meanproductivity following an increase of progressivity rate from the current (0.181) to the optimal level (0.356).

Table 2: Earnings and consumption distribution statistics following the large reform

insurance via the income tax would crowd-out private insurance so as to leave the

variance of log-earnings unchanged, since εψ,1−p = −1. For that to happen, the pass-

through would need to increase all the way to 0.93, raising the log-earnings risk of each

performance-pay worker by 62%. However, �rms in equilibrium choose a lower e�ort

level and reduce the power of incentive-pay accordingly. This force � the performance-

pay e�ect � counteracts the e�ect of crowding-out and brings the pass-through back

to the vicinity of its original level. The combination of these two e�ects implies that,

strikingly, the relative fall of log-consumption variance in the aftermath of the tax

reform is nearly identical for the workers at jobs with and without agency frictions.

44

5.5 Performance-Pay Jobs and Optimal Progressivity

We now study the importance of performance-pay considerations for the optimal tax

progressivity by comparing the optimal progressivity rate arising in the calibrated

model to two important benchmarks. The �rst is the optimal rate of progressivity in

the counterfactual economy without performance-pay jobs, obtained by setting the

variance of the performance shock σ2η,m to zero. The second is the rate of progressivity

that would be chosen by the government who would erroneously assume that the

entire wage risk is exogenous. This rate is found by applying the formula for the

optimal rate of progressivity from the model with exogenous risk to our calibrated

model economy, where wage-rate risk is actually endogenous. Following Rothschild

and Scheuer (2016), we call the resulting progressivity rate a self-con�rming policy

equilibrium. The results are depicted in panel (a) of Figure 3.

First, in an economy without performance-pay jobs, the optimal rate of progres-

sivity would increase from 0.356 to 0.41. To understand the discrepancy between the

true optimum and this benchmark, we gradually switch on the various channels re-

lated to performance-pay jobs in the optimum formula (43), starting from an economy

devoid of agency frictions (see panel (b) of Figure 3). First, workers at performance-

pay jobs exert lower e�ort than those at normal jobs. This leads to a lower output and

hence a higher share of government spending in GDP, G/Y . This in turn contributes

to lower progressivity and explains approximately 30% of the overall progressivity

change. Second, workers at performance-pay jobs face higher wage risk. That raises

the gains of providing social insurance via tax progressivity. However, this additional

bene�t of insurance is fully canceled by the crowding-out of private insurance. Third,

the labor supply elasticity of performance-pay workers is higher, increasing the excess

burden of raising progressivity and explaining 40% of the progressivity di�erence.

Finally, the performance-pay e�ect contributes to a reduction in government revenue

via a compression of the earnings distribution, which explains the remaining 30%.

Second, we compare the optimal rate of progressivity with the SCPE. In this

equilibrium concept, the government can correctly estimate the elasticity of e�ort, but

treats wage risk as fully exogenous. Such a policymaker would mistakenly attempt

to insure the endogenous part of the wage risk and choose too high a progressivity

rate, equal to 0.4. However, log-earnings risk at performance-pay jobs increases by a

mere 2.3% relative to its value in the true optimum. Once again, this is due to the

counteracting forces of the crowding-out and the performance pay e�ects. Finally,

45

Figure 3: Optimal progressivity and performance pay jobs

(a) Social welfare functions (b) Optima di�erence decomposition

0.0 0.1 0.2 0.3 0.4 0.5 0.6

progressivity rate p-5

0

5

10

15

wel

fare

cha

nge

rel.

to s

tatu

s qu

o (%

of c

ons.

)

optimum in the calibrated model p = 0.356, welfare change: 2.98%

optimum w/o performance-pay jobs p = 0.41, welfare change: 14.22%

self-confirming policy equilib. p = 0.4, welfare change: 2.74%

status quo p = 0.181

SWF in the calibrated modelSWF without performance-pay jobs

lower effort

higher earnings risk

crowding-out effect

higher effort elasticity

performance-pay effect

60

40

20

0

20

40

60

chan

ge in

pro

gres

sivi

ty ra

te (%

of t

otal

cha

nge)

Note: Panel (b) shows the contributions of various channels through which performance-pay jobs a�ect the optimal

progressivity rate. These contributions are obtained by successively switching on the respective channels in the

optimal progressivity formula. All the channels combined sum up to the di�erence in progressivity rate between the

optimum without performance pay jobs and the optimum in the calibrated model.

ignoring the endogeneity of wage risk does not lead to a large miscalculation of optimal

tax policy: the social welfare cost of choosing the SCPE is equivalent to a 0.24% drop

in consumption. This value implies that increasing the U.S. rate of progressivity

from the status quo to the SCPE reaps 93% of the welfare gains of moving from the

status quo to the full optimum. Recall that this small welfare cost of sub-optimizing

is not a necessary consequence of our theoretical analysis.27 In fact, if there were

only performance-pay jobs in the economy, the di�erence in progressivity between

the SCPE and the full optimum would be 0.09 with a welfare di�erence equivalent to

1% change of consumption. Since in our calibration only roughly half of the jobs are

performance-based, the impact on the progressivity rate is half of that. Furthermore,

as the social welfare function is concave in p, half of the change in progressivity

translates into a quarter of the change in social welfare.

5.6 Incidence of Other Tax Reforms

Recall that our theoretical tax incidence analysis of Section 2 gives us in closed-form

the incidence of arbitrary tax reforms. Speci�cally, we can apply the theoretical

formulas of Theorem 1 to reforms that do not keep the CRP structure of the tax

27Indeed, by the envelope theorem, the performance-pay e�ect is (at least locally) only second-order relative to the crowding-out e�ect from a welfare point of view. Thus, the two e�ects do noto�set each other as they did when we studied the incidence on earnings and government revenue.

46

code.28 We specialize our theoretical analysis to such reforms and study the direction

of the crowding-out and the performance-pay e�ects in Appendix E. Here we propose

two quantitative experiments.

First, in Figure 4 we depict the incidence of canonical tax reforms on the earnings

schedule of a worker with mean ability and a performance-pay job. Speci�cally, we

consider a $100 increase in lump-sum transfer as well as a 1 percentage point increase

in marginal tax rates over a varying range of earnings, namely, for all earnings, for the

top 50% earnings, and for the top 10% earnings. Our robust �nding is that although

the crowding-out e�ect (dashed black curves) contributes to higher earnings risk,

it is mostly o�set by the performance-pay e�ect (dashed-dotted blue curves). In the

case of an additional lump-sum transfer, the performance-pay e�ect o�sets more than

50% of the impact of crowding-out on the variance of log-earnings. For a uniform

increase in marginal tax rates, the o�set is more than 90%. When marginal tax

rates are increased only for highest incomes, the o�set rate even exceeds 100%: the

performance-pay e�ect dominates the crowding-out e�ect and the earnings risk falls on

net. To understand why the o�set rate can be so high, recall that tax reforms which

increase progressivity generate larger e�ort responses of performance-pay workers

than reforms which spread the same tax burden in a more uniform manner � see

Lemmas 1 and 2 and the subsequent discussions. Increasing marginal tax rates over

a smaller range of high potential earnings � a progressive tax reform � generates a

relatively larger e�ort response and, hence, a more substantial performance-pay e�ect

in comparison to the crowding-out e�ect.

Second, in Figure 5, rather than focusing on a single earnings contract as in

the previous paragraph, we study the impact of an increase in the top marginal

tax rate in our calibrated economy with workers that are heterogeneous in ex-ante

ability. Speci�cally, we consider a 1 percentage point increase in the marginal tax

rate faced by the top 1% income earners, that is, above $441,000 in 2007 dollars. This

hypothetical top tax bracket is depicted by the vertical dotted line in the left panel.

The left (resp., right) panel gives the results as a function of mean earnings (resp.,

mean earnings percentile). This reform leads to a direct crowd-out which, ceteris

paribus, increases the earnings risk for all workers in the top 5% of mean earnings,

particularly so in the top 1%. This crowding-out is represented by the dashed black

28The only di�culty consists of computing the e�ort change a (θ) in response to these reforms.We do so by solving the �rst-order condition (7) numerically.

47

Figure 4: Incidence of tax reforms: crowding-out and performance-pay e�ects

(a) Increase of lump-sum transfer (b) Increase of mg. tax rates (uniform)

0 20 40 60 80 100

output shock percentile

0.004

0.002

0.000

0.002

0.004

log-

wag

e ad

just

men

t

wco( , ) wpp( , ) wco( , ) + wpp( , )

0 20 40 60 80 100


0.004

0.002

0.000

0.002

0.004

log-

wag

e ad

just

men

t

wco( , ) wpp( , ) wco( , ) + wpp( , )

(c) Increase of mg. tax rates (top 50%) (d) Increase of mg. tax rates (top 10%)

0 20 40 60 80 100


0.015

0.010

0.005

0.000

0.005

0.010

0.015

log-

wag

e ad

just

men

t

wco( , ) wpp( , ) wco( , ) + wpp( , )

0 20 40 60 80 100


0.015

0.010

0.005

0.000

0.005

0.010

0.015

log-

wag

e ad

just

men

t

wco( , ) wpp( , ) wco( , ) + wpp( , )

Note: Tax incidence computed for a worker with mean productivity. Panel (a) depicts an increase of the lump-sum

transfer by $100 (in 2007 dollars). Panels (b-d) depict an increase of the marginal tax rate by 1 pp for all earnings,

for the top 50% earnings and for the top 10% earnings, respectively. The performance-pay e�ect o�sets the impact

of the crowding-out e�ect on the variance of log earnings by 52% (a), 92% (b), 136% (c) and 314% (d).

48

Figure 5: Increase of the top tax rate: impact on earnings risk

(a) (b)

0 500 1000 1500 2000 2500 3000

mean earnings ($1,000)

0.006

0.004

0.002

0.000

0.002

0.004

0.006

chan

ge in

log-

earn

ings

risk

due to crowding-out effectdue to crowding-out and perf.-pay effects

80 85 90 95 99 100

mean earnings percentile

0.006

0.004

0.002

0.000

0.002

0.004

0.006

chan

ge in

log-

earn

ings

risk

due to crowding-out effectdue to crowding-out and perf.-pay effects

Note: Log-earnings risk is measured by V ar(log(w(θ, η) | θ). The vertical dashed line in the left panel indicates the

hypothetical top tax bracket threshold.

curves in both panels. However, the results change dramatically when we take into

account labor e�ort responses. The performance-pay e�ect more than o�sets the

crowding-out e�ect everywhere apart from the very top earners, leading to a lower

earnings risk for all workers below 99.5 percentile. Only the very top 0.5% experience

any net crowding-out, but even for them the performance-pay e�ect o�sets more than

70% of the additional pre-tax earnings risk.

Conclusion

We have set up and analyzed a tractable environment in which �rms provide work-

ers with endogenous private insurance against stochastic performance shocks in the

face of moral hazard frictions. The government uses the tax-and-transfer system to

redistribute income across workers who di�er in uninsurable innate ability. The key

feature of our model is that earnings risk is endogenous and has a productive role.

The main and surprising conclusion of our analysis is that standard models that ig-

nore the endogeneity of wage-rate risk actually come very close to evaluating the

incidence of taxes on earnings contract, as well as the optimal level of tax progres-

sivity. Underlying this result are two countervailing forces at play � a crowding-out

and a performance-pay e�ect � which prevent taxes from having a large impact on

the structure of performance-based compensation.

It would be interesting to extend our analysis in several directions. First, we only

considered the impact of taxes on compensation for already existing performance-pay

jobs. One could also model the incentives for �rms to create such performance-pay

49

jobs (rather than �normal� jobs) in the �rst place, and study the incidence of tax

reforms on the extensive margin of switching from one type of job to another. Sec-

ond, in our model, private markets are constrained e�cient and perfectly competitive.

In other words, we gave private markets their �best chance� in making government

policy redundant. Introducing frictions such as adverse selection in private markets �

whereby �rms cannot perfectly observe a worker's innate ability � and market power

are natural next steps. Third, our theoretical analysis delivers predictions regarding

the impact of various types of tax reforms on the structure of incentive-based com-

pensation via counteracting crowding-out and performance-pay e�ects. Testing these

predictions empirically should be particularly fruitful.

50

References

Ábrahám, Á., F. Alvarez-Parra, and S. Forstner (2016a): �The e�ects of

moral hazard on wage inequality in a frictional labor market,� Unpublished working

paper.

Ábrahám, Á., S. Koehne, and N. Pavoni (2016b): �Optimal income taxation

when asset taxation is limited,� Journal of Public Economics, 136, 14�29.

Ales, L., A. A. Bellofatto, and J. J. Wang (2017): �Taxing Atlas: Executive

compensation, �rm size, and their impact on optimal top income tax rates,� Review

of Economic Dynamics, 26, 62�90.

Ales, L., M. Kurnaz, and C. Sleet (2015): �Technical change, wage inequality,

and taxes,� The American Economic Review, 105, 3061�3101.

Ales, L. and C. Sleet (2016): �Taxing top CEO incomes,� American Economic

Review, 106, 3331�66.

Attanasio, O. and J.-V. R�os-Rull (2000): �Consumption smoothing in island

economies: Can public insurance reduce welfare?� European Economic Review, 44,

1225�1258.

Bandiera, O., I. Barankay, and I. Rasul (2011): �Field experiments with

�rms,� Journal of Economic Perspectives, 25, 63�82.

Bell, B. and J. Van Reenen (2014): �Bankers and their bonuses,� The Economic

Journal, 124, F1�F21.

Bird, A. (2018): �Taxation and executive compensation: Evidence from stock op-

tions,� Journal of Financial Economics, 127, 285�302.

Blomqvist, Å. and H. Horn (1984): �Public health insurance and optimal income

taxation,� Journal of Public Economics, 24, 353�371.

Bloom, N. and J. Van Reenen (2010): �Why do management practices di�er

across �rms and countries?� Journal of Economic Perspectives, 24, 203�24.

Carroll, G. and D. Meng (2016): �Robust contracting with additive noise,�

Journal of Economic Theory, 166, 586�604.

51

Chang, Y. and Y. Park (2017): �Optimal Taxation with Private Insurance,�

Rochester Center for Economic Research Working Paper.

Chetty, R. (2009): �Su�cient Statistics for Welfare Analysis: A Bridge Between

Structural and Reduced-Form Methods,� Annual Review of Economics, 1, 451�488.

�� (2012): �Bounds on elasticities with optimization frictions: A synthesis of

micro and macro evidence on labor supply,� Econometrica, 80, 969�1018.

Chetty, R. and E. Saez (2010): �Optimal taxation and social insurance with

endogenous private insurance,� American Economic Journal: Economic Policy, 2,

85�114.

Colombi, R. (1990): �A new model of income distribution: The pareto-lognormal

distribution,� in Income and wealth distribution, inequality and poverty, Springer,

18�32.

Cremer, H. and P. Pestieau (1996): �Redistributive taxation and social insur-

ance,� International Tax and Public Finance, 3, 281�295.

Cullen, J. B. and J. Gruber (2000): �Does unemployment insurance crowd out

spousal labor supply?� Journal of Labor Economics, 18, 546�572.

Cutler, D. M. and J. Gruber (1996a): �Does public insurance crowd out private

insurance?� The Quarterly Journal of Economics, 111, 391�430.

�� (1996b): �The e�ect of Medicaid expansions on public insurance, private in-

surance, and redistribution,� The American Economic Review, 86, 378�383.

Dale-Olsen, H. (2012): �Do Tax Reforms A�ect Firm Performance and Executive

Remuneration? Evidence from a Compressed Wage Environment,� Economica, 79,

493�515.

Doligalski, P. (2019): �Optimal Income Taxation and Commitment on the Labor

Market,� Working Paper.

Edmans, A. and X. Gabaix (2011): �Tractability in incentive contracting,� The

Review of Financial Studies, 24, 2865�2894.

52

�� (2016): �Executive compensation: A modern primer,� Journal of Economic

Literature, 54, 1232�87.

Edmans, A., X. Gabaix, and D. Jenter (2017): �Executive compensation: A

survey of theory and evidence,� in The Handbook of the Economics of Corporate

Governance, Elsevier, vol. 1, 383�539.

Edmans, A., X. Gabaix, T. Sadzik, and Y. Sannikov (2012): �Dynamic CEO

compensation,� The Journal of Finance, 67, 1603�1647.

Farhi, E. and I. Werning (2013): �Insurance and taxation over the life cycle,�

The Review of Economic Studies, 80, 596�635.

Findeisen, S. and D. Sachs (2016): �Education and optimal dynamic taxation:

The role of income-contingent student loans,� Journal of Public Economics, 138,

1�21.

Foster, A. D. and M. R. Rosenzweig (1994): �A test for moral hazard in the

labor market: Contractual arrangements, e�ort, and health,� The Review of Eco-

nomics and Statistics, 213�227.

Friedrich, B., L. Laun, C. Meghir, and L. Pistaferri (2019): �Earnings dy-

namics and �rm-level shocks,� Tech. rep., National Bureau of Economic Research.

Frydman, C. and D. Jenter (2010): �CEO compensation,� Annual Review of

Financial Economics, 2, 75�102.

Frydman, C. and R. S. Molloy (2011): �Does tax policy a�ect executive com-

pensation? Evidence from postwar tax reforms,� Journal of Public Economics, 95,

1425�1437.

Garrett, D. F. and A. Pavan (2015): �Dynamic managerial compensation: A

variational approach,� Journal of Economic Theory, 159, 775�818.

Golosov, M., N. Kocherlakota, and A. Tsyvinski (2003): �Optimal Indirect

and Capital Taxation,� Review of Economic Studies, 70, 569�587.

Golosov, M., M. Troshkin, and A. Tsyvinski (2016): �Redistribution and

social insurance,� American Economic Review, 106, 359�86.

53

Golosov, M. and A. Tsyvinski (2007): �Optimal taxation with endogenous in-

surance markets,� The Quarterly Journal of Economics, 122, 487�534.

Grigsby, J., E. Hurst, and A. Yildirmaz (2019): �Aggregate nominal wage

adjustments: New evidence from administrative payroll data,� Tech. rep., National

Bureau of Economic Research.

Guiso, L., L. Pistaferri, and F. Schivardi (2005): �Insurance within the �rm,�

Journal of Political Economy, 113, 1054�1087.

Heathcote, J., K. Storesletten, and G. L. Violante (2017): �Optimal tax

progressivity: An analytical framework,� The Quarterly Journal of Economics, 132,

1693�1754.

Heathcote, J. and H. Tsujiyama (2019): �Optimal Income Taxation: Mirrlees

Meets Ramsey,� Working Paper.

Hendren, N. (2015): �The Policy Elasticity,� in Tax Policy and the Economy, Vol-

ume 30, University of Chicago Press.

Holmstrom, B. and P. Milgrom (1987): �Aggregation and linearity in the pro-

vision of intertemporal incentives,� Econometrica: Journal of the Econometric So-

ciety, 303�328.

Hungerbühler, M., E. Lehmann, A. Parmentier, and B. Van der Linden

(2006): �Optimal redistributive taxation in a search equilibrium model,� The Re-

view of Economic Studies, 73, 743�767.

Kapicka, M. and J. Neira (2013): �Optimal taxation in a life-cycle economy

with endogenous human capital formation,� University of California, Santa Barbara

WP.

Kaplow, L. (1991): �Incentives and government relief for risk,� Journal of Risk and

Uncertainty, 4, 167�175.

Keane, M. P. (2011): �Labor supply and taxes: A survey,� Journal of Economic

Literature, 49, 961�1075.

Krueger, D. and F. Perri (2011): �Public versus private risk sharing,� Journal

of Economic Theory, 146, 920�956.

54

Laffont, J.-J. and J. Tirole (1986): �Using cost observation to regulate �rms,�


Lamadon, T. (2016): �Productivity shocks, long-term contracts and earnings dy-

namics,� manuscript, University of Chicago.

Lamadon, T., M. Mogstad, and B. Setzler (2019): �Imperfect competition,

compensating di�erentials and rent sharing in the US labor market,� Tech. rep.,

National Bureau of Economic Research.

Lazear, E. P. and P. Oyer (2010): �Personnel Economics,� Handbook of Organi-

zational Economics.

Lemieux, T., W. B. MacLeod, and D. Parent (2009): �Performance pay and

wage inequality,� The Quarterly Journal of Economics, 124, 1�49.

Levitt, S. D. and C. Syverson (2008): �Market distortions when agents are

better informed: The value of information in real estate transactions,� The Review

of Economics and Statistics, 90, 599�611.

Luenberger, D. G. (1997): Optimization by vector space methods, John Wiley &

Sons.

Makris, M. and A. Pavan (2017): �Taxation under learning-by-doing,� Tech. rep.,

Working Paper.

Mirrlees, J. A. (1971): �An Exploration in the Theory of Optimum Income Taxa-

tion,� The Review of Economic Studies, 38, 175�208.

Park, Y. (2014): �Optimal taxation in a limited commitment economy,� Review of

Economic Studies, 81, 884�918.

Piketty, T., E. Saez, and S. Stantcheva (2014): �Optimal taxation of top

labor incomes: A tale of three elasticities,� American Economic Journal: Economic

Policy, 6, 230�271.

Prendergast, C. (1999): �The provision of incentives in �rms,� Journal of Eco-

nomic Literature, 37, 7�63.

55

Raj, A. (2019): �Optimal income taxation in the presence of networks of altruism,�

Working Paper.

Rochet, J.-C. (1991): �Incentives, redistribution and social insurance,� The Geneva

Papers on Risk and Insurance Theory, 16, 143�165.

Rogerson, W. P. (1985): �Repeated Moral Hazard,� Econometrica, 53, 69�76.

Rose, N. L. and C. Wolfram (2002): �Regulating executive pay: Using the tax

code to in�uence chief executive o�cer compensation,� Journal of Labor Economics,

20, S138�S175.

Rothschild, C. and F. Scheuer (2013): �Redistributive Taxation in the Roy

Model,� The Quarterly Journal of Economics, 128, 623�668.

�� (2014): �A theory of income taxation under multidimensional skill heterogene-

ity,� NBER Working Paper No. 19822.

�� (2016): �Optimal taxation with rent-seeking,� The Review of Economic Stud-

ies, 83, 1225�1262.

Sachs, D., A. Tsyvinski, and N. Werquin (2020): �Nonlinear tax incidence and

optimal taxation in general equilibrium,� Econometrica, 88, 469�493.

Saez, E. (2001): �Using Elasticities to Derive Optimal Income Tax Rates,� Review

of Economic Studies, 68, 205�229.

Scheuer, F. and I. Werning (2016): �Mirrlees meets Diamond-Mirrlees,� NBER

Working Paper No. 22076.

�� (2017): �The taxation of superstars,� The Quarterly Journal of Economics,

132, 211�270.

Schoeni, R. F. (2002): �Does unemployment insurance displace familial assistance?�

Public Choice, 110, 99�119.

Shearer, B. (2004): �Piece rates, �xed wages and incentives: Evidence from a �eld

experiment,� The Review of Economic Studies, 71, 513�534.

Sleet, C. and H. Yazici (2017): �Taxation, redistribution and frictional labor

supply,� Working Paper.

56

Stantcheva, S. (2014): �Optimal income taxation with adverse selection in the

labour market,� Review of Economic Studies, 81, 1296�1329.

�� (2017): �Optimal taxation and human capital policies over the life cycle,�


57

A Proofs of Section 1

Concavity of the Utility of Earnings. Our analysis requires that the utility of

earnings w 7→ v (w) ≡ u (R (w)) is concave. It is easy to show that this is equivalent

to

π1 (w) π2 (w) > −γ (w) (46)

where γ (w) ≡ −R(w)u′′(R(w))u′(R(w))

is the agent's coe�cient of relative risk aversion, and

π1 (w) ≡ 1−T (w)/w1−T ′(w)

, π2 (w) ≡ wT ′′(w)1−T ′(w)

are two measures of the local rate of progressivity

of the tax schedule. Speci�cally, the parameterπ1 (w) is the ratio of the average and

marginal retention rates, and π2 (w) is (minus) the elasticity of the retention rate with

respect to income. If the tax schedule has a constant rate of progressivity p (CRP),

these variables are respectively equal to 11−p and p. Note that most of our analysis

is concerned with the incidence of tax reforms around a given initial tax schedule T .

In this case, (46) is a restriction on the initial tax code T and can be easily veri�ed

in the data for practical applications. Note moreover that the tax reform itself is not

restricted. When we characterize the optimal tax schedule within the CRP class, we

assume that u (c) = log c which implies that γ (w) = −1. It is easy to verify that in

this case condition (46) is always satis�ed regardless of the value of p.

Proof of Proposition 1. The proof of this proposition follows directly from the

results of Edmans and Gabaix (2011) since the utility of earnings v (·) is concave.

We give here a heuristic proof of the main arguments. Given the earnings contract

{w (θ, η) : y ∈ R}, an agent with ability θ and performance shock η chooses e�ort a (θ)

to maximize utility u (R (w (θ, η)))− h (a (θ)) (see equation (3)). Since y = θ (a+ η),

we have ∂w(θ,η)∂η

= ∂w(θ,η)∂a

so that the �rst-order condition reads

r (w (θ, η))u′ (R (w (θ, η)))∂w (θ, η)

∂η= h′ (a (θ)) . (47)

This equation pins down the slope of the earnings schedule that the �rm must im-

plement in order to induce the e�ort level a (θ). Integrating this incentive constraint

over η given a (θ) leads to

u (R (w (θ, η))) = h′ (a (θ)) η + k, (48)

58

for some constant k ∈ R. Since in equilibrium the participation constraint (4) must

hold with equality, the agent's expected utility must be equal to his reservation value

U (θ). Therefore, the value of k must be chosen by the �rm such that the agent's

participation constraint holds with equality. Imposing the participation constraint

with E [η] = 0 implies

k = U (θ) + h (a (θ)) . (49)

The previous two equations fully characterize the wage contract given the desired

e�ort level a (θ) and the reservation value U (θ). They imply that, for a given pair

(a (θ) , U (θ)), the wage given performance shock η satis�es:

u (R (w (θ, η))) = h′ (a (θ)) η + [U (θ) + h (a (θ))] . (50)

Next, equation (7) is obtained by taking the �rst-order condition with respect to a (θ)

in the �rm's problem (2), taking as given the earnings contract (6) required to satisfy

the workers' incentive and participation constraints (3, 4). Finally, equation (8) is

simply a rewriting of (5).

B Proofs of Section 2

Proof of Theorem 1. Recall that, given the e�ort level a and the reservation value

U (θ), earnings as a function of the noise realization η (or equivalently output y =

θ (a+ η)) satis�es:

u (w (θ, η)− T (w (θ, η))) = U (θ) + h (a (θ)) + h′ (a (θ)) η.

In response to the tax reform δT , the perturbed wage contract satis�es

u[w (θ, η) + δw (θ, η)− T (w (θ, η) + δw (θ, η))− δT (w (θ, η))

]= U (θ) + δU (θ) + h (a (θ) + δa (θ)) + h′ (a (θ) + δa (θ)) η.

59

Di�erentiating with respect to δ and evaluating at δ = 0 leads to[(1− T ′ (w (θ, η))) w (θ, η)− T (w (θ, η))

]u′ (w (θ, η)− T (w (θ, η)))

= U (θ) + [h′ (a (θ)) + h′′ (a (θ)) η] a (θ) .

Solving for w (θ, η) leads to

w (θ, η) =T (w (θ, η))

r (w (θ, η))+

U (θ)

r (w (θ, η))u′ (R (w (θ, η)))

+h′ (a (θ)) + h′′ (a (θ)) η

r (w (θ, η))u′ (R (w (θ, η)))a (θ) .

Adding and subtracting w (θ, η) a(θ)a(θ)

, that is, the earnings adjustment obtained in the

model with exogenous risk, and substituting expression (17) derived below for the

impact of the reform on expected utility U (θ), easily yields (14).

Proof of Proposition 2. Recall that the reservation value satis�es E [w (θ, η)] =

θa (θ). Hence, in response to the tax reform, we get

E [w (θ, η) + δw (θ, η)] = θ (a (θ) + δa (θ)) ,

that is, E [w (θ, η)] = θa (θ). Substituting expression (14) for w (θ, η) in this equation

leads to

θa (θ) = E

[T (w (θ, η))

r (w (θ, η))

]+ E

[1

r (w (θ, η))u′ (R (w (θ, η)))

]U (θ)

+E[

h′ (a (θ)) + h′′ (a (θ)) η

r (w (θ, η))u′ (R (w (θ, η)))

]a (θ) .

Using equation (7) that de�nes optimal e�ort and solving for U (θ) leads to (17).

Proof of Lemma 1. Recall the optimal e�ort condition

E[

h′ (a (θ)) + h′′ (a (θ)) η

(1− T ′ (w (θ, η)))u′ (w (θ, η)− T (w (θ, η)))

]= θ.

60

Following a tax reform T , the perturbed level of e�ort satis�es

E

h′ (a (θ) + δa (θ)) + h′′ (a (θ) + δa (θ)) η{1− T ′ (w (θ, η))− δT ′ (w (θ, η))

}u′{R (w (θ, η))− δT (w (θ, η))

} = θ,

where we denote w (θ, η) ≡ w (θ, η)+δw (θ, η). Taking the derivative of this expression

with respect to δ evaluated at δ = 0 gives

0 = E[

h′′ (a (θ)) + h′′′ (a (θ)) η

r (w (θ, η))u′ (R (w (θ, η)))a (θ)

]− E

[h′ (a (θ)) + h′′ (a (θ)) η

{r (w (θ, η))u′ (R (w (θ, η)))}2 (51)

×{[−T ′′ (w) w − T ′ (w)

]u′ (R (w)) + r (w)

[r (w) w − T (w)

]u′′ (R (w))

}]where the arguments (θ, η) have been removed from the second line for notational

conciseness. Suppose that the utility function is quasilinear in consumption, and the

tax schedule is initially a�ne. Equation (51) can then be rewritten as

0 = E[h′′ (a (θ)) + h′′′ (a (θ)) η

1− τa (θ)

]+ E

[h′ (a (θ)) + h′′ (a (θ)) η

(1− τ)2 T ′ (w (y | θ))]

=h′′ (a (θ))

1− τa (θ) +

h′ (a (θ))

(1− τ)2 E[T ′ (w (y | θ))

]+h′′ (a (θ))

(1− τ)2 E[ηT ′ (w (y | θ))

].

Solving for a (θ) and letting ε (θ) = h′(a(θ))a(θ)h′′(a(θ))

easily leads to (18).

Proof of Proposition 3. Solving for a (θ) in equation (51) implies that the Gateaux

derivative of e�ort is given by

a (θ) = −E

[εa,R (θ, η)

T (w (θ, η))

r (w (θ, η))w (θ, η)

]− E

[εa,r (θ, η)

T ′ (w (θ, η))

r (w (θ, η))

]

+E[(εa,R (θ, η)− p (w (θ, η)) εa,r (θ, η))

w (θ, η)

w (θ, η)

],

where p (w) ≡ wT ′′(w)1−T ′(w)

is the local rate of progressivity of the tax schedule, and where

εa,R, εa,r denote the income e�ect parameter and compensated elasticity along the

61

linearized budget constraint, equal to

εa,r (θ, η) =h′ (a (θ))

a (θ)h′′ (a (θ))

(1 + h′′(a(θ))

h′(a(θ))η)

1r(w(θ,η))u′(R(w(θ,η)))

E[(

1 + h′′′(a(θ))h′′(a(θ))

η)

1r(w(θ,η))u′(R(w(θ,η)))

]and

εa,R (θ, η) =h′ (a (θ))

a (θ)h′′ (a (θ))

(1 + h′′(a(θ))

h′(a(θ))η)w(θ,η)u′′(R(w(θ,η)))

(u′(R(w(θ,η))))2

E[(

1 + h′′′(a(θ))h′′(a(θ))

η)

1r(w(θ,η))u′(R(w(θ,η)))

] .Now substitute equations (14, 17) for w (θ, η) in the previous equation, and solve for

a (θ) to get{1− E

[(εa,R (θ, η)− p (w (θ, η)) εa,r (θ, η))

h′ (a (θ)) + h′′ (a (θ)) η

v′ (w (θ, η))w (θ, η)

]}a (θ)

= −E

[εa,R (θ, η)

T (w (θ, η))

(1− T ′ (w (θ, η)))w (θ, η)

]− E

[εa,r (θ, η)

T ′ (w (θ, η))

1− T ′ (w (θ, η))

]

+E

[(εa,R (θ, η)− p (w (θ, η)) εa,r (θ, η))

T (w (θ, η))

r (w (θ, η))w (θ, η)

]

−E

[(εa,R (θ, η)− p (w (θ, η)) εa,r (θ, η))

1v′(w(θ,η))w(θ,η)

E[(v′ (w (θ, ·)))−1]

]E

[T (w (θ, η))

r (w (θ, η))

].

Collecting terms leads to

a (θ)

a (θ)= −E

[εEw,R (θ, η)

T (w (θ, η))

r (w (θ, η))w (θ, η)

]− E

[εEw,r (θ, η)

T ′ (w (θ, η))

r (w (θ, η))

]

where the income e�ect parameter and compensated elasticity now account for the

nonlinearity of the budget constraint (due to the fact that w depends on T ) and the

endogeneity of the reservation value (due to the fact that w depends on U) and are

given by

εEw,R (θ, η) =

p (w (θ, η)) εa,r (θ, η) +E[(εa,R(θ,·)−p(w(θ,·))εa,r(θ,·)) 1

v′(w(θ,·))w(θ,·)

]E[

1v′(w(θ,·))

] w (θ, η)

1 + E[(p (w (θ, η′)) εa,r (θ, η′)− εa,R (θ, η′))

(1 + h′′(a(θ))

h′(a(θ))η′)

a(θ)h′(a(θ))v′(w(θ,η′))w(θ,η′)

]

62

and

εEw,r (θ, η) =εa,r (θ, η)

1 + E[(p (w (θ, η′)) εa,r (θ, η′)− εa,R (θ, η′))

(1 + h′′(a(θ))

h′(a(θ))η′)

a(θ)h′(a(θ))v′(w(θ,η′))w(θ,η′)

] .This concludes the proof.

C Allowing for Non-Constant E�ort

In this section we extend our tax incidence analysis to the case where the �rm can o�er an

e�ort schedule a(θ, η), rather than imposing a constant e�ort level a(θ) as in the main body

of the paper. The �rm which employs a worker with productivity θ solves

maxw(θ,·),a(θ,·)

E[θa(θ, η)− w(θ, η)]

subject to incentive-compatibility constraints

a(θ, η) ∈ argmaxa

v(w(θ, η))− h(a(θ, η)) for all η,

and the participation constraint

E [v(w(θ, η))− h(a(θ, η))] ≥ U(θ).

From Edmans and Gabaix (2011) we know that the optimal contract satis�es

v(w(θ, η)) = K + h(a(θ, η)) +

ˆ η

ηh′(a(θ, x))dx,

where K ∈ R. Using the binding participation constraint to solve for K, we obtain

v(w(θ, η)) = U(θ) + h(a(θ, η)) +

ˆ η

ηh′(a(θ, x))dx− E

[ˆ η

ηh′(a(θ, x))dx

]. (52)

Unlike in the model with a single e�ort level, the slope of the ex-post utility potentially

varies with performance. In particular, when the e�ort schedule is di�erentiable, we have

∂v(w(θ, η))

∂η= h′(a(θ, η))

(1 +

∂a(θ, η)

∂η

).

Edmans and Gabaix (2011) show that all incentive-compatible e�ort schedules are such that

63

a(θ, η) + η is increasing with η. This implies that the above slope is non-negative and that

earnings are increasing with performance.

The �rst-order condition with respect to e�ort level a(θ, η′), assuming an interior solu-

tion, reads

θfη(η′) =

∂E [w(θ, η)]

∂a(θ, η′), (53)

where fη denotes the density of the performance shock η. To compute the derivative of the

expected wage on the right-hand side, �rst consider the derivative of the wage w(θ, η) with

respect to the e�ort level a(θ, η′), which we can compute using equation (52):

∂w(θ, η)

∂a(θ, η′)=

−(1− Fη(η′))h

′′(a(θ,η′))v′(w(θ,η)) if η < η′,

h′(a(θ,η′))v′(w(θ,η′)) − (1− Fη(η′))h

′′(a(θ,η′))v′(w(θ,η′)) if η = η′,

h′′(a(θ,η′))v′(w(θ,η)) − (1− Fη(η′))h

′′(a(θ,η′))v′(w(θ,η)) if η > η′.

(54)

Notice that increasing the e�ort level conditional on the output shock η′ requires lowering

earnings for worse performance (η < η′) and increasing earnings for better performance

(η > η′). Taking expectations over η yields

∂E [w(θ, η)]

∂a(θ, η′)=

h′(a(θ, η′))

v′(w(θ, η′))fη(η

′)

+(1− Fη(η′))(E[h′′(a(θ, η′))

v′(w(θ, η))| η ≥ η′

]− E

[h′′(a(θ, η′))

v′(w(θ, η))

]).

Plugging this expression into the �rst-order condition (53) yields the following �rst-order

condition with respect to e�ort:

θfη(η′) =

h′(a(θ, η′))

v′(w(θ, η′))fη(η

′) (55)

+(1− Fη(η′))(E[h′′(a(θ, η′))

v′(w(θ, η))| η ≥ η′

]− E

[h′′(a(θ, η′))

v′(w(θ, η))

]).

The left-hand side is the marginal bene�t from providing higher e�ort, equal to the expected

output gain. The right-hand side consists of two terms: the marginal rate of substitution

(MRS) and the marginal cost of incentives (MCI). The MRS is the expected wage cost of

compensating the agent for higher e�ort in contingency η′. The MCI, on the other hand, is

the expected wage cost of making the adjusted e�ort schedule incentive-compatible. As we

noted above, increasing e�ort level conditional on output η′ requires increasing earnings for

better performance and reducing them for worse performance. Since earnings are increas-

ing with performance and v is concave, such earnings adjustments are costly for the �rm:

64

MCI ≥ 0.

The following theorem extends the results of Theorem 1 to the model where non-

degenerate e�ort schedules are allowed.

Theorem 3. Denote by a (θ, η) the change in e�ort schedule induced by the reform. Suppose

that the original e�ort schedule is such that a(θ, η) > 0 for all η. The �rst-order e�ect of

the tax reform T on earnings w (θ, ·) is given by

w (θ, η) = wex (θ, η) + wco (θ, η) + wpp (θ, η)

where wex (θ, η) = θa(θ, η), the crowding-out e�ect wco has mean zero and is given by

wco (θ, η) =T (w (θ, η))

r (w (θ, η))− (v′ (w (θ, η)))−1

E[(v′ (w (θ, ·)))−1

] E[ T (w (θ, ·))r (w (θ, ·))

]

and the performance-pay e�ect wpp has mean zero and is given by

wpp (θ, η) =h′(a(θ, η))

v′(w(θ, η))a(θ, η) +

ˆ η

η

h′′(a(θ, x))

v′(w(θ, η))a(θ, x)dx

−E

[ˆ η′

η

h′′(a(θ, x))


]− θa(θ, η).

Proof. Consider a reform R = −T . Following the same steps as in the proof of Theorem 1,

we can show that the change in wages is equal to

w(θ, η) = −R(w(θ, η))r(w(θ, η))

+U(θ)

v′(w(θ, η))+

ˆ η

η

∂w(η)

∂a(η′)a(η′)dη′. (56)

Using prior results on ∂w(η)∂a(η′) from equation (54), we obtain

ˆ η

η

∂w(η)

∂a(η′)a(η′)dη′ =

h′(a(θ, η))

v′(w(θ, η))a(θ, η)

+

ˆ η

η

h′′(a(θ, x))

v′(w(θ, η))a(θ, x)dx− E

[ˆ η′

η

h′′(a(θ, x))


].

65

Note that the expected value of the above expression is

ˆ η

η

ˆ η

η

∂w(θ, η)

∂a(θ, η′)a(θ, η′)dη′dFη(η) =

ˆ η

η

ˆ η

η

∂w(θ, η)

∂a(θ, η′)dFη(η)a(θ, η

′)dη′

=

ˆ η

ηfη(η

′)θa(θ, η′)dη′ = E[θa(θ, η′)]

where in the �rst equality we changed the order of integration and in the second equality

we applied the �rst-order condition for e�ort (53). This implies that the performance-pay

e�ect has mean zero. By the free-entry condition we have E[θa(η′)] = E[w(η′)]. Hence,

the �rst two terms of (56) � the crowding-out e�ect � have mean zero. It follows that

U(θ) = E[R(w(θ,η))r(w(θ,η))

]E[v′(w(θ, η))−1

]−1.

It follows from Theorem 3 that the crowding-out e�ect wco (θ, η) is exactly the

same as in the simpler setting studied in the main body of the paper (equation (15)).

The performance-pay e�ect wpp (θ, η) is more complex than in the simpler model

(equation (16)). However, it is a natural extension of the expression obtained under

the constant-e�ort assumption. Namely, the only substantial di�erence is that the

term h′′(a(θ))a(θ)v′(w(θ,η)) η from (16), which measures the change in earnings necessary to elicit

higher e�ort, is now replaced by the more general expression´ ηηh′′(a(θ,η′))a(θ,η′)

v′(w(θ,η)) dη′. The

interpretation of this term is analogous to its counterpart in the simpler model. The

only added di�culty is that we now have to evaluate the change in the entire e�ort

schedule a(θ, ·) in response to the reform, rather than a scalar value a(θ).

D Proofs of Section 3

Proof of Theorem 2. Equation (14) implies that the excess burden of the reform

T is equal to

EB(T, T ) = −ˆ

Θ

E [T ′ (w (θ, η)) wex (θ, η)] dF (θ)

−ˆ

Θ

E

[T ′ (w (θ, η))

(T (w (θ, η))

r (w (θ, η))+

U (θ)

v′ (w (θ, η))

)]dF (θ)

−ˆ

Θ

E[T ′ (w (θ, η))

(h′ (a (θ)) + h′′ (a (θ)) η

v′ (w (θ, η))− w (θ, η)

a (θ)

)]a (θ) dF (θ) ,

66

where U (θ) is given by (17). Since

E

[T (w (θ, η))

r (w (θ, η))+

U (θ)

v′ (w (θ, η))

]= E

[h′ (a (θ)) + h′′ (a (θ)) η

v′ (w (θ, η))− w (θ, η)

a (θ)

]= 0,

we can rewrite the excess burden EB(T, T ) as

−ˆ

Θ

E [T ′ (w (θ, η)) wex (θ, η)] dF (θ)

−ˆ

Θ

Cov

(T ′ (w (θ, η)) ,

T (w (θ, η))

r (w (θ, η))+

U (θ)

v′ (w (θ, η))

)dF (θ)

−ˆ

Θ

Cov

(T ′ (w (θ, η)) ,

[h′ (a (θ)) + h′′ (a (θ)) η

v′ (w (θ, η))− w (θ, η)

a (θ)

]a (θ)

)dF (θ) .

This expression easily leads to equation (28).

Next, the welfare gain of the tax reform is given by

WG(T, T ) = −1

λ

ˆΘ

α (θ)1

E[(v′ (w (θ, η)))−1]E

[T (w (θ, η))

r (w (θ, η))

]dF (θ)

= −ˆ

Θ

E

[1

λα (θ)

1

E[(v′ (w (θ, ·)))−1] T (w (θ, η))

r (w (θ, η))

]dF (θ)

= −ˆ

Θ

E

[1

λα (θ)

1r(w(θ,η))u′(R(w(θ,η)))

E[(v′ (w (θ, ·)))−1]u′ (R (w (θ, η))) T (w (θ, η))

]dF (θ) .

This leads to equation (29).

When taxes are unrestricted, the marginal value of public funds λ can be ob-

tained as follows. Consider a uniform lump-sum transfer, represented by the reform

T ∗∗ (w) = −1 for all w. Denoting by a∗∗ (θ) the e�ect of this reform on e�ort (via a

pure income e�ect), it a�ects government revenue by R(T, T ∗∗) equal to

1 +

ˆΘ

E

T ′ (w (θ, η))

− 1

r (w (θ, η))+

1v′(w(θ,η))

E[

1v′(w(θ,·))

]E [ 1

r (w (θ, ·))

] dF (θ)

+

ˆΘ

E[T ′ (w (θ, η))

(h′ (a (θ)) + h′′ (a (θ)) η

v′ (w (θ, η))

)]a∗∗ (θ) dF (θ) .

Consider now the reform in direction T ∗∗ (w) = −1, normalized to reduce government

67

revenue by 1 dollar after all behavioral responses have been accounted for. This reform

is represented by T ∗ = −1

|R(T,T ∗∗)| . Its e�ect on government revenue is R(T, T ∗) = −1

by construction, and its e�ect on social welfare is given by

λ ≡ W(T, T ∗)∣∣∣R(T, T ∗∗)∣∣∣ =

1∣∣∣R(T, T ∗∗)∣∣∣ˆ

Θ

α (θ)1

E[

1v′(w(θ,η))

]E [ 1

r (w (θ, η))

]dF (θ) .

Finally, we show that the optimal tax schedule must satisfy equation (27) for

any tax reform T . To do so, consider an arbitrary tax reform T , normalized with-

out loss of generality so that its mechanical e�ect is equal to 1 dollar, that is,´E[T (w (θ, η))

]dF (θ) = 1. Denote its e�ect on government revenue by R(T, T )

and its e�ect on social welfare by W(T, T ). Redistribute any tax revenue gain (or

levy any tax revenue loss) from this reform via the reform T ∗ described in the previous

paragraph, that is, a uniform lump-sum transfer that reduces government budget by

1 dollar. The tax reform

T + R(T, T )T ∗ ≡ T + R(T, T )−1∣∣∣R(T,−1)

∣∣∣is, by construction, budget-neutral. Its e�ect on social welfare is given by

W(T, T ) + R(T, T )W(T, T ∗)∣∣∣R(T,−1)

∣∣∣ = W(T, T ) + λR(T, T ).

Of course, the marginal value of public funds λ is the Lagrange multiplier on the

government budget constraint. Now, the optimal tax schedule is such that every

budget-neutral tax reform has a zero e�ect on social welfare (see, for instance, Luen-

berger (1997)), that is, for all T ,

W(T, T ) + λR(T, T ) = 0.

But recall that 1λW(T, T ) = WG(T, T ) and R(T, T ) = 1− EB(T, T ), by de�nition of

the welfare gains and the excess burden. This immediately implies the characteriza-

tion (27).

68

E Proofs of Section 4

Proof of Corollary 1. Suppose that the tax schedule is CRP, so that R (w) =1−τ1−pw

1−p. Equation (48) then implies that in order to induce agents with ability

θ to choose the same e�ort a regardless of their noise realization η, the earnings

contract must satisfy:

log (w (θ, η)) =a

1ε

1− pη − 1

1− plog

(1− τ1− p

)+

k

1− p, (57)

for some k ∈ R. Thus, log-earnings are linear in the performance shock η = yθ− a

that the �rm infers upon observing realized output y. Imposing that the agent's

participation constraint holds with equality pins down the value of k as a function of

U (θ). Namely, equation (49) implies:

k = U (θ) +1

1 + 1ε

a1+ 1ε

and hence

log (w (θ, η)) =a

1ε

1− pη +

1

1− p1

1 + 1ε

a1+ 1ε − 1

1− plog

(1− τ1− p

)+U (θ)

1− p. (58)

Below we derive the equilibrium value of the reservation utility U (θ) and obtain the

equilibrium wage given (a, η):

log (w (θ, η)) = log (θa) +a

1ε

1− pη − 1

2

(a

1ε

1− p

)2

σ2η. (59)

De�ne the sensitivity of the before-tax and after-tax wages to output in the optimal

contract by the semi-elasticities ψ (θ, η) ≡ 1w(θ,η)

∂w(θ,η)∂η

and ψc (θ, η) ≡ 1R(w(θ,η))

∂R(w(θ,η))∂η

,

respectively. We have ψ (θ, η) = a1/ε

1−p and ψc (θ, η) = a1/ε. Both ψ (θ, η) and ψc (θ, η)

depend on the tax schedule through its e�ect on optimal e�ort, and there is an addi-

tional crowding-out e�ect on the before-tax sensitivity.

69

Next, since v′ (w) = r(w)R(w)

= 1−pw, the �rm's �rst-order condition reads

θ = E[

h′ (a)

v′ (w (θ, η))+

h′′ (a)

v′ (w (θ, η))η

]=

a1ε

1− pE [w (θ, η)] +

1

ε

a1ε−1

1− pE [w (θ, η) η] .

We have

E [w (θ, η)] = E[ea1ε

1−pη

]e

11−p

1

1+1εa1+

1ε− 1

1−p log( 1−τ1−p)+

U(θ)1−p

= e12

a2ε

(1−p)2σ2ηe

11−p

1

1+1εa1+

1ε− 1

1−p log( 1−τ1−p)+

U(θ)1−p .

where we used the fact that that η is normally distributed with mean 0 and variance

σ2η so that E [exη] = e

12x2σ2

η for any x. Moreover, we have E [ηexη] = xσ2e12x2σ2

η for

any x. Indeed, let ϕ the (normal) pdf of η. We have ϕ′ (η) = − ησ2ηϕ (η), so that

E [ηexη] =´ηexηϕ (η) dη = −σ2

η

´exηϕ′ (η) dη = xσ2

η

´exηϕ (η) dη = xσ2

ηe12x2σ2

η , where

the third equality follows from an integration by parts.

E [w (θ, η) η] = E[ηe

a1ε

1−pη

]e

11−p

1

1+1εa1+

1ε− 1

1−p log( 1−τ1−p)+

U(θ)1−p

=a

1ε

1− pσ2ηe

12

a2ε

(1−p)2σ2ηe

11−p

1

1+1εa1+

1ε− 1

1−p log( 1−τ1−p)+

U(θ)1−p .

Plugging these expressions into the �rm's �rst order condition leads to

θa =

[a1+ 1

ε

1− p+

1

ε

a2ε

(1− p)2σ2η

]e

12

a2ε

(1−p)2σ2ηe

11−p

1

1+1εa1+

1ε− 1

1−p log( 1−τ1−p)+

U(θ)1−p

and hence

a1+ 1ε

1− p+

1

ε

a2ε

(1− p)2σ2η = θae

− 11−p

1

1+1εa1+

1ε− 1

2a2ε

(1−p)2σ2η+ 1

1−p log( 1−τ1−p)−

U(θ)1−p

Now use the free-entry condition: equation (5) and the expression derived above for

70

E [w (y | θ)] leads to

e1

1−p1

1+1εa1+

1ε+ 1

2a2ε

(1−p)2σ2η− 1

1−p log( 1−τ1−p)+

U(θ)1−p = θa. (60)

Combining this equation with the �rst-order condition for optimal e�ort therefore

leads to:

a1+ 1ε +

1

ε

a2ε

1− pσ2η = 1− p. (61)

Using the de�nition ψ ≡ a1ε

1−p for the pass-through easily leads to (31). Note that if

ε = 1, we obtain optimal e�ort in closed form:

a =

(1

1− p+

σ2

(1− p)2

)−1/2

. (62)

Finally, taking logs in equation (60) and de�ning ψ ≡ a1ε

1−p easily leads to (32).

Proof of Corollary 2. Consider a tax reform that marginally raises the rate of pro-

gressivity p by a small amount δ → 0. The direction T of this tax reform satis�es(w − 1− τ

1− p− δw1−p−δ

)−(w − 1− τ

1− pw1−p

)= δT (w) + o (δ) .

This leads to the representation (34).

Di�erentiating equation (61) with respect to (1− p) leads to[(1 +

1

ε

)a

1ε +

2σ2η

(1− p) ε2a

2ε−1

]∂a

∂ (1− p)−

σ2η

(1− p)2 εa

2ε = 1,

and hence [(1 +

1

ε

)a

1ε

+1 +2σ2

η

(1− p) ε2a

2ε

]εa,1−p −

σ2η

(1− p) εa

2ε = 1− p.

Using the �rst-order condition again to substitute for 1− p leads to

εa,1−p =a

1ε

+1 +2σ2η

(1−p)εa2ε(

1 + 1ε

)a

1ε

+1 +2σ2η

(1−p)ε2a2ε

.

71

We conclude by expressing this elasticity in terms of the pass-through elasticities. We

have ψ = a1ε

1−p and εψ,a = 1ε. We can thus write

εa,1−p =a

1ε

+1 + 2 (1− p) εψ,aψ2σ2η(

1 + 1ε

)a

1ε

+1 + 2ε

(1− p) εψ,aψ2σ2η

.

But the �rst-order condition for labor e�ort reads

a1+1/ε = (1− p)(1− εψ,aψ2σ2

η

).

Substituting into the previous equation and rearranging terms leads to

εa,1−p =1 + εψ,aψ

2σ2η(

1 + 1ε

)+(

1ε− 1)εψ,aψ2σ2

η

.

This easily yields equation (35).

Expression (36) for wex (θ, η) is immediate since aa

= 11−pεa,1−p by de�nition. To

obtain the performance-pay e�ect (38), we show that for any (not necessarily CRP)

tax reform T ,

wpp (θ, η) = εψ,a(ψη − ψ2σ2

η

)wex (y | θ) . (63)

This equation implies that wpp (θ, η) has mean zero, but is dispersed around the mean

whenever εψ,a > 0, since the map η 7→ ψη− ψ2σ2η is strictly increasing. To prove this

equation, note that

h′ (a (θ)) + h′′ (a (θ)) η

v′ (w (θ, η))a (θ) =

R (w (θ, η))

r (w (θ, η))

(a1+ 1

ε +1

εa

1ε η

)a

a

=1

1− pw (θ, η)

[a1+ 1

ε + (1− p) εψ,aψη] aa

=1

1− pw (θ, η)

[(1− p)

(1− εψ,aψ2σ2

η

)+ (1− p) εψ,aψη

] aa

=[1 + εψ,a

(ψη − ψ2σ2

η

)]w (θ, η)

a

a,

where the third equality uses the �rst-order condition for labor e�ort.

72

Next, we compute the crowding-out e�ect wco (θ, η). We have

T (w (θ, η))

r (w (θ, η))=

1

(1− τ) (w (θ, η))−p

(log (w (θ, η))− 1

1− p

)1− τ1− p

(w (θ, η))1−p

=

(ln (θa) + ψη − 1

2ψ2σ2

η −1

1− p

)1

1− pw (θ, η) ,

and

U (θ)

v′ (w (θ, η))= −

1v′(w(θ,η))

E[

1v′(w(θ,·))

]E[ T (w (θ, ·))r (w (θ, ·))

]

= −1

1−pw (θ, η)

E[

11−pw (θ, ·)

]E [(log (w (θ, ·))− 1

1− p

)1

1− pw (θ, ·)

]

= − 1

1− pw (θ, η)

[ln (θa) +

1

2ψ2σ2

η −1

1− p

].

Summing these expressions yields equation (37).

Summing all the e�ects, we can easily verify that the incidence of the reform is

given by

∂ logw (θ, η)

∂ (1− p)= εa,1−p + (εψ,aεa,1−p + εψ,1−p)

(ψη − ψ2σ2

η

),

which is the expression as we would obtain by directly di�erentiating logw (θ, η) =

log (θa) + ψη − 12ψ2σ2

η.

Note that the earnings adjustment wi (θ, η) contributes to raising the sensitivity

of log-earnings to performance shocks (pass-through function) i�

∂

∂ηlog (w (θ, η) + δwi (θ, η))− ∂

∂ηlog (w (θ, η)) > 0.

For δ close enough to zero this inequality is equivalent to

∂w (θ, η)

∂η> wi (θ, η)

∂ log (w (θ, η))

∂η.

73

The expressions derived above imply

∂wpp (θ, η)

∂η=

wpp (θ, η)

wex (θ, η)

∂wex (θ, η)

∂η+ wpp (θ, η)

ψ

ψη − ψ2σ2η

= wpp (θ, η)∂ log (w (θ, η))

∂η+ ψεψ,aw (θ, η)

a

a.

Thus, since a < 0, we obtain ∂wpp(θ,η)

∂η< wpp (y | θ) ∂ log(w(θ,η))

∂ηand the reform lowers

the sensitivity of log-earnings to output. Analogously, the crowding-out e�ect lowers

raises the sensitivity of log-earnings since εψ,1−p < 0.

Finally, we derive equation (40). We have

wco (θ, η)

wpp (θ, η)=

− 11−pεψ,1−p

(ψη − ψ2σ2

η

)w (θ, η)

− 11−pεψ,aεa,1−p

(ψη − ψ2σ2

η

)w (θ, η)

=εψ,1−p

εψ,aεa,1−p= − ε

εa,1−p

=o(ψ2σ2

η)− (1 + ε)

(1 +

1− ε1 + ε

1

εψ2σ2

η −1

εψ2σ2

η

)= 2ψ2σ2

η − ε− 1.

where the second to last equality uses equation (35).

Proof of Corollary 3. Suppose that Assumption 2 holds, and that ability types

are lognormally distributed, that is, log θ ∼ N (µθ, σ2θ). We substitute formula (34)

for the tax reform T in equations (28) and (29) to compute each term of the excess

burden and the welfare gains of marginally raising progressivity. The algebra is

straightforward but tedious. It is available upon request and we only summarize our

results here. The mechanical e�ect of the progressive tax reform is equal to

ˆΘ

E[T (w (θ, η))

]dF (θ) =

[µθ + ln a− 1

1− p+ (1− p)σ2

θ +

(1

2− p)ψ2σ2

]C

where C denotes aggregate consumption and is given by

C ≡ˆ

Θ

E [R (w (θ, η))] dF (θ) =1− τ1− p

a1−pe(1−p)µθ+ 12

(1−p)2σ2θe−

12p(1−p)ψ2σ2

η .

74

Let Y denotes aggregate output, given by

Y ≡ˆ

Θ

(θa) dF (θ) = aeµθ+σ2θ2 .

Note that in our baseline economy with government expenditures G, we have Y =

C +G. The excess burden of the tax reform in the model with exogenous risk (that

is, the �rst integral in (28)) is given by

ˆΘ

E[T ′ (w (θ, η))w (θ, η)

a (θ)

a (θ)

]dF (θ) = − 1

1− p(Y − (1− p)C) εa,1−p.

The �scal externality due to the �rst component of crowding-out is given b

ˆΘ

Cov

(T ′ (w (θ, η)) ,

T (w (θ, η))

r (w (θ, η))

)dF (θ)

=(epψ

2σ2η − 1

)[µθ + ln a− 1

1− p+ (1− p)σ2

θ

]C +

(epψ

2σ2η − 1 + 2p

) 1

2ψ2σ2

ηC.

The �scal externality due to the second element of crowding-out is given by

ˆΘ

Cov

(T ′ (w (θ, η)) ,

U (θ)

v′ (w (θ, η))

)dF (θ)

= −(epψ

2σ2η − 1

)[µθ + ln a− 1

1− p+ (1− p)σ2

θ

]C −

(epψ

2σ2η − 1

) 1

2ψ2σ2

ηC.

Thus the total �scal externality from the crowding-out e�ect is equal to pψ2σ2ηC. The

�scal externality due to the performance-pay e�ect is given by

ˆΘ

Cov

(T ′ (w (θ, η)) ,

h′ (a (θ)) + h′′ (a (θ)) η

v′ (w (θ, η))a (θ)− wex (θ, η)

)dF (θ)

= −p1

εψ2σ2

ηεa,1−pC.

Suppose that the social welfare weights are given by α (θ) = e−α log θ´e−α log θ′dF (θ′)

for all θ,

75

for some α ≥ 0. Then the e�ect of the tax reform on social welfare is given by

−ˆ

Θ

E

[(v′ (w (θ, η)))−1

E[(v′ (w (θ, ·)))−1]α (θ)u′ (R (w (θ, η))) T (w (θ, η))

]dF (θ)

= −µθ − ln a+1

1− p− 1

2ψ2σ2

η + ασ2θ .

Third, we compute the marginal value of public funds λ in the loglinear model,

when the tax code is restricted to the CRP class. To do so, �rst consider a reform of

the parameter τ , represented by formula (64). By de�nition, the parameter lambda

is the e�ect on social welfare caused by a tax reform in this direction, normalized to

raise government revenue by 1 dollar. The mechanical e�ect of the (non-normalized)

reform (64) is equal to

ˆΘ

E[T (w (θ, η))

]dF (θ) =

C

1− τ.

Since the elasticity of labor e�ort is εa,1−τ = 0, the standard excess burden and the

�scal externality caused by the performance-pay e�ect are both equal to zero,

ˆΘ

E [T ′ (w (θ, η)) wex (θ, η)] dF (θ)

=

ˆΘ

Cov (T ′ (w (θ, η)) , wpp (θ, η)) dF (θ) = 0.

The �scal externalities caused by the two elements of crowding-out are given by

ˆΘ

Cov

(T ′ (w (θ, η)) ,

T (w (θ, η))

r (w (θ, η))

)dF (θ)

= −ˆ

Θ

Cov

(T ′ (w (θ, η)) ,

U (θ)

v′ (w (θ, η))

)dF (θ) =

(epψ

2σ2η − 1

) C

1− τ.

The welfare e�ect of the tax reform is given by

−ˆ

Θ

E

[(v′ (w (θ, η)))−1

E[(v′ (w (θ, ·)))−1]α (θ)u′ (R (w (θ, η))) T (w (θ, η))

]dF (θ) = − 1

1− τ.

Now, normalize the tax reform of the tax rate τ so that it delivers $1 of revenue.

76

Since the sum of all the �scal externalities is zero, the increase in government revenue

of the tax reform T = 11−pw

1−p is simply of mechanical e�ect C1−τ . Thus, we consider

the normalized tax reform

T ∗ =1− τC× T =

1

C× 1− τ

1− p(w (θ, η))1−p .

The welfare impact of distributing an additional dollar of tax revenue via a reduction

of the parameter τ is therefore equal to the welfare e�ect of the reform −T ∗, whichis equal to

λ =1− τC× 1

1− τ=

1

C.

This gives the marginal value of public funds λ in this setting and concludes the

proof.

Proof of Proposition 4. We give two proofs of this result. First, we apply formula

(27) using the explicit expressions of each term derived in the proof of Proposition 3

above. We must have, letting α (θ) = 1 for all θ,

0 =

[µθ + ln a− 1

1− p+ (1− p)σ2

θ +

(1

2− p)ψ2σ2

η

]C

−[µθ + ln a− 1

1− p+

1

2ψ2σ2

η

]C

− 1

1− p(Y − (1− p)C) εa,1−p + pψ2σ2

ηC −[p

1

εψ2σ2

η

]εa,1−pC

=

[(1− p)σ2

θ +

(− p

1− pY

C− p1

εψ2σ2

η

)εa,1−p

]C.

Since government expenditures are equal to G, we have

Y

C=

Y

Y −G=

1

1− g

77

where g ≡ G/Y . We thus obtain

σ2θ

εa,1−p=

1

(1− p)2

(1

1− g− (1− p)

)εa,1−p +

p

1− p1

εψ2σ2

η

=p

(1− p)2

[1 +

g

(1− g) p+ (1− p) 1

εψ2σ2

η

],

which easily yields the result.

The second proof consists of directly calculating the optimal rate of progressivity

in the loglinear model by equating to zero the derivative of social welfare in this

environment. To do so, recall that the earnings schedule of agents with ability θ can

be written as

log (w (θ, η)) = log (θa) + ψη − 1

2(ψση)

2

and their expected utility as

U (θ) = log

(1− τ1− p

)+ (1− p) log (θa)− 1

2(1− p) (ψση)

2 − h (a) .

Utilitarian social welfare is therefore equal to

ˆΘ

U (θ) dF (θ) = (1− p)µθ + (1− p) log a− (1− p)ψ2σ2

η

2− h (a) + log

(1− τ1− p

).

The �rst-order condition for e�ort, taking tax rates as given, reads

0 =∂U (θ)

∂a= (1− p) 1

a− (1− p)ψσ2

η

∂ψ

∂a− h′ (a) .

Now recall that expected pre-tax and post-tax earnings are respectively given by

E [w (θ, η)] = θa and E[(w (θ, η))1−p] = (θa)1−p e−pa

2ε σ2η

2(1−p) , so that government revenue

is equal to

ˆΘ

E [R (w (θ, η))] f (θ) dθ = aeµθ+σ2θ2 − 1− τ

1− pe−

pa2ε σ2η

2(1−p) a1−pe(1−p)µθ+(1−p)2 σ2θ2 .

78

Budget balance thus requires

1− τ1− p

=aeµθ+

σ2θ2 −G

e−pa

2ε σ2η

2(1−p) a1−pe(1−p)µθ+(1−p)2σ2θ2

=(1− g) aeµθ+

σ2θ2

e−pa

2ε σ2η

2(1−p) a1−pe(1−p)µθ+(1−p)2σ2θ2

.

As a result, maximizing with respect to 1− p leads to:

0 = µθ + log a+ (1− p) 1

a

∂a

∂ (1− p)− h′ (a)

∂a

∂ (1− p)−ψ2σ2

η

2

− (1− p)ψσ2η

[∂ψ

∂ (1− p)+∂ψ

∂a

∂a

∂ (1− p)

]+∂ log

(1−τ1−p

)∂ (1− p)

,

with

∂ log(

1−τ1−p

)∂ (1− p)

=g

1− g∂ log a

∂ (1− p)− µθ − (1− p)σ2

θ − log a+ p1

a

∂a

∂ (1− p)

−(

1

2− p)ψ2σ2

η + p (1− p)ψσ2η

[∂ψ

∂ (1− p)+∂ψ

∂a

∂a

∂ (1− p)

].

We therefore obtain

0 =

[(1− p) 1

a− h′ (a)− (1− p)ψσ2

η

∂ψ

∂a

]∂a

∂ (1− p)+ p

1

a

∂a

∂ (1− p)+

g

1− g∂ log a

∂ (1− p)

− (1− p)σ2θ − (1− p)ψ2σ2

η − (1− p)2 ψσ2η

∂ψ

∂ (1− p)+ p (1− p)ψσ2

η

∂ψ

∂a

∂a

∂ (1− p).

Using the �rst-order condition for e�ort leads to

0 =1

1− p

[p+

g

1− g

]εa,1−p + pψ2σ2

ηεψ,aεa,1−p

− (1− p)[σ2θ + ψ2σ2

η

]− (1− p)ψ2σ2

ηεψ,1−p.

Rearranging this equation leads to the result.

Further Examples of Tax Reforms. We now study the incidence of additional

examples of tax reforms.

Lump-Sum Tax Increase on High Incomes. We focus on the top earners, whose

incomes are located in the highest bracket characterized by a constant tax rate τtop

79

and an income threshold wtop. Thus, the baseline tax schedule T is locally a�ne,

that is, T (w) = T (wtop) + τtop (w − wtop) for all w > wtop. Moreover, we assume

that w (y | θ) > wtop for all y. A uniform lump-sum increase of the income tax

liabilities of these agents is represented by the tax reform T (w) = 1 for all w > wtop.

(Equivalently, T can be any positive constant.) Indeed, the perturbed tax schedule is

then given by T + δT = T + δ. Thus, the tax function is shifted up by the constant

δ < 0. The value of the Gateaux derivative Ψ(T, T ) then gives the �rst-order e�ect

of this lump-sum transfer on the functional Ψ as its size δ becomes small.

Applying formulas (15) and (16) leads to the following results. The earnings

adjustment caused by crowding-out is equal to

wco (θ, η) =1

1− τ

(1− (v′ (w (θ, η)))−1

E[(v′ (w (θ, ·)))−1]

).

Intuitively, a uniform lump-sum tax increase can be fully absorbed by the �rm via a

counteracting lump-sum increase in earnings, without any change in private insurance.

Thus, the �rst element of crowding-out is constant. On the other hand, the second

element of crowding-out is decreasing in the performance shock. This is because

the tax increase reduces expected utility U (θ). To preserve incentive compatibility,

this is achieved by reducing pre-tax earnings by larger amounts for higher-income

workers, since their marginal utility is smaller. As a result, ∂∂ηwco (θ, η) < 0, so that

the crowding-out e�ect reduces the sensitivity of pre-tax earnings to performance.

On the other hand, the sum of the standard labor supply e�ect wex (θ, η) and the

performance-pay e�ect wpp (θ, η) is increasing, so that the labor supply responses

raise the performance sensitivity of the contract. Indeed, a lump-sum tax increase

creates a pure income e�ect and hence raises optimal e�ort, that is, a (θ) > 0. Im-

plementing this higher e�ort level requires an increase in the sensitivity of earnings

to performance shocks. Overall, a uniform tax increase can lead to either a spread or

a contraction in the pre-tax earnings schedule, depending on the size of the income

e�ect for high-income earners.

Marginal Tax Rate Increase on High Incomes. Focusing again on the highest

income tax bracket, an increase in the top marginal tax rate is represented by the tax

reform T (w) = w − wtop for all w > wtop. Indeed, the perturbed tax payments are

then given by T (w) + δT (w) = T (w) + δ (w − wtop), and the marginal tax rates are

80

perturbed by the constant δT ′ (w) = δ for w ≥ wtop. We obtain the following results.

Assuming that the utility function is logarithmic, we �nd that the crowding-out e�ect

is equal to

wco (θ, η) =1

(1− τ) Π(w (θ, η)− E [w (θ, η)]) ,

where Π = E [w (θ, η)] /wtop. Thus, ∂∂ηwco (θ, η) > 0, so that the crowding-out e�ect

raises the sensitivity of pre-tax earnings to performance shocks. Note that if the base-

line tax code is linear and the tax rates increase uniformly for the whole population,

the crowding-out e�ect is equal to zero. On the other hand, the sum of the standard

labor supply e�ect wex (θ, η) and the performance-pay e�ect wpp (θ, η) is increasing if

the reform reduces optimal e�ort, a (θ) < 0, which raises the performance sensitivity

of the contract. The larger the average uncompensated elasticity of labor supply, the

stronger the performance-pay e�ect relative to the crowding-out e�ect, the more an

increase in top tax rates reduces the dispersion of earnings at the top. Overall, the

e�ect of this reform on the earnings distribution is ambiguous.

Proportional Decrease in Retention Rates. Suppose that Assumption 2 holds.

Consider a tax reform that raises the parameter τ of the CRP tax schedule by a small

amount δ. The direction T of this tax reform is such that(w − 1− τ − δ

1− pw1−p

)−(w − 1− τ

1− pw1−p

)= δT (w) + o (δ) .

This easily implies that this tax reform T is de�ned by

T (w) =1

1− pw1−p, ∀w > 0. (64)

Note that this reform changes the retention rates r (w), in percentage terms, by

a negative constant: r(w)r(w)

= − T ′(w)1−T ′(w)

= − 1(1−τ)(1−p) . Therefore, it amounts to a

proportional reduction in retention rates.

Applying formula (14) to the tax reform (64) yields the following results. The two

81

components of crowding-out are equal to

T (w (θ, η))

r (w (θ, η))=

(v′ (w (θ, η)))−1

E[(v′ (w (θ, ·)))−1] E

[T (w (θ, ·))r (w (θ, ·))

]=

w (θ, η)

(1− τ) (1− p).

That is, in response to a tax reform that reduces all retention rates by the same

percent amount (and hence raises marginal tax rates), the �rm �rst increases all

workers' salaries in proportion to their initial earnings in order to counteract their

net income losses and keep their incentives unchanged. But the reform also reduces

rents and hence leads �rms to reduce salaries also in proportion to the workers' initial

earnings. Indeed, since the utility function is logarithmic, this ensures that all agents'

utilities decrease by the same amount. Therefore, we obtain that the total crowding

out of private insurance by the reform, wco (θ, η), is equal to zero.

Now, the standard labor supply e�ect wex (θ, η) and the performance-pay e�ect

wpp (θ, η) are both also equal to zero because, by equation (31), the optimal e�ort

level depends only on the rate of progressivity and not on the tax parameter τ .

Intuitively, e�ort remains constant because the utility function is logarithmic, so that

the substitution and income e�ects cancel out. As a result, the earnings schedule is

completely una�ected by the reform.

F Proofs of Section 5

In this section we derive the optimal progressivity formula in the quantitative model.

The e�ort and the expected utility (conditional on ability θ) of a normal worker are

an = (1− p)ε

1+ε

and

Un(θ) = log

(1− τ1− p

)+ (1− p) log(θan)− h(an).

The e�ort and the expected utility (conditional on ability θ) of a performance-pay

worker are

am =

[(1− p)

(1− 1

εψ2σ2

η,m

)] ε1+ε

82

and

Um(θ) = log

(1− τ1− p

)+ (1− p) log(θam)− h(am)−

ψ2σ2η,m

2,

where the endogenous pass-through ψ is equal a1/εm

1−p . The Utilitarian social welfare

function is

W = π

ˆUm(θ)dFm(θ) + (1− π)

ˆUn(θ)dFn(θ) (65)

= log

(1− τ1− p

)+ (1− p)

(πµθ,m + (1− π)µθ,n +

1

λθ

)+π

((1− p) log(am)− h(am)− (1− p)

ψ2σ2η,m

2

)+(1− π) ((1− p) log(an)− h(an))

where the distribution of productivities at performance-pay jobs Fm(θ) is Pareto-

lognormal with parameters (µθ,m, σ2θ , λθ), and the distribution of productivities at

normal jobs Fn(θ) is Pareto-lognormal with parameters (µθ,n, σ2θ , λθ). We derive the

optimality condition by di�erentiating W with respect to 1−p, applying the envelopetheorem and equating the derivative to zero:

0 =∂W

∂ (1− p)=

∂ log(

1−τ1−p

)∂1− p

+ πµθ,m + (1− π)µθ,n +1

λθ

+π log(am) + (1− π) log(an)− π(

1

2+ εψ,1−p

)ψ2σ2

η,m.

For each rate of progressivity, the other tax parameter τ is chosen to balance the

government budget subject to �xed government spending G. Therefore, the resource

constraint reads Y = C +G, where the aggregate output Y is given by

Y =λθ

λθ − 1

(πeµθ,m+

σ2θ2 am + (1− π)eµθ,n+

σ2θ2 an

)

83

and the aggregate consumption C is given by

C =1− τ1− p

λθλθ − (1− p)

×(πe(1−p)µθ,m+(1−p)2 σ

2θ2 e−p(1−p)

(ψση,m)2

2 a1−pm + (1− π)e(1−p)µθ,n+(1−p)2 σ

2θ2 a1−p

n

).

Using the resource constraint, we have

1− τ1− p

=1− τ1− p

Y

C

(1− G

Y

).

Plug in the expression for Y and C and di�erentiate with respect to 1− p to obtain

∂ log(

1−τ1−p

)∂ (1− p)

= − 1

λθ − (1− p)+

1

1− pεY,1−p

g

1− g+

1

1− pεY,1−p

−cm(µθ,m + (1− p)σ2

θ + εam,1−p + log(am))

−cm(

1− p− 1

2− pεψ,1−p − pεam,1−pεψ,am

)ψ2σ2

η,m

−cn(µθ,n + (1− p)σ2

θ + εan,1−p + log(an))

where εY,1−p ≡ ymεam,1−p + ynεan,1−p is the elasticity of the aggregate income with

respect to 1 − p, g = GY

is the share of government spending in output and cj is

the share of jobs of type j ∈ {m,n} in the aggregate consumption. Plugging this

expression into the optimality condition and rearranging yields the �nal optimality

condition(p+

g

1− g

)εY,1−p1− p

+ (ym − cm) (εam,1−p − εan,1−p) + πpεam,1−pεψ,amψ2σ2

η,m

=1

λθ

1− pλθ − (1− p)

+ (1− p)σ2θ + π(1− p)(1 + εψ,1−p)ψ

2σ2η,m

−(π − cm) (µθ,m + log (am)− µθ,n − log (an))

−(π − cm)

(1

2− p (1 + εψ,1−p + εam,1−pεψ,am)

)ψ2σ2

η,m

where yj is the share of jobs of type j ∈ {m,n} in the aggregate output. The above

formula collapses to the formulas with only performance pay jobs when π = 1 and the

standard progressivity formula when π = 0 (in both cases the blue terms disappear).

84

The �rst term is the standard deadweight loss from rising the progressivity rate ad-

justed by the government spending, evaluated using the elasticity of aggregate income

εY,1−p ≡ ymεam,1−p+ynεan,1−p. The second term is a correction to the deadweight loss

due to both di�erences in elasticities between job types and discrepancies between

income and consumption shares of job types. The intuition is as follows. Note that

performance-pay workers, who are more elastic, have lower consumption share than

income share when p > 0 due to higher wage-rate risk. Suppose we increase progres-

sivity. Performance-pay workers will reduce their e�ort, which decreases both their

income (negative e�ect on available resources) and their consumption (positive e�ect

on available resources). Since their income share is higher than their consumption

share, aggregating both e�ects across all performance-pay workers leads to a negative

e�ect on available resources. The opposite is true for normal workers: their con-

sumption share is greater than income share, which means that on aggregate, there

are more available resources due to their responses. However, since performance-

pay workers are more elastic than normal workers, the former e�ect dominates and

increasing progressivity leads to an additional deadweight loss.

The next four terms are standard: the performance-pay e�ect, the redistribution

gains due to the the Pareto tail and due to the normal variance of productivities, and

the gain from insuring endogenous wage-rate risk at the performance-pay jobs net of

the crowd-out.

The last two terms are novel. They are present because the consumption share

of performance pay workers cm is potentially di�erent from their population share π.

These terms correspond to various ways in which resources are redistributed between

job types. The �rst of the two terms stands for the gain from insuring the �job type�

risk, that is, the risk of having a performance-pay job vs. a normal job. Recall that

this risk is exogenous. This term contributes to higher progressivity whenever there

is a di�erence in mean consumption at the two job types.

The last term is related to the endogenous wage-rate risk. When taxes are pro-

gressive, higher wage-rate risk of performance-pay workers lowers their consumption

relative to normal workers. Consequently, changes in progressivity as well as endoge-

nous adjustments of the wage-rate risk will result in the transfers of resources across

job types via the government budget constraint. Suppose that the term in the big

brackets is positive, which is likely if p is not very high.29 Then, when progressiv-

29Plugging in the values of elasticities, the term in the brackets becomes 12 − p

εam,1−pε . Since

85

ity increases, performance pay workers end up contributing more to the government

budget. This implies an additional redistribution from the performance-pay workers

to the normal workers. This e�ect contributes to higher (lower) progressivity if the

consumption share of performance-pay workers is higher (lower) than their population

share.

G Dynamic Model

In this section we extend our results to a dynamic model of the labor market. In

our setting, individuals live for several periods and sign a long-term labor contract

with a �rm. We use the moral hazard model of Edmans, Gabaix, Sadzik, and San-

nikov (2012) who extended Edmans and Gabaix (2011) to the multi-period environ-

ment. Workers' earnings can depend in an arbitrary way on their history of output

realizations. The government levies a labor income tax in each period and has a

redistributive social welfare objective.

G.1 Environment

Individuals are indexed by their exogenous and constant labor productivity θ ∈ Θ.

They live for S ≥ 2 periods, have time-separable preferences over consumption ct

and e�ort at with discount factor β ∈ (0, 1). Throughout this section, we denote the

history of a random variable x up to time t ≤ S by xt ≡ {xs}1≤s≤t and let x0 = ∅.Flow output at time t is given by:

yt = θ (at + ηt) , (66)

where {ηt}1≤t≤S are independent and identically distributed random variables. Through-

out the analysis we assume that the utility of consumption is logarithmic with isoe-

lastic disutility of labor, productivity θ is lognormally distributed with mean µθ and

variance σ2θ , and the performance shocks ηt are normally distributed with mean 0

and variance σ2η. As in Section 1, we assume that the agent chooses period-t e�ort at

after observing the realization of ηt. Therefore, the agent's strategy can be a function

at (ηt) of her history (including the current-period) of performance shocks.

εam,1−p >ε

1+ε , this term is positive if p < 1+ε2 .

86

Firms discount future pro�ts at rate r. For simplicity we assume that β (1 + r) =

1. In each period t they observe the agent's productivity θ and her output history yt

up to that date, but not her e�ort levels at or performance shocks ηt. A labor contract

speci�es an e�ort level at (θ) in each period t, and an earnings function wt (θ, ηt) that

depends on the inferred history of performance shocks (given the recommended e�ort

levels) up to and including time t.

Finally, in each period, the government levies an income tax. We suppose that

the tax schedule has a constant and history-independent rate of progressivity p, so

that for all t ∈ {1, . . . , S}, the retention function in period t is given by

Rt (w) =1− τt1− p

w1−p.

The parameter τt ensures that the government balances its budget in each period.

Finally, we rule out private savings so that an agent with earnings wt in period t

consumes ct = Rt (wt).

G.2 Equilibrium Labor Contract

We start by setting up the contracting problem between the �rm and a worker with

productivity θ. The operator Et denotes the expectation over all future performance

shock realizations {ηs}t+1≤s≤S conditional on ex-ante productivity θ and output his-

tory ηt.

Firm's problem. The �rm's maximizes its expected pro�t

Π (θ) = max{at(θ),wt(θ,ηt)}1≤t≤S

E0

[S∑t=1

(1

1 + r

)t−1 (yt − wt

(θ, ηt

))], (67)

subject to the incentive constraint which requires that for any alternative e�ort strat-

egy {at (ηt)}1≤t≤S,

E1

[S∑t=1

βt−1(u(Rt

(wt(θ, ηt

)))− h

(at(ηt)))]

(68)

≤ E1

[S∑t=1

βt−1(u(Rt

(wt(θ, ηt

)))− h (at (θ))

)],

87

and the participation constraint:

E0

[S∑t=1

βt−1(u(R(wt(yt | θ

)))− h (at (θ))

)]≥ U (θ) , (69)

where yt is given by (66) and U (θ) is the reservation value of workers with productivity

θ.

Equilibrium. We assume that there is free entry of �rms, so that in equilibrium

pro�ts are equal to zero:

Π (θ) = 0. (70)

This equation pins down the workers' reservation value U (θ).

Optimal Contract. We characterize the equilibrium contract in two steps: we �rst

study its intertemporal, then its intratemporal, properties.

Lemma 3. The earnings process wt (θ, ηt) is a martingale. That is, expected period-t

earnings are equal to realized period-(t− 1) earnings,

Et−1

[wt(θ, ηt−1, ηt

)| ηt]

= wt−1

(θ, ηt−1

). (71)

Proof. See Appendix H.

This lemma characterizes the intertemporal properties of the optimal contract

between the �rm and the worker. It is well known that the solution to dynamic

contracting models under separable utility satis�es the Inverse Euler Equation (see,

e.g., Rogerson (1985); Golosov, Kocherlakota, and Tsyvinski (2003)). Intuitively, the

�rm incurs a convex cost of providing e�ort incentives � giving xt utils in period t

requires paying a before-tax salary R−1t (u−1 (xt)), where the cost function R

−1t ◦u−1 ≡

Ct is convex. As a result, the optimal contract smooths out the cost of providing

incentives over time, which requires Et [C ′t (xt+1)] = C ′t (xt). Under the assumptions

that the utility function u is logarithmic and the retention function Rt is CRP, this

equation can be rewritten as (71).

88

Proposition 5. Assume that e�ort is positive in each period, or that h′ (0) = 0.

De�ne the present value of e�ort by A ≡∑S

s=1

(1

1+r

)s−1as, and the sequences of

sensitivity and pass-through parameters {δt, ψt}1≤t≤S by

δt =1∑S−ts=0 β

s, and ψt = δt

h′ (at)

1− p.

The earnings schedule satis�es

log(wt(θ, ηt

))= log

(wt−1

(θ, ηt−1

))+ ψtηt −

1

2ψ2t σ

2η, (72)

where initial earnings are given by w0 ≡ δ1θA. The optimal period-t e�ort level at is

independent of θ and satis�es

at =

[(1− p)

(atδ1A− 1

δtεψt,atψ

2t σ

2η

)] ε1+ε

, (73)

where εψt,at = 1εis the elasticity of the pass-through parameter ψt with respect to e�ort

at. Expected utility is given by

U (θ) =S∑t=1

βt−1

[log (R (δ1θA))− h (at)−

1

2δtψ2t σ

2η

]. (74)


This proposition generalizes Corollary 1 to the dynamic setting and allows us

to characterize the intratemporal properties of the optimal compensation contract.

Equation (72) implies that earnings in each period t are a log-linear function of the

performance shock ηt in that period. The pass-through of performance shocks ηt to

log-earnings, ψt = ∂ logwt (θ, ·) /∂ηt, is increasing in the rate of progressivity p and

the optimal e�ort level at at time t. Note �nally that the pass-throughs have the same

form as in the static model � we therefore expect the insight that the performance-pay

e�ect counteracts and o�sets a large share of the crowding-out e�ect to carry over to

the dynamic environment.

Formally, up to the optimal value of e�ort, the pass-through ψS in the terminal

period S is the same as in the optimal static contract (see equation (30)) since δS = 1.

In earlier periods, on the other hand, the exposure to risk for a given e�ort level is

89

strictly smaller than in the static environment as the sensitivity parameters δt satisfy

δt < 1 for all t ≤ S − 1. To understand the intuition for this result, note that

equation (72) implies that an increase in the output realization yt � either due to

e�ort or to random shocks � boosts log-earnings in the current and future periods

equally. Indeed, since the agent is risk-averse it is e�cient to spread the rewards over

her entire horizon. In other words, a given increase in lifetime utility necessary to

elicit higher e�ort requires a higher increase in �ow utility if there are fewer remaining

periods over which to smooth these bene�ts. As a result, the sequence {δt}1≤t≤S is

strictly increasing and the degree of performance-pay gets stronger over time.

G.3 Optimal Tax Progressivity

We �nally characterize the optimal history-independent rate of progressivity p in the

dynamic environment. The government chooses p to maximize a utilitarian social

objective´

ΘU (θ) dF (θ) subject to period-by-period budget balance constraint that´

ΘRt (wt (θ, ηt)) dF (θ) ≥ 0.

Proposition 6. The optimal rate of progressivity is given by

p∗

(1− p∗)2 =σ2θ

εA,1−p + (1− p)∑S

s=1 βs−1 δ1

δsεψs,asεas,1−pψ

2sσ

2η

, (75)

where εA,1−p is the elasticity of the present discounted value of e�ort A with respect

to progressivity, and εψs,as = 1ε.


To compare the optimum rate of progressivity (75) to its static counterpart (43),

�rst consider the benchmark environment with exogenous wage risk. That is, the

planner observes ex-ante earnings heterogeneity due to productivity shocks θ, and

ex-post heterogeneity due to performance shocks ηt passed through to earnings. In

particular, it observes that the degree of performance-pay rises with age, as described

in Proposition 5. However, it mistakenly believes that wage rates, and hence ψt, are

exogenous. That is, it assumes that εψs,as = εas,1−p = 0 for all s ≥ 1. In this case,

the dynamic optimal tax formula (75) is identical to the static formula (43), except

that the relevant labor supply elasticity is now the elasticity of the present-value of

e�ort, εA,1−p.

90

Now consider the general model with ex-post earnings dispersion and endogenous

wage risk, captured by the non-zero elasticities εψs,as and εas,1−p. As in the static

model, the �scal externality and welfare e�ect induced by the crowding-out e�ect

cancel each other out. The adjustment to the optimal rate of progressivity is given

by the second term in the denominator of (75). Recall that this term accounts for

the negative �scal externality due to the performance-pay e�ect: a higher rate of

progressivity reduces e�ort in period s, hence reduces the dispersion of earnings which

in turn (by Jensen's inequality) negatively a�ects government revenue. This term

resembles the present value of the corresponding terms in the static model, with one

di�erence. Namely, the relevant discount factor is not βs−1 but βs−1 δ1δs. Since δs

is increasing over time, this implies that the �scal externalities caused by the future

performance-pay e�ects are discounted at a higher rate than the standard deadweight

losses from distorting e�ort.

H Proofs of Section G

Proof of Lemma 3. Starting from an incentive compatible allocation, consider the

following variations in retained wage/utility:

ut−1 = u(R(wt−1

(θ, ηt−1

)))− β∆

ut = u(R(wt(θ, ηt−1, ηt

)))+ ∆

and us = u (R (ws (θ, ηs))) for all s /∈ {t− 1, t}. These perturbations preserve utilityand incentive compatibility since for all at−1

ut−1 − h (at−1) + βE[ut|ηt−1

]= u

(R(wt−1

(θ, ηt−1

)))− h (at−1) + βE

[u(R(wt(θ, ηt−1, ηt

)))|ηt−1

].

The optimal allocation must be una�ected by such deviations, so that

0 = arg min∆

E

[S∑s=1

(1 + r)−t (ys −W (us))

]

91

Where W = (u ◦R)−1. The associated �rst-order condition evaluated at ∆ = 0 reads

W ′ (u (R (wt−1

(θ, ηt−1

))))=

1

β (1 + r)E[W ′ (u ((R (wt (θ, ηt−1, ηt

))))|yt−1

]that is,

E[

1

v′ (wt (θ, ηt−1, ηt))| yt−1

]= β (1 + r)

1

v′ (wt−1 (θ, ηt−1)).

The inverse Euler equation (see Golosov, Kocherlakota, and Tsyvinski (2003)) holds in

our setting. With log utility and a CRP tax schedule, this equation can be rewritten

as

(1− p)E[wt(θ, ηt−1, ηt

)| yt−1

]= (1− p) β (1 + r)wt−1

(θ, ηt−1

),

which leads to equation (71) as β (1 + r) = 1.

Proof of Proposition 5. We provide a heuristic proof of this proposition, and the

formal argument follows the same steps as in Edmans, Gabaix, Sadzik, and Sannikov

(2012). Assume that a unique level of e�ort is implemented at each time t, that these

e�ort levels are independent of previous output noise, and that local incentive con-

straints are su�cient conditions. Consider �rst the incentive compatibility constraint

which ensures that the worker does not wish to choose a di�erent level of e�ort than

the one recommended by the �rm. Consider a local deviation in e�ort at after history

(ηt−1, ηt). The e�ect of such a deviation on the worker's lifetime utility U should be

zero,

Et−1

[∂U

∂yt

∂yt∂at

+∂U

∂at

]= 0.

Since ∂yt∂at

= θ, we obtain

Et−1

[∂U

∂yt

]= −1

θ

∂U

∂at(76)

Applying incentive compatibility for e�ort in the �nal period we obtain:

r(wS(θ, ηS

))u′(R(wS(θ, ηS

))) ∂w (θ, ηS−1, ηS)

∂ηS= h′ (aS (θ)) .

92

Fixing ηS−1 and integrating this incentive constraint over ηS (meaning over realiza-

tions of ηS given a (θ)) leads to

u(R(wS(θ, ηS

)))= h′ (aS (θ)) ηS + zS−1

(ηS−1

)for some function of past output zS−1

(ηS−1

). This implies in particular that

∂u(R(wS(θ, ηS

)))∂ηS−1

=∂zS−1

(ηS−1

)∂ηS−1

.

Analogously, the incentive constraint for e�ort in the second to last period reads

r(wS−1

(θ, ηS−1

))u′(R(wS−1

(θ, ηS−1

))) ∂w (θ, ηS−1)

∂ηS−1

+βr(wS(θ, ηS

))u′(R(wS(θ, ηS

))) ∂wS (θ, ηS)∂ηS−1

= h′ (aS−1 (θ)) .

Integrating the previous expression over ηS−1 and using the previous equation implies

u(R(wS−1

(θ, ηS−1

)))+ βzS−1

(ηS−1

)= h′ (aS−1 (θ)) ηS−1 + zS−2

(ηS−2

).

We now want to show that zS−1(ηS−1

)is a linear function of ηS−1. Since the utility

function is logarithmic and the tax schedule is CRP, we obtain

(1− p) log(wS(θ, ηS

))= h′ (aS (θ)) ηS + zS−1

(ηS−1

)− log

(1− τS1− p

)and

(1− p) log(wS−1

(θ, ηS−1

))= h′ (aS−1 (θ)) ηS−1 − βzS−1

(ηS−1

)+ zS−2

(ηS−2

)− log

(1− τS−1

1− p

).

Now recall that the inverse Euler equation reads

ES−1

[wS(θ, ηS

)]= wS−1

(θ, ηS−1

).

93

Using the previous expressions, this equality can be rewritten as

ES−1

[e

11−ph

′(aS(θ))ηS]e

11−p z

S−1(ηS−1)

=

(1− τS

1− τS−1

) 11−p

e1

1−ph′(aS−1(θ))ηS−1e−β

11−p z

S−1(ηS−1)+ 11−p z

S−2(ηS−2).

This in turn implies

(1 + β) zS−1(ηS−1

)= h′ (aS−1 (θ)) ηS−1 + zS−2

(ηS−2

)− 1

2

(h′ (aS (θ)))2

1− pσ2η +

1

1− plog

(1− τS

1− τS−1

).

Therefore, zS−1(ηS−1

), and in turn u

(R(wS−1

(θ, ηS−1

))), is linear in ηS−1. More-

over, the last-period utility is linear in both ηS and ηS−1. By induction, we can show

that the utility in each period is a linear function of the performance shock in every

past period. Now suppose for simplicity of exposition that S = 2, β = 1, r = 0,

θ = 1, so that δ1 = 12and δ2 = 1. From the arguments above we guess a log-linear

speci�cation for earnings:

logw1 = ψ1η1 + k1

logw2 = ψ21η1 + ψ2η2 + k1 + k2.

The martingale property (71) requires w1 = E1 [w2], so that for all η1, eψ1η1+k1 =

eψ21η1+k1E[eψ2η2+k2 | η1

]. This requires ψ1 = ψ21 and e−k2 = E

[eψ2η2 | η1

]. Now, the

total utility of the agent is given by

U = (1− p) [2ψ1η1 + ψ2y2 + 2k1 + k2]

−h (a1)− h (a2) + log

(1− τ1

1− p

)+ log

(1− τ2

1− p

).

The incentive constraint for e�ort 76 implies

ψ1 =h′ (a1)

2 (1− p), and ψ2 =

h′ (a2)

1− p

94

and therefore

k2 = −h′ (a2)

1− p−σ2η

2

(h′ (a2)

1− p

)2

.

Replacing in the expression for log earnings leads to

logw1 = k′1 +h′ (a1)

2 (1− p)η1 −

σ2η

2

(h′ (a1)

2 (1− p)

)2

and

logw2 = k′1 +h′ (a1)

2 (1− p)η1 −

σ2η

2

(h′ (a1)

2 (1− p)

)2

+

(h′ (a2)

1− p

)η2 −

σ2η

2

(h′ (a2)

1− p

)2

,

where k′1 ≡ k1+ψ1a1−σ2η

2ψ2

1. This constant is pinned down by the zero pro�t condition

E [w1 + w2] = a1 + a2, that is, 2ek′1 = a1 + a2. This implies

k′1 = log

(a1 + a2

2

),

which concludes the proof of equation (72). Equations (73) and (74) are derived in

the next proof.

Proof of Proposition 6. Recall that the earnings schedule is given by

logw1 = log (δ1θA) + ψ1η1 −ψ2

1σ2η

2,

logwt = logwt−1 + ψtηt −ψ2t σ

2η

2.

The expected utility of workers with productivity θ is therefore equal to

U (θ) = (1− p)

[1

δ1

log (δ1θA)−S∑s=1

βs−1 1

δs

ψ2sσ

2η

2

]

−S∑s=1

βs−1h (as) +S∑s=1

βs−1 log

(1− τs1− p

),

95

from which (74) easily follows. Thus, utilitarian social welfare is

ˆΘ

U (θ) dF (θ) = (1− p)

[1

δ1

log (δ1A) +1

δ1

µθ −S∑s=1

βs−1 1

δs

ψ2sσ

2η

2

]

−S∑s=1

βs−1h (as) +S∑s=1

βs−1 log

(1− τs1− p

).

The �rst-order condition for optimal e�ort reads

0 =∂U (θ)

∂at= (1− p)

[1

δ1

1

A

∂A

∂at− βt−1 1

δtψtσ

2η

∂ψt∂at

]− βt−1h′ (at)

= (1− p)

[1

δ1

(1

1+r

)t−1at

A− βt−1 1

δtψ2t σ

2ηεψt,at

]1

at− βt−1h′ (at) ,

which easily implies equation (73). Now, the expected present value of pre-tax and

post-tax earnings in period t are given by E [wt] = δ1θA and

E[w1−pt

]= (δ1θA)1−p E

[e∑ts=1(1−p)ψsηs

]e−

∑ts=1(1−p)

ψ2sσ

2η

2 = (δ1θA)1−p e−p(1−p)∑ts=1

ψ2sσ

2η

2

respectively, so that expected government revenue in period t is equal to

ˆΘ

E [T (wt)] dF (θ)

= δ1Aeµθ+

σ2θ2 − 1− τt

1− p(δ1A)1−p e−p(1−p)

∑ts=1

ψ2sσ

2η

2 e(1−p)µθ+(1−p)2 σ2θ2 .

Imposing period-by-period budget balance therefore requires

1− τt1− p

=(δ1A)p eµθ+

σ2θ2

e−p(1−p)(∑ts=1 ψ

2s)

σ2η2 e(1−p)µθ+(1−p)2

σ2θ2

.

Substituting this expression into the social welfare function´

ΘU (θ) dF (θ) implies

96

that social welfare is equal to

1

δ1

[log (δ1A) + µθ +

(1− (1− p)2) σ2

θ

2

]−

S∑s=1

βs−1h (as)

+p (1− p)S∑s=1

βs−1

(s∑i=1

ψ2i σ

2η

2

)− (1− p)

S∑s=1

βs−1 1

δs

ψ2sσ

2η

2

=1

δ1

[log (δ1A) + µθ +

(1− (1− p)2) σ2

θ

2

]−

S∑s=1

βs−1h (as)

− (1− p)2S∑s=1

βs−1 1

δs

ψ2sσ

2η

2.

We can now maximize this expression with respect to 1− p to get

S∑s=1

[1

δ1

1

A

(1

1 + r

)s−1

− βs−1h′ (as)

]∂as

∂ (1− p)

− (1− p)S∑s=1

βs−1 1


2sσ

2η

= (1− p)

[1

δ1

σ2θ +

S∑s=1

βs−1 1

δsψ2sσ

2η

]+ (1− p)

S∑s=1

βs−1 1

δsεψs,1−pψ

2sσ

2η.

Using the �rst-order condition for e�ort derived above to simplify the left hand side

of this expression implies

p

1− p1

δ1A

S∑s=1

(1

1 + r

)s−1

asεas,1−p + pS∑s=1

βs−1 1


2sσ

2η

= (1− p)

[1

δ1

σ2θ +

S∑s=1

βs−1 1

δs(1 + εψs,1−p)ψ

2sσ

2η

].

But the elasticity of the present discounted value of e�ort is equal to

εA,1−p ≡(1− p)A

∂∑S

s=1

(1

1+r

)s−1as

∂ (1− p)=

S∑s=1

(1

1 + r

)s−1asAεas,1−p.

Moreover, we have 1 + εψs,1−p = 0. Substituting these two expressions into the

97

previous equation and rearranging terms leads to

p

(1− p)2

[1

δ1

εA,1−p + (1− p)S∑s=1

βs−1 1


2sσ

2η

]=

1

δ1

σ2θ .

This concludes the proof of equation (75).

98

paweª doligalski abdoulaye ndiaye nicolas werquinpdoligalski.github.io/files/dnw_taxperfpay.pdf ·...

Documents