paweª doligalski abdoulaye ndiaye nicolas werquinpdoligalski.github.io/files/dnw_taxperfpay.pdf ·...
TRANSCRIPT
Redistribution with Performance Pay*
Paweª Doligalski
University of Bristol
Abdoulaye Ndiaye
New York University
Nicolas Werquin
Toulouse School of Economics
May 21, 2020
Abstract
Half of the jobs in the U.S. feature pay-for-performance. We study nonlinear in-
come taxation in a model where such labor contracts arise as a result of moral hazard
frictions within �rms. We derive novel formulas for the incidence of arbitrarily nonlin-
ear reforms of a given tax code on both average earnings and their sensitivity to output
risk. We show theoretically and quantitatively that, following an increase in tax pro-
gressivity, the higher sensitivity of earnings to performance caused by the crowding-out
of private insurance is almost fully o�set by a countervailing performance-pay e�ect
driven by labor supply responses. As a result, earnings risk is hardly a�ected by pol-
icy. We then turn to the normative analysis of a government that levies taxes and
transfers to redistribute income across workers with di�erent levels of uninsurable pro-
ductivity. We �nd that setting taxes without accounting for the endogeneity of private
insurance is close to optimal. Thus, the common concern that standard models of tax-
ation underestimate the cost of redistribution is, in the context of performance-based
compensation, overblown.
*We thank Árpád Ábrahám, Gadi Barlevy, Katherine Carey, Antoine Ferey, Xavier Gabaix,Daniel Garrett, Louis Kaplow, Stefan Pollinger, Morten Ravn, Kjetil Storesletten, Florian Scheuer,Karl Schulz, Stefanie Stantcheva, Aleh Tsyvinski, Hélène Turon, Venky Venkateswaran, PhilippWangner, and numerous seminar and conference participants, for helpful comments and suggestions.Nicolas Werquin acknowledges support from ANR under grant ANR-17-EURE-0010 (Investissementsd'Avenir program).
Introduction
What do fruit harvesters, real estate brokers, bankers and CEOs have in common?
All of them are paid based on their performance. Performance-pay contracts have
become increasingly popular across the income distribution. Empirically, a large share
� roughly half � of all the jobs in the U.S. involves performance-based compensation
(Lemieux, MacLeod, and Parent (2009)) in the form of piece rates, commissions,
bonuses, and stock options. These contracts are qualitatively di�erent from usual
wage contracts. Indeed, the structure of earnings is designed not only to compensate
the employee for completing the job, but also to provide incentives for e�ort in the
�rst place. When wages are highly sensitive to performance � incentives are high-
powered � employees are generously rewarded for better outcomes, but at the same
time they are also more exposed to risk. Crucially, we expect both the level and
the performance-sensitivity of these contracts to be endogenous to the tax policy
implemented by the government. Yet despite the prevalence of these compensation
schemes, they have not been systematically studied in the taxation literature. We �ll
this gap. In a general and tractable framework we derive in closed form the incidence
of tax reforms on the earnings and utility that performance-pay workers receive in
equilibrium. We also derive the impact of taxes on government revenue and social
welfare, as well as the optimal rate of tax progressivity in the presence of such realistic
labor contracts.
A widespread concern is that traditional models of income taxation in the tradi-
tion of Mirrlees (1971) substantially overstate the optimal level of taxes, by assuming
that heterogeneity in wage rates is exogenous and policy-invariant.1 Instead, when
wage risk is endogenous, increasing the progressivity of income taxes should lead to a
crowding-out of private insurance provided by �rms, that is, a one-for-one spread of
the pre-tax earnings distribution. Theoretically, this crowding-out has been shown to
be of critical importance in various contexts � in particular by Attanasio and R�os-
Rull (2000), Golosov and Tsyvinski (2007), and Krueger and Perri (2011) � where
it severely limits the ability of governments to provide social insurance. Empirically,
evidence of such crowding-out has been highlighted in several markets, for instance
unemployment or health insurance � see Cullen and Gruber (2000); Schoeni (2002);
1This is the case both in the static (for instance Saez (2001)) and dynamic (for instance Golosov,Kocherlakota, and Tsyvinski (2003); Farhi and Werning (2013); Golosov, Troshkin, and Tsyvinski(2016)) frameworks.
1
Cutler and Gruber (1996a,b). Yet the empirical literature that studies the impact
of income taxes on the structure of performance-pay contracts often fails to �nd sig-
ni�cant crowding-out e�ects, see for instance Rose and Wolfram (2002); Frydman
and Molloy (2011). Our paper reconciles these �ndings by highlighting a counter-
vailing force that keeps earnings risk practically una�ected by tax policy. This novel
�performance-pay� e�ect is driven by labor supply adjustments. Under a more pro-
gressive tax code, the worker's optimal level of e�ort is lower. The �rm elicits this
labor supply reduction by providing more insurance (crowding-in). We �nd that this
performance-pay e�ect almost fully o�sets the crowding-out.
We set up a model in which income inequality arises from two distinct sources,
namely, innate ability di�erences, and ex-post performance shocks that a�ect the out-
put of equally talented workers. While the former source of wage disparities cannot
be insured by private markets, the latter is very much shaped in the labor market.
In the presence of moral hazard frictions, wage risk has a productive role: employers
choose the amount of risk faced by their employees through performance-based pay
contracts in order to strike a balance between insurance and incentives for e�ort. Our
modeling of labor markets is based on those of Edmans and Gabaix (2011) for our
static setting, and Edmans, Gabaix, Sadzik, and Sannikov (2012) for our dynamic set-
ting. Their frameworks have been very successful at explaining the empirical features
of actual performance-based contracts (see Edmans and Gabaix (2016)). We extend
them to incorporate sophisticated nonlinear policy instruments. The key technical
breakthrough is that we allow for arbitrarily nonlinear tax instruments. Previous
models of moral hazard were tractable only under very restricted forms of the utility
of consumption � for instance, Holmstrom and Milgrom (1987) impose exponential
utility functions. This makes it impossible to consider a wide class of tax schedules
� typically, they would have to be restricted to being a�ne � since nonlinear taxes
e�ectively modify the concavity of the utility that workers receive from their salaries.
Instead, the analysis of Edmans and Gabaix (2011) remains tractable for very general
utility functions. Therefore it allows us to study the incidence of arbitrary tax reforms
(say, increasing taxes on the rich, or altering the shape of the EITC) of any initial tax
schedule (say, the U.S. tax code). Our analysis is thus very general and can be used
for both positive and normative investigation. The government has an e�ective role to
play despite the fact that private insurance markets are constrained e�cient. Indeed,
while �rms optimally provide insurance against ex-post output risk, the government
2
uses tax policy for redistribution between workers with di�erent ex-ante ability.
We start with a positive analysis of the incidence of tax reforms on the workers'
labor contracts and the distribution of utilities. In standard models with exogenous
wage risk, taxes a�ect earnings only by modifying individual labor e�ort decisions. In
our framework, wage risk is endogenous to policy as well. We show that it responds to
tax changes via two channels: a crowding-out e�ect, and a performance-pay e�ect. On
the one hand, the crowding-out e�ect is the optimal response of �rms to a change in
social insurance: they adjust the earnings contract endogenously so that the workers'
incentives for e�ort and participation constraints remain satis�ed after the reform.
Thus, following an improvement in social insurance (higher tax progressivity), �rms
respond by spreading the pre-tax earnings schedule. The performance-pay e�ect, on
the other hand, arises from the optimal labor supply adjustment to the tax reform.
As in standard models of income taxation, workers' optimal e�ort is lower in response
to an increase in marginal tax rates or tax progressivity. But eliciting a lower e�ort
level in the presence of moral hazard frictions is achieved by lowering the sensitivity of
pre-tax earnings to performance, that is, by compressing the wage distribution. This
e�ect counteracts the direct crowding-out of private insurance that the tax reform
induces. Crucially, because our model is tractable, we are able to derive this tax
incidence analysis entirely in closed form for an arbitrary baseline tax system and
arbitrary tax reforms.
We show both theoretically and quantitatively in a calibrated version of our model
that the two earnings risk adjustments almost fully o�set each other in response to
an increase in the progressivity of the tax code. Taken separately these e�ects are
both signi�cant, but summing them implies that taxes barely a�ect the sensitivity
of pay to compensation. Moreover, this result is robust to the value of the labor
supply elasticity. The fundamental reason is that the sensitivity of the contract to
performance is proportional to the marginal disutility of labor. As a result, in order
to elicit a given increase in labor e�ort, the �rm must increase the pass-through of
output risk to earnings proportionally to the inverse of the labor supply elasticity.
Therefore, if labor e�ort is relatively inelastic, the change in performance-sensitivity
necessary to elicit the optimal e�ort change must be large, and vice versa. We evaluate
the robustness of this result to other canonical tax reforms and show that in all cases,
the performance-pay o�sets at least �fty percent, and in some cases even dominates,
the direct crowding-out e�ect.
3
Armed with this tax incidence analysis, we then derive the impact of tax reforms
on government revenue and social welfare, as well as the optimal level of tax progres-
sivity. This analysis extends Chetty and Saez (2010) to our environment with arbi-
trarily nonlinear taxes. In addition to the standard e�ects obtained in the benchmark
model with exogenous wage risk, the crowding-out and performance-pay e�ects cre-
ate �scal externalities: given an initially progressive tax code, a spread (respectively,
contraction) of the pre-tax earnings distribution impacts positively (resp., negatively)
the government budget. Moreover, the crowding-out e�ect has a �rst-order negative
impact on social welfare. This is because, following a tax reform, �rms adjust wages
in a way that renders tax cuts less accurately targeted than in a model with exoge-
nous risk. This modi�es the relevant social welfare weights in the direction of less
redistribution. We then impose a number of functional form assumptions to make
the analysis as transparent as possible and obtain sharper results. In particular, we
assume that the nonlinear tax schedule is restricted to having a constant rate of pro-
gressivity (as in, for instance, Heathcote, Storesletten, and Violante (2017)). Within
this class of tax schedules, we derive the optimal rate of progressivity in closed form
and show that it is smaller than when wage risk is considered exogenous. However,
the welfare losses from setting taxes suboptimally by ignoring the endogeneity of wage
risk are quantitatively limited, equivalent to a mere 0.24% drop in consumption. This
is because only roughly half of the jobs in our calibration are performance-pay, which
reduces the aggregate welfare losses from ignoring the endogeneity of wage risk to a
quarter of what they would be if all jobs were subject to agency frictions. We con-
clude that the common concern that standard models overstate optimal tax policy by
ignoring the endogeneity of private insurance is � in the context of performance-pay
jobs � overblown.
Literature Review. The two papers that are closest to ours are Golosov and
Tsyvinski (2007) and Chetty and Saez (2010). Golosov and Tsyvinski (2007) study
an economy in which �rms insure their workers subject to unobservable productivity
and hidden asset trades. They show that tax reforms generate a large crowding
out e�ect which reduces the gains from public insurance. The government optimally
refrains from providing insurance and instead uses tax policy to correct the externality
generated by hidden trades. In our environment, markets are constrained e�cient.
Instead, we study how the redistributive motive for government intervention interacts
4
with the endogenous private insurance on the labor market. Chetty and Saez (2010)
derive a su�cient statistics formula for the optimal linear tax in the presence of
linear private insurance contracts. We extend their analysis in two ways. First, and
most importantly, rather than following an approach based purely on endogenous
su�cient statistics � in particular, the elasticity of crowd-out with respect to tax
policy � we study a tractable structural microfoundation for the equilibrium labor
contracts. This allows us to characterize analytically the e�ects of government policy
on private insurance contracts via crowding-out and performance-pay responses, and
derive explicit theoretical formulas for tax incidence and optimal taxes. Second,
we allow for arbitrarily nonlinear taxes in the equilibrium with nonlinear incentive
contracts, and we show that several novel e�ects arise from theses nonlinearities.
Our paper is motivated by the large literature that studies performance-pay con-
tracts as an optimal way for �rms to incentivize workers' e�ort in the presence of
moral hazard frictions. On the theoretical side, our baseline framework is the model
of Edmans and Gabaix (2011) for our static setting, and that of Edmans et al. (2012)
for our dynamic setting. These models have been very successful at explaining the
structure of performance-pay contracts of CEOs (Frydman and Jenter (2010); Edmans
and Gabaix (2016); Edmans, Gabaix, and Jenter (2017)). On the empirical side, there
is growing reduced-form and structural evidence that moral hazard in labor markets is
pervasive (Foster and Rosenzweig (1994); Prendergast (1999); Shearer (2004); Lazear
and Oyer (2010); Bandiera, Barankay, and Rasul (2011); Ábrahám, Alvarez-Parra,
and Forstner (2016a)), that employers are important providers of insurance for their
employees (Guiso, Pistaferri, and Schivardi (2005); Lamadon (2016); Friedrich, Laun,
Meghir, and Pistaferri (2019); Lamadon, Mogstad, and Setzler (2019)), and that
the fraction of jobs with explicit pay-for-performance is high and rising (Lemieux,
MacLeod, and Parent (2009); Bloom and Van Reenen (2010); Bell and Van Reenen
(2014); Grigsby, Hurst, and Yildirmaz (2019)). Analogous to Kaplow (1991), our key
contribution to this large literature is to analyze the e�ects of policy in such envi-
ronments where the worker-�rm relationship is modeled as a moral hazard problem.
Crucially, our policy instruments are very general, yet our analysis remains tractable.
Our results can help guide future empirical analysis on the impact of taxes on the
level and structure of performance-pay packages in the spirit of Rose and Wolfram
(2002); Frydman and Molloy (2011); Bird (2018); Dale-Olsen (2012).
Several other papers study optimal taxation with endogenous earnings risk. These
5
papers focus on risk generated by human capital accumulation (Kapicka and Neira
(2013); Findeisen and Sachs (2016); Stantcheva (2017); Makris and Pavan (2017)),
job search (Sleet and Yazici (2017)), or wage randomization in response to exces-
sive tax regressivity (Doligalski (2019)). Blomqvist and Horn (1984); Rochet (1991);
Cremer and Pestieau (1996) studied the joint design of optimal insurance and re-
distribution but in these papers the government is the sole provider of insurance.
Another strand in the taxation literature studies government taxation in the pres-
ence of endogenous consumption insurance, understood either as informal exchanges
in family networks or asset trades. Attanasio and R�os-Rull (2000) and Krueger
and Perri (2011) demonstrate a potentially large crowding-out of private insurance
in response to increased public insurance. Park (2014); Ábrahám et al. (2016b);
Heathcote et al. (2017); Chang and Park (2017); Raj (2019), characterize the optimal
tax systems in such economies. In contrast to these papers, it is pre-tax earnings
risk � rather than consumption risk � that is endogenous to policy in our model.
Finally, several papers in the optimal taxation literature allow wages to be deter-
mined on private labor markets, for instance Hungerbühler, Lehmann, Parmentier,
and Van der Linden (2006); Rothschild and Scheuer (2013, 2016, 2014); Stantcheva
(2014); Piketty, Saez, and Stantcheva (2014); Scheuer and Werning (2017, 2016); Ales,
Kurnaz, and Sleet (2015); Ales and Sleet (2016); Ales, Bellofatto, and Wang (2017);
Sachs, Tsyvinski, and Werquin (2020). These papers do not account for wage-rate
risk and performance-based earnings caused by moral hazard frictions.
Outline of the Paper. Our paper is organized as follows. We set up our baseline
static environment in Section 1. In Section 2, we analyze the incidence of arbitrary
nonlinear tax reforms on the structure of performance-based compensation and on
the distribution of utilities. In Section 3, we derive the excess burden and the social
welfare gains of tax reforms. We then focus on a special case of our model to derive
sharper results, as well as the optimal rate of progressivity, in Section 4. We study
our results quantitatively in Section 5. The proofs, extensions of our baseline model,
and the dynamic analysis are gathered in Appendices A to H.
6
1 Environment
1.1 Labor Market
Individuals. There is a continuum of mass one of agents indexed by their exogenous
innate ability θ ∈ Θ ⊂ R+ distributed according to the c.d.f. F (θ). Their preferences
over consumption c and labor e�ort a ≥ 0 are represented by a separable utility
function u (c)−h (a), where u and h are twice continuously di�erentiable, u is concave,
and h is strictly convex. An agent with earnings2 w pays a tax liability T (w) and
consumes c = w − T (w). The tax schedule T : R+ → R is twice continuously
di�erentiable. We denote by R (w) ≡ w−T (w) the retention function and by r (w) ≡R′ (w) = 1− T ′ (w) the retention (or net-of-tax) rate. We assume that the utility of
earnings w 7→ v (w) ≡ u (R (w)) is concave.3
Labor Contract. A worker with ability θ who provides e�ort a produces output
y = θ (a+ η) , (1)
where the �performance shock� η ∈ R is a random variable with mean 0, distributed
on a (possibly unbounded) interval with interior (η, η). The �rm observes both the
agent's ability θ and her realized output y, but cannot disentangle her e�ort a from
her performance shock η. A performance-based contract speci�es an e�ort level and
an earnings schedule as a function of realized output. Following Edmans and Gabaix
(2011), we impose the following assumption in order to characterize the optimal
contract analytically.
Assumption 1. The agent chooses e�ort after observing the realization of her per-
formance shock η. The �rm recommends the same e�ort level a (θ) for all agents with
ability θ.
We discuss Assumption 1 in Section 1.3 below. We relax its second part and extend
our analysis to arbitrary e�ort schedules a (θ, η) in Appendix C. Note that the �rm
2Throughout the paper we denote a worker's earnings or income by w, while the term wage-ratestands for earnings per unit of e�ort w/a.
3This condition holds as long as the tax schedule T is not too regressive; see Appendix A fordetails. It is a natural restriction: Doligalski (2019) shows that when this condition is violated,�rms have incentives to o�er stochastic earnings even in the absence of moral hazard frictions.Furthermore, the tax schedule which encourages such earnings randomization is Pareto ine�cient.
7
can infer the worker's performance shock η = y/θ − a (θ) upon observing her output
y, assuming that she has exerted the recommended e�ort level a (θ). Therefore,
the earnings contract can be equivalently expressed as a function of the inferred
performance shock η rather than the realized output y. Since recommended e�ort is
incentive-compatible by construction, in equilibrium the �rm infers the worker's true
performance shock, that is, η = η. Thus, throughout the paper we simply denote the
earnings schedule by the map η 7→ w (θ, η).
The �rm chooses the contract {a (θ) , w (θ, ·)} that maximizes its expected pro�t
given the tax schedule T and the worker's reservation utility U (θ), that is,4
Π (θ) = maxa(θ),w(θ,·)
E [y − w (θ, η)] , (2)
subject to the incentive-compatibility constraints:
a (θ) = arg maxa≥0
u (R (w (θ, η)))− h (a) , ∀η, (3)
and the participation constraint:
E [u (R (w (θ, η)))]− h (a (θ)) ≥ U (θ) . (4)
Since the participation constraint (4) binds at the optimum, the expected utility of
workers with ability θ is equal to U (θ). The incentive-compatibility constraints (3)
deserve some explanation. Since e�ort is chosen after the worker observes her per-
formance shock η, it must maximize utility state-by-state rather than in expectation.
Thus, equation (3) must hold for every performance shock realization η.
Labor Market Equilibrium. To close the model, we assume that there is free
entry of �rms in each labor market θ.5 Thus, in equilibrium pro�ts are equal to zero,
Π (θ) = 0. (5)
4Throughout the paper, the operator E denotes the expectation over performance shocks η, orequivalently output y, conditional on ability θ.
5We can easily generalize our analysis to environments where �rms have market power andmake positive pro�ts. In particular, the optimal contract characterization (6, 7) holds for anyreservation value U (θ), not necessarily determined by free-entry. For instance we can assume thatthe worker's reservation value is a convex combination of the reservation value under free entry andsome exogenous outside option.
8
This condition pins down the workers' reservation value U (θ), which is just high
enough so that no additional �rm �nds it pro�table to enter the labor market.
1.2 Equilibrium Labor Contract
The following proposition characterizes the optimal contract between the �rm and a
worker with ability θ. It is an application of the results of Edmans and Gabaix (2011)
to our environment with a nonlinear tax schedule.
Proposition 1. The optimal contract {a (θ) , w (θ, ·)} and equilibrium expected utility
U (θ) of agents with ability θ when a (θ) > 0 are characterized by the following three
equations.6 The earnings schedule w (θ, ·) satis�es
u (R (w (θ, η)))− h (a (θ)) = U (θ) + h′ (a (θ)) η, ∀η. (6)
The optimal e�ort level a (θ) satis�es
E[h′ (a (θ))
v′ (w (θ, η))
]+ E
[h′′ (a (θ))
v′ (w (θ, η))η
]= θ. (7)
The equilibrium reservation utility U (θ) is determined by
E[v−1 (U (θ) + h (a (θ)) + h′ (a (θ)) η)
]= θa (θ) . (8)
Proof. See Appendix A.
Earnings Schedule. In order to motivate high-performing workers to provide as
much e�ort a (θ) as those with lower performance shocks, the �rm needs to reward
them with higher earnings. Equation (6) shows that the agent's ex-post utility
u (R (w (θ, η))) − h (a (θ)) is an a�ne function of the performance shock η that the
�rm infers. The linearity of the contract in the utility space is a consequence of our
assumption of a separable utility function.7 Since we assumed that the utility of earn-
6When the optimal e�ort level is zero, the worker optimally receives no compensation from the�rm. Our analysis goes through without assuming a (θ) > 0 if h′ (0) = 0.
7Note that the earnings schedule (6) is non-trivial even if the utility of consumption u (·) is linear:in this case, a progressive income tax schedule implies that the utility of earnings v (w) = R (w) isstrictly concave, so that the worker is e�ectively averse to pre-tax earnings risk and values insuranceagainst performance shocks.
9
ings w 7→ v (w) ≡ u (R (w)) is concave, this translates into a convex earnings schedule
w (θ, ·). Empirically, performance-pay contracts are indeed often convex, either due
to nonlinear commission rates as in the case of stock and travel brokers (Levitt and
Syverson (2008))8 or to stock options (Edmans and Gabaix (2011, 2016)).
The two key features of the utility schedule (6) are its demogrant and its slope.
Its demogrant in (6) is equal to U (θ): a higher reservation value leads the �rm to
raise the utility of workers uniformly regardless of their performance so as to preserve
incentive-compatibility. Its slope is equal to h′ (a (θ)): inducing an agent with large
unobservable performance shock to provide costly work e�ort requires a larger reward
if the marginal disutility of labor is higher. Crucially, since the marginal disutility
h′ (·) is increasing, the sensitivity of utility to performance shocks is strictly increasingin labor e�ort a (θ). This observation captures the fundamental insight that eliciting
higher e�ort from a worker in the presence of moral hazard requires a higher exposure
to output risk.
E�ort Level. Equation (7) pins down the value of e�ort that maximizes the �rm's
pro�t. The optimal level a (θ) is such that the expected gain in output θa due to
a marginal increase a > 0 in the workers' e�ort is exactly compensated by the pay
raise necessary to elicit this higher e�ort. This cost has two components. First,
to ensure that agents' participation constraint (4) remains satis�ed despite their
higher labor supply, their earnings must increase to compensate their utility loss
−∆h (a (θ)) = −h′ (a (θ)) a. In a frictionless economy, this would be the only e�ect
and (7) would reduce to the familiar optimality condition h′(a(θ))v′(w(θ,η))
= θ, according to
which the marginal rate of substitution (MRS) between e�ort and earnings is equal
to the marginal rate of transformation, or labor productivity θ.
In our setting with moral hazard, agency frictions create a wedge between labor
productivity and the (expected) marginal rate of substitution, even in the absence
of any distortionary taxes.9 Providing incentives to work harder requires increasing
8While it is common for real-estate brokers to be compensated with a �xed commision rate,thus leading to a linear earnings schedule, Levitt and Syverson (2008) show that such contracts aresuboptimal and could be improved by introducing convexity.
9The term h′′(a(θ))v′(w(θ,η)) can be rewritten as 1
ε(θ)a(θ)h′(a(θ))v′(w(θ,η)) where ε (θ) is the Frisch elasticity of
labor supply. Thus, the wedge τMCI between the marginal rate of substitution and the marginal rateof transformation at performance shock realization η, de�ned by (1 + τMCI)
h′(a(θ))v′(w(θ,η)) = θ, is equal
to ηε(θ)a(θ) .
10
the sensitivity h′ (a (θ)) of utility to performance shocks by ∆h′ (a (θ)) = h′′ (a (θ)) a,
and hence the slope of the earnings schedule by h′′(a(θ))av′(w(θ,η))
. This mechanically changes
the labor cost of an agent with performance shock η ∈ R by h′′(a(θ))av′(w(θ,η))
η. This leads
to the second expectation in the left-hand side of (7) which we call the marginal
cost of incentives (MCI). In particular, eliciting a higher e�ort level requires raising
(respectively, lowering) the earnings of high- (resp., low-) performers. Yet, since
the marginal utility v′ (w) = r (w)u′ (R (w)) is decreasing the pro�t generated by a
smaller wage bill for unlucky workers does not fully compensate the �rm for the cost
of raising the wages of lucky workers. As a result, the expected cost of providing
incentives is positive.
Expected Utility. Finally, equation (8) is simply a rewriting of the free-entry
condition (5). It implies that the average income E [w (θ, η)] of agents with ability θ
is equal to their expected output E [y] = θa (θ). Using formula (6), this equilibrium
condition pins down the workers' reservation value U (θ).
1.3 Discussion of Assumptions
To obtain the tractable characterization of the contract described in Proposition 1,
our analysis relied on several key assumptions.
Utility Function. The �rst restriction is the separability of the utility function
between consumption and e�ort. This assumption is not essential and is only made
for clarity of exposition. As in Edmans and Gabaix (2011), it is straightforward to
extend our analysis to a larger class of utility functions, namely, φ (u (c)− h (a)) where
φ exhibits non-increasing absolute risk aversion (NIARA).10 In particular, this would
allow us to nest the functional form assumed by Holmstrom and Milgrom (1987). The
fact that the slope of the contract is equal to h′ (a (θ)), which is crucial for our main
results, is robust to this more general speci�cation. The only di�erence that this more
general speci�cation would make is that the distribution of a rent by the �rm would
no longer lead to a uniform shift in ex-post utilities via the demogrant U (θ). Our
arguments can however be straightforwardly extended to alternative distributions of
rents.
10Most common utility functions, in particular those with constant absolute risk aversion (CARA)or constant relative risk aversion (CRRA), belong to the NIARA class.
11
Timing. The �rst part of Assumption 1 imposes that the worker chooses e�ort
a after observing the performance shock η. This timing assumption was originally
introduced by La�ont and Tirole (1986) and was subsequently used by, for instance,
Edmans and Gabaix (2011); Garrett and Pavan (2015). It allows us to solve the
�rm's problem for a very general class of utility functions. Allowing for arbitrary
utility functions is crucial for our analysis. Indeed, nonlinear taxes e�ectively modify
the concavity of the utility that workers derive from their gross earnings. If we had
to restrict the utility function to a speci�c functional form (for instance, CARA as in
Holmstrom and Milgrom (1987)) we would only be able to study tax schedules that
preserve this functional form (for instance, linear or a�ne). Instead, the tractability
allowed by our timing assumption allows us to characterize the incidence of arbitrarily
nonlinear taxes.
E�ort. In the main body of the paper, we impose that the �rm chooses to elicit the
same level of e�ort regardless of the worker's performance shock � this is the second
part of Assumption 1. This restriction is also imposed by Edmans and Gabaix (2011)
in their main model, and by Edmans, Gabaix, Sadzik, and Sannikov (2012). It is an
exogenous restriction on the set of contracts that is not without loss of generality.
It substantially simpli�es our analysis without restricting the shape of the earnings
schedule, which is crucial for our investigation of private insurance and nonlinear
taxation. Carroll and Meng (2016) provide a microfoundation of this restriction; they
call this property reliability and show that it may be optimal when �rms aim to design
a contract that is robust to uncertainty about the distribution of the performance
shock.11 For completeness, we relax this �constant-e�ort� assumption and generalize
our main result (Theorem 1) to fully optimal contracts in Appendix C � our theoretical
analysis remains technically straightforward and carries qualitatively over to this case.
Performance Shocks. Finally, and importantly, note that we do not impose any
restriction on the distribution of performance shocks η, other than it must take values
in an interval (bounded or unbounded). We view this generality as an important
feature of our analysis. As an example, this allows us to capture the structure of
contracts that specify of a �xed baseline income and an additional bonus paid with
11In particular, in our environment such a contract leads to the same level of expected outputEy = θa (θ) regardless of the distribution of η. However, the �rm's expected pro�t depends on thedistribution of η as earnings w (θ, η) are not linear in η.
12
positive probability by letting the distribution of η have a mass point at the lower
bound η, and a smooth density on (η, η). Importantly, it also allows us to let the
distribution of η, and hence the degree of performance-pay, depend explicitly on the
ability level θ.
2 General Tax Incidence Analysis
This section is devoted to the positive analysis of nonlinear tax incidence. We derive
the impact of tax reforms on earnings, �rst in Section 2.1 in a benchmark setting
with exogenous risk, then in Section 2.2 in our general environment that takes into
account the endogeneity of private insurance. In Section 2.3 we derive the impact
of tax reforms on individual utilities. We �nally introduce the relevant notions of
earnings elasticities in Section 2.4 in order to express our tax incidence formulas in
terms of empirically estimable variables.
Nonlinear Tax Reforms. We start by formally de�ning the concept of nonlinear
reforms of an arbitrary initial tax system. Consider a given (potentially suboptimal)
tax schedule T , say the U.S. tax code, and another function T : R+ → R. Our goal isto evaluate the e�ects of perturbing the initial tax schedule T by δT , where δ > 0 is a
scalar that parametrizes the size of the reform in the direction T . Formally, consider
an outcome variable Ψ, for instance individual earnings, utility, government revenue,
or social welfare, that depends on the tax schedule T . The �rst-order change in the
value of this functional T 7→ Ψ (T ) following a marginal tax reform in the direction
T is given by the Gateaux derivative
Ψ(T, T ) ≡ limδ→0
Ψ(T + δT )−Ψ (T )
δ. (9)
We analyze several concrete examples of tax reforms in Section 4 and Appendix B.
2.1 Exogenous Risk Benchmark
As a preliminary step towards our general analysis, we derive in this section the
incidence of tax reforms T on earnings w (θ, η) and utilities U (θ) that would arise
in an environment with fully exogenous risk. In this benchmark model, as in our
13
framework, there are two sources of heterogeneity: innate ability θ and job-speci�c
productivity shocks η. A worker with characteristics (θ, η) is o�ered a �xed wage rate
x (θ, η) that re�ects her exogenous labor productivity. To make this model comparable
to ours we assume moreover that the worker's labor e�ort a (θ) is independent of η.12
A worker's income w (θ, η) is then the product of her exogenous labor productivity
x (θ, η) and her labor supply a (θ). The key di�erence with our general environment
is that in this benchmark setting, wage rates w(θ,η)a(θ)
= x (θ, η) are policy-invariant.
Incidence of Tax Reforms on Earnings. In such an environment, tax reforms
only a�ect earnings w (θ, η) by the endogenous change in e�ort a (θ) caused by the
reform, multiplied by the constant wage rate x (θ, η). Thus, the incidence of tax
reforms is given by13
wex (θ, η) = x (θ, η) a (θ) = w (θ, η)a (θ)
a (θ). (10)
This is the standard behavioral response to taxes through labor supply choices ana-
lyzed in most of the optimal taxation literature following Mirrlees (1971).14 Impor-
tantly, note that the e�ort change a (θ) in formula (10) depends on the particular
reform that is implemented � formally, it is the Gateaux derivative of the e�ort func-
tional a (θ) in the direction T . Thus, at this stage a(θ)a(θ)
is a policy elasticity in the
sense of Hendren (2015).15 Section 2.4 below is devoted to expressing this labor sup-
ply response in terms of standard elasticities and income e�ect parameters that can
be estimated empirically independently of a particular choice of tax reform.
Formula (10) implies that wex (θ, η) > 0 i� a (θ) > 0. That is, the earnings
schedule is shifted up (resp., down) if e�ort increases (resp., decreases) following the
12This can be justi�ed by assuming that in this model e�ort is chosen before observing therealization of η.
13For notational simplicity, whenever there is no ambiguity we ignore the dependence of theGateaux derivative a (θ) on the initial tax schedule T and the tax reform T .
14Note that we would obtain exactly the same expression in the Mirrlees model without within-group inequality, that is, σ2
η = 0. In this case, all earnings di�erences are due to innate ability(or labor productivity) θ and e�ort a (θ), so that the compensation schedule w (θ, ·) conditional onability is degenerate. Equation (10) then reduces to wex (θ, η) = θa (θ).
15This is the concept of elasticity used in several papers in the taxation literature, for instance,Chetty and Saez (2010).
14
reform. Earnings adjust on average by
E[w (θ, η)
a (θ)
a (θ)
]= θa (θ) . (11)
Now, consider the impact of the reform on earnings risk around this mean adjust-
ment. We measure earnings risk by the variance of log-earnings conditional on ability
θ. Since e�ort a (θ) does not depend on η, equation (10) immediately implies that
earnings risk after the reform is the same as before the reform, that is,
Var [log (w (θ, η) + δwex (θ, η)) | θ] = Var [log (w (θ, η)) | θ] (12)
for δ > 0 small enough. Therefore, in the benchmark model that ignores the endo-
geneity of private insurance, tax reforms a�ect the average level of earnings but do
not modify the amount of risk to which workers are exposed.16
Incidence of Tax Reforms on Welfare. Finally, in the benchmark model with
exogenous risk, the incidence of the tax reform on the average utility of agents with
ability θ is given by
U (θ) = −E[u′ (R (w (θ, η))) T (w (θ, η))
]. (13)
Intuitively, an increase in the tax payment of an agent by T (w (θ, η)) lowers her
ex-post utility by the marginal utility of consumption u′ (R (w (θ, η))). This is a
simple consequence of the envelope theorem: since labor e�ort is chosen optimally
by equation (3), the endogenous change in e�ort a (θ) triggered by the reform has no
�rst-order impact on welfare.17 Taking expectations leads to the change in expected
utility (13).
16We can also de�ne earnings risk at a disaggregated level by the pass-through function ∂ logw(θ,η)∂η ,
that is, the sensitivity of log-earnings to performance shocks. Equation (10) implies that the pass-through is una�ected by the reform for every value of η. If we de�ne instead earnings risk as thesensitivity of earnings (rather than log-earnings) with respect to performance shocks, that is, ∂w(θ,η)
∂η ,
tax reforms would raise earnings risk if and only if ∂w(θ,η)∂η > 0. In the setting analyzed in this section,
formula (10) implies that this is the case whenever the reform raises e�ort, that is, a (θ) > 0.17In particular, in this environment a change in marginal tax rates that keeps the total tax
payment unchanged has no �rst-order impact on individual welfare.
15
2.2 Incidence of Tax Reforms on Earnings
We now proceed to characterizing the incidence of an arbitrary tax reform T on the
compensation schedule w (θ, ·) of workers with ability θ in our general environment.
This is the �rst main result of our paper.
Theorem 1. Suppose that a (θ) > 0. Denote by a (θ) the change in e�ort induced
by the reform, which we study in Section 2.4 below. The �rst-order e�ect of the tax
reform T on earnings w (θ, ·) is given by
w (θ, η) = wex (θ, η) + wco (θ, η) + wpp (θ, η) , (14)
where the crowding-out e�ect wco has mean zero and is given by
wco (θ, η) =T (w (θ, η))
r (w (θ, η))− (v′ (w (θ, η)))−1
E[(v′ (w (θ, ·)))−1] E
[T (w (θ, ·))r (w (θ, ·))
], (15)
and the performance-pay e�ect wpp has mean zero and is given by
wpp (θ, η) =
[h′ (a (θ)) + h′′ (a (θ)) η
v′ (w (θ, η))− w (θ, η)
a (θ)
]a (θ) . (16)
Proof. See Appendix B.
Equation (14) gives the adjustment of the earnings schedule following an arbitrary
tax reform T as a function of the tax rates, earnings distribution, and labor supply
responses in the initial (pre-reform) economy. In practice, this formula only requires
choosing a functional form for the utility function u and the disutility of e�ort h in
order to evaluate the incidence of any potential reform of the current tax code.
Theorem 1 shows that the earnings adjustment in response to the tax reform,
w (θ, η), is in general di�erent than in the standard model with exogenous risk,
wex (θ, η), analyzed in Section 2.1. Speci�cally, the tax reform modi�es the earn-
ings schedule by the same average amount as in the benchmark model (see equation
(11)), since E[wco (θ, η)] = E[wpp (θ, η)] = 0. However, it also introduces two ad-
justments to earnings risk around this mean shift. The �rst, wco (θ, η), captures the
crowding-out of the private insurance contract by the tax change, keeping e�ort con-
stant. The second, wpp (θ, η), is the performance-pay e�ect due to the endogenous
change in labor e�ort. We analyze them in turn.
16
Crowding-Out of Private Insurance. Equation (15) gives the adjustment to the
compensation schedule that the �rm must implement in order to keep the worker's
incentive and participation constraints both satis�ed following the reform. First,
consider the adjustment T (w(θ,η))r(w(θ,η))
of the earnings schedule (�rst term in (15)). This
term implies that the agent's consumption c (θ, η) = w (θ, η)−T (w (θ, η)) changes by
c (θ, η) = −T (w (θ, η)) + (1− T ′ (w (θ, η))) w (θ, η)
= −T (w (θ, η)) + (1− T ′ (w (θ, η)))T (w (θ, η))
1− T ′ (w (θ, η))= 0.
Thus, absent any other forces � in particular, if e�ort were kept constant � the �rm
would adjust the contract such that, for every performance shock realization η, the
agent's disposable income c (θ, η), and hence her realized utility, remain �xed. In
other words, any attempt by the government to a�ect consumption insurance would
be fully absorbed by the �rm so as to keep the worker's payo�s unchanged.
Second, suppose that the tax reform is such that the tax liabilities of workers
with ability θ are reduced, that is, T (w (θ, η)) < 0 for all η. Per our discussion in the
previous paragraph, this reform generates a rent for the �rm equal to −E[ T (w(θ,·))r(w(θ,·)) ] > 0:
intuitively, the �rm compensates the reduction in tax payments by an equivalent
reduction in wages. Now, by the free-entry condition, this rent must be shared with
workers, whose expected utility rises as a result.18 Recall that, by equation (6), this
increase in utility must be distributed uniformly among all agents � regardless of their
performance η � in order to preserve their incentive compatibility condition for e�ort.
But this implies that the salary of high-performers must increase by a larger amount,
since their marginal utility of earnings v′ (w (θ, η)) is lower. As a result, the share of
the rent assigned to workers with performance shock η is inversely proportional to
their marginal utility, that is, equal to (v′(w(θ,η)))−1
E[(v′(w(θ,·)))−1]. This leads to the second term
in (15).
Performance-Pay E�ect. Now, suppose that in response to the tax reform T ,
the �rm �nds it optimal to elicit a higher e�ort level, so that a (θ) > 0. In order
to do so, we showed in Section 1.2 that it must both compensate workers for their
utility loss, and increase the pass-through of performance shocks to earnings. These
two adjustments are captured by the term h′(a(θ))+h′′(a(θ))ηv′(w(θ,η))
in equation (16). Since
18We analyze this change in reservation value U (θ) in Section 2.3 below.
17
h′(a(θ))+h′′(a(θ))ηv′(w(θ,η))
a (θ) is an increasing function of η, eliciting a higher e�ort level re-
quires increasing the sensitivity of earnings to performance. Now, wpp is de�ned by
subtracting the income change wex (θ, η) that would arise in the benchmark model
with exogenous risk in response to the same change in e�ort (equation (10)). As a
result, the performance-pay e�ect has mean zero and is the pure contribution of moral
hazard to the change in earnings risk via labor supply decisions.
Generalization to an E�ort Schedule. In Appendix C we extend this result to
the environment where the �rm can elicit an arbitrary (non-constant) e�ort schedule
a (θ, η). We show that the crowding-out e�ect is identical to (15). The performance-
pay e�ect is analogous to (16) except that it depends on the change in the entire
e�ort schedule rather than in the single e�ort level.
2.3 Incidence of Tax Reforms on Utilities
We now proceed to analyzing the incidence of tax reforms on the expected utility
U (θ) of workers with ability θ in our general environment.
Proposition 2. The �rst-order e�ect of the tax reform T on expected utility U (θ) is
given by
U (θ) = − 1
E[
1v′(w(θ,η))
]E[ T (w (θ, η))
r (w (θ, η))
]. (17)
Proof. See Appendix B.
To understand Proposition 2, recall that a decrease in the tax payment of an
agent by T (w (θ, η)) < 0 allows the �rm to decrease her earnings by T (w(θ,η))r(w(θ,η))
in
order to keep her consumption (and, hence, incentives) unchanged. Moreover, by
the envelope theorem, any change in pay that operates via labor supply (wex, wpp)
generates only second-order changes in total labor costs. Therefore, the tax reform
creates an expected rent for the �rm equal to −E[ T (w(θ,η))r(w(θ,η))
] > 0, which is then shared
with the workers and leads to an increase in their reservation value by U (θ) > 0.
Now, this increase in expected utility U (θ) > 0 must be distributed across work-
ers with di�erent performance shocks η. As explained in the previous sections, every
worker's utility must increase uniformly. Therefore, realized earnings w (θ, η) must
18
increase in proportion to the inverse marginal utility 1/v′ (w (θ, η)). Hence, this shar-
ing rule costs the �rm E[ U(θ)v′(w(θ,η))
]. As a result, the value of U (θ) which ensures that
pro�ts remain equal to zero (free-entry condition) satis�es:
E
[T (w (θ, η))
r (w (θ, η))
]+ E
[U (θ)
v′ (w (θ, η))
]= 0.
Solving for U (θ) easily leads to equation (14).
Analogous to the standard model where tax changes a�ect individual consump-
tion directly rather than being intermediated by �rms, equation (17) implies that
workers' expected utility increases when their expected tax payments (weighted by
retention rates) are reduced. Conversely, an increase in their expected tax bill lowers
their utility. However, the level of change in utility U (θ) di�ers from that obtained in
the benchmark model with exogenous risk (equation (13)) unless σ2η = 0. As a simple
example, suppose that the initial tax schedule is a�ne, so that the retention rate
r (w (θ, η)) is constant. Consider a tax reform that consists of a uniform lump-sum
transfer for all agents. We show in the Appendix that this reform is represented by
T (w) = −1 for all w. In the benchmark model with exogenous risk, equation (13)
shows that individual welfare would increase on average by the expected marginal
utility, E [u′ (R (w (θ, η)))]. Now, in the general model with agency frictions, applying
Jensen's inequality to equation (17) yields 0 < U (θ) < E [u′ (R (w (θ, η)))]. There-
fore, a lump-sum transfer leads to a strictly smaller rise in utility when tax cuts are
distributed by �rms than when they are directly targeted to workers.
2.4 Elasticities of Average Earnings
The last step of our tax incidence analysis is to characterize the impact of a tax
reform T on the optimal e�ort level a (θ). We tackle this in two (complementary)
ways. First, we use a structural approach and derive analytically the impact of tax
reforms on labor e�ort in terms of primitives. Second, we express these labor supply
responses in terms of su�cient statistics that can be estimated empirically regardless
of the values of the underlying primitives.
Structural Approach. When the structure of the model is simple enough, it is
worthwhile to derive explicitly the elasticity of e�ort with respect to the particular
19
tax reform under consideration, that is, a(θ)a(θ)
. This allows us to compare the incidence
of taxes across di�erent contractual environments. The next result and Lemma 2
below illustrate this approach through the lens of simple examples.
Lemma 1. Suppose that the utility function is linear in consumption, that is u (c) = c,
and that earnings w (θ, η) are located in a bracket with constant marginal tax rate τ
for all performance shocks η. The response of labor e�ort to an arbitrary tax reform
T is then given by
a (θ)
a (θ)= − 1
1− τε (θ)E
[T ′ (w (θ, η))
]− 1
1− τCov
(T ′ (w (θ, η)) ;
η
a (θ)
), (18)
where ε (θ) ≡ h′(a(θ))a(θ)h′′(a(θ))
is the Frisch elasticity of labor supply.
Proof. See Appendix B.
Equation (18) shows that the response of labor e�ort to the tax reform T is
the sum of two terms, which re�ect both elements (MRS and MCI) of the �rst-order
condition (7). First, the marginal rate of substitution (�rst expectation in (7)) implies
that the change in expected marginal tax rates, E[T ′ (w (θ, η))], reduces labor supply
by the Frisch elasticity ε (θ). This is the standard response one would obtain in
models with exogenous risk. Second, recall that in our moral hazard environment
the optimal e�ort level a (θ) is also determined by the marginal cost of incentives
(second expectation in (7)). But the MCI is positively related to the progressivity of
the tax schedule: with quasilinear utility we have MCI ∝ Cov( 1r(w(θ,η))
; η), which is
equal to zero (respectively, positive) when the marginal tax rates are constant (resp.,
increasing with income). Consequently, starting from an a�ne tax code, we expect a
progressive tax reform � for which the marginal tax rate adjustments T ′ (·) increasewith income � to raise the cost of incentive provision, and hence trigger an additional
downward adjustment in e�ort. Formally, this is indeed implied by the negative
covariance term in equation (18). Therefore, taking into account the endogeneity of
private insurance against output risk magni�es the negative impact of raising tax
progressivity on labor e�ort. In Section 4, we generalize this result to the case of a
utility function with income e�ects and a nonlinear baseline tax schedule and show
that the same insight carries over.
20
Su�cient-Statistic Approach. In our most general environment, the analytical
expressions for the policy elasticities a(θ)a(θ)
are technically straightforward to derive, but
they may fail to deliver sharp comparative statics with respect to the variance of per-
formance shocks σ2η or the strength of moral hazard frictions. Instead, it is standard
since Saez (2001) to express these labor supply responses in terms of substitution
and income e�ects that can be estimated in the data, and treat the resulting elas-
ticities as su�cient statistics in our tax incidence analysis (Chetty (2009)). Namely,
our tax formulas depend on the empirical values of these parameters, regardless of
the underlying structure of the model that generates them � that is, in our case,
regardless of whether private insurance against performance shocks is exogenous or
endogenous. Our goal is therefore to express the labor supply response a(θ)a(θ)
to any po-
tential tax reform T in terms of standard elasticity parameters that can be estimated
independently of the particular reform.
To do so, recall that, by the free-entry condition (5), average earnings conditional
on ability θ, E [w (θ, ·)], are equal to θa (θ). As a consequence, the elasticities of
e�ort a (θ) with respect to tax changes are equal to the corresponding elasticities of
average earnings E [w (θ, ·)]. Note that to evaluate average earnings E [w (θ, ·)] theeconometrician does not need to observe the actual value of ability θ � it is enough
to group workers into ordinal ability groups proxied by education, experience, etc.
We can thus de�ne the (compensated) elasticity of average earnings of agents with
ability θ with respect to the retention rate at income level w (θ, η) by
εEw,r (θ, η) ≡ r (w (θ, η))
E [w (θ, ·)]∂E [w (θ, ·)]∂r (w (θ, η))
. (19)
We also de�ne the income e�ect parameter as the semi-elasticity of average earnings
of agents with ability θ with respect to a lump-sum transfer at income level w (θ, η),
that is,
εEw,R (θ, η) ≡ 1
E [w (θ, ·)]∂E [w (θ, ·)]∂R (w (θ, η))
∣∣∣∣U
. (20)
These elasticities can be estimated empirically, and their explicit analytical expres-
sions in terms of primitives are given in Appendix B.
Proposition 3. The �rst-order e�ect of the tax reform T on labor e�ort a (θ) can be
21
expressed as
a (θ)
a (θ)= −E
[εEw,r (θ, η)
T ′ (w (θ, η))
r (w (θ, η))
]+ E
[εEw,R (θ, η)
T (w (θ, η))
w (θ, η)
](21)
where εEw,r (θ, η) and εEw,R (θ, η) are de�ned in (19) and (20), respectively.
Proof. See Appendix B.
The interpretation of Lemma 3 is standard. An increase in the marginal tax rate by
T ′ (w (θ, η)) (resp., an increase in the average tax rate by T (w(θ,η))w(θ,η)
) at the income level
w (θ, η) a�ects the optimal e�ort level a (θ) in proportion to the compensated elasticity
εEw,r (θ, η) (resp., the income e�ect parameter εEw,R (θ, η)). The intuition underlying
these substitution and income e�ects is the same as in the standard model of nonlinear
income taxation. Namely, an increase in marginal tax rates (respectively, in lump-sum
liabilities) lowers (resp., raises) the worker's optimal e�ort level by creating a wedge
between the marginal rate of substitution and the marginal rate of transformation
in the optimality condition (7). The only di�erence is that in our framework, it is
the �rm rather than the worker that chooses how much e�ort should optimally be
provided, and it achieves this by spreading or compressing the earnings schedule.
Nevertheless, standard methods of estimating taxable income elasticities would give
the correct values for the parameters (19) and (20).
3 Aggregate E�ects of Tax Reforms
In this section, we use our tax incidence results of Section 2 to characterize the
aggregate costs and bene�ts of tax reforms. We introduce the government and de�ne
formally the concepts of excess burden and welfare gains of policies in Section 3.1.
We derive the theoretical results in Section 3.2. Readers primarily interested in the
incidence of tax reforms on individual performance-pay contracts can skip to Section
4.
3.1 Government
In our model, the government observes both between- and within-group inequality,
that is, earnings di�erences due to ex-ante ability θ (proxied by education, experience,
22
etc.) and ex-post job-speci�c shocks η. However, taxes and transfers can only be
conditioned on realized earnings w and not on ability θ. The labor income tax schedule
is function T ∈ C2(R+,R).
Government Revenue and Social Welfare. Given the tax schedule T , govern-
ment revenue is given by
R (T ) =
ˆΘ
E [T (w (θ, η))] dF (θ) . (22)
Throughout the paper, we assume that the government faces an exogenous expen-
diture requirement G ≥ 0. Any extra revenue is used for redistribution between
workers with di�erent (uninsurable) levels of ability θ. Social welfare is evaluated by
a weighted-utilitarian functional
W (T ) =
ˆΘ
α (θ)U (θ) dF (θ) , (23)
where the map of Pareto weights θ 7→ α (θ) is positive, decreasing, and satis�es´Θα (θ) dF (θ) = 1.
Mechanical E�ect of Tax Reforms. Consider a tax reform T of the initial tax
schedule T . The mechanical, or statutory, e�ect of this reform is equal to its impact
on government revenue assuming that everyone's earnings remain �xed. It is given
by
ME(T, T ) =
ˆΘ
E[T (w (θ, η))]dF (θ) . (24)
That is, in the absence of endogenous earnings responses, government revenue would
simply change by the sum of (positive or negative) additional tax payments T (w (θ, η))
of all agents.
Excess Burden of Tax Reforms. The excess burden, or deadweight loss, of a
tax reform T is (minus) the change in government revenue caused by the endoge-
nous earnings adjustments. Since the government retains a share T ′ (w (θ, η)) of the
23
workers' earnings gains or losses w (θ, η), the excess burden is given by
EB(T, T ) = −ˆ
Θ
E[T ′ (w (θ, η)) w (θ, η)]dF (θ) , (25)
where w (θ, η) is given by (14). For instance, if a tax reform mechanically raises $1 of
revenue absent earnings adjustments, but causes distortions � say, reductions in labor
supply � which lower government revenue by ¢20, then the marginal excess burden is
equal to a fraction 20% of the mechanical e�ect. The total impact of the tax reform
on government budget (that is, the Gateaux derivative of the tax revenue functional
R (T )) is therefore equal to R(T, T ) = ME(T, T )− EB(T, T ).
Welfare Gains of Tax Reforms. The welfare gains of a tax reform T is the change
in social welfare W (T ) that it causes, expressed in monetary units. The change in
social welfare is equal to W(T, T ) =´α (θ) U (θ) dF (θ), where U (θ) is the change in
expected utility incurred by agents with ability θ and α (θ) measures their weight in
the social objective. To convert this welfare measure into units of revenue, consider
another, benchmark reform T ∗ within the available set of tax instruments, that costs
one dollar of revenue.19 Let the marginal value of public funds λ be the increase in
social welfare brought about by this reform T ∗. We then de�ne the welfare gain of
the tax reform T by
WG(T, T ) =1
λ
ˆΘ
α (θ) U (θ) dF (θ) . (26)
Optimum Tax Schedule. The optimal tax schedule is such that no tax reform of
the initial tax schedule that keeps the government budget constraint satis�ed has a
positive �rst-order impact on social welfare. We show in Appendix D that the optimal
tax schedule (respectively, the optimum within a restricted class of tax instruments)
19If universal lump-sum taxes and transfers are available, as in Mirrlees (1971), we naturallychoose T ∗ to be a uniform lump-sum transfer. We show in Appendix B that a lump-sum transferof $1 per worker is represented by the constant function −1 for all w. Denote by R(T,−1) < 0 theloss in government revenue from this transfer, once all behavioral responses have been taken into
account. Then the reform T ∗ (w) = −1/∣∣∣R(T,−1)∣∣∣ for all w is a uniform lump-sum transfer that
reduces government budget by $1 by construction. If instead the policy is restricted to the CRPtax schedules as in Section 4, the benchmark reform T ∗ consists of a decrease in the parameter τ ,normalized analogously to yield $1 of revenue. See Appendix D for details.
24
is characterized by
EB(T, T ) = ME(T, T ) + WG(T, T ), (27)
for all tax reforms T (resp., all tax reforms in the restricted class). In other words,
the marginal cost and marginal bene�t of any reform must be equal at the optimum.
3.2 Excess Burden and Welfare Gains
Substituting expression (14) for w (θ, η) into equation (25) yields the following char-
acterization. This is the second main result of our paper.
Theorem 2. The excess burden of the tax reform T is given by
EB(T, T ) = −ˆ
Θ
E [T ′ (w (θ, η)) wex (θ, η)] dF (θ) (28)
−∑
i∈{co,pp}
ˆΘ
Cov (T ′ (w (θ, η)) , wi (θ, η)) dF (θ)
The welfare gains of the tax reform T are given by
WG(T, T ) = −1
λ
ˆΘ
E[α (θ; η)u′ (R (w (θ, η))) T (w (θ, η))
]dF (θ) , (29)
where the modi�ed social welfare weights are given by α (θ; η) ≡ (v′(w(θ,η)))−1
E[(v′(w(θ,·)))−1]α (θ).
Proof. See Appendix D.
Formulas (28) and (29) can be used in practice to evaluate whether a concrete tax
reform proposal has a positive or negative e�ect on government revenue and social
welfare, starting from any (not necessarily optimal) tax code.
Excess Burden of Tax Reforms. The �rst integral in (28) is equal to the dead-
weight loss one would obtain in standard models with exogenous risk � recall that
in this case, the tax reform a�ects earnings via the standard labor supply channel
by wex (θ, η) = w (θ, η) a(θ)a(θ)
. This deadweight loss depends on the average earnings
elasticities summarized in a(θ)a(θ)
, as described in Lemma 3. This integral is analogous
to those typically derived in the optimal taxation literature (see for instance Saez
(2001)).
25
Now consider the case where the endogeneity of private insurance is taken into
account. Recall that the full adjustment to earnings w (θ, η) has the same mean as in
the frictionless benchmark. Thus, any �scal externalities due to moral hazard must
come from the change in earnings risk due to the crowding-out and the performance-
pay e�ects wco (θ, η), wpp (θ, η). The �rst implication of Corollary 2 is that if the
marginal tax rates T ′ (w (θ, η)) are initially constant � as in Chetty and Saez (2010)
� both covariances in the second line of equation (28) are equal to zero. Therefore,
the excess burden of the tax reform is the same as in the standard model, despite
the presence of moral hazard frictions and endogenous risk. In other words, the
performance-based nature of contracts and the endogenous crowding-out of private
insurance do not give rise to additional �scal externalities when the tax code T is
initially a�ne, even if the tax reform T itself is highly nonlinear.20
Consider �nally the case where the tax schedule T is initially nonlinear. In this
case, the covariances in (28) are no longer equal to zero and capture the impact on
government budget of the novel sources of earnings risk highlighted in formula (14).
Suppose for concreteness that the tax schedule is initially progressive, that is, the
marginal tax rates T ′ (·) are increasing. In this case, Cov (T ′ (w (θ, η)) , wi (θ, η)) > 0
whenever ∂wi(θ,η)∂η
> 0, that is, whenever the sensitivity of pre-tax earnings to perfor-
mance shocks increases following the reform. Therefore, a spread (resp., contraction)
of the earnings distribution causes a positive (resp., negative) �scal externality, that
is, a �rst-order gain (resp., loss) in government revenue. This is a consequence of
Jensen's inequality: a progressive (concave) tax code generates more tax revenue for
the government if earnings are more volatile, keeping their mean constant. For budget
purposes, the government is therefore tempted to induce an increase in the dispersion
of pre-tax earnings.
Welfare Gains of Tax Reforms. In the benchmark model with exogenous risk,
the increase in social welfare achieved by giving one additional unit of consumption
(say, via a tax break) to agents with ability θ and performance shock η is given
by the marginal social welfare weight α (θ)u′ (R (w (θ, η))), equal to their marginal
20In particular, consider the highest-income earners for whom performance-pay contracts areparticularly prevalent, and suppose that their baseline income absent any bonus � that is, theirearnings given the lowest realization η of the performance shock � is located in the highest taxbracket. Then the performance-sensitivity of their salary is irrelevant for their contribution togovernment revenue.
26
utility of consumption times their weight α (θ) in the social objective. Now, in the
environment with moral hazard frictions, the crowding-out of private insurance by
tax policy highlighted in Theorem 1 has welfare consequences that must be taken
into account.21
Indeed, we saw in Section 2.2 that the tax break raises the expected utility of
the workers (Proposition 2), as in a standard model with exogenous risk. Crucially,
however, recall that this utility gain must be shared uniformly across agents in order
to preserve their e�ort incentives. But since the marginal utility is decreasing, this
implies that workers with a higher output realization y end up getting a higher increase
in consumption. As a result, expression (29) implies that the marginal social welfare
weights that would arise in the benchmark model are now weighted by the share(v′(w(θ,η)))−1
E[(v′(w(θ,·)))−1]of the tax cut that workers actually receive. These weights are regressive
� richer agents end up with higher e�ective welfare weights in the social objective.
Intuitively, tax cuts accrue mostly to the highest-performing agents of a given ability
group. They are thus less e�ciently targeted than in the standard model without �rm
intermediation, in which the government could directly alter workers' consumption.
This regressive distribution of rents in turn reduces the welfare bene�ts of providing
social insurance compared to the exogenous-risk environment.
4 The Loglinear Framework
In this section we introduce a special case of our general model that allows us to
derive sharp consequences of our results of Sections 2 and 3. In particular, under the
following functional form restrictions the equilibrium labor contract is loglinear and
our tax incidence formulas become particularly transparent.
Assumption 2. The utility of consumption is logarithmic, u (c) = log c. The Frisch
elasticity of labor supply ε > 0 is constant, that is h (a) = (1 + 1ε)−1a1+ 1
ε . The
performance shocks are normally distributed, η ∼ N(0, σ2
η
). The tax schedule has a
constant rate of progressivity (CRP),22 that is, there exist τ ∈ R and p < 1 such that
21Because of the envelope theorem, the performance-pay e�ect (equation (16)) induces onlysecond-order welfare gains or losses. The �rst part of crowding-out (�rst term in equation (15))also keeps welfare constant since it ensures that workers' consumption remains �xed.
22The CRP tax code is a good approximation of the U.S. tax system, see for instance Heathcote,Storesletten, and Violante (2017). The rate of progressivity p is equal to (minus) the elasticity ofthe retention rate 1− T ′ (w) with respect to income w. Alternatively, 1− p is equal to the ratio of
27
T (w) = w − 1−τ1−pw
1−p.
We characterize the equilibrium labor contract in Section 4.1. We then focus on a
particular tax reform, namely, an increase in the (constant) rate of progressivity of the
initial tax schedule. We derive the incidence of this reform on earnings and utilities
in Section 4.2, and its excess burden and welfare gains in Section 4.3. We conclude
in Section 4.4 by characterizing the optimal rate of progressivity in this economy. In
Appendix G and H, we generalize our analysis of the optimal rate of progressivity to
the dynamic environment of Edmans, Gabaix, Sadzik, and Sannikov (2012).
4.1 Equilibrium Labor Contract
Under these assumptions, the labor contract characterized in Proposition 1 can be
simpli�ed as follows.
Corollary 1. Suppose that Assumption 2 holds. Denote by ψ ≡ ∂ logw(θ,η)∂η
the pass-
through of performance shocks to log-earnings. The earnings schedule is log-linear and
given by
logw (θ, η) = log (θa) + ψ η − 1
2ψ2σ2
η with ψ =a1/ε
1− p. (30)
E�ort a is independent of θ and satis�es
a =[(1− p)
(1− εψ,aψ2σ2
η
)] ε1+ε , (31)
where εψ,a ≡ ∂ logψ∂ log a
= 1ε. Expected utility is given by
U (θ) = log (R (θa))− h (a)− 1
2(1− p)ψ2σ2
η. (32)
Proof. See Appendix E.
In the standard Mirrlees (1971) model, the worker's wage rate is equal to her
marginal productivity θ, and her earnings are θa. In our more general model, equation
(30) implies that the �rm designs an incentive-based compensation contract that has
mean θa, but is also dispersed around this mean. The amount of risk to which the
marginal retained income 1− T ′ (w) to average retained income 1− T (w) /w.
28
�rm exposes the worker is summarized by the constant (that is, independent of η)
pass-through ψ of performance shocks to log-earnings.
Crucially, this parameter ψ ≡ ∂ logw(θ,η)∂η
= a1/ε
1−p is endogenous to policy: it depends
on the rate of progressivity p of the tax schedule both directly and indirectly through
the optimal e�ort level a. If the elasticities
εψ,a =∂ logψ
∂ log a, and εψ,1−p =
∂ logψ
∂ log (1− p)(33)
were both equal to zero, the model would be equivalent to one with an exogenous
and uninsurable shock η analogous to θ.
In general, however, the elasticity εψ,a is positive and measures the strength of
the moral hazard friction: it determines how much more exposure to performance
shocks is necessary to elicit a higher level of e�ort from the agent. Since εψ,a = 1ε, our
model implies that the sensitivity of earnings risk to the desired e�ort level is inversely
proportional to the Frisch elasticity of labor supply. If in response to a tax reform the
�rm wants to reduce the e�ort provided by the worker, it implements it by reducing
her exposure to risk, that is, by providing more insurance against performance shocks.
In the sequel we refer to εψ,a as the performance-pay elasticity.
Second, the elasticity εψ,1−p = −1 < 0 implies that higher tax progressivity leads
to a steeper pre-tax earnings schedule. This means that ceteris paribus (that is, keep-
ing e�ort constant), public insurance crowds out private insurance against output risk.
Intuitively, this is because an increase in tax progressivity compresses the disposable
income distribution and thus reduces the amount of risk that workers are e�ectively
facing; as a response, the �rm spreads out the pre-tax earnings schedule in order to
preserve incentives for e�ort. In the sequel we refer to εψ,1−p as the crowding-out
elasticity.
Finally, equations (31) and (32) imply that the worker's e�ort and expected utility
are both strictly lower in the environment with moral hazard and endogenous private
insurance, where εψ,a > 0 and εψ,1−p < 0, than in the exogenous-risk model where
εψ,a = εψ,1−p = 0. E�ort is a decreasing function of the rate of tax progressivity p.
29
4.2 Tax Incidence Analysis
Throughout this section we consider a tax reform that marginally raises the rate of
progressivity p. This policy is represented by (see Appendix E for technical details)
T (w) =
(logw − 1
1− p
)1− τ1− p
w1−p, ∀w > 0. (34)
To derive the incidence of this tax reform, we can either directly di�erentiate with
respect to p the equilibrium labor contract given by Corollary 1, or apply Theorem 1
to the corresponding function (34). We �rst derive the impact of this tax reform on
labor e�ort before analyzing its e�ect on earnings and utility.
Lemma 2. Suppose that Assumption 2 holds. The elasticity of e�ort with respect to
progressivity εa,1−p = ∂ log a∂ log(1−p) is given by
εa,1−p =ε
1 + ε·
1 + εψ,a ψ2σ2
η
1 + 1−ε1+ε
εψ,a ψ2σ2η
, (35)
where εψ,a = 1εdenotes the performance-pay elasticity (33). Thus, the labor supply
elasticity εa,1−p is strictly larger in the presence of moral hazard (εψ,a > 0) than in
the benchmark model with exogenous risk (εψ,a = 0).
Proof. See Appendix E.
Equation (35) gives an analytical expression for the labor supply elasticity that
drives the response of labor supply to the tax reform (34), a(θ)a(θ)
= − 11−pεa,1−p. This
elasticity is strictly larger in an economy with moral hazard and endogenous private
insurance than in the benchmark setting with exogenous risk. Speci�cally, in the
polar case where εψ,a = 0, we obtain εa,1−p = ε1+ε
, which is an increasing function of
the Frisch elasticity ε. Instead, when the exposure to risk varies endogenously with
e�ort so that εψ,a > 0, we have εa,1−p > ε1+ε
. The intuition underlying this result
is analogous to the case of the utility without income e�ects analyzed in Lemma 1.
Recall that the marginal cost of eliciting higher e�ort is equal to the expected marginal
rate of substitution (MRS, �rst term in (7)) plus, in the presence of moral hazard,
the expected marginal cost of incentive provision (MCI, second term in (7)). But the
latter increases when the tax code becomes more progressive, since it is proportional
to 1v′(w(θ,η))
= w(θ,η)1−p if the utility is logarithmic and the tax schedule is CRP. Note
30
that the greater the Frisch elasticity, the more the standard model underestimates
the true distortionary cost of raising tax progressivity.
Corollary 2. The impact of an increase in progressivity on earnings is given by
formula (14), where the standard adjustment in a model with exogenous risk is given
by
wex (θ, η)
w (θ, η)= − 1
1− pεa,1−p, (36)
the crowding-out e�ect is given by23
wco (θ, η)
w (θ, η)= − 1
1− pεψ,1−p
(ψη − ψ2σ2
η
), (37)
and the performance-pay e�ect is given by
wpp (θ, η)
w (θ, η)= − 1
1− pεψ,a εa,1−p
(ψη − ψ2σ2
η
), (38)
where εψ,a = 1εand εψ,1−p = −1 denote the pass-through elasticities (33). Overall,
pre-tax earnings are strictly more exposed to output risk after the reform.
Proof. See Appendix E.
Equations (36) to (38) give closed-form expressions for the three sources of earn-
ings adjustments caused by the tax reform: the standard labor supply e�ect, the
crowding-out e�ect, and the performance-pay e�ect. The �rst, wex (θ, η), is straight-
forward: it simply states that if wage rates are exogenous, the percentage earnings
response to the reform, wex(θ,η)w(θ,η)
, is equal to the percentage e�ort response, a(θ)a(θ)
=
− 11−pεa,1−p. Taking into account the endogeneity of private insurance yields the other
two e�ects, wco (θ, η) and wpp (θ, η). As explained in Section 4.1 above, the crowding-
out e�ect strictly raises the amount of risk to which workers are exposed, measured
by the pass-through of performance shocks to log-earnings, via the crowding-out elas-
ticity εψ,1−p < 0. This pre-tax earnings adjustment ensures that their incentives are
preserved. On the other hand, the reform lowers the optimal e�ort that �rms would
23The proof in the Appendix provides the decomposition of wco (θ, η) into its two e�ects high-lighted in equation (15).
31
like workers to exert by εa,1−p > 0. This reduction in e�ort is elicited by improv-
ing the worker's insurance against output shocks. Thus, the performance-pay e�ect
strictly reduces exposure to risk via the performance-pay elasticity εψ,a > 0. This
indirect increase in private insurance counteracts the direct crowding-out response to
the policy change.
Overall, using the structural expression (35) for the labor supply elasticity εa,1−p,
we can easily show that
εψ,1−p + εψ,a εa,1−p < 0. (39)
As a consequence, we obtain that the total earnings adjustment due to moral hazard
frictions, wco (θ, η) + wpp (θ, η), unambiguously leads to an increase in the sensitiv-
ity of pre-tax log-earnings to performance shocks. That is, the crowding-out e�ect
outweighs the performance-pay e�ect and private insurance is reduced on net.
The Crowding-Out and Performance-Pay E�ects Almost O�set Each Other.
We now study the relative magnitude of the crowding-out and the performance-pay
e�ects. Our conclusion is that, while each of them taken separately has a large impact
on the structure of compensation, on net they almost fully o�set each other so that
the earnings schedule is only barely riskier following an increase in tax progressivity.
Recall that the crowding-out elasticity εψ,1−p is equal to 1 in absolute value: this
is equivalent to saying that keeping e�ort constant, the variance of log-consumption
remains constant after the tax reform and the endogenous earnings adjustment. The
performance-pay e�ect, on the other hand, is driven by the optimal change in e�ort
given by the labor supply elasticity εa,1−p. The �rm implements this change in e�ort
by adjusting the sensitivity of the contract via the pass-through elasticity εψ,a. Why
does this performance-pay e�ect εψ,aεa,1−p have the same order of magnitude as the
direct crowding-out adjustment εψ,1−p?
The key insight is that the performance-pay elasticity εψ,a is proportional to the
inverse of the (Frisch) elasticity of labor supply ε. This is an immediate consequence
of our key result that the slope of the contract (6) (or (30) in the loglinear model) is
equal to the marginal disutility of labor h′ (a (θ)). Thus, to raise the worker's e�ort by
a, the �rm must raise the slope of the contract in percentage terms by h′′(a(θ))ah′(a(θ))
= 1ε(θ)
aa,
where ε (θ) is the Frisch elasticity of labor supply.
32
As a result, to the extent that the labor supply elasticity εa,1−p has a similar order
of magnitude as the Frisch elasticity of labor supply ε, the performance-pay e�ect is
approximately equal to 1ε× εa,1−p ≈ 1, that is, about the same as the crowding-out
e�ect. Crucially, this result is robust to the value of the labor supply elasticity. Indeed,
if the labor supply elasticity is small, so that e�ort moves only a little in response to
a tax change, then the pass-through must mechanically increase by a large amount
in order to elicit this change in e�ort, so that the product of the two elasticities is
always approximately equal to 1. Intuitively, if labor supply is very inelastic, e�ort
will barely change in response to tax reforms; but precisely because of this inelastic
behavior, a very large change in performance-sensitivity will then be necessary to
convince workers to adjust their e�ort by this small amount.
The previous discussion is correct if the labor supply elasticity εa,1−p (or, more
generally, aain our general model) is indeed approximately equal to the Frisch elasticity
ε. In practice, this need not be exactly the case. Formula (35) gives the structural
expression for εa,1−p as a function of ε and the variance of performance shocks σ2η. We
showed that the endogeneity of private insurance raises the labor supply elasticity
with respect to an increase in tax progressivity, relative to the benchmark setting
with exogenous risk. Therefore, we know that εa,1−p must be at least as large as its
value in this environment, namely, ε1+ε
.24 We therefore have
εψ,a εa,1−p >1
ε× ε
1 + ε=
1
1 + ε.
For a Frisch elasticity ε ≈ 12, this means that the performance pay e�ect o�sets at
least 11+0.5
= 23of the crowding-out e�ect εψ,1−p = −1. To re�ne this estimate, we use
the structural expression for the labor supply elasticity εa,1−p derived in Lemma 2. A
Taylor approximation in ψ2σ2η yields
− wco (θ, η)
wpp (θ, η)=
εa,1−pε
≈ 1 +(ε− 2ψ2σ2
η
). (40)
Our calibration in Section 5.4 implies that the variance of earnings conditional on
ability that best matches the data is equal to ψ2σ2η ≈ 0.2. We thus get
(ε− 2ψ2σ2
η
)≈
0.1, so that the earnings adjustment due to crowding-out amounts to about 110%
(in absolute value) of the adjustment caused by the performance-pay e�ect. In other
24This value is lower than the Frisch elasticity ε because of the income e�ects on labor supply.
33
words, the labor supply responses o�set about 90% of the crowding-out of private
insurance by tax progressivity.
Discussion. The analysis of our quantitative model in Section 5 con�rms that the
crowding out e�ect (15) and the performance-pay e�ect (16) are both signi�cant but
o�set each other almost entirely, so that the overall e�ect of tax progressivity on earn-
ings risk is small. In Section 5.6 and Appendix B, we analyze both theoretically and
numerically the incidence of several other tax reforms: lump-sum tax and marginal
tax rate increases on high incomes, and a constant percentage increase in retention
rates. For each of these reforms, as in the case of an increase in tax progressivity,
the performance-pay e�ect counteracts, and sometimes even dominates, the direct
crowding-out of the private insurance contract. Intuitively, raising marginal tax rates
leads to a spread of the pre-tax earnings distribution, but the reduction in labor sup-
ply that it causes tends to contract it. Conversely, raising lump-sum tax payments
leads to a crowding-in of private pre-tax insurance, but the income e�ect on labor
supply again runs in the opposite direction. These results are consistent with the em-
pirical literature. In particular, Frydman and Molloy (2011) exploit the relative tax
advantage of di�erent forms of CEO pay from 1946 to 2005 and �nd that the structure
of compensation responds little to changes in tax rates on labor income. By high-
lighting the two counteracting forces at play � crowding-out versus performance-pay
adjustments � our analysis provides an explanation for these �ndings.
4.3 Excess Burden and Welfare Gains
We now derive expressions for the excess burden and the welfare gains of raising the
rate of progressivity of the tax schedule.
Corollary 3. Suppose that Assumption 2 holds. Suppose moreover that ability types
are lognormally distributed, log θ ∼ N (µθ, σ2θ). The excess burden of an increase in
the rate of progressivity p of the CRP tax schedule is given by
EB =
(1
(1− g) (1− p)− 1
)εa,1−pC + (εψ,1−p + εψ,a εa,1−p) pψ
2 σ2η C, (41)
where C is the economy's aggregate private consumption, g is the ratio of government
expenditures G to aggregate output Y = C + G, εa,1−p is the labor e�ort elasticity
34
given by (35), and εψ,a = 1ε, εψ,1−p = −1 are the pass-through elasticities.
Suppose moreover that the planner is utilitarian, that is, α (θ) = 1 for all θ.25 The
welfare gains (including the mechanical e�ect) of an increase in the rate of progres-
sivity are given by
ME + WG = (1− p)(σ2θ + ψ2σ2
η
)C + εψ,1−p ψ
2 σ2η C. (42)
Proof. See Appendix E.
Excess Burden of Raising Progressivity. The �rst term in the right-hand side
of (41), [((1− g) (1− p))−1−1]εa,1−pC, is the standard deadweight loss from distorting
labor e�ort, that is, the behavioral e�ect of taxation that would arise in a model with
exogenous risk. This e�ect, equal to the �rst integral in equation (28), is increasing in
the elasticity of e�ort with respect to progressivity (35) that measures the disincentive
e�ects of raising tax rates, and in the rate of progressivity of the tax code that captures
the share of income losses borne by the government as reduced revenue. Moreover,
government expenditures raise the excess burden of tax progressivity. Intuitively, this
is because a given marginal increase in tax progressivity implies a larger deadweight
loss if the tax burden is already large due to high spending needs.
The second part of equation (41) captures the �scal externalities that arise when
private insurance against output risk is endogenous, that is, the two covariance terms
in equation (28). The term εψ,1−pp(ψση)2C is the value of
´Cov(T ′, wco)dF , and
the term εψ,aεa,1−pp(ψση)2C is the value of
´Cov(T ′, wpp)dF . Since εψ,1−p < 0, the
crowding-out e�ect contributes to reducing the excess burden of the reform, because
it increases the sensitivity of earnings to output risk � by Jensen's inequality, this
generates more tax revenue when the tax schedule is initially progressive (p > 0).
Conversely, since εψ,a, εa,1−p > 0 the performance-pay e�ect contributes to raising
the excess burden (lowering government revenue) via a reduction in e�ort and hence
risk exposure. Now recall that, by Corollary 2, the crowding-out e�ect dominates
the performance-pay e�ect so that the earnings schedule becomes more risky overall.
Therefore, conditional on the value of the labor e�ort elasticity, the deadweight loss
of raising taxes is strictly smaller than in the standard model with exogenous risk.
25In the Appendix we consider the more general case where the social welfare weights are givenby α (θ) = e−α log θ´
e−α log θ′dF (θ′)for all θ, where α ≥ 0.
35
However, we showed in Section 4.2 that the crowding-out and performance-pay e�ects
almost fully o�set each other. We therefore expect the net positive �scal externality
to be small in magnitude. We con�rm this intuition in Section 5.
Welfare Gains of Raising Progressivity. Finally, equation (42) shows that the
welfare gains (including the mechanical e�ect) ME+WG of the tax reform is the sum
of two terms. The �rst, (1− p) (σ2θ + (ψση)
2), captures the insurance gains obtained
by raising the rate of progressivity of the tax schedule, as in a standard optimal
taxation model. Note that tax progressivity insures both the initial ability di�erences
θ and the performance shock η passed-through to earnings, that is, both between-
and within-group heterogeneity. The larger their respective variances σ2θ and ψ2σ2
η
and the lower the initial rate of progressivity p ∈ (−∞, 1), the higher the gains of
marginally raising progressivity.
However, recall that the private insurance contract adjusts endogenously to the
policy reform. Since the pass-through elasticity with respect to progressivity is equal
to εψ,1−p = −1, the welfare e�ect of this crowding-out (last term in equation (42))
satis�es
εψ,1−pψ2σ2
η < εψ,1−p (1− p)ψ2σ2η = − (1− p)ψ2σ2
η.
As a result, the crowding-out more than fully o�sets the additional insurance against
performance shocks provided by public policy, (1− p)ψ2σ2η. Intuitively, in response to
increased public insurance through higher tax progressivity, the �rm adjusts the pre-
tax earnings contract so that total (public plus private) insurance remains unchanged
� there is a one-for-one crowding-out. However, there is an additional force at play.
Recall that in our model, tax changes are intermediated by �rms rather than being
directly distributed to individual workers. We saw that as a consequence, the bene�ts
of a tax cut accrue primarily to the richest workers of a given ability group. This
strictly reduces the welfare gains of raising progressivity relative to the benchmark
model with exogenous risk.
Note �nally that, while the earnings distribution and government revenue remain
practically unchanged following a tax reform as the crowding-out and performance-
pay e�ects almost o�set each other, the welfare e�ects of raising progressivity, on the
other hand, can be large in magnitude. Indeed, labor supply adjustments cause only
36
second-order changes in welfare by the envelope theorem. As a result, the welfare
implications of crowding-out are not counteracted by those of the performance-pay
e�ect. We evaluate quantitatively the welfare cost of ignoring the endogeneity of
private insurance in Section 5.3.
4.4 Optimal Rate of Progressivity
We �nally gather our results on the excess burden and the welfare gain of tax reforms
to characterize the optimal CRP tax schedule.
Proposition 4. Suppose that Assumption 2 holds, that ability types are lognormally
distributed, and that the social welfare objective is utilitarian. The optimal rate of
progressivity satis�es
p∗
(1− p∗)2 =σ2θ + (1 + εψ,1−p)ψ
2σ2η(
1 + g(1−g)p∗
)εa,1−p + (1− p∗) εψ,aεa,1−pψ2σ2
η
, (43)
where g = G/Y is the ratio of government spending to output, εa,1−p is the elasticity of
e�ort with respect to the rate of progressivity given by (35), and εψ,a = 1εand εψ,1−p =
−1 are the pass-through elasticities. In particular, the optimal rate of progressivity is
strictly smaller in the model with endogenous private insurance than in the benchmark
environment with exogenous risk where εψ,1−p = εψ,a = 0.
Proof. See Appendix E.
Formula (43) is obtained by equating the excess burden EB to the welfare gain
(including the mechanical e�ect) ME + WG of raising progressivity, both derived in
Corollary 3. Consider �rst the polar case with exogenous risk, that is, where all
earnings di�erences are attributed to exogenous labor productivity shocks θ and η.
In this case, letting εψ,1−p = εψ,a = 0 in formula (43) leads to
p∗
(1− p∗)2 =
(1 +
g
(1− g) p∗
)−1 σ2θ + ψ2σ2
η
εa,1−p. (44)
Thus, the optimal rate of progressivity in this model is increasing in the variances of
the ability and performance shock distributions, and decreasing in the elasticity of
e�ort εa,1−p = ε1+ε
. In this benchmark setting, the government trades-o� the bene�ts
37
of insuring the entire earnings risk, which is determined by the variance of log-earnings
Var (logw) = σ2θ + ψ2σ2
η, with the excess burden of raising progressivity.
Now consider the general model with endogenous partial insurance against per-
formance shocks, so that εψ,1−p < 0 and εψ,a > 0. Equating the excess burden to the
welfare gains of raising progressivity implies that p∗ is the solution to(1
(1− g) (1− p∗)− 1
)εa,1−p + (εψ,1−p + εψ,aεa,1−p) p
∗ψ2σ2η
= (1− p∗)[σ2θ + ψ2σ2
η
]+ εψ,1−pψ
2σ2η, (45)
from which (43) follows. This formula implies that p∗ is strictly decreasing in σ2η, and
hence that the optimal rate of progressivity is strictly lower than in the previous case.
This is because the positive �scal externality and the welfare loss of crowding-out ex-
actly cancel each other out, as they are respectively equal to εψ,1−pp∗Var (logw | θ)and [(1− p∗)+εψ,1−pp
∗]Var (logw | θ), where Var (logw | θ) = ψ2σ2η is the variance of
log-earnings conditional on ability θ. The only remaining term is therefore the neg-
ative �scal externality due to the performance-pay e�ect, εψ,aεa,1−pp∗Var (logw | θ).This �scal externality is captured by the second term in the denominator of (43).
Overall, we obtain that by ignoring these e�ects, a planner that would ignore the
endogeneity of private insurance would overestimate the optimal rate of tax progres-
sivity.
5 Quantitative Analysis
In Section 5.1 we extend the model of Section 4 to make it suitable for policy analysis.
Speci�cally, we incorporate the coexistence of jobs with and without performance pay
and a Pareto tail for productivity types. We calibrate the model to match several
key moments of U.S. data in Section 5.2. We then analyze the impact of two tax
reforms in Sections 5.3 and 5.4. First, we consider a small reform that increases the
rate of progressivity by one percentage point around the current tax code. Second,
we study a large reform that nearly doubles the current rate of progressivity and
brings the economy to the utilitarian optimum. We �nally compare in Section 5.5 the
optimal rate of progressivity in our calibrated model with two important benchmarks:
the optimum in the model without performance-pay jobs, and the progressivity rate
38
chosen by a government that would wrongly assume that wage risk is exogenous.
5.1 Quantitative Model
We extend the loglinear model of Section 4 by adding the following elements. A
share π of workers have a performance-pay job, denoted with a subscript m, and the
remaining share 1− π of workers have a normal job (subscript n). The output of the
worker with productivity θ and a job type j ∈ {m,n} is θ(aj + η), where aj is the
e�ort level and η ∼ N(0, σ2
η,j
)is the performance shock. Performance-pay jobs are
subject to the agency frictions described in Section 1. At these jobs, the employer
observes the output but not the e�ort of the worker nor the performance shock and,
hence, o�ers a wage which depends on the stochastic output realization according to
the pass-through coe�cient ψ. Normal jobs, in contrast, are free from agency frictions
and guarantee a risk-free wage.
We treat the job type of a worker as exogenous. In the data, the share of perfor-
mance pay jobs increases with earnings (see Lemieux, MacLeod, and Parent (2009);
Grigsby, Hurst, and Yildirmaz (2019)). We allow the share of job types to be cor-
related with productivity by assuming that productivity is drawn from a job-type-
speci�c Pareto-lognormal distribution (Colombi (1990)). That is, conditional on the
job type j ∈ {n,m}, the log productivity is the sum of independently drawn nor-
mal and exponential random variables: log θ = θ1 + θ2 where θ1 ∼ N(µθ,j, σ
2θ,j
)and
θ2 ∼ Exp (λθ,j). We keep government expenditures G �xed when comparing di�erent
policy scenarii. We derive and analyze the theoretical formula for the optimal rate of
progressivity in this generalized environment in Appendix F.
5.2 Calibration
We calibrate to model to match the evidence on elasticities and wage distribution
in the U.S. We choose the value ε = 0.5 for the Frisch elasticity, which implies a
compensated elasticity of labor supply at normal jobs of approximately 0.3. Both
values are consistent with empirical evidence (Keane (2011); Chetty (2012)).
We assume that log-productivity distributions for both job types have a common
normal variance σ2θ,m = σ2
θ,n = σ2θ and tail parameter λθ,m = λθ,n = λθ. As a result,
39
our model implies that the variance of log earnings is given by
Var (logw) =
(σ2θ +
1
λ2θ
)+ πψ2σ2
η + π (1− π)
(µm − µn + log
aman−ψ2σ2
η
2
)2
,
where σ2θ + 1
λ2θis the variance of log-productivity, ψ2σ2
η is the variance of log-earnings
at the performance-pay jobs due to the performance shocks, and the last term cap-
tures the contribution of the di�erence between the mean log-earnings at normal and
performance-pay jobs.
Lemieux, MacLeod, and Parent (2009) study performance-pay jobs using Panel
Study of Income Dynamics (PSID) and �nd that their fraction π was 0.45 in 1998,
the most recent year included in their analysis. They report that performance-pay
jobs have mean hourly wages higher by 30%, and the variance of wages higher by
42%, relative to normal jobs.26 The �rst statistic pins down µm−µn = log (1.3). The
second will be crucial in determining σ2η,m, the variance of the performance shock at
performance-pay jobs.
For the levels of the mean and variance of log-earnings in the entire economy, we
turn to the Survey of Consumer Finances (SCF) which uses data from the Internal
Revenue Service Statistics of Income program to accurately represent the distribution
of high income households. Based on the SCF, Heathcote and Tsujiyama (2019) report
a mean household labor income of $77, 325 and an overall variance of log labor income
of 0.618 in 2007. They also estimate the tail parameter of the log earnings distribution
λθ at 2.2.
Regarding the government policy, Heathcote, Storesletten, and Violante (2017)
estimate the empirical rate of tax progressivity at 0.181 and Heathcote and Tsujiyama
(2019) report a ratio of government purchases to output of 18.8 percent.
Given these estimates, we choose σ2θ = 0.31 and σ2
η,m = 0.4 to match the overall
variance of log-earnings as well as the relative variance between performance-pay and
normal jobs. Matching mean labor income as well as the ratio of mean wage rates
at the two job types implies µθ,m = 3.88 and µθ,n = 3.62. The implied distribution
of wage rates and job types is depicted in Figure 1. The share of performance-pay
jobs is largest in the top quartile of the wage distribution. This is consistent with the
26These values are based on Table 1, Figure IV and Figure V in Lemieux, MacLeod, and Parent(2009). When statistics are available for multiple years, the last year available is used (either 1996or 1998).
40
Figure 1: Joint distribution of wage rates and job types
(a) (b)
1 2 3 4 5 6 7 8
log wage rate
0.0
0.1
0.2
0.3
0.4
0.5
0.6
dens
ity
normal jobsperformance-pay jobs
Q1 Q2 Q3 Q4
quartiles of the wage rate distribution
20
30
40
50
60
shar
e of
per
form
ance
-pay
jobs
(%)
empirical evidence that bonuses and other forms of performance-related pay are more
prevalent at higher income levels (Lemieux, MacLeod, and Parent (2009); Grigsby,
Hurst, and Yildirmaz (2019)).
5.3 Marginal Reform of Tax Progressivity
Table 1 shows the impact of a small increase of the rate of progressivity by one per-
centage point, from 0.181 to 0.191, on the performance-pay jobs, the normal jobs, and
all jobs. Note that the e�ects on performance-pay jobs are generally larger in absolute
value, since performance-pay workers have higher average output and earnings than
those in normal jobs. An increase in progressivity leads to a large redistributive gain
for both types of jobs. For the performance-pay workers, an increase of progressivity
would also lead to a substantial gain from better insurance against the earning risk if
this risk was policy-invariant. However, an increase in progressivity generates a large
crowding-out which, as we saw in Section 4, fully o�sets the gains from insurance
and, in addition, somewhat reduces the gains from redistribution. To understand the
latter e�ect, note that workers who on average gained from the reform will see their
earnings structure adjusted to keep incentives intact: their consumption will increase
disproportionally in high-output contingencies, which reduces their expected utility
gain. Hence, the crowding-out e�ect makes redistribution less potent. We �nd that,
quantitatively, the redistributive gain for the performance-pay jobs is reduced by 6%.
The excess burden of the reform, EB, is substantially larger for performance-
pay jobs, because the elasticity of e�ort εa,1−p is 25% greater for these workers. This
di�erence in elasticities is only slightly mitigated by the combined impact of the �scal
41
externalities caused by the crowding-out and performance-pay e�ects. These e�ects
have a non-negligible impact on the excess burden when considered separately, but
roughly cancel each other out and lead to a very modest positive �scal externality.
To understand this result, recall the excess burden formula obtained in Corollary 3.
Since εψ,1−p = −1, the positive �scal externality due to crowding-out relative to the
standard deadweight loss induced by labor supply responses is proportional to the
inverse labor e�ort elasticity 1/εa,1−p and given by
1
εa,1−p×(
1
(1− g) (1− p)− 1
)−1
pψ2σ2η ≈ 17.85%.
Since εψ,a = 1ε, the additional negative �scal externality due to performance-pay
relative to the standard deadweight loss is proportional to the inverse Frisch elasticity
1/ε and given by
−1
ε×(
1
(1− g) (1− p)− 1
)−1
pψ2σ2η ≈ −14.9%.
Therefore, each of these e�ects signi�cantly alters the standard calculation of the
excess burden of raising tax progressivity. However, the sum of these e�ects is
proportional to the di�erence between the labor supply and the Frisch elasticities,
1/εa,1−p − 1/ε. Since this di�erence is small, the overall �scal externality caused by
moral hazard frictions is equal to a mere 2.94% of the standard deadweight loss of
raising tax progressivity. Therefore, this reform is only slightly less costly for the
government budget than one would estimate by ignoring the endogeneity of private
insurance.
5.4 Large Reform: From Status Quo to Optimum
We extend the theoretical optimal progressivity formula to our quantitative model
with two types of jobs in Appendix F. We �nd that the utilitarian optimum progres-
sivity rate is given by p∗ = 0.356. This rate is almost double the current progressivity
rate in the U.S., and the implied social welfare increase is equivalent to a 3% increase
in consumption. To get a sense of the magnitude of this reform, note that the average
tax rate, including transfers, of a worker with labor income $33, 000 would decrease
from −0.7% to −14.2%. The tax rate at the mean household income $77, 325 would
42
Table 1: Impact of a small increase in the rate of progressivity
Perf.-pay jobs Normal jobs All jobs
Welfare gain ME + WG 354 292 320due to redistribution 376 292 330due to insurance 114 0 51due to crowding-out -136 0 -61
Excess burden EB 137 99 116due to standard e�ect 141 99 118due to crowding-out -25 0 -11due to performance-pay 21 0 9
Total: ME + WG - EB 217 192 203
Note: The three columns show the mean impact of increasing the progressivity rate by 1 percentage point (0.01) onthe performance-pay jobs, the normal jobs, and all jobs, respectively. All the e�ects are expressed in USD per
worker in a given job category.
increase from 13.6% to 15.6%. The tax rate at $500, 000, which roughly corresponds
to the top 1% threshold, would increase from 38.4% to 56.6%. In this section we
analyze the impact of a large reform of the current tax code that implements the op-
timal progressivity and adjusts the other tax parameter to keep government revenue
unchanged.
The impact of this reform is depicted in Figure 2 and analyzed in Table 2. Fol-
lowing a large increase in tax progressivity, the earnings schedule is barely altered:
the pass-through of performance shocks to log-earnings increases only modestly from
0.731 to 0.76. As a result, the variance of log-earnings conditional on productivity
increases by 8%, while the overall dispersion of log-earnings among performance pay
workers increases by 2.3%. Given that the pre-tax earnings schedule hardly moves,
this progressivity-increasing reform substantially �attens the consumption schedule,
leading to a much better consumption insurance. Indeed, both the individual con-
sumption risk and overall consumption dispersion among performance-pay workers
fall by more than 30%. Better insured workers have weaker incentives to exert e�ort
which, for our calibrated labor supply elasticity, falls by a substantial 9.6% due the
large magnitude of the tax reform.
Underlying the weak response of the earnings schedule are two countervailing
forces: the crowding-out of private insurance and the performance-pay e�ect. If �rms
attempted to motivate workers to maintain their original level of e�ort, better private
43
Figure 2: Earnings and consumption schedules of performance-pay workers
(a) Earnings schedule (b) Consumption schedule
100 50 0 50 100 150 200 250
output ($1,000)
0
25
50
75
100
125
150
175
200
earn
ings
($1,
000)
original earnings schedule+ crowding-out effect+ performance-pay effect (final)
100 50 0 50 100 150 200 250
output ($1,000)
0
25
50
75
100
125
150
175
200
cons
umpt
ion
($1,
000)
original consumption schedule+ crowding-out effect+ performance-pay effect (final)
Note: The adjustment of the earnings and consumption schedules for a performance-pay worker with a meanproductivity following an increase of progressivity rate from the current (0.181) to the optimal level (0.356).
Table 2: Earnings and consumption distribution statistics following the large reform
insurance via the income tax would crowd-out private insurance so as to leave the
variance of log-earnings unchanged, since εψ,1−p = −1. For that to happen, the pass-
through would need to increase all the way to 0.93, raising the log-earnings risk of each
performance-pay worker by 62%. However, �rms in equilibrium choose a lower e�ort
level and reduce the power of incentive-pay accordingly. This force � the performance-
pay e�ect � counteracts the e�ect of crowding-out and brings the pass-through back
to the vicinity of its original level. The combination of these two e�ects implies that,
strikingly, the relative fall of log-consumption variance in the aftermath of the tax
reform is nearly identical for the workers at jobs with and without agency frictions.
44
5.5 Performance-Pay Jobs and Optimal Progressivity
We now study the importance of performance-pay considerations for the optimal tax
progressivity by comparing the optimal progressivity rate arising in the calibrated
model to two important benchmarks. The �rst is the optimal rate of progressivity in
the counterfactual economy without performance-pay jobs, obtained by setting the
variance of the performance shock σ2η,m to zero. The second is the rate of progressivity
that would be chosen by the government who would erroneously assume that the
entire wage risk is exogenous. This rate is found by applying the formula for the
optimal rate of progressivity from the model with exogenous risk to our calibrated
model economy, where wage-rate risk is actually endogenous. Following Rothschild
and Scheuer (2016), we call the resulting progressivity rate a self-con�rming policy
equilibrium. The results are depicted in panel (a) of Figure 3.
First, in an economy without performance-pay jobs, the optimal rate of progres-
sivity would increase from 0.356 to 0.41. To understand the discrepancy between the
true optimum and this benchmark, we gradually switch on the various channels re-
lated to performance-pay jobs in the optimum formula (43), starting from an economy
devoid of agency frictions (see panel (b) of Figure 3). First, workers at performance-
pay jobs exert lower e�ort than those at normal jobs. This leads to a lower output and
hence a higher share of government spending in GDP, G/Y . This in turn contributes
to lower progressivity and explains approximately 30% of the overall progressivity
change. Second, workers at performance-pay jobs face higher wage risk. That raises
the gains of providing social insurance via tax progressivity. However, this additional
bene�t of insurance is fully canceled by the crowding-out of private insurance. Third,
the labor supply elasticity of performance-pay workers is higher, increasing the excess
burden of raising progressivity and explaining 40% of the progressivity di�erence.
Finally, the performance-pay e�ect contributes to a reduction in government revenue
via a compression of the earnings distribution, which explains the remaining 30%.
Second, we compare the optimal rate of progressivity with the SCPE. In this
equilibrium concept, the government can correctly estimate the elasticity of e�ort, but
treats wage risk as fully exogenous. Such a policymaker would mistakenly attempt
to insure the endogenous part of the wage risk and choose too high a progressivity
rate, equal to 0.4. However, log-earnings risk at performance-pay jobs increases by a
mere 2.3% relative to its value in the true optimum. Once again, this is due to the
counteracting forces of the crowding-out and the performance pay e�ects. Finally,
45
Figure 3: Optimal progressivity and performance pay jobs
(a) Social welfare functions (b) Optima di�erence decomposition
0.0 0.1 0.2 0.3 0.4 0.5 0.6
progressivity rate p-5
0
5
10
15
wel
fare
cha
nge
rel.
to s
tatu
s qu
o (%
of c
ons.
)
optimum in the calibrated model p = 0.356, welfare change: 2.98%
optimum w/o performance-pay jobs p = 0.41, welfare change: 14.22%
self-confirming policy equilib. p = 0.4, welfare change: 2.74%
status quo p = 0.181
SWF in the calibrated modelSWF without performance-pay jobs
lower effort
higher earnings risk
crowding-out effect
higher effort elasticity
performance-pay effect
60
40
20
0
20
40
60
chan
ge in
pro
gres
sivi
ty ra
te (%
of t
otal
cha
nge)
Note: Panel (b) shows the contributions of various channels through which performance-pay jobs a�ect the optimal
progressivity rate. These contributions are obtained by successively switching on the respective channels in the
optimal progressivity formula. All the channels combined sum up to the di�erence in progressivity rate between the
optimum without performance pay jobs and the optimum in the calibrated model.
ignoring the endogeneity of wage risk does not lead to a large miscalculation of optimal
tax policy: the social welfare cost of choosing the SCPE is equivalent to a 0.24% drop
in consumption. This value implies that increasing the U.S. rate of progressivity
from the status quo to the SCPE reaps 93% of the welfare gains of moving from the
status quo to the full optimum. Recall that this small welfare cost of sub-optimizing
is not a necessary consequence of our theoretical analysis.27 In fact, if there were
only performance-pay jobs in the economy, the di�erence in progressivity between
the SCPE and the full optimum would be 0.09 with a welfare di�erence equivalent to
1% change of consumption. Since in our calibration only roughly half of the jobs are
performance-based, the impact on the progressivity rate is half of that. Furthermore,
as the social welfare function is concave in p, half of the change in progressivity
translates into a quarter of the change in social welfare.
5.6 Incidence of Other Tax Reforms
Recall that our theoretical tax incidence analysis of Section 2 gives us in closed-form
the incidence of arbitrary tax reforms. Speci�cally, we can apply the theoretical
formulas of Theorem 1 to reforms that do not keep the CRP structure of the tax
27Indeed, by the envelope theorem, the performance-pay e�ect is (at least locally) only second-order relative to the crowding-out e�ect from a welfare point of view. Thus, the two e�ects do noto�set each other as they did when we studied the incidence on earnings and government revenue.
46
code.28 We specialize our theoretical analysis to such reforms and study the direction
of the crowding-out and the performance-pay e�ects in Appendix E. Here we propose
two quantitative experiments.
First, in Figure 4 we depict the incidence of canonical tax reforms on the earnings
schedule of a worker with mean ability and a performance-pay job. Speci�cally, we
consider a $100 increase in lump-sum transfer as well as a 1 percentage point increase
in marginal tax rates over a varying range of earnings, namely, for all earnings, for the
top 50% earnings, and for the top 10% earnings. Our robust �nding is that although
the crowding-out e�ect (dashed black curves) contributes to higher earnings risk,
it is mostly o�set by the performance-pay e�ect (dashed-dotted blue curves). In the
case of an additional lump-sum transfer, the performance-pay e�ect o�sets more than
50% of the impact of crowding-out on the variance of log-earnings. For a uniform
increase in marginal tax rates, the o�set is more than 90%. When marginal tax
rates are increased only for highest incomes, the o�set rate even exceeds 100%: the
performance-pay e�ect dominates the crowding-out e�ect and the earnings risk falls on
net. To understand why the o�set rate can be so high, recall that tax reforms which
increase progressivity generate larger e�ort responses of performance-pay workers
than reforms which spread the same tax burden in a more uniform manner � see
Lemmas 1 and 2 and the subsequent discussions. Increasing marginal tax rates over
a smaller range of high potential earnings � a progressive tax reform � generates a
relatively larger e�ort response and, hence, a more substantial performance-pay e�ect
in comparison to the crowding-out e�ect.
Second, in Figure 5, rather than focusing on a single earnings contract as in
the previous paragraph, we study the impact of an increase in the top marginal
tax rate in our calibrated economy with workers that are heterogeneous in ex-ante
ability. Speci�cally, we consider a 1 percentage point increase in the marginal tax
rate faced by the top 1% income earners, that is, above $441,000 in 2007 dollars. This
hypothetical top tax bracket is depicted by the vertical dotted line in the left panel.
The left (resp., right) panel gives the results as a function of mean earnings (resp.,
mean earnings percentile). This reform leads to a direct crowd-out which, ceteris
paribus, increases the earnings risk for all workers in the top 5% of mean earnings,
particularly so in the top 1%. This crowding-out is represented by the dashed black
28The only di�culty consists of computing the e�ort change a (θ) in response to these reforms.We do so by solving the �rst-order condition (7) numerically.
47
Figure 4: Incidence of tax reforms: crowding-out and performance-pay e�ects
(a) Increase of lump-sum transfer (b) Increase of mg. tax rates (uniform)
0 20 40 60 80 100
output shock percentile
0.004
0.002
0.000
0.002
0.004
log-
wag
e ad
just
men
t
wco( , ) wpp( , ) wco( , ) + wpp( , )
0 20 40 60 80 100
output shock percentile
0.004
0.002
0.000
0.002
0.004
log-
wag
e ad
just
men
t
wco( , ) wpp( , ) wco( , ) + wpp( , )
(c) Increase of mg. tax rates (top 50%) (d) Increase of mg. tax rates (top 10%)
0 20 40 60 80 100
output shock percentile
0.015
0.010
0.005
0.000
0.005
0.010
0.015
log-
wag
e ad
just
men
t
wco( , ) wpp( , ) wco( , ) + wpp( , )
0 20 40 60 80 100
output shock percentile
0.015
0.010
0.005
0.000
0.005
0.010
0.015
log-
wag
e ad
just
men
t
wco( , ) wpp( , ) wco( , ) + wpp( , )
Note: Tax incidence computed for a worker with mean productivity. Panel (a) depicts an increase of the lump-sum
transfer by $100 (in 2007 dollars). Panels (b-d) depict an increase of the marginal tax rate by 1 pp for all earnings,
for the top 50% earnings and for the top 10% earnings, respectively. The performance-pay e�ect o�sets the impact
of the crowding-out e�ect on the variance of log earnings by 52% (a), 92% (b), 136% (c) and 314% (d).
48
Figure 5: Increase of the top tax rate: impact on earnings risk
(a) (b)
0 500 1000 1500 2000 2500 3000
mean earnings ($1,000)
0.006
0.004
0.002
0.000
0.002
0.004
0.006
chan
ge in
log-
earn
ings
risk
due to crowding-out effectdue to crowding-out and perf.-pay effects
80 85 90 95 99 100
mean earnings percentile
0.006
0.004
0.002
0.000
0.002
0.004
0.006
chan
ge in
log-
earn
ings
risk
due to crowding-out effectdue to crowding-out and perf.-pay effects
Note: Log-earnings risk is measured by V ar(log(w(θ, η) | θ). The vertical dashed line in the left panel indicates the
hypothetical top tax bracket threshold.
curves in both panels. However, the results change dramatically when we take into
account labor e�ort responses. The performance-pay e�ect more than o�sets the
crowding-out e�ect everywhere apart from the very top earners, leading to a lower
earnings risk for all workers below 99.5 percentile. Only the very top 0.5% experience
any net crowding-out, but even for them the performance-pay e�ect o�sets more than
70% of the additional pre-tax earnings risk.
Conclusion
We have set up and analyzed a tractable environment in which �rms provide work-
ers with endogenous private insurance against stochastic performance shocks in the
face of moral hazard frictions. The government uses the tax-and-transfer system to
redistribute income across workers who di�er in uninsurable innate ability. The key
feature of our model is that earnings risk is endogenous and has a productive role.
The main and surprising conclusion of our analysis is that standard models that ig-
nore the endogeneity of wage-rate risk actually come very close to evaluating the
incidence of taxes on earnings contract, as well as the optimal level of tax progres-
sivity. Underlying this result are two countervailing forces at play � a crowding-out
and a performance-pay e�ect � which prevent taxes from having a large impact on
the structure of performance-based compensation.
It would be interesting to extend our analysis in several directions. First, we only
considered the impact of taxes on compensation for already existing performance-pay
jobs. One could also model the incentives for �rms to create such performance-pay
49
jobs (rather than �normal� jobs) in the �rst place, and study the incidence of tax
reforms on the extensive margin of switching from one type of job to another. Sec-
ond, in our model, private markets are constrained e�cient and perfectly competitive.
In other words, we gave private markets their �best chance� in making government
policy redundant. Introducing frictions such as adverse selection in private markets �
whereby �rms cannot perfectly observe a worker's innate ability � and market power
are natural next steps. Third, our theoretical analysis delivers predictions regarding
the impact of various types of tax reforms on the structure of incentive-based com-
pensation via counteracting crowding-out and performance-pay e�ects. Testing these
predictions empirically should be particularly fruitful.
50
References
Ábrahám, Á., F. Alvarez-Parra, and S. Forstner (2016a): �The e�ects of
moral hazard on wage inequality in a frictional labor market,� Unpublished working
paper.
Ábrahám, Á., S. Koehne, and N. Pavoni (2016b): �Optimal income taxation
when asset taxation is limited,� Journal of Public Economics, 136, 14�29.
Ales, L., A. A. Bellofatto, and J. J. Wang (2017): �Taxing Atlas: Executive
compensation, �rm size, and their impact on optimal top income tax rates,� Review
of Economic Dynamics, 26, 62�90.
Ales, L., M. Kurnaz, and C. Sleet (2015): �Technical change, wage inequality,
and taxes,� The American Economic Review, 105, 3061�3101.
Ales, L. and C. Sleet (2016): �Taxing top CEO incomes,� American Economic
Review, 106, 3331�66.
Attanasio, O. and J.-V. R�os-Rull (2000): �Consumption smoothing in island
economies: Can public insurance reduce welfare?� European Economic Review, 44,
1225�1258.
Bandiera, O., I. Barankay, and I. Rasul (2011): �Field experiments with
�rms,� Journal of Economic Perspectives, 25, 63�82.
Bell, B. and J. Van Reenen (2014): �Bankers and their bonuses,� The Economic
Journal, 124, F1�F21.
Bird, A. (2018): �Taxation and executive compensation: Evidence from stock op-
tions,� Journal of Financial Economics, 127, 285�302.
Blomqvist, Å. and H. Horn (1984): �Public health insurance and optimal income
taxation,� Journal of Public Economics, 24, 353�371.
Bloom, N. and J. Van Reenen (2010): �Why do management practices di�er
across �rms and countries?� Journal of Economic Perspectives, 24, 203�24.
Carroll, G. and D. Meng (2016): �Robust contracting with additive noise,�
Journal of Economic Theory, 166, 586�604.
51
Chang, Y. and Y. Park (2017): �Optimal Taxation with Private Insurance,�
Rochester Center for Economic Research Working Paper.
Chetty, R. (2009): �Su�cient Statistics for Welfare Analysis: A Bridge Between
Structural and Reduced-Form Methods,� Annual Review of Economics, 1, 451�488.
��� (2012): �Bounds on elasticities with optimization frictions: A synthesis of
micro and macro evidence on labor supply,� Econometrica, 80, 969�1018.
Chetty, R. and E. Saez (2010): �Optimal taxation and social insurance with
endogenous private insurance,� American Economic Journal: Economic Policy, 2,
85�114.
Colombi, R. (1990): �A new model of income distribution: The pareto-lognormal
distribution,� in Income and wealth distribution, inequality and poverty, Springer,
18�32.
Cremer, H. and P. Pestieau (1996): �Redistributive taxation and social insur-
ance,� International Tax and Public Finance, 3, 281�295.
Cullen, J. B. and J. Gruber (2000): �Does unemployment insurance crowd out
spousal labor supply?� Journal of Labor Economics, 18, 546�572.
Cutler, D. M. and J. Gruber (1996a): �Does public insurance crowd out private
insurance?� The Quarterly Journal of Economics, 111, 391�430.
��� (1996b): �The e�ect of Medicaid expansions on public insurance, private in-
surance, and redistribution,� The American Economic Review, 86, 378�383.
Dale-Olsen, H. (2012): �Do Tax Reforms A�ect Firm Performance and Executive
Remuneration? Evidence from a Compressed Wage Environment,� Economica, 79,
493�515.
Doligalski, P. (2019): �Optimal Income Taxation and Commitment on the Labor
Market,� Working Paper.
Edmans, A. and X. Gabaix (2011): �Tractability in incentive contracting,� The
Review of Financial Studies, 24, 2865�2894.
52
��� (2016): �Executive compensation: A modern primer,� Journal of Economic
Literature, 54, 1232�87.
Edmans, A., X. Gabaix, and D. Jenter (2017): �Executive compensation: A
survey of theory and evidence,� in The Handbook of the Economics of Corporate
Governance, Elsevier, vol. 1, 383�539.
Edmans, A., X. Gabaix, T. Sadzik, and Y. Sannikov (2012): �Dynamic CEO
compensation,� The Journal of Finance, 67, 1603�1647.
Farhi, E. and I. Werning (2013): �Insurance and taxation over the life cycle,�
The Review of Economic Studies, 80, 596�635.
Findeisen, S. and D. Sachs (2016): �Education and optimal dynamic taxation:
The role of income-contingent student loans,� Journal of Public Economics, 138,
1�21.
Foster, A. D. and M. R. Rosenzweig (1994): �A test for moral hazard in the
labor market: Contractual arrangements, e�ort, and health,� The Review of Eco-
nomics and Statistics, 213�227.
Friedrich, B., L. Laun, C. Meghir, and L. Pistaferri (2019): �Earnings dy-
namics and �rm-level shocks,� Tech. rep., National Bureau of Economic Research.
Frydman, C. and D. Jenter (2010): �CEO compensation,� Annual Review of
Financial Economics, 2, 75�102.
Frydman, C. and R. S. Molloy (2011): �Does tax policy a�ect executive com-
pensation? Evidence from postwar tax reforms,� Journal of Public Economics, 95,
1425�1437.
Garrett, D. F. and A. Pavan (2015): �Dynamic managerial compensation: A
variational approach,� Journal of Economic Theory, 159, 775�818.
Golosov, M., N. Kocherlakota, and A. Tsyvinski (2003): �Optimal Indirect
and Capital Taxation,� Review of Economic Studies, 70, 569�587.
Golosov, M., M. Troshkin, and A. Tsyvinski (2016): �Redistribution and
social insurance,� American Economic Review, 106, 359�86.
53
Golosov, M. and A. Tsyvinski (2007): �Optimal taxation with endogenous in-
surance markets,� The Quarterly Journal of Economics, 122, 487�534.
Grigsby, J., E. Hurst, and A. Yildirmaz (2019): �Aggregate nominal wage
adjustments: New evidence from administrative payroll data,� Tech. rep., National
Bureau of Economic Research.
Guiso, L., L. Pistaferri, and F. Schivardi (2005): �Insurance within the �rm,�
Journal of Political Economy, 113, 1054�1087.
Heathcote, J., K. Storesletten, and G. L. Violante (2017): �Optimal tax
progressivity: An analytical framework,� The Quarterly Journal of Economics, 132,
1693�1754.
Heathcote, J. and H. Tsujiyama (2019): �Optimal Income Taxation: Mirrlees
Meets Ramsey,� Working Paper.
Hendren, N. (2015): �The Policy Elasticity,� in Tax Policy and the Economy, Vol-
ume 30, University of Chicago Press.
Holmstrom, B. and P. Milgrom (1987): �Aggregation and linearity in the pro-
vision of intertemporal incentives,� Econometrica: Journal of the Econometric So-
ciety, 303�328.
Hungerbühler, M., E. Lehmann, A. Parmentier, and B. Van der Linden
(2006): �Optimal redistributive taxation in a search equilibrium model,� The Re-
view of Economic Studies, 73, 743�767.
Kapicka, M. and J. Neira (2013): �Optimal taxation in a life-cycle economy
with endogenous human capital formation,� University of California, Santa Barbara
WP.
Kaplow, L. (1991): �Incentives and government relief for risk,� Journal of Risk and
Uncertainty, 4, 167�175.
Keane, M. P. (2011): �Labor supply and taxes: A survey,� Journal of Economic
Literature, 49, 961�1075.
Krueger, D. and F. Perri (2011): �Public versus private risk sharing,� Journal
of Economic Theory, 146, 920�956.
54
Laffont, J.-J. and J. Tirole (1986): �Using cost observation to regulate �rms,�
Journal of Political Economy, 94, 614�641.
Lamadon, T. (2016): �Productivity shocks, long-term contracts and earnings dy-
namics,� manuscript, University of Chicago.
Lamadon, T., M. Mogstad, and B. Setzler (2019): �Imperfect competition,
compensating di�erentials and rent sharing in the US labor market,� Tech. rep.,
National Bureau of Economic Research.
Lazear, E. P. and P. Oyer (2010): �Personnel Economics,� Handbook of Organi-
zational Economics.
Lemieux, T., W. B. MacLeod, and D. Parent (2009): �Performance pay and
wage inequality,� The Quarterly Journal of Economics, 124, 1�49.
Levitt, S. D. and C. Syverson (2008): �Market distortions when agents are
better informed: The value of information in real estate transactions,� The Review
of Economics and Statistics, 90, 599�611.
Luenberger, D. G. (1997): Optimization by vector space methods, John Wiley &
Sons.
Makris, M. and A. Pavan (2017): �Taxation under learning-by-doing,� Tech. rep.,
Working Paper.
Mirrlees, J. A. (1971): �An Exploration in the Theory of Optimum Income Taxa-
tion,� The Review of Economic Studies, 38, 175�208.
Park, Y. (2014): �Optimal taxation in a limited commitment economy,� Review of
Economic Studies, 81, 884�918.
Piketty, T., E. Saez, and S. Stantcheva (2014): �Optimal taxation of top
labor incomes: A tale of three elasticities,� American Economic Journal: Economic
Policy, 6, 230�271.
Prendergast, C. (1999): �The provision of incentives in �rms,� Journal of Eco-
nomic Literature, 37, 7�63.
55
Raj, A. (2019): �Optimal income taxation in the presence of networks of altruism,�
Working Paper.
Rochet, J.-C. (1991): �Incentives, redistribution and social insurance,� The Geneva
Papers on Risk and Insurance Theory, 16, 143�165.
Rogerson, W. P. (1985): �Repeated Moral Hazard,� Econometrica, 53, 69�76.
Rose, N. L. and C. Wolfram (2002): �Regulating executive pay: Using the tax
code to in�uence chief executive o�cer compensation,� Journal of Labor Economics,
20, S138�S175.
Rothschild, C. and F. Scheuer (2013): �Redistributive Taxation in the Roy
Model,� The Quarterly Journal of Economics, 128, 623�668.
��� (2014): �A theory of income taxation under multidimensional skill heterogene-
ity,� NBER Working Paper No. 19822.
��� (2016): �Optimal taxation with rent-seeking,� The Review of Economic Stud-
ies, 83, 1225�1262.
Sachs, D., A. Tsyvinski, and N. Werquin (2020): �Nonlinear tax incidence and
optimal taxation in general equilibrium,� Econometrica, 88, 469�493.
Saez, E. (2001): �Using Elasticities to Derive Optimal Income Tax Rates,� Review
of Economic Studies, 68, 205�229.
Scheuer, F. and I. Werning (2016): �Mirrlees meets Diamond-Mirrlees,� NBER
Working Paper No. 22076.
��� (2017): �The taxation of superstars,� The Quarterly Journal of Economics,
132, 211�270.
Schoeni, R. F. (2002): �Does unemployment insurance displace familial assistance?�
Public Choice, 110, 99�119.
Shearer, B. (2004): �Piece rates, �xed wages and incentives: Evidence from a �eld
experiment,� The Review of Economic Studies, 71, 513�534.
Sleet, C. and H. Yazici (2017): �Taxation, redistribution and frictional labor
supply,� Working Paper.
56
Stantcheva, S. (2014): �Optimal income taxation with adverse selection in the
labour market,� Review of Economic Studies, 81, 1296�1329.
��� (2017): �Optimal taxation and human capital policies over the life cycle,�
Journal of Political Economy, 125, 1931�1990.
57
A Proofs of Section 1
Concavity of the Utility of Earnings. Our analysis requires that the utility of
earnings w 7→ v (w) ≡ u (R (w)) is concave. It is easy to show that this is equivalent
to
π1 (w) π2 (w) > −γ (w) (46)
where γ (w) ≡ −R(w)u′′(R(w))u′(R(w))
is the agent's coe�cient of relative risk aversion, and
π1 (w) ≡ 1−T (w)/w1−T ′(w)
, π2 (w) ≡ wT ′′(w)1−T ′(w)
are two measures of the local rate of progressivity
of the tax schedule. Speci�cally, the parameterπ1 (w) is the ratio of the average and
marginal retention rates, and π2 (w) is (minus) the elasticity of the retention rate with
respect to income. If the tax schedule has a constant rate of progressivity p (CRP),
these variables are respectively equal to 11−p and p. Note that most of our analysis
is concerned with the incidence of tax reforms around a given initial tax schedule T .
In this case, (46) is a restriction on the initial tax code T and can be easily veri�ed
in the data for practical applications. Note moreover that the tax reform itself is not
restricted. When we characterize the optimal tax schedule within the CRP class, we
assume that u (c) = log c which implies that γ (w) = −1. It is easy to verify that in
this case condition (46) is always satis�ed regardless of the value of p.
Proof of Proposition 1. The proof of this proposition follows directly from the
results of Edmans and Gabaix (2011) since the utility of earnings v (·) is concave.
We give here a heuristic proof of the main arguments. Given the earnings contract
{w (θ, η) : y ∈ R}, an agent with ability θ and performance shock η chooses e�ort a (θ)
to maximize utility u (R (w (θ, η)))− h (a (θ)) (see equation (3)). Since y = θ (a+ η),
we have ∂w(θ,η)∂η
= ∂w(θ,η)∂a
so that the �rst-order condition reads
r (w (θ, η))u′ (R (w (θ, η)))∂w (θ, η)
∂η= h′ (a (θ)) . (47)
This equation pins down the slope of the earnings schedule that the �rm must im-
plement in order to induce the e�ort level a (θ). Integrating this incentive constraint
over η given a (θ) leads to
u (R (w (θ, η))) = h′ (a (θ)) η + k, (48)
58
for some constant k ∈ R. Since in equilibrium the participation constraint (4) must
hold with equality, the agent's expected utility must be equal to his reservation value
U (θ). Therefore, the value of k must be chosen by the �rm such that the agent's
participation constraint holds with equality. Imposing the participation constraint
with E [η] = 0 implies
k = U (θ) + h (a (θ)) . (49)
The previous two equations fully characterize the wage contract given the desired
e�ort level a (θ) and the reservation value U (θ). They imply that, for a given pair
(a (θ) , U (θ)), the wage given performance shock η satis�es:
u (R (w (θ, η))) = h′ (a (θ)) η + [U (θ) + h (a (θ))] . (50)
Next, equation (7) is obtained by taking the �rst-order condition with respect to a (θ)
in the �rm's problem (2), taking as given the earnings contract (6) required to satisfy
the workers' incentive and participation constraints (3, 4). Finally, equation (8) is
simply a rewriting of (5).
B Proofs of Section 2
Proof of Theorem 1. Recall that, given the e�ort level a and the reservation value
U (θ), earnings as a function of the noise realization η (or equivalently output y =
θ (a+ η)) satis�es:
u (w (θ, η)− T (w (θ, η))) = U (θ) + h (a (θ)) + h′ (a (θ)) η.
In response to the tax reform δT , the perturbed wage contract satis�es
u[w (θ, η) + δw (θ, η)− T (w (θ, η) + δw (θ, η))− δT (w (θ, η))
]= U (θ) + δU (θ) + h (a (θ) + δa (θ)) + h′ (a (θ) + δa (θ)) η.
59
Di�erentiating with respect to δ and evaluating at δ = 0 leads to[(1− T ′ (w (θ, η))) w (θ, η)− T (w (θ, η))
]u′ (w (θ, η)− T (w (θ, η)))
= U (θ) + [h′ (a (θ)) + h′′ (a (θ)) η] a (θ) .
Solving for w (θ, η) leads to
w (θ, η) =T (w (θ, η))
r (w (θ, η))+
U (θ)
r (w (θ, η))u′ (R (w (θ, η)))
+h′ (a (θ)) + h′′ (a (θ)) η
r (w (θ, η))u′ (R (w (θ, η)))a (θ) .
Adding and subtracting w (θ, η) a(θ)a(θ)
, that is, the earnings adjustment obtained in the
model with exogenous risk, and substituting expression (17) derived below for the
impact of the reform on expected utility U (θ), easily yields (14).
Proof of Proposition 2. Recall that the reservation value satis�es E [w (θ, η)] =
θa (θ). Hence, in response to the tax reform, we get
E [w (θ, η) + δw (θ, η)] = θ (a (θ) + δa (θ)) ,
that is, E [w (θ, η)] = θa (θ). Substituting expression (14) for w (θ, η) in this equation
leads to
θa (θ) = E
[T (w (θ, η))
r (w (θ, η))
]+ E
[1
r (w (θ, η))u′ (R (w (θ, η)))
]U (θ)
+E[
h′ (a (θ)) + h′′ (a (θ)) η
r (w (θ, η))u′ (R (w (θ, η)))
]a (θ) .
Using equation (7) that de�nes optimal e�ort and solving for U (θ) leads to (17).
Proof of Lemma 1. Recall the optimal e�ort condition
E[
h′ (a (θ)) + h′′ (a (θ)) η
(1− T ′ (w (θ, η)))u′ (w (θ, η)− T (w (θ, η)))
]= θ.
60
Following a tax reform T , the perturbed level of e�ort satis�es
E
h′ (a (θ) + δa (θ)) + h′′ (a (θ) + δa (θ)) η{1− T ′ (w (θ, η))− δT ′ (w (θ, η))
}u′{R (w (θ, η))− δT (w (θ, η))
} = θ,
where we denote w (θ, η) ≡ w (θ, η)+δw (θ, η). Taking the derivative of this expression
with respect to δ evaluated at δ = 0 gives
0 = E[
h′′ (a (θ)) + h′′′ (a (θ)) η
r (w (θ, η))u′ (R (w (θ, η)))a (θ)
]− E
[h′ (a (θ)) + h′′ (a (θ)) η
{r (w (θ, η))u′ (R (w (θ, η)))}2 (51)
×{[−T ′′ (w) w − T ′ (w)
]u′ (R (w)) + r (w)
[r (w) w − T (w)
]u′′ (R (w))
}]where the arguments (θ, η) have been removed from the second line for notational
conciseness. Suppose that the utility function is quasilinear in consumption, and the
tax schedule is initially a�ne. Equation (51) can then be rewritten as
0 = E[h′′ (a (θ)) + h′′′ (a (θ)) η
1− τa (θ)
]+ E
[h′ (a (θ)) + h′′ (a (θ)) η
(1− τ)2 T ′ (w (y | θ))]
=h′′ (a (θ))
1− τa (θ) +
h′ (a (θ))
(1− τ)2 E[T ′ (w (y | θ))
]+h′′ (a (θ))
(1− τ)2 E[ηT ′ (w (y | θ))
].
Solving for a (θ) and letting ε (θ) = h′(a(θ))a(θ)h′′(a(θ))
easily leads to (18).
Proof of Proposition 3. Solving for a (θ) in equation (51) implies that the Gateaux
derivative of e�ort is given by
a (θ) = −E
[εa,R (θ, η)
T (w (θ, η))
r (w (θ, η))w (θ, η)
]− E
[εa,r (θ, η)
T ′ (w (θ, η))
r (w (θ, η))
]
+E[(εa,R (θ, η)− p (w (θ, η)) εa,r (θ, η))
w (θ, η)
w (θ, η)
],
where p (w) ≡ wT ′′(w)1−T ′(w)
is the local rate of progressivity of the tax schedule, and where
εa,R, εa,r denote the income e�ect parameter and compensated elasticity along the
61
linearized budget constraint, equal to
εa,r (θ, η) =h′ (a (θ))
a (θ)h′′ (a (θ))
(1 + h′′(a(θ))
h′(a(θ))η)
1r(w(θ,η))u′(R(w(θ,η)))
E[(
1 + h′′′(a(θ))h′′(a(θ))
η)
1r(w(θ,η))u′(R(w(θ,η)))
]and
εa,R (θ, η) =h′ (a (θ))
a (θ)h′′ (a (θ))
(1 + h′′(a(θ))
h′(a(θ))η)w(θ,η)u′′(R(w(θ,η)))
(u′(R(w(θ,η))))2
E[(
1 + h′′′(a(θ))h′′(a(θ))
η)
1r(w(θ,η))u′(R(w(θ,η)))
] .Now substitute equations (14, 17) for w (θ, η) in the previous equation, and solve for
a (θ) to get{1− E
[(εa,R (θ, η)− p (w (θ, η)) εa,r (θ, η))
h′ (a (θ)) + h′′ (a (θ)) η
v′ (w (θ, η))w (θ, η)
]}a (θ)
= −E
[εa,R (θ, η)
T (w (θ, η))
(1− T ′ (w (θ, η)))w (θ, η)
]− E
[εa,r (θ, η)
T ′ (w (θ, η))
1− T ′ (w (θ, η))
]
+E
[(εa,R (θ, η)− p (w (θ, η)) εa,r (θ, η))
T (w (θ, η))
r (w (θ, η))w (θ, η)
]
−E
[(εa,R (θ, η)− p (w (θ, η)) εa,r (θ, η))
1v′(w(θ,η))w(θ,η)
E[(v′ (w (θ, ·)))−1]
]E
[T (w (θ, η))
r (w (θ, η))
].
Collecting terms leads to
a (θ)
a (θ)= −E
[εEw,R (θ, η)
T (w (θ, η))
r (w (θ, η))w (θ, η)
]− E
[εEw,r (θ, η)
T ′ (w (θ, η))
r (w (θ, η))
]
where the income e�ect parameter and compensated elasticity now account for the
nonlinearity of the budget constraint (due to the fact that w depends on T ) and the
endogeneity of the reservation value (due to the fact that w depends on U) and are
given by
εEw,R (θ, η) =
p (w (θ, η)) εa,r (θ, η) +E[(εa,R(θ,·)−p(w(θ,·))εa,r(θ,·)) 1
v′(w(θ,·))w(θ,·)
]E[
1v′(w(θ,·))
] w (θ, η)
1 + E[(p (w (θ, η′)) εa,r (θ, η′)− εa,R (θ, η′))
(1 + h′′(a(θ))
h′(a(θ))η′)
a(θ)h′(a(θ))v′(w(θ,η′))w(θ,η′)
]
62
and
εEw,r (θ, η) =εa,r (θ, η)
1 + E[(p (w (θ, η′)) εa,r (θ, η′)− εa,R (θ, η′))
(1 + h′′(a(θ))
h′(a(θ))η′)
a(θ)h′(a(θ))v′(w(θ,η′))w(θ,η′)
] .This concludes the proof.
C Allowing for Non-Constant E�ort
In this section we extend our tax incidence analysis to the case where the �rm can o�er an
e�ort schedule a(θ, η), rather than imposing a constant e�ort level a(θ) as in the main body
of the paper. The �rm which employs a worker with productivity θ solves
maxw(θ,·),a(θ,·)
E[θa(θ, η)− w(θ, η)]
subject to incentive-compatibility constraints
a(θ, η) ∈ argmaxa
v(w(θ, η))− h(a(θ, η)) for all η,
and the participation constraint
E [v(w(θ, η))− h(a(θ, η))] ≥ U(θ).
From Edmans and Gabaix (2011) we know that the optimal contract satis�es
v(w(θ, η)) = K + h(a(θ, η)) +
ˆ η
ηh′(a(θ, x))dx,
where K ∈ R. Using the binding participation constraint to solve for K, we obtain
v(w(θ, η)) = U(θ) + h(a(θ, η)) +
ˆ η
ηh′(a(θ, x))dx− E
[ˆ η
ηh′(a(θ, x))dx
]. (52)
Unlike in the model with a single e�ort level, the slope of the ex-post utility potentially
varies with performance. In particular, when the e�ort schedule is di�erentiable, we have
∂v(w(θ, η))
∂η= h′(a(θ, η))
(1 +
∂a(θ, η)
∂η
).
Edmans and Gabaix (2011) show that all incentive-compatible e�ort schedules are such that
63
a(θ, η) + η is increasing with η. This implies that the above slope is non-negative and that
earnings are increasing with performance.
The �rst-order condition with respect to e�ort level a(θ, η′), assuming an interior solu-
tion, reads
θfη(η′) =
∂E [w(θ, η)]
∂a(θ, η′), (53)
where fη denotes the density of the performance shock η. To compute the derivative of the
expected wage on the right-hand side, �rst consider the derivative of the wage w(θ, η) with
respect to the e�ort level a(θ, η′), which we can compute using equation (52):
∂w(θ, η)
∂a(θ, η′)=
−(1− Fη(η′))h
′′(a(θ,η′))v′(w(θ,η)) if η < η′,
h′(a(θ,η′))v′(w(θ,η′)) − (1− Fη(η′))h
′′(a(θ,η′))v′(w(θ,η′)) if η = η′,
h′′(a(θ,η′))v′(w(θ,η)) − (1− Fη(η′))h
′′(a(θ,η′))v′(w(θ,η)) if η > η′.
(54)
Notice that increasing the e�ort level conditional on the output shock η′ requires lowering
earnings for worse performance (η < η′) and increasing earnings for better performance
(η > η′). Taking expectations over η yields
∂E [w(θ, η)]
∂a(θ, η′)=
h′(a(θ, η′))
v′(w(θ, η′))fη(η
′)
+(1− Fη(η′))(E[h′′(a(θ, η′))
v′(w(θ, η))| η ≥ η′
]− E
[h′′(a(θ, η′))
v′(w(θ, η))
]).
Plugging this expression into the �rst-order condition (53) yields the following �rst-order
condition with respect to e�ort:
θfη(η′) =
h′(a(θ, η′))
v′(w(θ, η′))fη(η
′) (55)
+(1− Fη(η′))(E[h′′(a(θ, η′))
v′(w(θ, η))| η ≥ η′
]− E
[h′′(a(θ, η′))
v′(w(θ, η))
]).
The left-hand side is the marginal bene�t from providing higher e�ort, equal to the expected
output gain. The right-hand side consists of two terms: the marginal rate of substitution
(MRS) and the marginal cost of incentives (MCI). The MRS is the expected wage cost of
compensating the agent for higher e�ort in contingency η′. The MCI, on the other hand, is
the expected wage cost of making the adjusted e�ort schedule incentive-compatible. As we
noted above, increasing e�ort level conditional on output η′ requires increasing earnings for
better performance and reducing them for worse performance. Since earnings are increas-
ing with performance and v is concave, such earnings adjustments are costly for the �rm:
64
MCI ≥ 0.
The following theorem extends the results of Theorem 1 to the model where non-
degenerate e�ort schedules are allowed.
Theorem 3. Denote by a (θ, η) the change in e�ort schedule induced by the reform. Suppose
that the original e�ort schedule is such that a(θ, η) > 0 for all η. The �rst-order e�ect of
the tax reform T on earnings w (θ, ·) is given by
w (θ, η) = wex (θ, η) + wco (θ, η) + wpp (θ, η)
where wex (θ, η) = θa(θ, η), the crowding-out e�ect wco has mean zero and is given by
wco (θ, η) =T (w (θ, η))
r (w (θ, η))− (v′ (w (θ, η)))−1
E[(v′ (w (θ, ·)))−1
] E[ T (w (θ, ·))r (w (θ, ·))
]
and the performance-pay e�ect wpp has mean zero and is given by
wpp (θ, η) =h′(a(θ, η))
v′(w(θ, η))a(θ, η) +
ˆ η
η
h′′(a(θ, x))
v′(w(θ, η))a(θ, x)dx
−E
[ˆ η′
η
h′′(a(θ, x))
v′(w(θ, η))a(θ, x)dx
]− θa(θ, η).
Proof. Consider a reform R = −T . Following the same steps as in the proof of Theorem 1,
we can show that the change in wages is equal to
w(θ, η) = −R(w(θ, η))r(w(θ, η))
+U(θ)
v′(w(θ, η))+
ˆ η
η
∂w(η)
∂a(η′)a(η′)dη′. (56)
Using prior results on ∂w(η)∂a(η′) from equation (54), we obtain
ˆ η
η
∂w(η)
∂a(η′)a(η′)dη′ =
h′(a(θ, η))
v′(w(θ, η))a(θ, η)
+
ˆ η
η
h′′(a(θ, x))
v′(w(θ, η))a(θ, x)dx− E
[ˆ η′
η
h′′(a(θ, x))
v′(w(θ, η))a(θ, x)dx
].
65
Note that the expected value of the above expression is
ˆ η
η
ˆ η
η
∂w(θ, η)
∂a(θ, η′)a(θ, η′)dη′dFη(η) =
ˆ η
η
ˆ η
η
∂w(θ, η)
∂a(θ, η′)dFη(η)a(θ, η
′)dη′
=
ˆ η
ηfη(η
′)θa(θ, η′)dη′ = E[θa(θ, η′)]
where in the �rst equality we changed the order of integration and in the second equality
we applied the �rst-order condition for e�ort (53). This implies that the performance-pay
e�ect has mean zero. By the free-entry condition we have E[θa(η′)] = E[w(η′)]. Hence,
the �rst two terms of (56) � the crowding-out e�ect � have mean zero. It follows that
U(θ) = E[R(w(θ,η))r(w(θ,η))
]E[v′(w(θ, η))−1
]−1.
It follows from Theorem 3 that the crowding-out e�ect wco (θ, η) is exactly the
same as in the simpler setting studied in the main body of the paper (equation (15)).
The performance-pay e�ect wpp (θ, η) is more complex than in the simpler model
(equation (16)). However, it is a natural extension of the expression obtained under
the constant-e�ort assumption. Namely, the only substantial di�erence is that the
term h′′(a(θ))a(θ)v′(w(θ,η)) η from (16), which measures the change in earnings necessary to elicit
higher e�ort, is now replaced by the more general expression´ ηηh′′(a(θ,η′))a(θ,η′)
v′(w(θ,η)) dη′. The
interpretation of this term is analogous to its counterpart in the simpler model. The
only added di�culty is that we now have to evaluate the change in the entire e�ort
schedule a(θ, ·) in response to the reform, rather than a scalar value a(θ).
D Proofs of Section 3
Proof of Theorem 2. Equation (14) implies that the excess burden of the reform
T is equal to
EB(T, T ) = −ˆ
Θ
E [T ′ (w (θ, η)) wex (θ, η)] dF (θ)
−ˆ
Θ
E
[T ′ (w (θ, η))
(T (w (θ, η))
r (w (θ, η))+
U (θ)
v′ (w (θ, η))
)]dF (θ)
−ˆ
Θ
E[T ′ (w (θ, η))
(h′ (a (θ)) + h′′ (a (θ)) η
v′ (w (θ, η))− w (θ, η)
a (θ)
)]a (θ) dF (θ) ,
66
where U (θ) is given by (17). Since
E
[T (w (θ, η))
r (w (θ, η))+
U (θ)
v′ (w (θ, η))
]= E
[h′ (a (θ)) + h′′ (a (θ)) η
v′ (w (θ, η))− w (θ, η)
a (θ)
]= 0,
we can rewrite the excess burden EB(T, T ) as
−ˆ
Θ
E [T ′ (w (θ, η)) wex (θ, η)] dF (θ)
−ˆ
Θ
Cov
(T ′ (w (θ, η)) ,
T (w (θ, η))
r (w (θ, η))+
U (θ)
v′ (w (θ, η))
)dF (θ)
−ˆ
Θ
Cov
(T ′ (w (θ, η)) ,
[h′ (a (θ)) + h′′ (a (θ)) η
v′ (w (θ, η))− w (θ, η)
a (θ)
]a (θ)
)dF (θ) .
This expression easily leads to equation (28).
Next, the welfare gain of the tax reform is given by
WG(T, T ) = −1
λ
ˆΘ
α (θ)1
E[(v′ (w (θ, η)))−1]E
[T (w (θ, η))
r (w (θ, η))
]dF (θ)
= −ˆ
Θ
E
[1
λα (θ)
1
E[(v′ (w (θ, ·)))−1] T (w (θ, η))
r (w (θ, η))
]dF (θ)
= −ˆ
Θ
E
[1
λα (θ)
1r(w(θ,η))u′(R(w(θ,η)))
E[(v′ (w (θ, ·)))−1]u′ (R (w (θ, η))) T (w (θ, η))
]dF (θ) .
This leads to equation (29).
When taxes are unrestricted, the marginal value of public funds λ can be ob-
tained as follows. Consider a uniform lump-sum transfer, represented by the reform
T ∗∗ (w) = −1 for all w. Denoting by a∗∗ (θ) the e�ect of this reform on e�ort (via a
pure income e�ect), it a�ects government revenue by R(T, T ∗∗) equal to
1 +
ˆΘ
E
T ′ (w (θ, η))
− 1
r (w (θ, η))+
1v′(w(θ,η))
E[
1v′(w(θ,·))
]E [ 1
r (w (θ, ·))
] dF (θ)
+
ˆΘ
E[T ′ (w (θ, η))
(h′ (a (θ)) + h′′ (a (θ)) η
v′ (w (θ, η))
)]a∗∗ (θ) dF (θ) .
Consider now the reform in direction T ∗∗ (w) = −1, normalized to reduce government
67
revenue by 1 dollar after all behavioral responses have been accounted for. This reform
is represented by T ∗ = −1
|R(T,T ∗∗)| . Its e�ect on government revenue is R(T, T ∗) = −1
by construction, and its e�ect on social welfare is given by
λ ≡ W(T, T ∗)∣∣∣R(T, T ∗∗)∣∣∣ =
1∣∣∣R(T, T ∗∗)∣∣∣ˆ
Θ
α (θ)1
E[
1v′(w(θ,η))
]E [ 1
r (w (θ, η))
]dF (θ) .
Finally, we show that the optimal tax schedule must satisfy equation (27) for
any tax reform T . To do so, consider an arbitrary tax reform T , normalized with-
out loss of generality so that its mechanical e�ect is equal to 1 dollar, that is,´E[T (w (θ, η))
]dF (θ) = 1. Denote its e�ect on government revenue by R(T, T )
and its e�ect on social welfare by W(T, T ). Redistribute any tax revenue gain (or
levy any tax revenue loss) from this reform via the reform T ∗ described in the previous
paragraph, that is, a uniform lump-sum transfer that reduces government budget by
1 dollar. The tax reform
T + R(T, T )T ∗ ≡ T + R(T, T )−1∣∣∣R(T,−1)
∣∣∣is, by construction, budget-neutral. Its e�ect on social welfare is given by
W(T, T ) + R(T, T )W(T, T ∗)∣∣∣R(T,−1)
∣∣∣ = W(T, T ) + λR(T, T ).
Of course, the marginal value of public funds λ is the Lagrange multiplier on the
government budget constraint. Now, the optimal tax schedule is such that every
budget-neutral tax reform has a zero e�ect on social welfare (see, for instance, Luen-
berger (1997)), that is, for all T ,
W(T, T ) + λR(T, T ) = 0.
But recall that 1λW(T, T ) = WG(T, T ) and R(T, T ) = 1− EB(T, T ), by de�nition of
the welfare gains and the excess burden. This immediately implies the characteriza-
tion (27).
68
E Proofs of Section 4
Proof of Corollary 1. Suppose that the tax schedule is CRP, so that R (w) =1−τ1−pw
1−p. Equation (48) then implies that in order to induce agents with ability
θ to choose the same e�ort a regardless of their noise realization η, the earnings
contract must satisfy:
log (w (θ, η)) =a
1ε
1− pη − 1
1− plog
(1− τ1− p
)+
k
1− p, (57)
for some k ∈ R. Thus, log-earnings are linear in the performance shock η = yθ− a
that the �rm infers upon observing realized output y. Imposing that the agent's
participation constraint holds with equality pins down the value of k as a function of
U (θ). Namely, equation (49) implies:
k = U (θ) +1
1 + 1ε
a1+ 1ε
and hence
log (w (θ, η)) =a
1ε
1− pη +
1
1− p1
1 + 1ε
a1+ 1ε − 1
1− plog
(1− τ1− p
)+U (θ)
1− p. (58)
Below we derive the equilibrium value of the reservation utility U (θ) and obtain the
equilibrium wage given (a, η):
log (w (θ, η)) = log (θa) +a
1ε
1− pη − 1
2
(a
1ε
1− p
)2
σ2η. (59)
De�ne the sensitivity of the before-tax and after-tax wages to output in the optimal
contract by the semi-elasticities ψ (θ, η) ≡ 1w(θ,η)
∂w(θ,η)∂η
and ψc (θ, η) ≡ 1R(w(θ,η))
∂R(w(θ,η))∂η
,
respectively. We have ψ (θ, η) = a1/ε
1−p and ψc (θ, η) = a1/ε. Both ψ (θ, η) and ψc (θ, η)
depend on the tax schedule through its e�ect on optimal e�ort, and there is an addi-
tional crowding-out e�ect on the before-tax sensitivity.
69
Next, since v′ (w) = r(w)R(w)
= 1−pw, the �rm's �rst-order condition reads
θ = E[
h′ (a)
v′ (w (θ, η))+
h′′ (a)
v′ (w (θ, η))η
]=
a1ε
1− pE [w (θ, η)] +
1
ε
a1ε−1
1− pE [w (θ, η) η] .
We have
E [w (θ, η)] = E[ea1ε
1−pη
]e
11−p
1
1+1εa1+
1ε− 1
1−p log( 1−τ1−p)+
U(θ)1−p
= e12
a2ε
(1−p)2σ2ηe
11−p
1
1+1εa1+
1ε− 1
1−p log( 1−τ1−p)+
U(θ)1−p .
where we used the fact that that η is normally distributed with mean 0 and variance
σ2η so that E [exη] = e
12x2σ2
η for any x. Moreover, we have E [ηexη] = xσ2e12x2σ2
η for
any x. Indeed, let ϕ the (normal) pdf of η. We have ϕ′ (η) = − ησ2ηϕ (η), so that
E [ηexη] =´ηexηϕ (η) dη = −σ2
η
´exηϕ′ (η) dη = xσ2
η
´exηϕ (η) dη = xσ2
ηe12x2σ2
η , where
the third equality follows from an integration by parts.
E [w (θ, η) η] = E[ηe
a1ε
1−pη
]e
11−p
1
1+1εa1+
1ε− 1
1−p log( 1−τ1−p)+
U(θ)1−p
=a
1ε
1− pσ2ηe
12
a2ε
(1−p)2σ2ηe
11−p
1
1+1εa1+
1ε− 1
1−p log( 1−τ1−p)+
U(θ)1−p .
Plugging these expressions into the �rm's �rst order condition leads to
θa =
[a1+ 1
ε
1− p+
1
ε
a2ε
(1− p)2σ2η
]e
12
a2ε
(1−p)2σ2ηe
11−p
1
1+1εa1+
1ε− 1
1−p log( 1−τ1−p)+
U(θ)1−p
and hence
a1+ 1ε
1− p+
1
ε
a2ε
(1− p)2σ2η = θae
− 11−p
1
1+1εa1+
1ε− 1
2a2ε
(1−p)2σ2η+ 1
1−p log( 1−τ1−p)−
U(θ)1−p
Now use the free-entry condition: equation (5) and the expression derived above for
70
E [w (y | θ)] leads to
e1
1−p1
1+1εa1+
1ε+ 1
2a2ε
(1−p)2σ2η− 1
1−p log( 1−τ1−p)+
U(θ)1−p = θa. (60)
Combining this equation with the �rst-order condition for optimal e�ort therefore
leads to:
a1+ 1ε +
1
ε
a2ε
1− pσ2η = 1− p. (61)
Using the de�nition ψ ≡ a1ε
1−p for the pass-through easily leads to (31). Note that if
ε = 1, we obtain optimal e�ort in closed form:
a =
(1
1− p+
σ2
(1− p)2
)−1/2
. (62)
Finally, taking logs in equation (60) and de�ning ψ ≡ a1ε
1−p easily leads to (32).
Proof of Corollary 2. Consider a tax reform that marginally raises the rate of pro-
gressivity p by a small amount δ → 0. The direction T of this tax reform satis�es(w − 1− τ
1− p− δw1−p−δ
)−(w − 1− τ
1− pw1−p
)= δT (w) + o (δ) .
This leads to the representation (34).
Di�erentiating equation (61) with respect to (1− p) leads to[(1 +
1
ε
)a
1ε +
2σ2η
(1− p) ε2a
2ε−1
]∂a
∂ (1− p)−
σ2η
(1− p)2 εa
2ε = 1,
and hence [(1 +
1
ε
)a
1ε
+1 +2σ2
η
(1− p) ε2a
2ε
]εa,1−p −
σ2η
(1− p) εa
2ε = 1− p.
Using the �rst-order condition again to substitute for 1− p leads to
εa,1−p =a
1ε
+1 +2σ2η
(1−p)εa2ε(
1 + 1ε
)a
1ε
+1 +2σ2η
(1−p)ε2a2ε
.
71
We conclude by expressing this elasticity in terms of the pass-through elasticities. We
have ψ = a1ε
1−p and εψ,a = 1ε. We can thus write
εa,1−p =a
1ε
+1 + 2 (1− p) εψ,aψ2σ2η(
1 + 1ε
)a
1ε
+1 + 2ε
(1− p) εψ,aψ2σ2η
.
But the �rst-order condition for labor e�ort reads
a1+1/ε = (1− p)(1− εψ,aψ2σ2
η
).
Substituting into the previous equation and rearranging terms leads to
εa,1−p =1 + εψ,aψ
2σ2η(
1 + 1ε
)+(
1ε− 1)εψ,aψ2σ2
η
.
This easily yields equation (35).
Expression (36) for wex (θ, η) is immediate since aa
= 11−pεa,1−p by de�nition. To
obtain the performance-pay e�ect (38), we show that for any (not necessarily CRP)
tax reform T ,
wpp (θ, η) = εψ,a(ψη − ψ2σ2
η
)wex (y | θ) . (63)
This equation implies that wpp (θ, η) has mean zero, but is dispersed around the mean
whenever εψ,a > 0, since the map η 7→ ψη− ψ2σ2η is strictly increasing. To prove this
equation, note that
h′ (a (θ)) + h′′ (a (θ)) η
v′ (w (θ, η))a (θ) =
R (w (θ, η))
r (w (θ, η))
(a1+ 1
ε +1
εa
1ε η
)a
a
=1
1− pw (θ, η)
[a1+ 1
ε + (1− p) εψ,aψη] aa
=1
1− pw (θ, η)
[(1− p)
(1− εψ,aψ2σ2
η
)+ (1− p) εψ,aψη
] aa
=[1 + εψ,a
(ψη − ψ2σ2
η
)]w (θ, η)
a
a,
where the third equality uses the �rst-order condition for labor e�ort.
72
Next, we compute the crowding-out e�ect wco (θ, η). We have
T (w (θ, η))
r (w (θ, η))=
1
(1− τ) (w (θ, η))−p
(log (w (θ, η))− 1
1− p
)1− τ1− p
(w (θ, η))1−p
=
(ln (θa) + ψη − 1
2ψ2σ2
η −1
1− p
)1
1− pw (θ, η) ,
and
U (θ)
v′ (w (θ, η))= −
1v′(w(θ,η))
E[
1v′(w(θ,·))
]E[ T (w (θ, ·))r (w (θ, ·))
]
= −1
1−pw (θ, η)
E[
11−pw (θ, ·)
]E [(log (w (θ, ·))− 1
1− p
)1
1− pw (θ, ·)
]
= − 1
1− pw (θ, η)
[ln (θa) +
1
2ψ2σ2
η −1
1− p
].
Summing these expressions yields equation (37).
Summing all the e�ects, we can easily verify that the incidence of the reform is
given by
∂ logw (θ, η)
∂ (1− p)= εa,1−p + (εψ,aεa,1−p + εψ,1−p)
(ψη − ψ2σ2
η
),
which is the expression as we would obtain by directly di�erentiating logw (θ, η) =
log (θa) + ψη − 12ψ2σ2
η.
Note that the earnings adjustment wi (θ, η) contributes to raising the sensitivity
of log-earnings to performance shocks (pass-through function) i�
∂
∂ηlog (w (θ, η) + δwi (θ, η))− ∂
∂ηlog (w (θ, η)) > 0.
For δ close enough to zero this inequality is equivalent to
∂w (θ, η)
∂η> wi (θ, η)
∂ log (w (θ, η))
∂η.
73
The expressions derived above imply
∂wpp (θ, η)
∂η=
wpp (θ, η)
wex (θ, η)
∂wex (θ, η)
∂η+ wpp (θ, η)
ψ
ψη − ψ2σ2η
= wpp (θ, η)∂ log (w (θ, η))
∂η+ ψεψ,aw (θ, η)
a
a.
Thus, since a < 0, we obtain ∂wpp(θ,η)
∂η< wpp (y | θ) ∂ log(w(θ,η))
∂ηand the reform lowers
the sensitivity of log-earnings to output. Analogously, the crowding-out e�ect lowers
raises the sensitivity of log-earnings since εψ,1−p < 0.
Finally, we derive equation (40). We have
wco (θ, η)
wpp (θ, η)=
− 11−pεψ,1−p
(ψη − ψ2σ2
η
)w (θ, η)
− 11−pεψ,aεa,1−p
(ψη − ψ2σ2
η
)w (θ, η)
=εψ,1−p
εψ,aεa,1−p= − ε
εa,1−p
=o(ψ2σ2
η)− (1 + ε)
(1 +
1− ε1 + ε
1
εψ2σ2
η −1
εψ2σ2
η
)= 2ψ2σ2
η − ε− 1.
where the second to last equality uses equation (35).
Proof of Corollary 3. Suppose that Assumption 2 holds, and that ability types
are lognormally distributed, that is, log θ ∼ N (µθ, σ2θ). We substitute formula (34)
for the tax reform T in equations (28) and (29) to compute each term of the excess
burden and the welfare gains of marginally raising progressivity. The algebra is
straightforward but tedious. It is available upon request and we only summarize our
results here. The mechanical e�ect of the progressive tax reform is equal to
ˆΘ
E[T (w (θ, η))
]dF (θ) =
[µθ + ln a− 1
1− p+ (1− p)σ2
θ +
(1
2− p)ψ2σ2
]C
where C denotes aggregate consumption and is given by
C ≡ˆ
Θ
E [R (w (θ, η))] dF (θ) =1− τ1− p
a1−pe(1−p)µθ+ 12
(1−p)2σ2θe−
12p(1−p)ψ2σ2
η .
74
Let Y denotes aggregate output, given by
Y ≡ˆ
Θ
(θa) dF (θ) = aeµθ+σ2θ2 .
Note that in our baseline economy with government expenditures G, we have Y =
C +G. The excess burden of the tax reform in the model with exogenous risk (that
is, the �rst integral in (28)) is given by
ˆΘ
E[T ′ (w (θ, η))w (θ, η)
a (θ)
a (θ)
]dF (θ) = − 1
1− p(Y − (1− p)C) εa,1−p.
The �scal externality due to the �rst component of crowding-out is given b
ˆΘ
Cov
(T ′ (w (θ, η)) ,
T (w (θ, η))
r (w (θ, η))
)dF (θ)
=(epψ
2σ2η − 1
)[µθ + ln a− 1
1− p+ (1− p)σ2
θ
]C +
(epψ
2σ2η − 1 + 2p
) 1
2ψ2σ2
ηC.
The �scal externality due to the second element of crowding-out is given by
ˆΘ
Cov
(T ′ (w (θ, η)) ,
U (θ)
v′ (w (θ, η))
)dF (θ)
= −(epψ
2σ2η − 1
)[µθ + ln a− 1
1− p+ (1− p)σ2
θ
]C −
(epψ
2σ2η − 1
) 1
2ψ2σ2
ηC.
Thus the total �scal externality from the crowding-out e�ect is equal to pψ2σ2ηC. The
�scal externality due to the performance-pay e�ect is given by
ˆΘ
Cov
(T ′ (w (θ, η)) ,
h′ (a (θ)) + h′′ (a (θ)) η
v′ (w (θ, η))a (θ)− wex (θ, η)
)dF (θ)
= −p1
εψ2σ2
ηεa,1−pC.
Suppose that the social welfare weights are given by α (θ) = e−α log θ´e−α log θ′dF (θ′)
for all θ,
75
for some α ≥ 0. Then the e�ect of the tax reform on social welfare is given by
−ˆ
Θ
E
[(v′ (w (θ, η)))−1
E[(v′ (w (θ, ·)))−1]α (θ)u′ (R (w (θ, η))) T (w (θ, η))
]dF (θ)
= −µθ − ln a+1
1− p− 1
2ψ2σ2
η + ασ2θ .
Third, we compute the marginal value of public funds λ in the loglinear model,
when the tax code is restricted to the CRP class. To do so, �rst consider a reform of
the parameter τ , represented by formula (64). By de�nition, the parameter lambda
is the e�ect on social welfare caused by a tax reform in this direction, normalized to
raise government revenue by 1 dollar. The mechanical e�ect of the (non-normalized)
reform (64) is equal to
ˆΘ
E[T (w (θ, η))
]dF (θ) =
C
1− τ.
Since the elasticity of labor e�ort is εa,1−τ = 0, the standard excess burden and the
�scal externality caused by the performance-pay e�ect are both equal to zero,
ˆΘ
E [T ′ (w (θ, η)) wex (θ, η)] dF (θ)
=
ˆΘ
Cov (T ′ (w (θ, η)) , wpp (θ, η)) dF (θ) = 0.
The �scal externalities caused by the two elements of crowding-out are given by
ˆΘ
Cov
(T ′ (w (θ, η)) ,
T (w (θ, η))
r (w (θ, η))
)dF (θ)
= −ˆ
Θ
Cov
(T ′ (w (θ, η)) ,
U (θ)
v′ (w (θ, η))
)dF (θ) =
(epψ
2σ2η − 1
) C
1− τ.
The welfare e�ect of the tax reform is given by
−ˆ
Θ
E
[(v′ (w (θ, η)))−1
E[(v′ (w (θ, ·)))−1]α (θ)u′ (R (w (θ, η))) T (w (θ, η))
]dF (θ) = − 1
1− τ.
Now, normalize the tax reform of the tax rate τ so that it delivers $1 of revenue.
76
Since the sum of all the �scal externalities is zero, the increase in government revenue
of the tax reform T = 11−pw
1−p is simply of mechanical e�ect C1−τ . Thus, we consider
the normalized tax reform
T ∗ =1− τC× T =
1
C× 1− τ
1− p(w (θ, η))1−p .
The welfare impact of distributing an additional dollar of tax revenue via a reduction
of the parameter τ is therefore equal to the welfare e�ect of the reform −T ∗, whichis equal to
λ =1− τC× 1
1− τ=
1
C.
This gives the marginal value of public funds λ in this setting and concludes the
proof.
Proof of Proposition 4. We give two proofs of this result. First, we apply formula
(27) using the explicit expressions of each term derived in the proof of Proposition 3
above. We must have, letting α (θ) = 1 for all θ,
0 =
[µθ + ln a− 1
1− p+ (1− p)σ2
θ +
(1
2− p)ψ2σ2
η
]C
−[µθ + ln a− 1
1− p+
1
2ψ2σ2
η
]C
− 1
1− p(Y − (1− p)C) εa,1−p + pψ2σ2
ηC −[p
1
εψ2σ2
η
]εa,1−pC
=
[(1− p)σ2
θ +
(− p
1− pY
C− p1
εψ2σ2
η
)εa,1−p
]C.
Since government expenditures are equal to G, we have
Y
C=
Y
Y −G=
1
1− g
77
where g ≡ G/Y . We thus obtain
σ2θ
εa,1−p=
1
(1− p)2
(1
1− g− (1− p)
)εa,1−p +
p
1− p1
εψ2σ2
η
=p
(1− p)2
[1 +
g
(1− g) p+ (1− p) 1
εψ2σ2
η
],
which easily yields the result.
The second proof consists of directly calculating the optimal rate of progressivity
in the loglinear model by equating to zero the derivative of social welfare in this
environment. To do so, recall that the earnings schedule of agents with ability θ can
be written as
log (w (θ, η)) = log (θa) + ψη − 1
2(ψση)
2
and their expected utility as
U (θ) = log
(1− τ1− p
)+ (1− p) log (θa)− 1
2(1− p) (ψση)
2 − h (a) .
Utilitarian social welfare is therefore equal to
ˆΘ
U (θ) dF (θ) = (1− p)µθ + (1− p) log a− (1− p)ψ2σ2
η
2− h (a) + log
(1− τ1− p
).
The �rst-order condition for e�ort, taking tax rates as given, reads
0 =∂U (θ)
∂a= (1− p) 1
a− (1− p)ψσ2
η
∂ψ
∂a− h′ (a) .
Now recall that expected pre-tax and post-tax earnings are respectively given by
E [w (θ, η)] = θa and E[(w (θ, η))1−p] = (θa)1−p e−pa
2ε σ2η
2(1−p) , so that government revenue
is equal to
ˆΘ
E [R (w (θ, η))] f (θ) dθ = aeµθ+σ2θ2 − 1− τ
1− pe−
pa2ε σ2η
2(1−p) a1−pe(1−p)µθ+(1−p)2 σ2θ2 .
78
Budget balance thus requires
1− τ1− p
=aeµθ+
σ2θ2 −G
e−pa
2ε σ2η
2(1−p) a1−pe(1−p)µθ+(1−p)2σ2θ2
=(1− g) aeµθ+
σ2θ2
e−pa
2ε σ2η
2(1−p) a1−pe(1−p)µθ+(1−p)2σ2θ2
.
As a result, maximizing with respect to 1− p leads to:
0 = µθ + log a+ (1− p) 1
a
∂a
∂ (1− p)− h′ (a)
∂a
∂ (1− p)−ψ2σ2
η
2
− (1− p)ψσ2η
[∂ψ
∂ (1− p)+∂ψ
∂a
∂a
∂ (1− p)
]+∂ log
(1−τ1−p
)∂ (1− p)
,
with
∂ log(
1−τ1−p
)∂ (1− p)
=g
1− g∂ log a
∂ (1− p)− µθ − (1− p)σ2
θ − log a+ p1
a
∂a
∂ (1− p)
−(
1
2− p)ψ2σ2
η + p (1− p)ψσ2η
[∂ψ
∂ (1− p)+∂ψ
∂a
∂a
∂ (1− p)
].
We therefore obtain
0 =
[(1− p) 1
a− h′ (a)− (1− p)ψσ2
η
∂ψ
∂a
]∂a
∂ (1− p)+ p
1
a
∂a
∂ (1− p)+
g
1− g∂ log a
∂ (1− p)
− (1− p)σ2θ − (1− p)ψ2σ2
η − (1− p)2 ψσ2η
∂ψ
∂ (1− p)+ p (1− p)ψσ2
η
∂ψ
∂a
∂a
∂ (1− p).
Using the �rst-order condition for e�ort leads to
0 =1
1− p
[p+
g
1− g
]εa,1−p + pψ2σ2
ηεψ,aεa,1−p
− (1− p)[σ2θ + ψ2σ2
η
]− (1− p)ψ2σ2
ηεψ,1−p.
Rearranging this equation leads to the result.
Further Examples of Tax Reforms. We now study the incidence of additional
examples of tax reforms.
Lump-Sum Tax Increase on High Incomes. We focus on the top earners, whose
incomes are located in the highest bracket characterized by a constant tax rate τtop
79
and an income threshold wtop. Thus, the baseline tax schedule T is locally a�ne,
that is, T (w) = T (wtop) + τtop (w − wtop) for all w > wtop. Moreover, we assume
that w (y | θ) > wtop for all y. A uniform lump-sum increase of the income tax
liabilities of these agents is represented by the tax reform T (w) = 1 for all w > wtop.
(Equivalently, T can be any positive constant.) Indeed, the perturbed tax schedule is
then given by T + δT = T + δ. Thus, the tax function is shifted up by the constant
δ < 0. The value of the Gateaux derivative Ψ(T, T ) then gives the �rst-order e�ect
of this lump-sum transfer on the functional Ψ as its size δ becomes small.
Applying formulas (15) and (16) leads to the following results. The earnings
adjustment caused by crowding-out is equal to
wco (θ, η) =1
1− τ
(1− (v′ (w (θ, η)))−1
E[(v′ (w (θ, ·)))−1]
).
Intuitively, a uniform lump-sum tax increase can be fully absorbed by the �rm via a
counteracting lump-sum increase in earnings, without any change in private insurance.
Thus, the �rst element of crowding-out is constant. On the other hand, the second
element of crowding-out is decreasing in the performance shock. This is because
the tax increase reduces expected utility U (θ). To preserve incentive compatibility,
this is achieved by reducing pre-tax earnings by larger amounts for higher-income
workers, since their marginal utility is smaller. As a result, ∂∂ηwco (θ, η) < 0, so that
the crowding-out e�ect reduces the sensitivity of pre-tax earnings to performance.
On the other hand, the sum of the standard labor supply e�ect wex (θ, η) and the
performance-pay e�ect wpp (θ, η) is increasing, so that the labor supply responses
raise the performance sensitivity of the contract. Indeed, a lump-sum tax increase
creates a pure income e�ect and hence raises optimal e�ort, that is, a (θ) > 0. Im-
plementing this higher e�ort level requires an increase in the sensitivity of earnings
to performance shocks. Overall, a uniform tax increase can lead to either a spread or
a contraction in the pre-tax earnings schedule, depending on the size of the income
e�ect for high-income earners.
Marginal Tax Rate Increase on High Incomes. Focusing again on the highest
income tax bracket, an increase in the top marginal tax rate is represented by the tax
reform T (w) = w − wtop for all w > wtop. Indeed, the perturbed tax payments are
then given by T (w) + δT (w) = T (w) + δ (w − wtop), and the marginal tax rates are
80
perturbed by the constant δT ′ (w) = δ for w ≥ wtop. We obtain the following results.
Assuming that the utility function is logarithmic, we �nd that the crowding-out e�ect
is equal to
wco (θ, η) =1
(1− τ) Π(w (θ, η)− E [w (θ, η)]) ,
where Π = E [w (θ, η)] /wtop. Thus, ∂∂ηwco (θ, η) > 0, so that the crowding-out e�ect
raises the sensitivity of pre-tax earnings to performance shocks. Note that if the base-
line tax code is linear and the tax rates increase uniformly for the whole population,
the crowding-out e�ect is equal to zero. On the other hand, the sum of the standard
labor supply e�ect wex (θ, η) and the performance-pay e�ect wpp (θ, η) is increasing if
the reform reduces optimal e�ort, a (θ) < 0, which raises the performance sensitivity
of the contract. The larger the average uncompensated elasticity of labor supply, the
stronger the performance-pay e�ect relative to the crowding-out e�ect, the more an
increase in top tax rates reduces the dispersion of earnings at the top. Overall, the
e�ect of this reform on the earnings distribution is ambiguous.
Proportional Decrease in Retention Rates. Suppose that Assumption 2 holds.
Consider a tax reform that raises the parameter τ of the CRP tax schedule by a small
amount δ. The direction T of this tax reform is such that(w − 1− τ − δ
1− pw1−p
)−(w − 1− τ
1− pw1−p
)= δT (w) + o (δ) .
This easily implies that this tax reform T is de�ned by
T (w) =1
1− pw1−p, ∀w > 0. (64)
Note that this reform changes the retention rates r (w), in percentage terms, by
a negative constant: r(w)r(w)
= − T ′(w)1−T ′(w)
= − 1(1−τ)(1−p) . Therefore, it amounts to a
proportional reduction in retention rates.
Applying formula (14) to the tax reform (64) yields the following results. The two
81
components of crowding-out are equal to
T (w (θ, η))
r (w (θ, η))=
(v′ (w (θ, η)))−1
E[(v′ (w (θ, ·)))−1] E
[T (w (θ, ·))r (w (θ, ·))
]=
w (θ, η)
(1− τ) (1− p).
That is, in response to a tax reform that reduces all retention rates by the same
percent amount (and hence raises marginal tax rates), the �rm �rst increases all
workers' salaries in proportion to their initial earnings in order to counteract their
net income losses and keep their incentives unchanged. But the reform also reduces
rents and hence leads �rms to reduce salaries also in proportion to the workers' initial
earnings. Indeed, since the utility function is logarithmic, this ensures that all agents'
utilities decrease by the same amount. Therefore, we obtain that the total crowding
out of private insurance by the reform, wco (θ, η), is equal to zero.
Now, the standard labor supply e�ect wex (θ, η) and the performance-pay e�ect
wpp (θ, η) are both also equal to zero because, by equation (31), the optimal e�ort
level depends only on the rate of progressivity and not on the tax parameter τ .
Intuitively, e�ort remains constant because the utility function is logarithmic, so that
the substitution and income e�ects cancel out. As a result, the earnings schedule is
completely una�ected by the reform.
F Proofs of Section 5
In this section we derive the optimal progressivity formula in the quantitative model.
The e�ort and the expected utility (conditional on ability θ) of a normal worker are
an = (1− p)ε
1+ε
and
Un(θ) = log
(1− τ1− p
)+ (1− p) log(θan)− h(an).
The e�ort and the expected utility (conditional on ability θ) of a performance-pay
worker are
am =
[(1− p)
(1− 1
εψ2σ2
η,m
)] ε1+ε
82
and
Um(θ) = log
(1− τ1− p
)+ (1− p) log(θam)− h(am)−
ψ2σ2η,m
2,
where the endogenous pass-through ψ is equal a1/εm
1−p . The Utilitarian social welfare
function is
W = π
ˆUm(θ)dFm(θ) + (1− π)
ˆUn(θ)dFn(θ) (65)
= log
(1− τ1− p
)+ (1− p)
(πµθ,m + (1− π)µθ,n +
1
λθ
)+π
((1− p) log(am)− h(am)− (1− p)
ψ2σ2η,m
2
)+(1− π) ((1− p) log(an)− h(an))
where the distribution of productivities at performance-pay jobs Fm(θ) is Pareto-
lognormal with parameters (µθ,m, σ2θ , λθ), and the distribution of productivities at
normal jobs Fn(θ) is Pareto-lognormal with parameters (µθ,n, σ2θ , λθ). We derive the
optimality condition by di�erentiating W with respect to 1−p, applying the envelopetheorem and equating the derivative to zero:
0 =∂W
∂ (1− p)=
∂ log(
1−τ1−p
)∂1− p
+ πµθ,m + (1− π)µθ,n +1
λθ
+π log(am) + (1− π) log(an)− π(
1
2+ εψ,1−p
)ψ2σ2
η,m.
For each rate of progressivity, the other tax parameter τ is chosen to balance the
government budget subject to �xed government spending G. Therefore, the resource
constraint reads Y = C +G, where the aggregate output Y is given by
Y =λθ
λθ − 1
(πeµθ,m+
σ2θ2 am + (1− π)eµθ,n+
σ2θ2 an
)
83
and the aggregate consumption C is given by
C =1− τ1− p
λθλθ − (1− p)
×(πe(1−p)µθ,m+(1−p)2 σ
2θ2 e−p(1−p)
(ψση,m)2
2 a1−pm + (1− π)e(1−p)µθ,n+(1−p)2 σ
2θ2 a1−p
n
).
Using the resource constraint, we have
1− τ1− p
=1− τ1− p
Y
C
(1− G
Y
).
Plug in the expression for Y and C and di�erentiate with respect to 1− p to obtain
∂ log(
1−τ1−p
)∂ (1− p)
= − 1
λθ − (1− p)+
1
1− pεY,1−p
g
1− g+
1
1− pεY,1−p
−cm(µθ,m + (1− p)σ2
θ + εam,1−p + log(am))
−cm(
1− p− 1
2− pεψ,1−p − pεam,1−pεψ,am
)ψ2σ2
η,m
−cn(µθ,n + (1− p)σ2
θ + εan,1−p + log(an))
where εY,1−p ≡ ymεam,1−p + ynεan,1−p is the elasticity of the aggregate income with
respect to 1 − p, g = GY
is the share of government spending in output and cj is
the share of jobs of type j ∈ {m,n} in the aggregate consumption. Plugging this
expression into the optimality condition and rearranging yields the �nal optimality
condition(p+
g
1− g
)εY,1−p1− p
+ (ym − cm) (εam,1−p − εan,1−p) + πpεam,1−pεψ,amψ2σ2
η,m
=1
λθ
1− pλθ − (1− p)
+ (1− p)σ2θ + π(1− p)(1 + εψ,1−p)ψ
2σ2η,m
−(π − cm) (µθ,m + log (am)− µθ,n − log (an))
−(π − cm)
(1
2− p (1 + εψ,1−p + εam,1−pεψ,am)
)ψ2σ2
η,m
where yj is the share of jobs of type j ∈ {m,n} in the aggregate output. The above
formula collapses to the formulas with only performance pay jobs when π = 1 and the
standard progressivity formula when π = 0 (in both cases the blue terms disappear).
84
The �rst term is the standard deadweight loss from rising the progressivity rate ad-
justed by the government spending, evaluated using the elasticity of aggregate income
εY,1−p ≡ ymεam,1−p+ynεan,1−p. The second term is a correction to the deadweight loss
due to both di�erences in elasticities between job types and discrepancies between
income and consumption shares of job types. The intuition is as follows. Note that
performance-pay workers, who are more elastic, have lower consumption share than
income share when p > 0 due to higher wage-rate risk. Suppose we increase progres-
sivity. Performance-pay workers will reduce their e�ort, which decreases both their
income (negative e�ect on available resources) and their consumption (positive e�ect
on available resources). Since their income share is higher than their consumption
share, aggregating both e�ects across all performance-pay workers leads to a negative
e�ect on available resources. The opposite is true for normal workers: their con-
sumption share is greater than income share, which means that on aggregate, there
are more available resources due to their responses. However, since performance-
pay workers are more elastic than normal workers, the former e�ect dominates and
increasing progressivity leads to an additional deadweight loss.
The next four terms are standard: the performance-pay e�ect, the redistribution
gains due to the the Pareto tail and due to the normal variance of productivities, and
the gain from insuring endogenous wage-rate risk at the performance-pay jobs net of
the crowd-out.
The last two terms are novel. They are present because the consumption share
of performance pay workers cm is potentially di�erent from their population share π.
These terms correspond to various ways in which resources are redistributed between
job types. The �rst of the two terms stands for the gain from insuring the �job type�
risk, that is, the risk of having a performance-pay job vs. a normal job. Recall that
this risk is exogenous. This term contributes to higher progressivity whenever there
is a di�erence in mean consumption at the two job types.
The last term is related to the endogenous wage-rate risk. When taxes are pro-
gressive, higher wage-rate risk of performance-pay workers lowers their consumption
relative to normal workers. Consequently, changes in progressivity as well as endoge-
nous adjustments of the wage-rate risk will result in the transfers of resources across
job types via the government budget constraint. Suppose that the term in the big
brackets is positive, which is likely if p is not very high.29 Then, when progressiv-
29Plugging in the values of elasticities, the term in the brackets becomes 12 − p
εam,1−pε . Since
85
ity increases, performance pay workers end up contributing more to the government
budget. This implies an additional redistribution from the performance-pay workers
to the normal workers. This e�ect contributes to higher (lower) progressivity if the
consumption share of performance-pay workers is higher (lower) than their population
share.
G Dynamic Model
In this section we extend our results to a dynamic model of the labor market. In
our setting, individuals live for several periods and sign a long-term labor contract
with a �rm. We use the moral hazard model of Edmans, Gabaix, Sadzik, and San-
nikov (2012) who extended Edmans and Gabaix (2011) to the multi-period environ-
ment. Workers' earnings can depend in an arbitrary way on their history of output
realizations. The government levies a labor income tax in each period and has a
redistributive social welfare objective.
G.1 Environment
Individuals are indexed by their exogenous and constant labor productivity θ ∈ Θ.
They live for S ≥ 2 periods, have time-separable preferences over consumption ct
and e�ort at with discount factor β ∈ (0, 1). Throughout this section, we denote the
history of a random variable x up to time t ≤ S by xt ≡ {xs}1≤s≤t and let x0 = ∅.Flow output at time t is given by:
yt = θ (at + ηt) , (66)
where {ηt}1≤t≤S are independent and identically distributed random variables. Through-
out the analysis we assume that the utility of consumption is logarithmic with isoe-
lastic disutility of labor, productivity θ is lognormally distributed with mean µθ and
variance σ2θ , and the performance shocks ηt are normally distributed with mean 0
and variance σ2η. As in Section 1, we assume that the agent chooses period-t e�ort at
after observing the realization of ηt. Therefore, the agent's strategy can be a function
at (ηt) of her history (including the current-period) of performance shocks.
εam,1−p >ε
1+ε , this term is positive if p < 1+ε2 .
86
Firms discount future pro�ts at rate r. For simplicity we assume that β (1 + r) =
1. In each period t they observe the agent's productivity θ and her output history yt
up to that date, but not her e�ort levels at or performance shocks ηt. A labor contract
speci�es an e�ort level at (θ) in each period t, and an earnings function wt (θ, ηt) that
depends on the inferred history of performance shocks (given the recommended e�ort
levels) up to and including time t.
Finally, in each period, the government levies an income tax. We suppose that
the tax schedule has a constant and history-independent rate of progressivity p, so
that for all t ∈ {1, . . . , S}, the retention function in period t is given by
Rt (w) =1− τt1− p
w1−p.
The parameter τt ensures that the government balances its budget in each period.
Finally, we rule out private savings so that an agent with earnings wt in period t
consumes ct = Rt (wt).
G.2 Equilibrium Labor Contract
We start by setting up the contracting problem between the �rm and a worker with
productivity θ. The operator Et denotes the expectation over all future performance
shock realizations {ηs}t+1≤s≤S conditional on ex-ante productivity θ and output his-
tory ηt.
Firm's problem. The �rm's maximizes its expected pro�t
Π (θ) = max{at(θ),wt(θ,ηt)}1≤t≤S
E0
[S∑t=1
(1
1 + r
)t−1 (yt − wt
(θ, ηt
))], (67)
subject to the incentive constraint which requires that for any alternative e�ort strat-
egy {at (ηt)}1≤t≤S,
E1
[S∑t=1
βt−1(u(Rt
(wt(θ, ηt
)))− h
(at(ηt)))]
(68)
≤ E1
[S∑t=1
βt−1(u(Rt
(wt(θ, ηt
)))− h (at (θ))
)],
87
and the participation constraint:
E0
[S∑t=1
βt−1(u(R(wt(yt | θ
)))− h (at (θ))
)]≥ U (θ) , (69)
where yt is given by (66) and U (θ) is the reservation value of workers with productivity
θ.
Equilibrium. We assume that there is free entry of �rms, so that in equilibrium
pro�ts are equal to zero:
Π (θ) = 0. (70)
This equation pins down the workers' reservation value U (θ).
Optimal Contract. We characterize the equilibrium contract in two steps: we �rst
study its intertemporal, then its intratemporal, properties.
Lemma 3. The earnings process wt (θ, ηt) is a martingale. That is, expected period-t
earnings are equal to realized period-(t− 1) earnings,
Et−1
[wt(θ, ηt−1, ηt
)| ηt]
= wt−1
(θ, ηt−1
). (71)
Proof. See Appendix H.
This lemma characterizes the intertemporal properties of the optimal contract
between the �rm and the worker. It is well known that the solution to dynamic
contracting models under separable utility satis�es the Inverse Euler Equation (see,
e.g., Rogerson (1985); Golosov, Kocherlakota, and Tsyvinski (2003)). Intuitively, the
�rm incurs a convex cost of providing e�ort incentives � giving xt utils in period t
requires paying a before-tax salary R−1t (u−1 (xt)), where the cost function R
−1t ◦u−1 ≡
Ct is convex. As a result, the optimal contract smooths out the cost of providing
incentives over time, which requires Et [C ′t (xt+1)] = C ′t (xt). Under the assumptions
that the utility function u is logarithmic and the retention function Rt is CRP, this
equation can be rewritten as (71).
88
Proposition 5. Assume that e�ort is positive in each period, or that h′ (0) = 0.
De�ne the present value of e�ort by A ≡∑S
s=1
(1
1+r
)s−1as, and the sequences of
sensitivity and pass-through parameters {δt, ψt}1≤t≤S by
δt =1∑S−ts=0 β
s, and ψt = δt
h′ (at)
1− p.
The earnings schedule satis�es
log(wt(θ, ηt
))= log
(wt−1
(θ, ηt−1
))+ ψtηt −
1
2ψ2t σ
2η, (72)
where initial earnings are given by w0 ≡ δ1θA. The optimal period-t e�ort level at is
independent of θ and satis�es
at =
[(1− p)
(atδ1A− 1
δtεψt,atψ
2t σ
2η
)] ε1+ε
, (73)
where εψt,at = 1εis the elasticity of the pass-through parameter ψt with respect to e�ort
at. Expected utility is given by
U (θ) =S∑t=1
βt−1
[log (R (δ1θA))− h (at)−
1
2δtψ2t σ
2η
]. (74)
Proof. See Appendix H.
This proposition generalizes Corollary 1 to the dynamic setting and allows us
to characterize the intratemporal properties of the optimal compensation contract.
Equation (72) implies that earnings in each period t are a log-linear function of the
performance shock ηt in that period. The pass-through of performance shocks ηt to
log-earnings, ψt = ∂ logwt (θ, ·) /∂ηt, is increasing in the rate of progressivity p and
the optimal e�ort level at at time t. Note �nally that the pass-throughs have the same
form as in the static model � we therefore expect the insight that the performance-pay
e�ect counteracts and o�sets a large share of the crowding-out e�ect to carry over to
the dynamic environment.
Formally, up to the optimal value of e�ort, the pass-through ψS in the terminal
period S is the same as in the optimal static contract (see equation (30)) since δS = 1.
In earlier periods, on the other hand, the exposure to risk for a given e�ort level is
89
strictly smaller than in the static environment as the sensitivity parameters δt satisfy
δt < 1 for all t ≤ S − 1. To understand the intuition for this result, note that
equation (72) implies that an increase in the output realization yt � either due to
e�ort or to random shocks � boosts log-earnings in the current and future periods
equally. Indeed, since the agent is risk-averse it is e�cient to spread the rewards over
her entire horizon. In other words, a given increase in lifetime utility necessary to
elicit higher e�ort requires a higher increase in �ow utility if there are fewer remaining
periods over which to smooth these bene�ts. As a result, the sequence {δt}1≤t≤S is
strictly increasing and the degree of performance-pay gets stronger over time.
G.3 Optimal Tax Progressivity
We �nally characterize the optimal history-independent rate of progressivity p in the
dynamic environment. The government chooses p to maximize a utilitarian social
objective´
ΘU (θ) dF (θ) subject to period-by-period budget balance constraint that´
ΘRt (wt (θ, ηt)) dF (θ) ≥ 0.
Proposition 6. The optimal rate of progressivity is given by
p∗
(1− p∗)2 =σ2θ
εA,1−p + (1− p)∑S
s=1 βs−1 δ1
δsεψs,asεas,1−pψ
2sσ
2η
, (75)
where εA,1−p is the elasticity of the present discounted value of e�ort A with respect
to progressivity, and εψs,as = 1ε.
Proof. See Appendix H.
To compare the optimum rate of progressivity (75) to its static counterpart (43),
�rst consider the benchmark environment with exogenous wage risk. That is, the
planner observes ex-ante earnings heterogeneity due to productivity shocks θ, and
ex-post heterogeneity due to performance shocks ηt passed through to earnings. In
particular, it observes that the degree of performance-pay rises with age, as described
in Proposition 5. However, it mistakenly believes that wage rates, and hence ψt, are
exogenous. That is, it assumes that εψs,as = εas,1−p = 0 for all s ≥ 1. In this case,
the dynamic optimal tax formula (75) is identical to the static formula (43), except
that the relevant labor supply elasticity is now the elasticity of the present-value of
e�ort, εA,1−p.
90
Now consider the general model with ex-post earnings dispersion and endogenous
wage risk, captured by the non-zero elasticities εψs,as and εas,1−p. As in the static
model, the �scal externality and welfare e�ect induced by the crowding-out e�ect
cancel each other out. The adjustment to the optimal rate of progressivity is given
by the second term in the denominator of (75). Recall that this term accounts for
the negative �scal externality due to the performance-pay e�ect: a higher rate of
progressivity reduces e�ort in period s, hence reduces the dispersion of earnings which
in turn (by Jensen's inequality) negatively a�ects government revenue. This term
resembles the present value of the corresponding terms in the static model, with one
di�erence. Namely, the relevant discount factor is not βs−1 but βs−1 δ1δs. Since δs
is increasing over time, this implies that the �scal externalities caused by the future
performance-pay e�ects are discounted at a higher rate than the standard deadweight
losses from distorting e�ort.
H Proofs of Section G
Proof of Lemma 3. Starting from an incentive compatible allocation, consider the
following variations in retained wage/utility:
ut−1 = u(R(wt−1
(θ, ηt−1
)))− β∆
ut = u(R(wt(θ, ηt−1, ηt
)))+ ∆
and us = u (R (ws (θ, ηs))) for all s /∈ {t− 1, t}. These perturbations preserve utilityand incentive compatibility since for all at−1
ut−1 − h (at−1) + βE[ut|ηt−1
]= u
(R(wt−1
(θ, ηt−1
)))− h (at−1) + βE
[u(R(wt(θ, ηt−1, ηt
)))|ηt−1
].
The optimal allocation must be una�ected by such deviations, so that
0 = arg min∆
E
[S∑s=1
(1 + r)−t (ys −W (us))
]
91
Where W = (u ◦R)−1. The associated �rst-order condition evaluated at ∆ = 0 reads
W ′ (u (R (wt−1
(θ, ηt−1
))))=
1
β (1 + r)E[W ′ (u ((R (wt (θ, ηt−1, ηt
))))|yt−1
]that is,
E[
1
v′ (wt (θ, ηt−1, ηt))| yt−1
]= β (1 + r)
1
v′ (wt−1 (θ, ηt−1)).
The inverse Euler equation (see Golosov, Kocherlakota, and Tsyvinski (2003)) holds in
our setting. With log utility and a CRP tax schedule, this equation can be rewritten
as
(1− p)E[wt(θ, ηt−1, ηt
)| yt−1
]= (1− p) β (1 + r)wt−1
(θ, ηt−1
),
which leads to equation (71) as β (1 + r) = 1.
Proof of Proposition 5. We provide a heuristic proof of this proposition, and the
formal argument follows the same steps as in Edmans, Gabaix, Sadzik, and Sannikov
(2012). Assume that a unique level of e�ort is implemented at each time t, that these
e�ort levels are independent of previous output noise, and that local incentive con-
straints are su�cient conditions. Consider �rst the incentive compatibility constraint
which ensures that the worker does not wish to choose a di�erent level of e�ort than
the one recommended by the �rm. Consider a local deviation in e�ort at after history
(ηt−1, ηt). The e�ect of such a deviation on the worker's lifetime utility U should be
zero,
Et−1
[∂U
∂yt
∂yt∂at
+∂U
∂at
]= 0.
Since ∂yt∂at
= θ, we obtain
Et−1
[∂U
∂yt
]= −1
θ
∂U
∂at(76)
Applying incentive compatibility for e�ort in the �nal period we obtain:
r(wS(θ, ηS
))u′(R(wS(θ, ηS
))) ∂w (θ, ηS−1, ηS)
∂ηS= h′ (aS (θ)) .
92
Fixing ηS−1 and integrating this incentive constraint over ηS (meaning over realiza-
tions of ηS given a (θ)) leads to
u(R(wS(θ, ηS
)))= h′ (aS (θ)) ηS + zS−1
(ηS−1
)for some function of past output zS−1
(ηS−1
). This implies in particular that
∂u(R(wS(θ, ηS
)))∂ηS−1
=∂zS−1
(ηS−1
)∂ηS−1
.
Analogously, the incentive constraint for e�ort in the second to last period reads
r(wS−1
(θ, ηS−1
))u′(R(wS−1
(θ, ηS−1
))) ∂w (θ, ηS−1)
∂ηS−1
+βr(wS(θ, ηS
))u′(R(wS(θ, ηS
))) ∂wS (θ, ηS)∂ηS−1
= h′ (aS−1 (θ)) .
Integrating the previous expression over ηS−1 and using the previous equation implies
u(R(wS−1
(θ, ηS−1
)))+ βzS−1
(ηS−1
)= h′ (aS−1 (θ)) ηS−1 + zS−2
(ηS−2
).
We now want to show that zS−1(ηS−1
)is a linear function of ηS−1. Since the utility
function is logarithmic and the tax schedule is CRP, we obtain
(1− p) log(wS(θ, ηS
))= h′ (aS (θ)) ηS + zS−1
(ηS−1
)− log
(1− τS1− p
)and
(1− p) log(wS−1
(θ, ηS−1
))= h′ (aS−1 (θ)) ηS−1 − βzS−1
(ηS−1
)+ zS−2
(ηS−2
)− log
(1− τS−1
1− p
).
Now recall that the inverse Euler equation reads
ES−1
[wS(θ, ηS
)]= wS−1
(θ, ηS−1
).
93
Using the previous expressions, this equality can be rewritten as
ES−1
[e
11−ph
′(aS(θ))ηS]e
11−p z
S−1(ηS−1)
=
(1− τS
1− τS−1
) 11−p
e1
1−ph′(aS−1(θ))ηS−1e−β
11−p z
S−1(ηS−1)+ 11−p z
S−2(ηS−2).
This in turn implies
(1 + β) zS−1(ηS−1
)= h′ (aS−1 (θ)) ηS−1 + zS−2
(ηS−2
)− 1
2
(h′ (aS (θ)))2
1− pσ2η +
1
1− plog
(1− τS
1− τS−1
).
Therefore, zS−1(ηS−1
), and in turn u
(R(wS−1
(θ, ηS−1
))), is linear in ηS−1. More-
over, the last-period utility is linear in both ηS and ηS−1. By induction, we can show
that the utility in each period is a linear function of the performance shock in every
past period. Now suppose for simplicity of exposition that S = 2, β = 1, r = 0,
θ = 1, so that δ1 = 12and δ2 = 1. From the arguments above we guess a log-linear
speci�cation for earnings:
logw1 = ψ1η1 + k1
logw2 = ψ21η1 + ψ2η2 + k1 + k2.
The martingale property (71) requires w1 = E1 [w2], so that for all η1, eψ1η1+k1 =
eψ21η1+k1E[eψ2η2+k2 | η1
]. This requires ψ1 = ψ21 and e−k2 = E
[eψ2η2 | η1
]. Now, the
total utility of the agent is given by
U = (1− p) [2ψ1η1 + ψ2y2 + 2k1 + k2]
−h (a1)− h (a2) + log
(1− τ1
1− p
)+ log
(1− τ2
1− p
).
The incentive constraint for e�ort 76 implies
ψ1 =h′ (a1)
2 (1− p), and ψ2 =
h′ (a2)
1− p
94
and therefore
k2 = −h′ (a2)
1− p−σ2η
2
(h′ (a2)
1− p
)2
.
Replacing in the expression for log earnings leads to
logw1 = k′1 +h′ (a1)
2 (1− p)η1 −
σ2η
2
(h′ (a1)
2 (1− p)
)2
and
logw2 = k′1 +h′ (a1)
2 (1− p)η1 −
σ2η
2
(h′ (a1)
2 (1− p)
)2
+
(h′ (a2)
1− p
)η2 −
σ2η
2
(h′ (a2)
1− p
)2
,
where k′1 ≡ k1+ψ1a1−σ2η
2ψ2
1. This constant is pinned down by the zero pro�t condition
E [w1 + w2] = a1 + a2, that is, 2ek′1 = a1 + a2. This implies
k′1 = log
(a1 + a2
2
),
which concludes the proof of equation (72). Equations (73) and (74) are derived in
the next proof.
Proof of Proposition 6. Recall that the earnings schedule is given by
logw1 = log (δ1θA) + ψ1η1 −ψ2
1σ2η
2,
logwt = logwt−1 + ψtηt −ψ2t σ
2η
2.
The expected utility of workers with productivity θ is therefore equal to
U (θ) = (1− p)
[1
δ1
log (δ1θA)−S∑s=1
βs−1 1
δs
ψ2sσ
2η
2
]
−S∑s=1
βs−1h (as) +S∑s=1
βs−1 log
(1− τs1− p
),
95
from which (74) easily follows. Thus, utilitarian social welfare is
ˆΘ
U (θ) dF (θ) = (1− p)
[1
δ1
log (δ1A) +1
δ1
µθ −S∑s=1
βs−1 1
δs
ψ2sσ
2η
2
]
−S∑s=1
βs−1h (as) +S∑s=1
βs−1 log
(1− τs1− p
).
The �rst-order condition for optimal e�ort reads
0 =∂U (θ)
∂at= (1− p)
[1
δ1
1
A
∂A
∂at− βt−1 1
δtψtσ
2η
∂ψt∂at
]− βt−1h′ (at)
= (1− p)
[1
δ1
(1
1+r
)t−1at
A− βt−1 1
δtψ2t σ
2ηεψt,at
]1
at− βt−1h′ (at) ,
which easily implies equation (73). Now, the expected present value of pre-tax and
post-tax earnings in period t are given by E [wt] = δ1θA and
E[w1−pt
]= (δ1θA)1−p E
[e∑ts=1(1−p)ψsηs
]e−
∑ts=1(1−p)
ψ2sσ
2η
2 = (δ1θA)1−p e−p(1−p)∑ts=1
ψ2sσ
2η
2
respectively, so that expected government revenue in period t is equal to
ˆΘ
E [T (wt)] dF (θ)
= δ1Aeµθ+
σ2θ2 − 1− τt
1− p(δ1A)1−p e−p(1−p)
∑ts=1
ψ2sσ
2η
2 e(1−p)µθ+(1−p)2 σ2θ2 .
Imposing period-by-period budget balance therefore requires
1− τt1− p
=(δ1A)p eµθ+
σ2θ2
e−p(1−p)(∑ts=1 ψ
2s)
σ2η2 e(1−p)µθ+(1−p)2
σ2θ2
.
Substituting this expression into the social welfare function´
ΘU (θ) dF (θ) implies
96
that social welfare is equal to
1
δ1
[log (δ1A) + µθ +
(1− (1− p)2) σ2
θ
2
]−
S∑s=1
βs−1h (as)
+p (1− p)S∑s=1
βs−1
(s∑i=1
ψ2i σ
2η
2
)− (1− p)
S∑s=1
βs−1 1
δs
ψ2sσ
2η
2
=1
δ1
[log (δ1A) + µθ +
(1− (1− p)2) σ2
θ
2
]−
S∑s=1
βs−1h (as)
− (1− p)2S∑s=1
βs−1 1
δs
ψ2sσ
2η
2.
We can now maximize this expression with respect to 1− p to get
S∑s=1
[1
δ1
1
A
(1
1 + r
)s−1
− βs−1h′ (as)
]∂as
∂ (1− p)
− (1− p)S∑s=1
βs−1 1
δsεψs,asεas,1−pψ
2sσ
2η
= (1− p)
[1
δ1
σ2θ +
S∑s=1
βs−1 1
δsψ2sσ
2η
]+ (1− p)
S∑s=1
βs−1 1
δsεψs,1−pψ
2sσ
2η.
Using the �rst-order condition for e�ort derived above to simplify the left hand side
of this expression implies
p
1− p1
δ1A
S∑s=1
(1
1 + r
)s−1
asεas,1−p + pS∑s=1
βs−1 1
δsεψs,asεas,1−pψ
2sσ
2η
= (1− p)
[1
δ1
σ2θ +
S∑s=1
βs−1 1
δs(1 + εψs,1−p)ψ
2sσ
2η
].
But the elasticity of the present discounted value of e�ort is equal to
εA,1−p ≡(1− p)A
∂∑S
s=1
(1
1+r
)s−1as
∂ (1− p)=
S∑s=1
(1
1 + r
)s−1asAεas,1−p.
Moreover, we have 1 + εψs,1−p = 0. Substituting these two expressions into the
97
previous equation and rearranging terms leads to
p
(1− p)2
[1
δ1
εA,1−p + (1− p)S∑s=1
βs−1 1
δsεψs,asεas,1−pψ
2sσ
2η
]=
1
δ1
σ2θ .
This concludes the proof of equation (75).
98