multidimensional sorting under random search · dimensional sorting in a frictionless assignment...
TRANSCRIPT
Multidimensional Sorting
Under Random Search∗
Ilse Lindenlaub† Fabien Postel-Vinay‡
July 2017
Abstract
We analyze sorting in a standard market environment with search frictions and randomsearch, where both workers and jobs have multi-dimensional characteristics. We offer def-initions of multi-dimensional positive and negative assortative matching in this frictionalenvironment. We say that matching is positive assortative in dimension (j, k) if workerswith higher endowments in skill dimension k are matched to a distribution of jobs withhigher values of job attribute j, in the sense of first-order stochastic dominance, and simi-larly for negative assortative matching. We then provide conditions on the primitives of theeconomy (technology and distributions) under which sorting arises in equilibrium. An essen-tial condition for positive sorting is a single-crossing property on the technology, althoughin general further conditions on type distributions are needed. Guided by our theoreticalframework, we develop an empirical test of the standard assumption that heterogeneity inthe data is one-dimensional. We conduct simulation exercises (i) to show that the test cor-rectly reveals misspecification and (ii) to quantify the errors in assessing sorting, mismatchand policy caused by the restriction to one-dimensional heterogeneity when our test indicatesthat said restriction is not valid.
Keywords. Multidimensional Heterogeneity, Random Search, Sorting, Assortative Matching
∗We would like to thank Hector Chade and Jan Eeckhout for useful comments, as well as Carlos Carrillo-Tudela and Jim Albrecht for their insightful discussions of this paper at the 2016 Cowles Foundation MacroConference and the 2016 Konstanz Search and Matching Conference. We further thank seminar audiences atEssex, Yale, the Chicago Fed, Einaudi, Konstanz, the NBER Summer Institute, MIT, Simon Fraser University,McGill, EUI, and at the SED 2017 for their comments. All remaining errors are ours.
†Yale University, Address: Department of Economics, Yale University, 28 Hillhouse Avenue, New Haven,06520, US. Email: [email protected]
‡UCL and IFS. Address: Department of Economics, University College London, Drayton House, 30 GordonStreet, London WC1H 0AX, UK. Email: [email protected]
1 Introduction
Labor market frictions impede the efficient assignment of heterogeneous workers to heterogeneous
jobs and cause mismatch. Measuring mismatch in frictional markets — its nature, its extent,
and its implications for efficiency, inequality and welfare — is the focus of a growing body of
literature in structural labor economics. With very few exceptions, that literature works under
the assumption that job and worker heterogeneity can both be captured by scalar indexes, i.e.
heterogeneity is one-dimensional.1 While this restriction to one-dimensional heterogeneity is
a natural starting point and convenient for modeling, it is at odds with the fact that typical
data sets describe both workers and jobs in terms of many different productive attributes (e.g.
cognitive skills, manual skills, or psychometric test scores for workers, and numerous task-specific
skill requirements for jobs).2
If the productive characteristics of workers and firms are truly multi-dimensional, what are
the features of the data, relating to sorting and mismatch, that we miss by modeling them as
one-dimensional scalars? This is the question we ask in this paper.
We proceed in two steps. First, we develop a theory of the assignment of workers to jobs
when both workers and jobs differ along several dimensions in a random search environment.
We propose a notion of assortative matching (or sorting) in this setting and analyze how the
economy’s primitives shape equilibrium sorting. Second, we use our theory to develop an empir-
ical test to assess whether the assumption of one-dimensional (1D) heterogeneity in the random
search framework is valid. We show on simulated data that the test accurately reveals misspec-
ification of the one-dimensional model. We then highlight some consequences of approximating
job and worker characteristics by 1D summary indexes when it is not justified, i.e. when the test
indicates that the data is multi-dimensional. We show that using the misspecified 1D model to
measure sorting and mismatch (as opposed to the multi-dimensional one) leads to quantitative
and qualitative errors as well as misguided policy recommendations.
We now describe the two steps of our analysis — theory and application — in more detail.
We begin with developing a theoretical framework of multi-dimensional sorting under random
search. Our environment is a standard random search model, except that workers and firms
(also referred to as jobs) are endowed with vectors of productive attributes, x = (x1, · · · , xX)
1A recent exception is Lise and Postel-Vinay (2015), who focus on the accumulation of skills in various dimen-sions within a model that can otherwise be seen as a special case of the theoretical framework we develop here.
2Beyond search models, a growing applied literature takes explicit account of these multiple dimensions ofproductive heterogeneity. Recent examples include Yamaguchi (2012), Sanders (2012), and Guvenen et al. (2015).
1
for workers and y = (y1, · · · , yY ) for jobs. Employed and unemployed workers receive job offers
drawn at random from an exogenous sampling distribution of job attributes. Employers face no
capacity constraint. Utility is transferable: workers and firms are joint surplus maximizers. The
fact that agents base their match acceptance decisions on a scalar value (i.e. the match surplus
that summarizes all underlying multi-dimensional heterogeneity) makes our multi-dimensional
problem tractable.
In order to interpret the assignment patterns that arise in equilibrium, we define a notion
of positive assortative matching (PAM) and negative assortative matching (NAM) in this en-
vironment. This notion is based on first-order stochastic dominance (FOSD) ordering of the
marginal distributions of job attributes across workers with different skills: if workers with a
higher endowment of some skill xk are matched to jobs with ‘better’ (in the first-order stochastic
dominance sense) attributes yj , then PAM occurs in dimension (xk, yj). For instance, if workers
with more cognitive skills are matched to a distribution of jobs whose cognitive contents stochas-
tically dominates that of less cognitively skilled workers, then we call this sorting pattern ‘PAM
between cognitive skills and cognitive job contents’. It is important to note that sorting is thus
defined dimension by dimension, meaning that PAM can arise in one particular dimension (here
in the ‘cognitive’ dimension) while NAM occurs in another.
Using this definition of sorting, we present three main sets of theoretical results. The first one
is about the sign of sorting. We provide conditions on the economy’s primitives under which pos-
itive or negative sorting arises in equilibrium. For ease of exposition and clarity of interpretation,
we focus in the main body of the paper on a baseline model: it features two-dimensional het-
erogeneity on the job side, bilinear technology, wage setting by sequential auctions, and positive
surplus of any possible match — all assumptions that we can and do relax. In this baseline case,
we find that matching in, say, dimension (x1, y1) is positive assortative if and only if the technol-
ogy satisfies a single crossing condition implying that the complementarity between worker skill
x1 and job attribute y1 dominates the complementarity in the competing dimension (x1, y2).
This condition is distribution-free: it only involves restrictions on the production technology.
We generalize this analysis of the sign of sorting to cases where (1) not all possible matches
generate positive surplus (implying that there also is sorting on the unemployment-to-employment
margin), (2) heterogeneity on the job side is of dimension higher than two, and (3) the technol-
ogy is monotone in at least one job attribute but not necessarily bilinear or it is non-monotone
but separable. We show that in these more general environments the conditions for sorting not
2
only involve single-crossing of the technology but also interactions between the technology and
sampling distribution of jobs. We also show that these results do not hinge on the assumed se-
quential auctions wage setting but hold for several other commonly used wage setting protocols
like Nash bargaining, sequential auctions with worker bargaining power, and wage posting.
Second, our model predicts sorting based on specialization rather than on absolute advantage.
This implies that uniformly more skilled workers do not sort into jobs with uniformly higher
skill requirements. Rather, they sort into jobs with a higher requirement for the skill in which
they are relatively strong, possibly at the cost of a lower requirement for the other skills.
Our third set of results highlights that sorting generally cannot be positive between all skill
and job dimensions. Instead, there are sorting trade-offs. We provide conditions under which
PAM arises in all dimensions except the one that is characterized by the weakest complemen-
tarities in production, where forces push towards NAM.
An important insight from our analysis is that multi-dimensional heterogeneity per se is a
cause of sorting. That is, in the 1D version of this model, where match surplus depends on
scalar worker type x and job type y and is increasing in y, workers all rank jobs in the same
way (there is a single job ladder), regardless of their skill x: their common strategy is to accept
any job with a higher y than their current one, which rules out sorting. By contrast, in the
multi-dimensional world where workers have skill bundles, what matters is not just to match
with a productive job in any component of y. Rather, a worker wants to match with a job that
puts much weight on the skill in which he is strong. Thus, workers with different skill bundles
(and hence different strengths) accept and reject different types of jobs, they climb different job
ladders, which is why sorting arises.
In the second part of our analysis, we show that our theory of multi-dimensional sorting has
important implications for applied work. Based on the theoretical insight that multi-dimensional
job ladders are heterogenous across workers, we design a test of the assumption that job and
worker heterogeneity can be captured by scalar indices (i.e. the single index assumption). If
the data is well-approximated by 1D worker and job types, any two workers with the same
(scalar) type should face the same job ladder. Conversely, if the single index assumption is
not justified, approximating different multi-dimensional worker types by the same 1D worker
type will produce job ladder heterogeneity within 1D worker types, revealing misspecification.
To assess the accuracy of this test, we simulate data from several two-dimensional models that
comply with our theory and fit misspecified 1D models to those data. We show that the test
3
correctly reveals misspecification of the 1D model when the data is multi-dimensional.
We then use our simulations to assess the potential errors in the analysis of sorting and
mismatch caused by imposing the single-index assumption when it is not justified (i.e. when our
test indicates that this restriction is violated). We find that the misspecified model is largely
uninformative on true sorting patterns (it predicts sorting when the true data features none and
vice versa). Perhaps most importantly, a misspecified 1D model suggests a first-best allocation
that is very different from the true first-best under multi-dimensional types. Implementing the
first-best allocation suggested by the misspecified 1D model can cause sizable welfare losses.
While much is known about sorting under one-dimensional heterogeneity without frictions
(Becker, 1973; Legros and Newman, 2007) and one-dimensional heterogeneity with frictions
(Shimer and Smith, 2000; Smith, 2006; Eeckhout and Kircher, 2010), little is known about
sorting on multi -dimensional types. One exception is Lindenlaub (2017) who studies multi-
dimensional sorting in a frictionless assignment game.3 To the best of our knowledge, this
paper is the first to develop a theory of multi -dimensional sorting under random search —
an environment of great importance for applied work since it carries a well-defined notion of
mismatch and allows for policy analysis.
Our work not only shifts the focus to multi-dimensional heterogeneity but also differs in
another important way from that theoretical literature on sorting. All of the aforementioned
papers analyze assignment problems, meaning that agents on either side of the market can be
matched with at most one agent from the other side (matching is one to one). While this
assumption is certainly appropriate in partnership models of the marriage market, it is much
less common in analyses of the labor market, where a firm usually employs many workers. In
this paper we assume, as is almost invariably done in the applied structural search literature,
that firms operate technologies with constant returns to scale and face no capacity constraints.4
The no-capacity-constraint assumption greatly increases the tractability of structural search
models.5 Its cost, though, is that it tends to rule out sorting when job and worker heterogeneity3Our restriction on the technology to obtain sorting is most closely related (but not equivalent) to that
needed in the multi-dimensional, frictionless assignment problem by Lindenlaub (2017), who also has to disciplinecomplementarities across tasks to ensure PAM within tasks. But compared to Lindenlaub, the introduction ofrandom search and no-capacity constraint changes the technical nature of the problem (for instance, we cannotrely on optimal transport theory here) as well as the insights (in our framework with frictions, we can analyzesorting on the UE and EE margin, heterogenous job ladders, mismatch and policy).
4Examples of widely-used models without firm capacity constraint are Burdett and Mortensen (1998) andPostel-Vinay and Robin (2002).
5For instance, it is instrumental in circumventing a complex existence proof involving a fixed point problem inthe distribution of the unmatched agents that is common in one-to-one assignment models under search frictions(see Shimer and Smith, 2000).
4
are one-dimensional. As discussed above, if technology is monotone in firm type, workers share a
common ranking of firms and all seek to climb a single, economy-wide job ladder. The lack of ca-
pacity constraint on the firm side then implies that all workers match with the same distribution
of jobs in equilibrium. This result is independent of the complementarities in production.
Getting the standard structural search framework without firm capacity constraint to pro-
duce a meaningful notion of equilibrium sorting is not straightforward.6 In this paper, we extend
that framework by allowing for multi-dimensional heterogeneity of jobs and workers. This change
drastically alters the model’s predictions. Multi-dimensional skill bundles cause different work-
ers to rank jobs differently and face different job ladders, which is why sorting arises. Our model
and definition of multi-dimensional sorting can guide the measurement of sorting in the data.
It can be used to study efficiency losses due to mismatch or the role of sorting and mismatch in
wage inequality — topics that are of great importance from an applied point of view.
The rest of the paper is organized as follows: Section 2 illustrates our main theoretical
insights via a simple example. Section 3 introduces our model. Section 4 provides a definition
of sorting in multiple dimensions under random search. Section 5 contains our main results
on the sign of sorting, established within our baseline model with bilinear technology and two-
dimensional job attributes. Generalizations are discussed in Section 6 (and presented formally
in Appendix A). Section 7 investigates sorting based on absolute advantage vs specialization
and the interdependence of sorting patterns across different heterogeneity dimensions. Section 8
contains our application, including our test of the single index restriction. Section 9 concludes.
2 An Illustrative Example
We begin by illustrating our main theoretical insights using a simple example, the foundations
of which will be explored later as we introduce our full model. Consider a labor market where
workers are characterized by a two-dimensional skill vector x = (x1, x2) ∈ R2++ (e.g. “cognitive”
and “manual” skills). Workers can either be employed or unemployed. Either way, they face
search frictions in that they sample job offers randomly and sequentially. Jobs are also charac-
terized by a two-dimensional vector of attributes y = (y1, y2) ∈ R2++ describing their contents
6Bagger and Lentz (2016), who build on Lentz (2010), include endogenous search intensity into an otherwisestandard 1D sequential auction model. Under complementarities in production, high worker types have moreto gain from being with high firm types, hence search more intensively, and therefore end up in better firms inequilibrium. Thus, their model features sorting due to endogenous search intensity. Another way to introducesorting is to abandon the assumption that technology is monotone in firm type.
5
in terms of skill dimensions 1 and 2 (e.g. “cognitive” and “manual” skill requirements). A match
between a worker with skills x and a job with attributes y generates a surplus σ(x,y).
We also assume that jobs and workers are joint surplus maximizers. Hence, a meeting
between a type-x unemployed worker and a type-y job will result in a match if and only if
σ(x,y) ≥ 0. And a meeting between a type-x worker, employed in job y, and an alternative
type-y′ job will result in the worker accepting the type-y′ job if and only if σ(x,y′) > σ(x,y).
For illustration, we assume that the match surplus function is bilinear:
σ(x,y) = 2x1y1 + x2y2
(It will become clear in our baseline model below under which conditions it suffices to focus
on this flow surplus σ, which is a primitive.) In this example, complementarities in production
within task dimension 1 and within dimension 2 are positive while between-task complemen-
tarities are zero. This implies that complementarities are larger in dimension (x1, y1) than in
(x1, y2), ensuring that the single crossing condition
∂
∂x1
[∂σ/∂y1∂σ/∂y2
]> 0
holds. Note that, in this example, σ(x,y) ≥ 0 for all pairs (x,y), implying that all matches out
of unemployment will be accepted.
Consider two unemployed workers, one with skills (x1, x2) = (1, 1) and the other one
with (x1, x2) = (2, 1), so the second worker is relatively specialized in skill x1. Assume for
illustration that both workers receive the same sequence of job offers over time, (y1, y2) =
(2, 2), (3, 1), (3.4, 0), implying the following acceptance/rejection decisions:
Offers: y = (2, 2) −→ y = (3, 1) −→ y = (3.4, 0)
Worker x = (1, 1): accept −→ accept −→ reject
Worker x = (2, 1): accept −→ accept −→ accept
The last job offer with high y1 but very low y2 is only accepted by the worker with higher x1
since for him the surplus comparison yields σ ((2, 1), (3.4, 0)) = 13.6 > 13 = σ ((2, 1), (3, 1))
while for the worker with lower x1, σ ((1, 1), (3.4, 0)) = 6.8 < 7 = σ ((1, 1), (3, 1)).
This simple example illustrates our main theoretical insights in an intuitive way. Even
though both workers are inclined to accept jobs with higher y1 at the expense of lower y2 —
6
because complementarities in dimension (x1, y1) are stronger — for workers with higher x1 this
sorting trade-off is more pronounced: they are willing to accept jobs with particularly low
y2 in exchange for a higher y1. This is why in equilibrium, for given x2, higher-x1 workers
will be matched to a distribution of jobs that is ‘better’ (in the FOSD sense) in terms of y1,
compared to workers with lower x1. But, in turn, higher-x1 workers will be matched to jobs
with ‘worse’ y2 (again in the FOSD-sense) relative to workers with lower x1. According to our
definition of PAM, there is PAM in the dimension (x1, y1) (which is the dimension of relatively
strong complementarities in production), while there is NAM in dimension (x1, y2) (which is the
dimension of relatively weak complementarities). These sorting patterns arise because workers
with different skill bundles have different ‘specializations’ and thus rank jobs in different ways.
As a consequence, the two workers in this example climb different job ladders over time.
It is important to note that if there were no differences in the relative complementarities
across dimensions, then both workers in our example would share the same ranking of jobs. To
illustrate this, change the technology to σ(x,y) = 2x1y1 + 2x1y2 + x2y1 + x2y2, which implies
that ∂[∂σ/∂y1∂σ/∂y2
]/∂x1 = 0. One can easily check that both workers would reject the last job offer
y = (3.4, 0) in this case. Independent of their different degrees of specialization, they would
both climb the same job ladder, and no sorting would arise. The labor market would be similar
to one with 1D heterogeneity, where workers have a single skill x and jobs a single attribute y.
In such a 1D world, if the technology is monotone in y, workers share the same ranking of jobs
and climb the same job ladder over time (as is the case, e.g., in Postel-Vinay and Robin, 2002).
3 The Model
3.1 The Environment.
Time t ∈ R+ is continuous. The economy is populated by infinitely lived workers and firms.
There is a fixed unit mass of workers, each characterized by a time-invariant skill bundle x =
(x1, · · · , xX) ∈ X =×Xk=1[xk, xk], where X denotes the number of different skill dimensions.7
We normalize the lower support of worker skills to xk = 0 and allow xk ∈ R+ ∪ +∞. Skills
are distributed with cdf L and strictly positive density ℓ.8 Firms can be thought of as single7For early influential work on the importance of workers’ characteristics being bundled in the labor market
context, see Heckman and Scheinkman (1987).8We adopt the following notational conventions throughout the paper: we denote all CDFs using uppercase
letters (e.g. L), densities by the associated lowercase letter (e.g. ℓ), and survivor functions using bars over CDFs(e.g. L = 1−L). Also, we will state a function is increasing/decreasing or positive/negative if this is the case inthe weak sense. Strict properties will be mentioned explicitly.
7
jobs (possibly vacant), or as collections of independent, perfectly substitutable jobs. Jobs are
characterized by a vector of time-invariant productive attributes, or “skill requirements” y =
(y1, · · · , yY ) ∈ Y =×Yj=1[yj , yj ], where Y denotes the number of different job attributes and
where yj∈ R+ and yj ∈ R+ ∪ +∞.9
Workers search for jobs at random, both on and off the job. They can be matched to a job or
be unemployed. If matched, they lose their job at Poisson rate δ, and sample alternate job offers
drawn from an exogenous sampling distribution distribution Γ at rate λ1. We assume Γ to have
a strictly positive density γ, which is twice continuously differentiable. Unemployed workers
sample job offers from the same sampling distribution at rate λ0. The output flow in a match
between a worker with skills x and a job with attributes y is p(x,y), where p : RX×RY −→ R.10
We denote the income flow of an unemployed worker with skill x by b(x).
We stress that there is no capacity constraint on the firm side (firms are happy to hire any
worker with whom they generate positive surplus) and matched jobs do not search for other
workers. As such, this set-up is really a (partial equilibrium) model of the labor market rather
than one of symmetric, one-to-one matching of the marriage market.
3.2 Rent Sharing and Value Functions
Workers and firms are risk-neutral and have equal time discounting rates ρ > 0. Under those
assumptions, the total present discounted value of a type-(x,y) match is independent of the
way in which it is shared, and only depends on match attributes (x,y). We denote this value
by P (x,y). We further denote the value of unemployment by U(x), and the worker’s value of
being employed under his current wage contract by W , where W ≥ U(x) (otherwise the worker
would quit into unemployment), and W ≤ P (x,y) (otherwise the firm would fire the worker).
Assuming that the employer’s value of a job vacancy is zero, the total surplus generated by a
type-(x,y) match is P (x,y)− U(x).
We assume in the main text that wage contracts are set as in the sequential auction model
without worker bargaining power of Postel-Vinay and Robin (2002). This is mainly to simplify
exposition. We show in Appendix B that most of our results extend to other common wage set-
ting rules, as Nash bargaining (Mortensen and Pissarides, 1994; Moscarini, 2001), wage/contract9The restriction to x and y having positive elements is not necessary but simplifies the economic interpreta-
tions. Also, we can relax that X is a product of intervals but assume it for symmetry with job attributes.10We assume that the production function is defined over the entire space RX ×RY , not just the set X ×Y of
observed (x,y). This is to streamline some proofs and can be relaxed. Details are available upon request.
8
posting (Burdett and Mortensen, 1998; Moscarini and Postel-Vinay, 2013), or sequential auctions
with worker bargaining power (Cahuc, Postel-Vinay and Robin, 2006).
In the sequential auction model, firms offer take-it-or-leave-it wage contracts to workers.
Wage contracts are long-term contracts specifying a fixed wage that can be renegotiated by
mutual agreement only. In particular, when an employed worker receives an outside offer, the
current and outside employers Bertrand-compete for the worker. Consider a type-x worker
employed at a type-y firm and receiving an outside offer from a firm of type y′. Bertrand
competition between the type-y and type-y′ employers results in the worker matching with the
employer where the total match value is higher, while extracting the full surplus from the lower-
surplus match. This implies that he stays in his initial job if P (x,y) ≥ P (x,y′), moves to the
type-y′ job otherwise, and ends up with a new wage contract worth W ′ = min P (x,y), P (x,y′)
(provided that W ′ exceeds the value of the worker’s initial contract, W , as otherwise the worker
would not have initiated the contract renegotiation in the first place).
It follows that the total value of a type-(x,y) match, P (x,y), solves the equation:
ρP (x,y) = p(x,y) + δ [U(x)− P (x,y)] .
The annuity value of the match, ρP (x,y), equals the output flow p(x,y) plus the expected
capital loss δ[U(x)− P (x,y)] of the firm-worker pair from job destruction.11
Given that U(x) is independent of firm type, the optimal mobility choices of workers hinge on
the comparison of match surplus P (x,y)−U(x) across jobs. It solves (ρ+δ) [P (x,y)− U(x)] =
p(x,y)− ρU(x). In what follows, we will mostly reason in terms of the match flow surplus:
σ(x,y) = p(x,y)− ρU(x).
A worker x employed in a job y accepts an offer from a job y′ if and only if P (x,y′)− U(x) >
P (x,y) − U(x). This is equivalent to σ(x,y′) > σ(x,y), so that the optimal strategy to ac-
cept/reject a job is entirely based on the comparison of flow surpluses. Likewise, an unemployed
worker x accepts a job of type y if σ(x,y) ≥ 0. In turn, the optimal strategy of firm y is to
accept any worker x if σ(x,y) ≥ 0. It follows that the dynamic optimization problem of the11Note that, under the sequential auction model, the realization the “other” risk faced by the firm-worker pair,
namely the receipt of an outside job offer by the worker, generates zero capital gain for the match: either theworker rejects the offer and stays, in which case the continuation value of the match is still P (x,y), or the workeraccepts the offer and leaves, in which case he receives P (x,y) while his initial employer is left with a vacant jobworth 0, so that the initial firm-worker pair’s continuation value is again P (x,y).
9
agents is solved simply by flow surplus comparisons (as hinted at in the example of Section 2).
Finally note that, in the sequential auction case, the value of unemployment, U(x), is given
by ρU(x) = b(x), implying σ(x,y) = p(x,y)− b(x), i.e. σ is pinned down by technology. Thus,
optimal mobility decisions are entirely determined by technology.
3.3 Steady-State Distribution of Skills and Skill Requirements
In our analysis of sorting below, the key object will be the steady-state equilibrium measure of
type-(x,y) matches, denoted by h(x,y), which indicates who matches with whom. It is deter-
mined by the following flow-balance equation, which embeds the optimal mobility decisions,12
δ + λ1EΓ
[1σ(x,y′) > σ(x,y)
]h(x,y) = λ0γ(y)1 σ(x,y) ≥ 0u(x)
+ λ1γ(y)
∫1σ(x,y) > σ(x,y′)
h(x,y′)dy′, (1)
where u(x) is the measure of type-x unemployed workers in the economy. Note that we use primes
(′) to denote random variables with respect to which expectations are taken. The left-hand side
(l.h.s.) of (1) is the outflow from the stock of type-(x,y) matches, comprising matches that are
destroyed at rate δ and matches that are dissolved because the worker receives a dominant outside
offer. The flow probability of this latter event is λ1EΓ [1 σ(x,y′) > σ(x,y)], the product of
the arrival rate of offers λ1 and the probability of drawing a job type y′ that yields a higher flow
surplus with the worker than the current type-y job. The right-hand side (r.h.s.) of (1) is the
inflow into the stock of type-(x,y) matches and composed of two groups: unemployed type-x
workers who draw a type-y job with flow probability λ0γ(y) and accept it (which they do if
the flow surplus is positive); and type-x workers employed in any type-y′ job who draw an offer
from a type-y job with flow probability λ1γ(y) and accept it (which they do if the flow surplus
with that job exceeds the one with their initial type-y′ job). The measure of type-x unemployed
workers solves the following flow-balance equation with similar interpretation:
λ0EΓ
[1σ(x,y′) > 0)
]u(x) = δ
∫h(x,y′)dy′. (2)
Finally note that, consistently with (1) and (2), the total measure of workers with skill bundle
x in the economy solves ℓ(x) = u(x) +∫h(x,y′)dy′.
12Throughout the paper, we use the notation EΓ to distinguish expectations taken w.r.t. the sampling distri-bution Γ from expectations w.r.t. the equilibrium distribution of matches, which we simply denote by E.
10
Note that the job acceptance rule of an employed worker in a type-(x,y) match hinges on the
comparison of two scalar random variables, σ(x,y′) and σ(x,y), despite the underlying multi-
dimensional heterogeneity of workers and firms. It is therefore convenient to introduce the con-
ditional sampling distribution Fσ|x of flow surplus σ, given x (with density fσ|x). With this no-
tation, the job acceptance probability for an employed worker x is EΓ [1 σ(x,y′) > σ(x,y)] =
F σ|x (σ(x,y)) and that for an unemployed worker is EΓ [1 σ(x,y′) > 0)] = F σ|x(0).
Substituting these elements into (1), we show in Appendix C that the matching density
h(x,y) has the following closed-form:
h(x,y)
ℓ(x)γ(y)=
δλ01 σ(x,y) ≥ 0δ + λ0F σ|x(0)
×δ + λ1F σ|x(0)[
δ + λ1F σ|x (σ(x,y))]2 .
This equation also implies that the equilibrium conditional density of job types y given worker
types x among employed workers is given by:
h (y|x) = δ1 σ(x,y) ≥ 0F σ|x(0)
×[δ + λ1F σ|x(0)
]γ(y)[
δ + λ1F σ|x (σ(x,y))]2 =
gσ|x (σ(x,y))
fσ|x (σ(x,y))× γ(y), (3)
where for any s ∈ R, the steady-state cross-section cdf of flow surplus among employed workers
of type x is
Gσ|x(s) := 1−δ + λ1F σ|x(0)
F σ|x(0)×
F σ|x(s)
δ + λ1F σ|x(s)
and gσ|x is its density.
4 Equilibrium Sorting
4.1 Measuring Sorting
We first specify a measure of sorting in this multi-dimensional environment under frictions.
A criterion that has been proposed for multi-dimensional positive assortative matching in
a frictionless context is that the Jacobian matrix of the equilibrium matching function be a
P -matrix, meaning that all its principal minors are positive (Lindenlaub, 2017). This criterion
captures the way in which a worker’s job type y improves or deteriorates as one varies the
worker’s skill bundle x when matching is pure, i.e. when any two workers with the same skill
bundle are matched to the exact same type of job. By contrast, in our frictional environment
with random search the equilibrium assignment is generally not pure — there is mismatch. A
11
natural extension of this measure of sorting to our environment is to consider changes in the
quantiles of the conditional matching distribution of job types y as one varies worker type x.13
Formally, let Hj(y|x) denote the marginal cdf of yj (the jth component of job attribute vector
y) conditional on the matched workers having skill bundle x. Using (3), we can express this as
Hj(y|x) =∫
1y′j ≤ y
h(y′|x
)dy′ =
δ[δ + λ1F σ|x(0)
]F σ|x(0)
∫ 1 σ(x,y′) ≥ 0 × 1y′j ≤ y
[δ + λ1F σ|x (σ(x,y′))
]2 γ(y′)dy′.
(4)
We are interested in signing, for each job attribute j, the elements of the gradient ∇Hj(y|x) =
(∂Hj(y|x)/∂x1, · · · , ∂Hj(y|x)/∂xX)⊤, i.e. the Jacobian of H(y|x). A situation of particular
interest is when a component of this Jacobian, ∂Hj(y|x)/∂xk, has a constant sign over the
support of yj . If that sign is negative [positive], then Hj(·|x) is increasing [decreasing] in xk in
the sense of FOSD: positive [negative] assortative matching then occurs in dimension (yj , xk),
as a worker with higher type-k skill is matched to jobs with greater (in the FOSD sense) type-j
skill requirement compared to a worker with lower type-k skill.14 We will thus use the following
formal definition of sorting:
Definition 1 (Positive and Negative Assortative Matching). Matching is positive [negative]
assortative in dimension (yj , xk) if and only if ∂Hj(y|x)/∂xk is negative [positive] for all y ∈
[yj, yj ] and all x ∈ X .
We will use the acronyms PAM and NAM for positive and negative assortative matching.
Alternatively, we will also refer to PAM (or NAM) as positive (or negative) sorting.15 To avoid
duplication of some results, we focus on positive sorting throughout most of the paper.
4.2 A Decomposition Result
We begin our analysis by showing how equilibrium sorting can be decomposed into sorting on
the unemployment-to-employment (UE) margin and on the employment-to-employment (EE)13We choose to analyze the equilibrium matching distribution of y given x and not that of x given y for
the following reason. While workers sample job types from an exogenous sampling distribution γ, jobs ‘sample’workers from an endogenous distribution (the distribution of workers across employment statuses and job types),which in itself is a complex equilibrium object. The acceptance decisions of firms would impact the distributionof x across employment statuses and job types, and the distribution of x, in turn, impacts the acceptancedecisions of the firms. Analyzing the matching distributions of x given y would therefore require us to deal witha complicated fixed point problem, which proved intractable.
14FOSD has been used to characterize sorting under frictions and 1D-heterogeneity (e.g. Chade, 2006).15Note that Definition 1 is “global” in the sense that it imposes a sign restriction on ∂Hj(y|x)/∂xk for all skill
bundles x ∈ X . Alternatively, we could have opted for a “local” definition by only imposing the weaker conditionthat ∂Hj(y|x)/∂xk be positive or negative at a given skill bundle x. In what follow we use the global version ofthis definition as sorting is commonly envisaged as a global property in the literature.
12
margin. As we show in Appendix C, a typical element of the gradient of Hj(y|x), which we use
to characterize sorting patterns (Definition 1), has two parts, indicating that a marginal increase
in the worker’s skill xk affects his equilibrium distribution of job types yj in two ways.16
∂Hj(y|x)∂xk
= CUE︸︷︷︸(1): UE margin
+ CEE︸︷︷︸(2): EE margin
(DC)
where components CUE and CEE are given by
CUE := gσ|x(0)× EΓ
[∂σ(x,y′)
∂xk| σ(x,y′) = 0
]×[PrΓ
y′j ≤ y|σ(x,y′) = 0
−Hj(y|x)
]+PrΓ
y′j ≤ y|σ(x,y′) = 0
×EΓ
[∂σ(x,y′)
∂xk| σ(x,y′) = 0, y′j ≤ y
]−EΓ
[∂σ(x,y′)
∂xk| σ(x,y′) = 0
]
and
CEE :=
∫ +∞
0
2λ1fσ|x(s)gσ|x(s)
δ + λ1F σ|x(s)× PrΓ
y′j ≤ y|σ(x,y′) = s
×
EΓ
[∂σ(x,y′)
∂xk| σ(x,y′) = s, y′j ≤ y
]− EΓ
[∂σ(x,y′)
∂xk| σ(x,y′) = s
]ds.
First, a marginal increase in skill xk affects the boundary of the set of profitable matches for that
worker, i.e. the set of job types y such that σ(x,y) ≥ 0. An increase in skill may render some
matches between unemployed workers and jobs profitable that were unprofitable before. This
is reflected in the first term of expression (DC) — our label for decomposition — which only
works through selection on the UE margin: term CUE contains (multiplicatively) the density of
marginally profitable matches for type-x workers, gσ|x(0). If the worker’s type x is such that
σ(x,y) > 0 for all job types y (i.e. if the worker accepts any job type when unemployed), then
there are no such marginal matches, gσ|x(0) = 0, and sorting on the UE margin is shut down.
Second, a marginal increase in xk affects the job acceptance probability for employed workers
as well. More specifically, an increase in xk changes the comparison between any two potential
matches involving the worker: for any two job types (y,y′), the difference σ(x,y′) − σ(x,y)
generally varies with xk. This, in turn, changes how employed workers reallocate between jobs
through on-the-job search. This effect, given by term CEE of expression (DC), operates through16Two important technical notes: PrΓA is used to denote the probability of A occurring following a random
draw of a job type y from the sampling distribution γ. Second, it may be that the joint event(σ(x,y′) = s, y′
j ≤ y)
on which some of the expectations in (DC) are conditioned have zero probability in γ. As explained in the detailedderivation, we set expectations conditional on zero-probability events to zero by convention.
13
sorting on the EE margin.
If we shut down sorting on the UE margin (for example by assuming σ(x,y) > 0 for all
(x,y)-pairs), then ∂Hj(y|x)/∂xk = CEE. In turn, if we shut down sorting on the EE margin (for
example by setting λ1 = 0, or by specifying a flow surplus function such that the job acceptance
probability of employed job seekers is independent of skills x), then ∂Hj(y|x)/∂xk = CUE. In
what follows, we will say that PAM [NAM] occurs on the EE margin whenever CEE is negative
[positive], and that PAM [NAM] occurs on the UE margin whenever CUE is negative [positive].
In other words, we analyze sorting on the EE margin “as if” sorting on the UE margin were shut
down, and vice-versa. This is obviously an expositional convention: in general, both margins of
sorting are present, and the contributions to overall sorting of both terms CUE and CEE need to
be signed in order to determine the sign of ∂Hj(y|x)/∂xk, as indicated by (DC).
To offer some economic intuition for (DC), consider the EE margin first. Term CEE is
negative if EΓ
[∂σ(x,y′)/∂xk | σ(x,y′) = s, y′j ≤ y
]− EΓ [∂σ(x,y
′)/∂xk | σ(x,y′) = s] ≤ 0 for
all s. In the one-dimensional case (Y = 1), one can show that this will be true if ∂2σ/∂xk∂y ≥ 0,
i.e. if xk and y are complements in the production technology, echoing the familiar finding from
the literature on one-dimensional, frictionless sorting that supermodularity implies PAM. While
things are more complex in the multi-dimensional world, we show below that complementarities
remain a key determinant of sorting patterns in that case. Note that a similar expression,
EΓ
[∂σ(x,y′)/∂xk | σ(x,y′) = 0, y′j ≤ y
]− EΓ [∂σ(x,y
′)/∂xk | σ(x,y′) = 0], is present in term
CUE , hinting at the importance of productive complementarities for UE sorting as well.
Beyond this basic intuition about the driving forces of sorting on the UE and EE margin,
those two margins involve complex interactions between the technology σ and the sampling
distribution of job types γ. This implies that terms CUE and CEE in (DC) cannot easily be
signed without further assumptions on the primitives. In order to make progress towards a
characterization of the sign of sorting, we focus in the next section on a class of technologies
for which we can derive clean and (with one exception) distribution-free conditions for positive
sorting. We investigate generalizations in the following section and in the appendix.
5 The Sign of Sorting: Bilinear Technology in Two Dimensions
5.1 Assumptions
The following two assumptions simplify decomposition (DC) considerably.
14
Assumption 1. (a) The production function p(x,y) is bilinear in (x,y):
p(x,y) = (x+ a)⊤Qy =
X∑k=1
Y∑j=1
qkj(xk + ak)yj
where Q = (qkj) 1≤k≤X1≤j≤Y
is a X × Y matrix and a = (a1, · · · , aX)⊤ ∈ RX+ is a fixed vector;
(b) the nonemployment income function b(x) is linear in x:
b(x) = (x+ a)⊤Qb =
X∑k=1
Y∑j=1
qkj(xk + ak)bj
where b = (b1, · · · , bY )⊤ ∈ RY is a fixed vector;
(c) there exists j ∈ 1, · · · , Y such that qj(x) := ∂p(x,y)/∂yj =∑X
k=1 qkj(xk + ak) > 0 for
all x ∈ X ; to fix the notation, we will assume w.l.o.g. that qY (x) > 0.
Assumptions 1 (a)-(b) restrict the production technology in such a way that the flow surplus
function σ(x,y) is bilinear in (x,y). Indeed they imply that:
σ(x,y) = p(x,y)− b(x) = (x+ a)⊤Q(y − b) =
X∑k=1
Y∑j=1
qkj(xk + ak)(yj − bj)
The technology matrix Q captures the complementarity structure between all job and worker
characteristics and will be crucial to our analysis of sorting. We interpret the vector b as the
production technology of the unemployed. In turn, we interpret vector a, which is a technological
parameter, as the baseline productivity of workers, noting that a⊤Qy is the output of a type-y
job filled with the least skilled worker, x = 01×X . We assume a to be positive (Assumption 1(a)).
This ensures that the worker’s total input into production, x + a, is positive in all dimensions
(recall that x ∈ RX+ ). While not strictly necessary for our analysis, this restriction ensures that
our sorting results do not change with the sign of x+ a. Finally, Assumption 1(c) ensures that,
for any level of worker skills, there is at least one job attribute, here denoted by yY , that impacts
output positively.Note that we do not impose monotonicity of the production function in all job
attributes. Nor do we restrict the monotonicity of the production or flow surplus function in
worker skills x. Next, we consider:
Assumption 2. Each job has Y = 2 attributes, i.e. y ∈ Y ⊂ R2+.
The sorting results established in the next subsections will rely on Assumptions 1 and 2 (our
15
baseline model). In the following section, we provide generalizations of our results on the sign
of sorting to other surplus functions and to higher dimensions of job heterogeneity. We now
investigate the sign of sorting along both the EE and the UE margin, based on (DC).
5.2 The EE Margin
We begin with the following result on the sign of EE sorting.
Theorem 1 (EE Sorting, Y = 2, Bilinear Technology). Under Assumptions 1 and 2, PAM
occurs in dimension (y1, xk) along the EE margin if and only if, for all y ∈ Y:
∂
∂xk
(∂p(x,y)/∂y1∂p(x,y)/∂y2
)> 0 or, equivalently:
∂
∂xk
(q1(x)
q2(x)
)> 0. (SC-2D)
Condition (SC-2D) is a single-crossing condition of the production function (also known as
Spence-Mirrlees condition, in its differential form). Note that, in the two-dimensional, bilinear
case at hand, (SC-2D) simply amounts to detQ > 0, which does not depend on any skill bundle
x and either holds or fails to hold for all x ∈ X . Single-crossing properties have been shown to
guarantee positive sorting in several one-dimensional matching problems.17
The analysis of our multi -dimensional matching model with search frictions and transferable
utility further highlights the importance of single-crossing as a driving force toward positive
sorting. Condition (SC-2D) states that the marginal rate of substitution between (y1, y2) is
increasing in worker skill xk. This implies that skill xk is more complementary to job attribute
y1 than to y2, which is why positive sorting occurs between between xk and y1 (but negative
sorting between between xk and y2 — in the dimension of relatively weak complementarity).18
To illustrate the single crossing condition in our setting and its implication graphically, we
consider two workers with skill bundles x′ = (x′1, x′2) and x′′ = (x′′1, x
′′2) such that x′′1 > x′1 and
x′′2 = x′2 (the second worker has more of x1 but both have equal amounts of x2). In Figure 1, we
plot for each worker the locus of jobs with which that worker produces the same output as with
the job that has attributes A. Single-crossing condition (SC-2D) implies that these isoquants
cross only once (at point A). Moreover, because the marginal rate of substitution is increasing
in x1, the curve of the more skilled worker is steeper. Consider point A as a benchmark with17In an important paper, Legros and Newman (2007) show that a single crossing property is sufficient to
guarantee PAM in frictionless one-dimensional problems with non-transferable utility (NTU). Chade, Eeckhoutand Smith (2017) then demonstrate that several one-dimensional matching problems with transferable utilityboth in environments with and without frictions can be recast as NTU, frictionless matching problems. Afterfinding the associated NTU problem, the Legros-Newman-condition can be applied and guarantees PAM.
18This statement about NAM in (y2, xk) holds true if ∂p(x,y)/∂y1 > 0 in addition to ∂p(x,y)/∂y2 > 0.
16
y1
y2
AB
p(x′,y) ≡ p(x′,yA)
p(x′′,y) ≡ p(x′′,yA)
Low x′1 High x′′1
Figure 1: Single Crossing Property
no sorting (both workers are matched to the same job). Condition (SC-2D) says the following:
if the lower-skilled worker weakly prefers job B over job A where B has lower y2 but higher y1,
then the worker with higher x1 strictly prefers job B, as is the case in the graph.
5.3 The UE Margin
The next result establishes conditions for positive sorting along the UE margin.
Theorem 2 (UE Sorting, Y = 2, Bilinear Technology). Under Assumptions 1 and 2 and under
single-crossing condition (SC-2D) (from Theorem 1), if for all x ∈ X :
1. q1(x) > 0 (i.e. p(x,y) is not only increasing in y2 but also in y1)
2. along all level curves of p(x, ·) (i.e. at all y such that p(x,y) = C for some fixed C ≥ 0):
q2(x)∂2 ln γ
∂y2∂y1− q1(x)
∂2 ln γ
∂y22≥ 0 (UE-2D)
3. at the lower support of γ, denoted by y = (y1, y
2): y
2≥ b2 and y
1< b1
then PAM occurs in dimension (y1, xk) along the UE margin. Moreover, (SC-2D) is also neces-
sary for PAM to occur generically under any sampling distribution γ.
Theorem 2 highlights the importance of single-crossing also for sorting on the UE margin. It
becomes a necessary condition for sorting if all possible sampling distributions are considered.
However, contrary to the EE margin, single crossing alone is not sufficient for PAM on the
UE margin: additional restrictions on the sampling distribution are needed, given by (UE-2D).
17
Recall that sorting on the UE margin occurs when a marginal increase in skill xk affects the
boundary of the set of profitable matches, i.e. the locus of y’s such that σ(x,y) = 0. Figure 2
shows how this boundary shifts with xk under the assumptions of Theorem 2, and helps visualize
the role of both single crossing and distributional restrictions in that theorem.
Yy2
y1
(b1, b2)
y1
y2
σ(x,y)=0
(highx ′′k )
σ(x,y) =0 (low
x ′k )
AB
C
D
Figure 2: Sorting along the UE margin
Figure 2 represents the (y1, y2) plane, where the origin is placed at b = (b1, b2). The shaded
area materializes Y, the support of γ: the (lower) boundaries of Y are the horizontal line at
y2 = y2
and the vertical line at y1 = y1, which are placed in compliance with Condition 3 in
Theorem 2. The oblique lines are zero level curves of σ(x, ·), which under the assumed linear
technology are given by y2 = b2 − q1(x)q2(x)
(y1 − b1). By Condition 1 in Theorem 2, such lines are
downward sloping and go through point y = b. The boundary of feasible matches for a given
skill bundle x is at the intersection between the zero level curve of σ(x, ·) and Y (the emphasized
line segment). Note that this boundary lies entirely in the region of Y where y1 < b1: because
it is assumed that y2≥ b2, it has to be the case that y1 < b1 for surplus to equal zero.
Those zero level curves are drawn for two workers x′ and x′′ with x′′k > x′k (and with the same
amount of all other skills). The higher-x′′k (blue) curve is steeper than the lower-x′k (black) one,
meaning that for a given y2, the more skilled worker needs a higher y1 to generate non-negative
surplus. The reason is as follows: by the single crossing property (SC-2D), complementarities
in production are stronger between xk and y1 than between xk and y2. Thus, the jobs under
consideration (with y1 < b1) are prone to generate surplus losses especially for those workers
with high xk. Therefore, for a given y2, workers with higher xk need jobs with higher y1 to
generate non-negative surplus, which is clearly a force towards PAM. In the figure, this means
18
that all job types between the black and the blue line can be profitably matched with the low
skilled (x′k) worker, but produce negative surplus with the high-skilled (x′′k) worker. This is why
all those jobs with relatively low attribute y1 drop out of his equilibrium matching set.
However, complementarities in production alone are not enough to ensure PAM on the UE
margin. To see this, consider points A, B, C and D on the figure. After increasing skill k from
x′k to x′′k, a worker no longer breaks even with a job at A. Moreover, jobs around B (with higher
y1 but lower y2) are also made unprofitable while jobs around C with lower y1 but higher y2
compared to A remain profitable. Therefore, if the sampling distribution γ has most of its mass
concentrated around points A, B and C (i.e. there is a negative correlation between y1 and y2)
then workers with higher xk will be matched to jobs with lower y1 (since jobs around B with
higher y1 have too little of y2, leading to negative surplus) — a force towards NAM. To prevent
this, we assume a sufficient degree of positive association between y1 and y2 in γ to ensure that
more mass is concentrated around points A and D. Notice that the distributional barrier to
PAM arising from a negative association of (y1, y2) becomes more severe the larger is the positive
impact of y2 on the surplus (i.e. the larger is q2(x), which makes the zero-surplus lines flatter).
How likely is condition (UE-2D) in Theorem 2, guaranteeing a sufficient positive association
of job attributes, to hold? Sufficient conditions on the sampling distribution are that the density
γ be log-supermodular and its marginals log-concave. This class of distributions is quite broad.
For instance, any bivariate distribution of independent random variables that has log-concave
marginals (e.g. the uniform distribution with independent random variables) satisfies (UE-2D).
Other examples of log-supermodular densities with log-concave marginals are the (truncated)
bivariate Gaussian with positive covariance and the multivariate Gamma distribution.19
5.4 Taking Stock
Both theorems on the sign of sorting show that sorting under multi-dimensional job heterogeneity
is fundamentally different from a comparable model with one-dimensional heterogeneity. In such
a model, there is no sorting on the EE margin (Postel-Vinay and Robin, 2002): the strategy
of firms is to accept any worker that yields positive surplus while the strategy of workers is to
accept all jobs that yield a higher (flow) surplus than the current one. The assumption that
the flow surplus is increasing in y (the one-dimensional version of Assumption 1) implies that19The multivariate Gamma distribution is defined by a linear combination of independent random variables that
have standard gamma distribution. Log-supermodularity of the multivariate Gamma distribution is implied byKarlin and Rinott (1980), Prop. 3.8, and its log-concavity by Shapiro, Dentcheva, Ruszczynski (2009), Th. 4.26.
19
all workers rank firms in the same way and try to climb up a single economy-wide job ladder.
Hence, all workers tend to move into higher-y jobs over time, which rules out sorting. Moreover,
regarding the UE margin, there is positive sorting in our multi-dimensional setting under the
specified conditions. But there would again be no sorting in the 1D model since any match in
which the job productivity is too low (y < b) would not form, independent of the worker’s skill.
Why does sorting arise only in the multi-dimensional model? By contrast to the one-
dimensional case, what matters here is not only to match skilled workers to productive jobs
in any dimension. Instead it is important to match a given worker to a job that requires much
of the skill in which the worker is particularly strong. Thus, workers with different skill bundles
rank firms differently, have different job ladders, and therefore accept and reject different types
of jobs. This heterogeneity of job ladders induced by multiple skill dimensions is one of our
crucial insights and the reason why sorting arises in this multi-dimensional setting.
6 The Sign of Sorting: Generalizations
In this section, we discuss how our results on the sign of sorting generalize to other environ-
ments. We are able to relax both Assumptions 1 (bilinear technology) and 2 (two-dimensional
firm types). First, we extend our results to the case of a nonlinear technology with an unrestricted
number of job attributes Y ≥ 2, provided that the technology be monotone in at least one yj .
Second, we investigate a ‘separable’ technology (possibly non-linear and non-monotone), again
for general Y ≥ 2 dimensions of job attributes.20 In all these cases, we provide sufficient condi-
tions on primitives — technology and distribution of job attributes — that guarantee positive
sorting. Finally, all of our results on the sign of sorting also apply to other commonly used wage
bargaining protocols, beyond the sequential auction setting we have described in detail. These
generalizations are technically more involved than the results from the presented baseline model.
To keep the text focussed on the simplest case that conveys the full intuition, we state these
theorems along with a detailed discussion in Appendix A and only give a brief overview here.
Theorem 5 in Appendix A establishes sufficient conditions for sorting on the EE margin,
generalizing Theorem 1 to the case of non-linear monotone technologies for Y ≥ 2 (where
‘monotone’ holds for at least one dimension yj). Similar to our baseline model, a (generalized)
20The ‘separable’ class includes the popular specification of Tinbergen (1956) p(x,y) = c0 −∑X
i=1 ci(xi − yi)2,
where X = Y and where the ci’s are strictly positive numbers. In this example, each job has an “ideal” skillbundle given by y, and output is a decreasing function of the distance between the worker’s skill bundle x andthat ideal skill bundle.
20
single-crossing condition is at the center of this result. It implies stronger complementarities
between productivity-skill pairs (yj , xk) for all j < Y than between (yY , xk), pushing towards
positive sorting in all pairs (yj , xk) except (yY , xk). Contrary to our baseline result, however, this
theorem reveals that in more general environments the conditions for sorting are not distribution-
free. We need to place restrictions on the correlation of job attributes in the sampling distribution
γ. This is to prevent distributional obstructions to positive sorting between skill xk and all the
yj ’s that could stem from negative correlation among the different job attributes yj .
Theorem 5 is our most general result on EE sorting and is useful because it nests several
special cases of interest. Not only does it nest Theorem 1 above, but also the considerably simpler
conditions for sorting under nonlinear, monotone surplus functions with Y = 2 (Corollary 3) as
well as those under bilinear surplus functions with Y > 2 (Corollary 4).
These generalizations on EE sorting highlight that under richer forms of job heterogeneity
(Y > 2), the conditions for sorting do not only involve single-crossing of the technology but also
distribution γ. We can only circumvent this complication in the case of separable technologies
where there is no interaction between the different job attributes in production: In Theorem 7,
we dispense with the monotonicity assumption from Theorem 5 but impose that the skill under
consideration xk only complement a single yj and be irrelevant to other job dimensions (that is,
we impose that ∂2p(x,y)/∂xk∂yj = 0 and that ∂2p(x,y)/∂xk∂yj′ = 0 for j′ = j). We show that
there is PAM on the EE margin in the pair (yj , xk) that satisfies single-crossing in production.
Lastly, we focussed on surplus splitting via sequential auctions without worker bargaining
power for expositional clarity only. We show in Appendix B that our results on the sign of sorting
hold for other wage setting rules, as Nash bargaining (Mortensen and Pissarides, 1994; Moscarini,
2001), wage/contract posting (Burdett and Mortensen, 1998; Moscarini and Postel-Vinay, 2013),
or sequential auctions with worker bargaining power (Cahuc, Postel-Vinay, and Robin, 2006).
To summarize, we are able to provide sufficient conditions to characterize the sign of sorting
for (i) several broad classes of technology, (ii) all commonly used wage bargaining protocols and
(iii) an arbitrary number of job and worker attributes. However, departing from our baseline
case with bilinear technology and two-dimensional job heterogeneity complicates our analysis
considerably and requires restriction on the sampling distribution. Yet, across all of the environ-
ments that we consider, a single-crossing condition on the production technology that guarantees
sufficiently strong complementarities is the linchpin of positive sorting under random search.
21
7 Interrelation of Sorting Patterns Across Dimensions
We first approached the question of sorting “dimension by dimension”, focusing on the equilib-
rium relation between some skill xk and some job attribute yj (Section 5). Section 6 generalizes
those results to higher dimensions of job heterogeneity. More precisely, we give conditions under
which sorting is positive in dimensions (yj , xk) for a given skill xk and all j ∈ 1, · · · , Y − 1,
Y ≥ 2, keeping the other skills k′ = k fixed. This corresponds to signing all elements of any
column k in the Jacobian matrix of the conditional matching distribution,
∂H(y|x)∂x⊤ =
∂H1(y|x)
∂x1· · · ∂H1(y|x)
∂xX
.... . .
...∂HY (y|x)
∂x1· · · ∂HY (y|x)
∂xX
(5)
except for the last element of a column, ∂HY (y|x)/∂xk.
In this section we further explore the interdependence of sorting patterns across dimensions of
worker skills and job attributes, thereby completing the analysis of sorting. We first investigate
the effect of a simultaneous expansion of all skills, which corresponds to signing the elements
of any row j of (5), i.e. the gradient of Hj(y|x). This enables us to compare the matching
distributions of any two workers in the economy and illustrates how the degree of worker spe-
cialization matters for sorting. Second, we address sorting in the “remaining dimension” (yY , xk)
(that was not addressed in Section 6) by investigating the sign of ∂HY (y|x)/∂xk. The results
in this section are established under our baseline Assumption 1, i.e. for a bilinear technology.21
7.1 Sorting Patterns Across Dimensions of Skills
Our first result is based on the simultaneous expansion of all skills and shows that there is no
sorting on “absolute advantage”: if two workers x and x′ are such that one is twice as productive
as the other in all jobs, x′+a = 2(x+a), then both workers are matched to the same distribution
of jobs in equilibrium, irrespective of the complementarities in production.
Theorem 3. Under Assumption 1, ∀j ∈ 1, · · · , Y :
(x+ a)⊤∇Hj (y|x) = 0
i.e. the function (x+ a) 7→ Hj (y|x) is homogeneous of degree 0 in (x+ a) for all j.21We show in the Appendix that those results are straightforward to generalize to the case of a homogeneous
technology, i.e. a technology such that σ (−a+ t(x+ a),y) = tα · σ(x,y) for any positive scaling factor t.
22
Theorem 3 addresses the case of a simultaneous expansion of all skills such that the sum
x + a is scaled up, i.e. it considers an expansion in the direction of x + a. Since a is only a
fixed productivity parameter that does not differ across workers (see Assumption 1), there is no
obvious reason why workers’ skills should expand along this particular direction. We therefore
focus on a generic skill expansion in x instead. To illustrate its effect on the worker’s assignment,
we focus on the case X = Y = 2, so that x = (x1, x2), y = (y1, y2), and a = (a1, a2), and further
impose assumptions on technology and distributions that guarantee PAM in dimension (y1, x1),
i.e. ∂H1(y|x)/∂x1 ≤ 0 (and NAM in dimension (y1, x2)).22 In that context, we consider a
marginal expansion of a worker’s skills from (x1, x2) to (x1 +∆x1, x2 +∆x2). Using Theorem
3, the resulting change in H1(y|x) is:
∆H1(y|x) ≃ (x1 + a1)∂H1(y|x)
∂x1
[∆x1
x1 + a1− ∆x2
x2 + a2
](6)
Equation (6) has three implications. First, it confirms our earlier finding that ∆H1(y|x) = 0 if
∆x1x1+a1
= ∆x2x2+a2
(i.e. no change in sorting if the worker improves his skills in the direction of x+a).
Second, (6) shows more generally that a marginal (but not necessarily proportional) im-
provement in both skills will cause the worker to match with jobs with stochastically higher y1
attributes if and only if ∆x1x1+a1
> ∆x2x2+a2
. This condition says that his total productive input,
consisting of his individual skills x plus the baseline productivity a, is increased proportionately
more in dimension 1 than in dimension 2. As a result, the simultaneous improvement in both
skills will cause the worker to sort into jobs with stochastically higher y1 (but with lower y2).
A third implication of (6) is that under a proportional increase in both skills (∆x1x1
= ∆x2x2
),
∆H1(y|x) < 0 if and only if x1/x2 > a1/a2. In other words, scaling up all skills leads to a
stochastically better distribution of job matches in the dimension where the worker is specialized
relative to the baseline productivity vector a. By contrast, scaling up all skills leads to a
deterioration of the distribution of job matches in the second dimension, i.e. ∆H2(y|x) > 0.
Scaling up all of a worker’s skills simultaneously thus has a non-uniform effect on the worker’s
distribution of jobs across dimensions, which depends on his specialization. Our interpretation
is that this multi-dimensional model does not feature any hierarchical sorting based on absolute22Assumptions that ensure this are: Q is a positive matrix with detQ > 0, γ is a bivariate normal distribution
truncated over Y with positive covariance, and y2− b2 ≥ 0 > y
1− b1. Condition (SC-2D) from Theorem 1 writes
as (x2 + a2) detQ > 0, which holds here by assumption. Hence, the example has PAM on the EE margin. Next,because x + a is a positive vector and Q a positive matrix, p(x,y) is increasing in both y1 and y2, satisfyingCondition 1 of Theorem 2. The truncated normal with positive covariance satisfies Condition (UE-2D), as shownin Section 5, and Condition 3 from Theorem 2 is satisfied by assumption. PAM thus also occurs on the UE margin.
23
advantage but instead features sorting based on specialization.23
The content of these results is quite different when skills are one-dimensional. Consider
Theorem 3 for the one-dimensional case X = 1 (i.e. x and a are scalars x and a, Q is a
1 × Y row vector so that y = Q(y − b) is a scalar, and the flow surplus function σ(x,y) =
(x+ a)y). In this case, Theorem 3 echoes a known result: there cannot be sorting, in the sense
that ∂Hj(y|x)/∂x = 0 between x and any yj , j ∈ 1, · · · , Y . In other words, x and yj are
independent in the population of job-worker matches (Postel-Vinay and Robin, 2002).
We end this section with a result that follows directly from Theorem 3, addressing the
interrelation of sorting patterns between a given job attribute and different skills. That is, we
focus on signing any row j of (5):
Corollary 1. Under Assumption 1, if PAM occurs between yj and xk for all skill dimensions
but one, i.e. for k ∈ 1, · · · , X\K, then NAM occurs between yj and xK .
The underlying cause of Corollary 1 becomes most transparent in the special case X = Y = 2
with the UE margin shut down. Suppose for example that the single-crossing condition (SC-2D),
∂[q1(x)q2(x)
]/∂x1 > 0, holds, which is equivalent to detQ > 0, implying PAM in dimension (y1, x1).
However, detQ > 0 also implies ∂[q1(x)q2(x)
]/∂x2 < 0, leading to NAM in dimension (x2, y1).
By the same argument, detQ > 0 implies PAM in dimension (x2, y2) but NAM in dimension
(x1, y2). This illustrates that PAM cannot arise simultaneously across those two dimensions
because the (necessary and sufficient) single-crossing conditions cannot hold simultaneously for
x1 and x2. Put differently, detQ > 0 says that complementarities in (x1, y1) and (x2, y2)
“dominate” complementarities in (x2, y1) and (x1, y2), which is why PAM occurs within task
dimensions 1 and 2 but NAM occurs between those dimensions. Again in this case of Y = 2,
this occurs for purely technological reasons, independent of the sampling distribution.24
This section contains two main insights, both of which are empirically testable. First, our
model predicts that multi-dimensional sorting under random search occurs based on special-
ization instead of absolute advantage: uniformly better workers do not match with uniformly
better jobs but instead with jobs that are suited to their mix of skills. Similar to Section 5,
the underlying reason is that workers with different skill bundles rank jobs differently and climb23It is important to note that this result is not due to our assumption of no capacity constraint on the firm
side but appears in models of multi-dimensional sorting more generally. The absence of sorting based on absoluteadvantage can also be obtained in a multi-dimensional model with capacity constraints (e.g. Lindenlaub, 2017).
24Things are much more involved in higher dimensions Y ≥ 3, where the intuition underlying Corollary 1 isless clear. Yet, the result still holds: under Assumption 1, sorting cannot be simultaneously positive between agiven yj and all xk, k ∈ 1, · · · , X. More details are available upon request.
24
different job ladders. Here, we allow all of a worker’s skills to vary simultaneously, which illus-
trates more clearly how his entire skill bundle matters for sorting. Second, Corollary 1 shows
that sorting cannot be simultaneously positive between a given job dimension and all skill di-
mensions. Instead, there are sorting trade-offs and agents need to decide which skill dimension
to ‘sacrifice’. PAM arises between a job attribute and all skill dimensions except the one that is
characterized by the weakest complementarities, where NAM materializes.
7.2 Sorting Patterns Across Dimensions of Job Attributes
We now return to the interrelation of sorting patterns across job attributes for a given skill
xk, i.e. in any column of (5). The following theorem implies that the way the conditional
distributions of job attributes Hj(y|x) (and thus the conditional expectations) co-vary with skill
xk follows a pattern, which is driven by the technology Q:
Theorem 4. Under Assumption 1, if σ(x,y) > 0 for all y ∈ Y and x ∈ X (no UE sorting), then:
(x+ a)⊤Q∂E (y|x)∂x⊤ = 01×X .
For illustration, we consider again the two-dimensional case (X = Y = 2). Figure 3 shows the
locus of the pair (E(y1|x),E(y2|x)) when one of the components of x (say x1) varies. Theorem
4 implies that the normal vector to this locus is Q⊤(x + a) at all points, so the slope of the
E(y|x)-locus is entirely determined by the technology. Moreover, to fix ideas, Figure 3 was
drawn under the assumption that output is increasing in all components of y (all components of
Q⊤(x+a) are positive), so that the slope of the E(y|x)-locus is negative. We consider again our
two workers with skill bundles x′ = (x′1, x′2) and x′′ = (x′′1, x
′′2) such that x′′1 > x′1 and x′′2 = x′2.
If there is PAM within (y1, x1), then Theorem 1 implies ∂E(y1|x)/∂x1 > 0, and the higher-x′′1
worker will be matched on average to a job with higher y1 than the lower-x′1 worker, as shown
on Figure 3. However, the average job of the more skilled worker will have lower y2 than the
average job of the lower skilled worker, i.e. ∂E(y2|x)/∂x1 < 0.
More generally, we can use Theorem 4 to obtain some insights about sorting in dimension
(yY , xk) — the remaining dimension in any column of (5) in which the sign of sorting is not
determined by Corollary 4 (Section 6 and Appendix A).
Corollary 2. Under Assumption 1, assuming qj(x) ≥ 0 for all j ∈ 1, · · · , Y and σ(x,y) > 0
for all y ∈ Y and x ∈ X , then if PAM occurs in dimensions (yj , xk), j ∈ 1, · · · , Y − 1, there
cannot be PAM in dimension (yY , xk).
25
E(y2|x)
E(y1|x)
Q⊤(x′ + a)
E(y|x = x′)Low x′1
Q⊤(x′′ + a)
E(y|x = x′′)
High x′′1
Figure 3: Sorting Across Job Dimensions
Corollary 2 implies that if PAM occurs in dimensions (yj , xk), j ∈ 1, · · · , Y − 1, and if
there is sorting in dimension (yY , xk) — i.e. if ∂HY (y|x)/∂xk has a constant sign across the
entire support of yY — then sorting in this last dimension can only be negative.25 Note that
the requirement that production be increasing in all job attributes yj is essential for Corollary
2: if instead qj(x) < 0 for some j, then any sorting pattern (PAM, NAM, or neither) can arise
in the residual dimension (yY , xk) (example available on request).
Corollary 2 echoes Corollary 1 but focusses on the job attribute dimensions (instead of the
skill dimensions): sorting cannot be simultaneously positive between a given skill and all job
attributes. Once again there are sorting trade-offs. Agents need to decide which dimension of
job attributes to ‘sacrifice’, and base that decision on the relative strength of complementarities
in the technology. We provide conditions under which PAM arises between a skill and all job
attributes except for the pair that is characterized by the weakest complementarities, where
forces push towards NAM.
8 Numerical Application
In this final section, we explore the usefulness of our theoretical framework for applied work.
We have two main objectives. First, based on our theory, we develop an empirical test of the
assumption that job and worker heterogeneity can be captured by scalar indexes, i.e. that job25A stronger result than Corollary 2 holds in the two-dimensional case Y = 2. In this case, it follows directly
from Theorem 1 that, under Assumption 1 and with σ(x,y) > 0 for all y ∈ Y, if PAM arises in dimension(y1, xk), then there must be NAM in dimension (y2, xk), and vice versa. This is the case depicted on Figure 3.
26
and worker heterogeneity are both one-dimensional. Second, we highlight some of the potential
errors in the analysis of sorting and mismatch caused by imposing the 1D assumption in cases
when it is not justified (i.e. when our test indicates that this restriction is violated).
For the purpose of this exercise, we proceed in a controlled environment so that we know
the underlying data generating process (DGP).26 We simulate an economy based on our multi-
dimensional model, which we consider to be the true DGP. We then fit a deliberately misspecified
1D model to these data, based upon which we implement our test and analyze potential errors
regarding sorting and mismatch. We begin with a discussion of identification and estimation of
the 1D model before advancing to our two main exercises.
8.1 Estimation of a Misspecified 1D Model
8.1.1 The Model
The 1D model that we consider in this section is identical to our multi-dimensional model except
for two differences: we restrict heterogeneity to be 1D, X = Y = 1, and we do not impose any
functional form restriction on the flow surplus function σ. Importantly, we do not assume
monotonicity of σ in x and y, as this would limit the ability of the 1D model to produce sorting.
We only make the following identifying assumptions:
Assumption 3. The flow surplus function σ has the following properties:
1. x → maxy∈Y σ(x, y) is well-defined and strictly increasing in x.
2. y → maxx∈X σ(x, y) is well-defined and strictly increasing in y.
Under those assumptions, the 1D model can in principle generate sorting since we do not
assume monotonicity of σ in y. As we show below (and as has been established in the literature
for more sophisticated versions of the same model), this model is non-parametrically identified up
to strictly increasing transforms of x and y. We therefore normalize worker and firm attributes
by specifying the model in terms of their ranks:
Assumption 4.
1. x is uniformly distributed over X = [0, 1] in the population of workers.
2. y is uniformly distributed over Y = [0, 1] in the population of firms.26The natural first step before going to real data is to show on simulated data that our new test works.
27
Below we will estimate the (misspecified) 1D model following the literature on identifying
sorting on unobserved (to economists) scalar attributes of workers and firms.
8.1.2 Identification and Estimation
The simulated sample on which we estimate the misspecified 1D model is a panel of N workers,
indexed i ∈ 1, · · · , N, sorting themselves into M firms, j ∈ 1, · · · ,M. Time is discretized
for the purposes of simulation, and workers are followed over T periods, t ∈ 1, · · · , T. A
typical observation is described as a vector Jit, σi,Jit,t, where Jit ∈ 1, · · · ,M is the identity
of the worker’s employer at date t (we further normalize Jit = 0 if worker i is unemployed at
date t so that Jit also indicates the worker’s employment status), and σi,Jit,t is the flow surplus
achieved in the match between worker i and firm Jit (missing when Jit = 0).27
The steps below establish identification of the 1D model based on this sample. The identifi-
cation proof is constructive, and therefore also provides a practical estimation protocol.
Worker types. The maximum attainable surplus for a type-x worker is maxy∈Y σ(x, y) in this
model which, by Assumption 3, is a strictly increasing function of x. Any worker i’s type xi can
thus be estimated as:
xi = QW (max σi,Jit,t : t = 1, · · · , T)
where QW is the quantile function in the population of workers.
Firm types and surplus. The flow surplus σi,Jit,t generated by a match between worker i and
firm Jit equals σ (xi, yJit). Flow surplus is thus observed for any viable match involving any firm
j in the sample, i.e. for matches between firm j and any type-x worker such that σ(x, yj) ≥ 0:
σ(x, yj) = mean σi,Jit,t : xi = x, Jit = j, t = 1, · · · , T .27The assumption that the flow surplus is observed in the data is obviously a shortcut. What is typically
observed in practice are wages, not match output or surplus. However, in the context of the family of searchmodels considered in this paper, it has been shown elsewhere in the literature that the flow output from a matchbetween a worker i and a firm j is identified from the maximum wage earned in firm j by workers with the sametype as i, while unemployment income is identified from the wages earned by workers hired from unemployment— see for example Lamadon et al. (2015) for details. Because these particular identification issues are peripheralto the question addressed in this section (namely the distinction between one- vs multidimensional heterogeneity),we bypass them by assuming that flow surplus is directly observed.
28
Knowledge of σ(x, yj) then allows estimation of any firm j’s type, yj (by Assumption 3):
yj = QF
(max
σ(x, yj) : x ∈ [0, 1]
)
where QF is the quantile function in the population of firms.
At this stage we have estimates (xi, yj) of the types of every worker and firm in the sample,
as well as estimates σ(xi, yj) of the surplus of every viable match. Together those allow for
non-parametric estimation of σ over the set of viable type-(x, y) matches. They also allow for
the construction of the equilibrium distribution of employer types y conditional on employed
workers of any given type x, which in turn permits the analysis of equilibrium sorting.
Sampling distribution and offer arrival rates. A type-x worker exiting unemployment
draws his employer type from the conditional density γ(y)/γ y′ : σ(x, y′) ≥ 0. Moreover, the
job finding rate of any unemployed worker of type x is λ0γ y′ : σ(x, y′) ≥ 0. Combining those
two properties, one obtains the following estimator of λ0γ(yj) for any employer j in the sample:
λ0γ(yj) = Pr Jit = j | xi = x, ei,t−1 = 0, eit = 1 × JF (xi)
where JF (xi) is the empirical unemployment exit rate of type-xi workers. Note that the above
estimator is valid for any worker type in the sample. This, together with the constraint that γ
must integrate to 1, affords estimates of the sampling distribution γ and the unemployed offer
arrival rate λ0.
Finally, the probability of an employer change occurring for a type-x in a type-y firm is
λ1γ y′ : σ(x, y′) > σ(x, y) for all x ∈ [0, 1]. The term γ y′ : σ(x, y′) > σ(x, y) is known
from previous steps, allowing us to construct the following estimator of λ1:
λ1 = mean
Pr Jit = Ji,t−1 | xi = x, Ji,t−1 = j, ei,t−1 = eit = 1
γy′ : σ(xi, y′) > σ(xi, yj)
.
8.1.3 Parameterization and Basic Estimation Results
All of the examples shown below are based on simulations of 100,000 workers with two dimensions
of skills and (monthly) parameter values of λ0 = 0.3, λ1 = 0.1, and δ = 0.025.28 The population
distribution of skills, L, is Gaussian with mean (0, 0) and correlation corrℓ(x1, x2) = −0.5,28All samples are simulated for 480 months (40 years), after a 100-year burn-in period to reach steady-state.
29
Specification: 0 (1D model) 1 2 3
Q
(1 00 0
) (1 11 1
) (4.5 21.5 4
) (4.5 −2−1.5 4
)
a (1, 1)⊤
b (0, 0)⊤
Table 1: Examples of Technologies
truncated over [0, 1]2. The sampling distribution of job attributes, Γ, is also Gaussian with
mean (1, 1) and correlation corrΓ(y1, y2) = 0.33, truncated over [1, 2]2.
We consider four different examples of the bilinear technology (parameters Q, a and b),
shown in Table 1. All parameterizations have a = (1, 1)⊤ and b = (0, 0)⊤, implying that
unemployed workers accept all job offers and there is no sorting on the UE margin.
In Example 0, the DGP is truly one-dimensional: the technology described by the Q matrix
in this example implies that p(x,y) = x1 · (y1 + 1), meaning that only dimension 1 of worker
and firm heterogeneity matters for production. We will use this example to check that our
estimation procedure returns the true parameter values when the model is correctly specified
and to illustrate the validity of our proposed tests of the single-index assumption.
Examples 1, 2, and 3 are all two-dimensional models. In those examples the three different
Q-matrices are designed to feature different patterns of complementarities that, according to
Theorem 1, produce certain sorting patterns. Specifically, Example 1 is the case of no sort-
ing, within or between dimensions. Both Example 2 and 3 have positive sorting within tasks,
and negative between tasks but differ in the complementarity structure. While the technology
in Example 2 features worker-job complementarities both within and between dimensions, in
Example 3, there are complementarities within but substitutabilities between dimensions.
We next fit a model with 1D heterogeneity to the simulated data generated from our four
examples, where we follow the identification protocol explained above. In the cases of Examples
1, 2 and 3, the 1D fitted model is misspecified, as the true DGP is 2D. Figures 4 and 5 show
contour plots of the estimated scalar attributes x and y as functions of the underlying true
(vector) attributes x and y, respectively.
Panel (a) on each figure relates to Example 0, the 1D model. The “iso-x”-curves (the contour
lines on the graph) on Figure 4(a) are vertical lines in the (x1, x2)-plane: as expected, the
estimated x coincides with the true x1 in this correctly specified 1D model. Figure 5(a) shows
30
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
(a) Example 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
(b) Example 1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
(c) Example 2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
(d) Example 3
Figure 4: True and Estimated Worker Types
1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 21
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
(a) Example 0
1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 21
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
(b) Example 1
1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 21
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
(c) Example 2
1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 21
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
(d) Example 3
Figure 5: True and Estimated Firm Types
that the same applies to firm attributes, albeit with slightly more estimation noise.
Panels (b) relate to Example 1, the case with no sorting in the ‘real’ 2D world. The contour
lines on Figure 4(b) roughly coincide with lines of slope −1 in the (x1, x2)-plane, as was intuitively
31
(a) Example 0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
6
7
8
9
10
11
12
13
14
15
(b) Example 1
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
20
25
30
35
40
45
(c) Example 2
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
8
10
12
14
16
18
(d) Example 3
Figure 6: Estimated Surplus Function
expected given the underlying symmetric technology of Example 1. Figure 5(b) suggests the
same pattern for firm attributes, but again with slightly more noise. In turn, panels (c) and (d)
on Figures 4 and 5 plot these results for Examples 2 and 3 from Table 1 that feature varying
complementarities across dimensions. It is no longer true in those examples that the “iso-type”
curves have slope −1 in the (x1, x2) and (y1, y2)-planes. For instance, in Example 2, the “iso-x”
curves are steeper than in Example 1. This means that in the 1D estimation more weight is
put on skill x1 than skill x2, possibly owing to the relatively stronger complementarities within
dimension 1 compared to dimension 2 (see the true technology Q in Example 2). A similar
pattern arises in Example 3, here more pronounced for the estimates of firm types, Figure 5(d).
Figure 6 analyzes the estimated 1D surplus. Figure 6(a), which again pertains to Example
0, plots the estimated surplus σ as a function of the true surplus σ, as a check of the precision of
our estimation procedure. The figure shows that, despite some estimation noise, the estimated σ
is reasonably close to the true σ in this correctly specified model. Finally, Figures 6(b), (c) and
(d) plot the estimated flow surplus function σ for Examples 1, 2 and 3 as functions of estimated
scalar types (x, y). They show that the estimated surplus function increases in x and y (up to
some estimation noise) in all three examples, which implies that no sorting should occur in any
of these examples, were the 1D model correctly specified.
32
We now move on to our two main exercises: first, we design and apply a test of the single-
index assumption to our simulated data to check the validity of the 1D-approximation. Second,
we ask about the consequences of imposing the 1D-assumption in cases that fail our test.
8.2 A Test of the Single Index Restriction
With few exceptions, the random search literature on sorting and mismatch has operated under
the assumption that firm and worker heterogeneity can both be captured by scalar indexes —
what we refer to as the single index restriction. We now use our theory to develop a test of that
restriction. We then show on our simulated data that it correctly reveals misspecification.
8.2.1 Principles
The test is based on a simple observation that holds in the 1D (as well as in the multi-D) version
of our model: Two workers with equal skill types employed in jobs with the same attributes
make identical job mobility decisions, i.e. have equal job transition probabilities and equal job
acceptance sets. Therefore, if the 1D model is correctly specified, then any two workers with the
same estimated 1D skill index x that are currently employed in two jobs with the same estimated
1D job attribute y should have the same job switching probabilities and job acceptance sets.
They should climb the same job ladder. However, if the true data is not 1D, this prediction is
generally not true, revealing misspecification.
We further illustrate the principle of our test on Figure 7, which is based the parameterization
in Example 1. Figure 7(a) shows two workers with true skill bundles (x1, 0) and (0, x2), but
whose estimated 1D skill indexes x are the same (as the two true skill bundles are on the same
“iso-x” line). Figure 7(b) further depicts a situation where those two workers are employed in
jobs with different true attributes y, but equal estimated 1D attributes y on the same ‘iso-y” line.
Thus, from the perspective of the 1D model, those two workers look exactly the same, even
though in reality they have different skill bundles and work in different jobs. If the 1D model
were correctly specified, those two workers should have equal job-switching probabilities and job
acceptance sets. Yet in reality, they differ on both counts: as the figure shows, worker 1, whose
true skill bundle is (x1, 0), will accept all jobs with a higher y1 than his current one, while worker
2, whose true skill bundle is (0, x2), will only take jobs with higher y2. This shows that the two
workers will have different job acceptance sets, and therefore generically different job-switching
33
probabilities.29 The reason is that the underlying 2D-type workers climb different job ladders.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
(a) Two workers with equal x
1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 21
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
(b) Acceptance sets
Figure 7: Illustration of the Single Index Test
Because acceptance sets are difficult to work with in practice, we implement our test based
on job-switching probabilities. In the 2D model, those equal λ1F σ|x (σ(x,y)) for a type-x worker
employed in a type-y job. In the 1D world, the job-switching probability is λ1F σ|x (σ(x, y)).
Either way, the probability of a worker switching jobs only depends on the worker’s type and
the surplus of the worker’s current match. Based on those observations, we implement our
single-index test in two ways.
8.2.2 Implementation
Observed Multi-D Attributes. If true 2D worker and job attributes, x and y, are ob-
servable, we implement our test by regressing a job-switching indicator on σ(x, y), (a flexible
polynomial in) x, and x1, x2, y1, y2 (or a subset of those latter four variables), in a cross-section
sample of workers. If the 1D model were correctly specified, the coefficients on all variables other
than σ(x, y) and x should equal zero, while the coefficient on σ(x, y) should be negative and
statistically significant. Conversely, finding that any of the true two-dimensional attributes x1,
x2, y1 or y2 is a significant determinant of the job-switching probability conditional on σ(x, y)
and x, would be inconsistent with the 1D restriction.30
Table 2 shows results from our test, run on each of our four simulated examples. In each case,29It also shows that both workers will accept some offers with lower estimated y than they currently have,
and reject some higher y’s. This is also at odds with this simple version of the 1D model. We do not see thisas a strong rejection of the single index restriction, though, as even the 1D model could easily be extended toaccommodate reallocation down the job ladder by allowing for exogenous reallocation shocks. We briefly discussthis issue below.
30An alternative specification of the same test would consist of regressing the job-switching indicator on x, yand (x1, x2, y1, y2) (or a subset of those latter four variables). The results lead to the same conclusions.
34
we first regress a job-switching indicator on σ(x, y) and x. Second, we regress the job-switching
indicator on σ(x, y), x and multi-D attributes x and y.31 The test results are as expected. The
one-dimensional Example 0 passes the test, while all other three examples — whose DGP is two-
dimensional — fail it (at least one, and usually all of the “additional regressors” are estimated
to be significant determinants of job switching).
Unobserved Multi-D Attributes. Implementing the test described in the previous para-
graph requires that one observe at least some of the multi-D worker and job attributes to include
them in the set of “additional regressors”. This may not always be the case. We therefore present
an alternative test in Table 3, which only relies on estimated 1D attributes, x and y.
This alternative test consists of regressing a job-switching indicator on σ(x, y), (a flexible
polynomial in) x, and moments (in this case, the mean and standard deviation) of the individual-
level distributions of past estimated 1D job types. If the 1D model is correctly specified, then
the history of estimated job types should not add any relevant information on the occurrence of
an employer change conditional on σ(x, y) and x. Conversely, if the DGP is really 2D, then the
history of estimated job types y from the misspecified 1D model will convey information about
the worker’s true skill type x and true current job type y, over and above what the current job’s
estimated y conveys. The worker’s job history would thus impact his job-switching probability.
Table 3 shows that, as expected, the one-dimensional Example 0 passes the test. But all
other three examples fail it, as at least one, and usually both of the “additional regressors” are
estimated to be significant determinants of job-switching. This suggests that this alternative
test also correctly detects model misspecification.
As a consistency check, we also test the validity of the 2D assumption in our simulated data.
We apply the second version of our test (i.e. regressing a job-switching indicator on the true flow
surplus σ(x,y), the true two-dimensional skill type x and the mean and standard deviations
of past true job types y). If the 2D assumption is valid, past job types should not provide
any information on job switching conditional on current flow surplus σ(x,y) and worker type x.
Table 4 contains the results, showing that the 2D assumption is not violated in our 2D examples.31In Example 0 (the truly 1D model), x1 and y1 are excluded from the list of additional regressors because
they are the true 1D worker and job attributes, and are therefore equal (up to estimation error) to x and y.
35
Specification: 0 1
σ −0.058[−0.061,−0.056]
−0.058[−0.061,−0.056]
−0.019[−0.020,−0.018]
+0.002[−0.003,−0.006]
x +0.104[0.099,0.109]
+0.104[0.099,0.109]
+0.084[0.079,0.089]
+0.021[0.007,0.034]
x1 — — — −0.019[−0.036,−0.002]
x2 — −0.001[−0.005,0.002]
— −0.022[−0.039,−0.004]
y1 — — — −0.064[−0.079,−0.049]
y2 — +0.001[−0.002,0.004]
— −0.066[−0.081,−0.051]
Specification: 2 3
σ −0.007[−0.007,−0.007]
+0.001[−0.001,−0.002]
−0.016[−0.016,−0.015]
−0.006[−0.010,−0.003]
x +0.090[0.085,0.096]
+0.017[−0.000,0.034]
+0.089[0.083,0.094]
+0.030[0.014,0.046]
x1 — −0.019[−0.033,−0.006]
— +0.010[−0.005,0.024]
x2 — −0.018[−0.030,−0.007]
— +0.003[−0.011,0.016]
y1 — −0.069[−0.082,−0.055]
— −0.047[−0.062,−0.032]
y2 — −0.069[−0.082,−0.055]
— −0.024[−0.034,−0.014]
Note: Dependent variable is a job-switching indicator. All regressions include a constant.95% confidence intervals are reported in small type under coefficients.
Table 2: Test of the 1D Restriction
Specification: 0 1
σ −0.058[−0.061,−0.056]
−0.058[−0.061,−0.056]
−0.019[−0.020,−0.018]
−0.019[−0.020,−0.019]
x +0.104[0.099,0.109]
+0.104[0.099,0.109]
+0.084[0.079,0.089]
+0.084[0.079,0.089]
Mean of past y — +0.005[−0.006,0.016]
— +0.017[0.006,0.027]
SD of past y — +0.015[−0.003,0.033]
— +0.033[0.015,0.052]
Specification: 2 3
σ −0.007[−0.007,−0.007]
−0.007[−0.007,−0.007]
−0.016[−0.016,−0.015]
−0.017[−0.017,−0.015]
x +0.090[0.085,0.096]
+0.091[0.086,0.096]
+0.089[0.083,0.094]
+0.089[0.084,0.094]
Mean of past y — +0.021[0.010,0.017]
— +0.017[0.006,0.027]
SD of past y — +0.036[0.017,0.055]
— +0.029[0.010,0.049]
Note: Dependent variable is a job-switching indicator. All regressions include a constant.95% confidence intervals are reported in small type under coefficients.
Table 3: Test of the 1D Restriction
36
Specification: 1 2 3
σ −0.020[−0.021,−0.019]
−0.020[−0.021,−0.019]
−0.007[−0.007,−0.007]
−0.007[−0.007,−0.007]
−0.016[−0.017,−0.015]
−0.016[−0.017,−0.016]
x1 +0.069[0.064,0.073]
+0.070[0.065,0.074]
+0.078[0.073,0.083]
+0.079[0.074,0.084]
+0.075[0.070,0.079]
+0.076[0.071,0.081]
x2 +0.066[0.062,0.071]
+0.067[0.063,0.072]
+0.064[0.059,0.068]
+0.064[0.060,0.069]
+0.065[0.061,0.067]
+0.065[0.061,0.070]
Mean of past y1 — +0.012[0.001,0.023]
— +0.008[−0.004,0.019]
+0.008[−0.003,0.019]
Mean of past y2 — +0.009[−0.002,0.020]
— +0.012[0.000,0.023]
+0.008[−0.002,0.019]
SD of past y1 — +0.004[−0.022,0.030]
— −0.001[−0.028,0.027]
−0.006[−0.031,0.012]
SD of past y2 — +0.009[−0.017,0.035]
— +0.012[−0.016,0.040]
+0.018[−0.009,0.044]
Note: Dependent variable is a job-switching indicator. All regressions include a constant.95% confidence intervals are reported in small type under coefficients.
Table 4: Test of the 2D Restriction
8.2.3 Additional Remarks
It is important to note that our tests rely on within-worker predictions of our model and, as
such, do not require the 1D production function to be monotone — i.e., they do not require
identical 1D job ladders across worker types.
Also, as mentioned above (footnote 29), the analysis in this section is relevant to a common
practice in the literature, which is to include exogenous job-to-job reallocation shocks (sometimes
referred to as ‘Godfather shocks’) to match the observed share of job-to-job transitions down
the job ladder. According to our theory, what appear to be moves down the 1D job ladder may
in fact be moves up the 2D job ladder. We leave it for future research to study the share of
downward moves explained by misspecification of worker and job heterogeneity.
Finally, yet another alternative version of our test consists of proxying the job acceptance
set by the expected accepted job type after a job switch (instead of the job-switching indicator
used above). The results are qualitatively similar as those above and available on request.
8.3 Consequences of the Single Index Restriction
We now turn to the consequences for sorting and mismatch of imposing the single index assump-
tion when the test indicates that it is violated.
8.3.1 Sorting
We first examine the sorting patterns in the true data. They are in line with the model’s
predictions and reported in the middle column of Table 5, which shows regressions of each yj ,
37
j = 1, 2, on both x1 and x2 in the cross-section of employed workers, for each of our examples.32
Example 0 (1D model) does not feature any sorting due to the monotone technology and the two-
dimensional regressions correctly reflect that, as worker and job attributes are not correlated.
Example 1 is a 2D model where the single-crossing condition is not satisfied and thus there is no
sorting, as shown in the second panel of the table. The technology in Examples 2 and 3 satisfies
the conditions for PAM, and this is picked up in the regressions in the third and fourth panel,
which show a positive within but negative between-correlation of worker and job attributes.
We then examine the predictions of the 1D model in terms of sorting between the estimated
(scalar) job and worker attributes, denoted by y and x. To this end, we simply regress y on x
in a cross section of job-worker matches, as we did in each dimension of heterogeneity for the
correctly specified 2D models. The results are in the third column of Table 5 and suggest that
relying on the misspecified 1D model for inference on sorting patterns is misleading (except for
Example 0 which is 1D and thus not misspecified; it correctly predicts ‘no sorting’). In Example
1, in which there is truly no sorting in any pair (xk, yj) in the data, the 1D model predicts
positive sorting: the coefficient in the regression of y on x is positive and statistically significant.
Similar results hold for Example 2: the 1D model predicts a positive correlation between y and
x, whereas the true sorting patterns are positive within skill dimensions and negative between.
In Example 3, however, where the true 2D model features in fact the strongest positive sorting
within heterogeneity dimensions (as measured by the regression coefficients in the fourth panel,
second column of Table 5), the 1D model fails to predict any clear sorting pattern: the regression
coefficient of y on x is very small and not statistically significant.
These results suggest that the 1D sorting estimates are largely uninformative about the true
underlying sorting patterns. Worse, relying on a 1D model to assess sorting when the true data
is 2D can be misleading: remember that, because the 1D surplus function was estimated to
be increasing in y (Figure 6), no sorting should occur in equilibrium between y and x if agents
actually behaved as per the 1D model. The fact that y and x are found to be positively correlated
in Examples 1 and 2 should therefore alert the researcher to some form of model misspecification.
In Example 3, however, things are even worse: The 1D estimates look perfectly consistent with
the estimated surplus function (y and x are uncorrelated), even though they suggest very different
sorting patterns than the true model. This highlights the importance of testing the validity of32Note that this is not a test of FOSD of the matching distributions in worker skills but FOSD implies the
monotonicity of the conditional means that we report here.
38
Specification: True 2D model Misspecified 1D model
0E(y1|x) = 1.751 +0.004
[−0.002,0.010]x1 −0.000
[−0.006,0.006]x2
E(y2|x) = 1.512 +0.003[−0.003,0.010]
x1 −0.003[−0.010,0.003]
x2E(y|x) = 0.736 −0.001
[−0.007,0.005]x
1E(y1|x) = 1.688 −0.005
[−0.011,0.002]x1 +0.000
[−0.006,0.006]x2
E(y2|x) = 1.686 +0.000[−0.006,0.006]
x1 −0.001[−0.007,0.005]
x2E(y|x) = 0.670 +0.010
[0.004,0.016]x
2E(y1|x) = 1.687 +0.024
[0.018,0.030]x1 −0.027
[−0.033,−0.020]x2
E(y2|x) = 1.686 −0.028[−0.035,−0.022]
x1 +0.026[0.020,0.032]
x2E(y|x) = 0.671 +0.009
[0.003,0.015]x
3E(y1|x) = 1.709 +0.085
[0.079,0.092]x1 −0.094
[−0.100,−0.088]x2
E(y2|x) = 1.650 −0.130[−0.137,−0.124]
x1 +0.127[0.120,0.133]
x2E(y|x) = 0.667 +0.003
[−0.003,0.009]x
Note: 95% confidence intervals are reported in small type under coefficients.
Table 5: Sorting Patterns
the single index assumption separately, as we have done above.
8.3.2 Mismatch
Labor market sorting matters for at least two reasons. First, allocative efficiency requires certain
sorting patterns governed by the technology. Second, positive sorting is a driver of inequality.
In what follows, we will focus on efficiency. Our objective is to asses the efficiency loss from
mismatch and to gauge the magnitude of the welfare loss/gain from implementing the ‘optimal’
allocation suggested by the misspecified 1D model in a world where heterogeneity is really 2D.
In this exercise, we take the populations of workers and active jobs as given and ask to what
extent the equilibrium can be improved by eliminating search frictions (i.e. letting λ0 → ∞
and λ1 → ∞) and implementing the worker-job allocation from the frictionless benchmark. By
focusing on the reallocation of workers into existing jobs, we take a short-term perspective. This
is a common approach in the search literature, which circumvents the issue that, absent capacity
constraints, allowing for free entry and exit of jobs in the frictionless world would cause the job
type distribution to degenerate, with all workers being employed in the most productive firm.33
We compute the allocations (and surpluses) under the frictionless benchmark in the 1D
and 2D models.34 The results are reported in Table 6. For the 2D model in the top part33See, e.g. Bagger and Lentz (2016) or Hagedorn, Law and Manovskii (2016) for a similar approach.34The allocation and sorting of this frictionless benchmark with fixed multivariate distributions of workers
39
Specification: 1 2 3
True 2D model
E[σ(x,y)] 12.298 36.902 15.446E[σ∗(x,y)] 13.458 40.531 17.234E[σ∗
1D(x,y)] 12.415 37.277 15.550Surplus Loss from Mismatch -0.086 -0.090 -0.104Surplus Gain from 1D Optimal Allocation rel. to 2D Allocation 0.010 0.010 0.007Surplus Loss from 1D Optimal Allocation rel. to 2D Optimum -0.077 -0.080 -0.098
Misspecified 1D model
E[σ(x, y)] 12.209 36.830 15.358E[σ∗(x, y)] 13.482 40.564 17.016Surplus Loss from Mismatch -0.094 -0.092 -0.097
Table 6: Expected Surplus and Mismatch
of the table, the rows display, in this order: average actual surplus under search frictions,
E[σ(x,y)], the average optimal surplus from reallocating workers to jobs in a surplus-maximizing
way when frictions vanish, E[σ∗(x,y)], and the average surplus when implementing the 1D
optimal allocation, E[σ∗1D(x,y)] (here we implement the 1D allocation but then, to compute the
average surplus, take into account the agents’ true 2D types). Based on these statistics, we then
compute the percentage surplus loss from mismatch E[σ(x,y)]−E[σ∗(x,y)]E[σ∗(x,y)] , the percentage surplus
gain/loss from implementing the optimal 1D allocation instead of the actual 2D allocationE[σ∗
1D(x,y)]−E[σ(x,y)]E[σ(x,y)] , and the percentage surplus loss from implementing the optimal 1D allocation
instead of the optimal 2D allocation E[σ∗1D(x,y)]−E[σ∗(x,y)]
E[σ∗(x,y)] (rows 4 to 6). Similar notation is used
for the 1D model in the lower panel of Table 6.
Three findings stand out. First, both models indicate that the surplus loss from mismatch
is sizable in all examples (around 10%). Interpreting the surplus loss as a measure of mismatch,
the 1D model overestimates mismatch in two out of three specifications relative to the 2D model.
Second, implementing the 1D frictionless allocation generates essentially no welfare gain (around
1% or less) compared to the equilibrium 2D frictional allocation. Third, and most striking, the
results indicate that welfare losses from implementing the “wrong” optimal allocation (the one
suggested by the 1D model), relative to implementing the correct 2D one can be substantial
(between 7.7% and 9.8% of aggregate surplus). Policy recommendations based on an analysis
that imposes the single index assumption when it is not valid can thus be severely misguided.
and jobs is analyzed in Lindenlaub (2017): Examples 2 and 3 feature PAM while Example 1 yields a matchingfunction that in not one-to-one. We compute these frictionless allocations numerically using a linear program.
40
9 Conclusion
We analyze sorting in a standard random-search setting, but where both workers and jobs have
multi-dimensional characteristics. We first define notions of multi-dimensional positive and
negative assortative matching (PAM and NAM) in this frictional environment, based on FOSD-
monotonicity of the equilibrium matching distribution. For instance, if more skilled workers
are matched to jobs with stochastically better attributes, we call this PAM. We then provide
conditions on the primitives of this economy under which positive sorting arises in equilibrium,
where we distinguish sorting on the UE and the EE margins. In all the environments we study,
the key restriction on the primitives for PAM to occur between a given worker skill and a given
job attribute is a single-crossing condition on the technology. It guarantees that the skill and
job attribute at hand are sufficiently complementary in production relative to other skills and
job attributes. But contrary to well-known results on one-dimensional sorting, our conditions
for multi-dimensional sorting are generally not distribution-free.
Two main insights emerge from our analysis. First, workers with different skill specializations
look for and accept different kinds of jobs, meaning that they move up and down different job
ladders. This is why sorting arises. Remarkably, multi-dimensional heterogeneity is per se a
source of sorting: in comparable one-dimensional models, all workers share the same ranking of
firms (i.e. face the same job ladder), which prevents sorting. Second, there are sorting trade-
offs, in the sense that positive sorting, when it occurs, will do so along the dimensions of skills
and job attributes that are strong complements in production relative to other dimensions. But
negative sorting tends to arise in the dimension of weakest complementarity.
Our theory has important implications for applied work. We use it as the basis for an empir-
ical test of the commonly used single index assumption which states that both worker and job
heterogeneity are one-dimensional. We then show in a series of simulation exercises that — when
the test fails — approximating workers’ and jobs’ true multi-dimensional characteristics by one-
dimensional indices may lead to quantitatively and even qualitatively mistaken conclusions about
the sign and extent of sorting as well as mismatch. Consequently, policies that are based on a
(misspecified) one-dimensional model and aim to reduce allocative inefficiencies can be misguided
and lead to welfare losses. We provide a theoretical framework and, along with it, a definition
of sorting that can (i) guide the measurement of sorting in the data and (ii) be used as a basis
for policies that aim to improve worker-job allocations when heterogeneity is multi-dimensional.
41
References
[1] Bagger, J. and R. Lentz (2016) “An Empirical Model of Wage Dispersion with Sorting”.
Working Paper.
[2] Becker, G. S. (1973) “A Theory of Marriage: Part I”, Journal of Political Economy, 81(4),
813-46.
[3] Burdett, K. and D. T. Mortensen (1998) “Wage Differentials, Employer Size and Unem-
ployment”, International Economic Review, 39, 257-73.
[4] Cahuc, P., F. Postel-Vinay and J.-M. Robin (2006) “Wage Bargaining with On-the-job
Search: Theory and Evidence”, Econometrica, 74(2), 323-64.
[5] Chade, H. (2006) “Matching with Noise and the Acceptance Curse”, Journal of Economic
Theory, 129, 81-113.
[6] Chade, H., J. Eeckhout and L. Smith (2017) “Sorting Through Search and Matching Models
in Economics”, Journal of Economic Literature, forthcoming.
[7] Guvenen, F., B. Kuruscu, S. Tanaka, and D. Wiczer (2015) “Multidimensional Skill Mis-
match”, manuscript, University of Minnesota.
[8] Eeckhout, J. and P. Kircher (2010) “Sorting and Decentralized Price Competition”, Econo-
metrica, 78, 539-74.
[9] Hagedorn, M., T. Law and I. Manovksii (2014) “Identifying Equilibrium Models of Labor
Market Sorting”, NBER Working Paper No. 18661.
[10] Heckman, J. and Scheinkman, J. (1987) “The Importance of Bundling in a Gorman-
Lancaster Model of Earnings”, Review of Economic Studies, 54(2), 243-255.
[11] Karlin, S. and Y. Rinott (1980) “Classes of Orderings of Measures and Related Correla-
tion Inequalities. I - Multivariate Totally Positive Distributions”, Journal of Multivariate
Analysis, 10, 467-98.
[12] Lamadon, T., J. Lise, C. Meghir and J.-M. Robin (2015) “Matching, Sorting, and Wages”,
manuscript, University of Chicago.
42
[13] Legros, P. and A. F. Newman (2007) “Beauty is a Beast, Frog is a Prince: Assortative
Matching with Nontransferabilities”, Econometrica, 75(4), 1073-102.
[14] Lindenlaub, I. (2017) “Sorting Multidimensional Types: Theory and Application”, Review
of Economic Studies, 84(2): 718-789.
[15] Lise, J., and F. Postel-Vinay (2015) “Multidimensional Skills, Sorting, and Human Capital
Accumulation”, manuscript, University College London.
[16] Mortensen, D. T. and C. A. Pissarides (1994) “Job Creation and Job Destruction in the
Theory of Unemployment”, Review of Economic Studies, 61(3), 397-415.
[17] Moscarini, G. (2001) “Excess Worker Reallocation”, Review of Economic Studies, 69(3),
593-612.
[18] Moscarini, G. and F. Postel-Vinay (2013) “Stochastic Search Equilibrium”, Review of Eco-
nomic Studies, 80(4), 1545-81.
[19] Postel-Vinay, F. and J.-M. Robin (2002) “Equilibrium Wage Dispersion with Worker and
Employer Heterogeneity”, Econometrica, 70(6), 2295-350.
[20] Sanders, C. (2012) “Skill Uncertainty, Skill Accumulation, and Occupational Choice”,
manuscript, Washington University in St. Louis.
[21] Shapiro, A., D. Dentcheva and A. Ruszczyński (2009) Lectures on Stochastic Programming
- Modeling and Theory, MPS-SIAM series on optimization, 9.
[22] Shimer, R., and L. Smith (2000) “Assortative Matching and Search”, Econometrica, 68(2),
343-69.
[23] Smith, L. (2006).“The Marriage Model with Search Frictions," Journal of Political Economy,
114(6), 1124-1146.
[24] Villani, C. (2009) Optimal Transport: Old and New. Springer, Berlin.
[25] Yamaguchi, S. (2012) “Tasks and Heterogeneous Human Capital”, Journal of Labor Eco-
nomics, 30(1), 1-53.
43
APPENDIX
A Generalizations of Theorems 1 and 2
A.1 Monotone Technology with Y ≥ 2
We begin with sufficient conditions for sorting on the EE margin, which was addressed by Theorem 1
for the case of Y = 2 and bilinear technology. The following theorem relaxes both Assumptions 1 and 2,
generalizing Theorem 1 to the case of monotone technologies for Y ≥ 2:
Theorem 5 (EE Sorting, Y ≥ 2, Monotone Technology). If, for all x ∈ X :
1. p(x,y) is three times differentiable in y
2. (a) p(x,y) is strictly increasing in yY (monotonicity)
(b) for all y ∈ Y, limyY →yYp(x,y) < b(x) and limyY →yY
p(x,y) = +∞
(c) for all ℓ ∈ 1, · · · , Y − 1,∂
∂xk
(∂p(x,y)/∂yℓ∂p(x,y)/∂yY
)> 0 (SC-YD)
3. if Y ≥ 3, then for all (i, j) ∈ 1, · · · , Y − 12, i = j, and along all level curves of p(x, ·):
∂p
∂yY
[(∂p
∂yY
)2∂2 ln γ
∂yi∂yj+
∂p
∂yi
∂p
∂yj
∂2 ln γ
∂y2Y− ∂p
∂yj
∂p
∂yY
∂2 ln γ
∂yi∂yY− ∂p
∂yi
∂p
∂yY
∂2 ln γ
∂yj∂yY
]
− ∂ ln γ
∂yY
[(∂p
∂yY
)2∂2p
∂yi∂yj+
∂p
∂yi
∂p
∂yj
∂2p
∂y2Y− ∂p
∂yj
∂p
∂yY
∂2p
∂yi∂yY− ∂p
∂yi
∂p
∂yY
∂2p
∂yj∂yY
]
−(
∂p
∂yY
)2∂3p
∂yi∂yj∂yY+
∂p
∂yi
∂p
∂yY
∂3p
∂yj∂y2Y+
∂p
∂yj
∂p
∂yY
∂3p
∂yi∂y2Y− ∂p
∂yi
∂p
∂yj
∂3p
∂y3Y
+∂p
∂yY
[∂2p
∂yj∂yY
∂2p
∂yi∂yY− ∂2p
∂y2Y
∂2p
∂yi∂yj
]> 0 (EE-YD)
then PAM occurs in all dimensions (yℓ, xk) other than ℓ = Y along the EE margin.
The proof is in Appendix D.7. Conditions 2a (monotonicity) and 2b (yY is an “essential” input) are
technical and ensure that p(x,y) is invertible w.r.t. one of the y’s (which w.l.o.g. we take to be yY ).
Conditions 2a and 2b further ensure that the support of γ keeps a lattice structure under a change of
variables (see proof). Condition 2c is central to the theorem and generalizes the single-crossing condition
(SC-2D) from Theorem 1. It states that there exists a job attribute yY , satisfying Conditions 2a and 2b,
that is among those which are less complementary to xk than all other yℓ.
While Theorem 1 provided distribution-free, necessary and sufficient conditions for EE sorting in the
case of Y = 2, Theorem 5 only provides sufficient conditions that are no longer distribution-free (even if
we assume bilinear technology, see Corollary 4): Condition 3 — or, equivalently, equation (EE-YD) —
restricts both the sampling distribution and its interaction with the production function in a complicated
44
way. In essence, equation (EE-YD) places restrictions on the way the job attributes are associated in
the sampling distribution γ.
The need to restrict the sampling distribution in the case of Y > 2 can be explained as follows.
Because complementarities between skill k with all job attributes ℓ (ℓ = Y ) are stronger than with
attribute Y (Condition 2c), a worker with higher xk would ideally move towards jobs with higher levels
of yℓ for all ℓ = Y (and possibly with lower level of yY ). Yet, if the attributes yℓ are too strongly
negatively associated (and/or there is a positive correlation between yℓ and yY ), then such moves may
not be feasible. The role of condition (EE-YD) is to prevent these distributional obstructions to positive
sorting in dimensions (yℓ, xk), for ℓ = Y .
Theorem 5 is our most general result on EE sorting and is useful because it nests several special
cases of interest. We show in the following corollaries that the sufficient conditions for sorting simplify
considerably for nonlinear, monotone surplus functions with Y = 2 (Corollary 3) as well as bilinear
surplus functions with Y > 2 (Corollary 4).
Corollary 3 (EE Sorting, Y = 2, Monotone Technology). Under Assumption 2 (Y = 2), if for all
x ∈ X , p(x,y) is twice continuously differentiable and quasi-concave in y, strictly increasing in y2 with
miny2∈R p(x, (y1, y2)) < b(x) for all y1 ∈[y1, y1
], and such that:
∂
∂xk
(∂p(x,y)/∂y1∂p(x,y)/∂y2
)> 0
then PAM occurs in the (y1, xk) dimension along the EE margin.
The proof is in Appendix D.8. If we restrict the number of job attributes to two, the sufficient
conditions for EE sorting from Theorem 5 become considerably simpler. Condition (EE-YD) disappears
altogether: the role of that condition was to prevent the sampling distribution from having job attributes
yℓ (ℓ = Y ) associated in a way that countered the force toward PAM in (yℓ, xk) arising from comple-
mentarities. This issue of association between Y − 1 job attributes becomes moot when Y = 2. The
reason is that in this case we focus on conditions for PAM in a single dimension (Y − 1 = 1). Stronger
complementarities between (y1, xk) than between (y2, xk) are then sufficient to ensure PAM in (y1, xk).
This result again highlights that as long as we restrict our attention to two job attributes, we can specify
distribution-free conditions for EE sorting.
Another useful special case on EE sorting that emerges from Theorem 5 is for bilinear production
functions (Assumption 1), which satisfy monotonicity (as, by Assumption 1, qY (x) > 0).
Corollary 4 (EE Sorting, Y > 2, Bilinear Technology). Under Assumption 1, if for all x ∈ X :
1. for all y ∈ Y, limyY →yYp(x,y) < b(x), and yY = +∞
45
2. for all ℓ ∈ 1, · · · , Y − 1,
∂
∂xk
(∂p(x,y)/∂yℓ∂p(x,y)/∂yY
)> 0 or, equivalently:
∂
∂xk
(qℓ(x)
qY (x)
)> 0 (SC-YD)
3. for all (i, j) ∈ 1, · · · , Y − 12, i = j, and along all level curves of p(x, ·):
qY (x)2 ∂
2 ln γ
∂yi∂yj+ qi(x)qj(x)
∂2 ln γ
∂y2Y− qj(x)qY (x)
∂2 ln γ
∂yi∂yY− qi(x)qY (x)
∂2 ln γ
∂yj∂yY> 0 (EE-YD’)
then PAM occurs in all dimensions (yℓ, xk) other than ℓ = Y along the EE margin.
The proof is in Appendix D.9. Condition 1 echoes Condition 2b of Theorem 5 and is of the same tech-
nical nature. Condition 2 parallels the generalized single crossing condition from Theorem 5. In contrast
to Theorem 1 that focuses on sorting under two-dimensional job heterogeneity, the case with general
Y -dimensional heterogeneity requires restrictions on the sampling distribution (Condition 3), which is a
special case of Condition 3 in Theorem 5. This condition is again satisfied if log-supermodularity of γ in
job attribute yi and any other job attribute, yj (i, j = Y ) sufficiently dominates any log-supermodularity
in pairs (yi, yY ) and (yj , yY ) (assuming qi(x) > 0, qj(x) > 0 for all i, j). Like in Theorem 5, this condi-
tion limits distributional barriers to PAM in dimensions (yℓ, xk), ℓ = Y , that could arise from negative
association among the attributes yℓ in γ.
Condition (EE-YD’) is more demanding on γ than condition (UE-2D) since it involves the relative
strength of log-supermodularity in the various dimensions as well as subtle interactions with the tech-
nology. Nevertheless we can show that the set of models satisfying (EE-YD’) is not empty. For instance,
for a (truncated) multivariate normal distribution, (EE-YD’) reads
−qY (x)2Σ−1
ij +Σ−1ji
2 |Σ|− qi(x)qj(x)
Σ−1Y Y
|Σ|+ qj(x)qY (x)
Σ−1iY +Σ−1
Y i
2 |Σ|+ qi(x)qY (x)
Σ−1Y j +Σ−1
jY
2 |Σ|> 0 (7)
where Σ−1ij is element ij of the inverse of the covariance matrix and |Σ| is its determinant. Condition (7)
holds if the correlation between yi and yj is sufficiently strong.35
We finally generalize Theorem 2 on UE sorting to Y -dimensional heterogeneity of job attributes.
Similar to Corollary 4 on EE sorting, in this more general setting sorting requires complex restrictions
on the sampling distribution Γ.
Theorem 6 (UE Sorting, Y > 2, Bilinear Technology). Under Assumption 1 and Conditions 1-3 from
Corollary 4, if for all x ∈ X :
1. qj(x) > 0 for all j ∈ 1, · · · , Y (i.e. p(x,y) is increasing in all job attributes)
35As an example, consider Y = 3 and assume that standard deviations satisfy θi = θj = θ. Then, (7) reads:
(θ4/ |Σ|2)(q23(τ12 − τ13τ23)− q1q2(1− τ2
12)− q2q3(τ13 − τ12τ23)− q1q3(τ23 − τ12τ13))> 0
This inequality holds if, for instance, τ12 = 1 and τ13 = τ23 = 0 (where τij = θij/θiθj = corrΓ(yi, yj)).
46
2. the following condition holds along all level curves of p(x, ·) and for all j = 1, ..., Y − 1:
qY (x)∂2 ln γ
∂yY ∂yj− qj(x)
∂2 ln γ
∂y2Y≥ 0 (UE-YD)
3. denoting the lower support of y by y =(y1, · · · , y
Y
):
Y∑j=1
[qkjqj(x)
−maxj′
qkj′
qj′(x)
]qj(x)
(yj− bj
)≤ 0
then PAM occurs in all dimensions (yℓ, xk) other than ℓ = Y along the UE margin.
The proof is in Appendix D.10. Theorem 6 shows that signing the sorting patterns on the UE margin
is even more involved than signing EE sorting. It requires additional assumptions on the sampling
distribution Γ, similar to those we discussed under two-dimensional heterogeneity of jobs. Condition
(UE-YD) (which again imposes some degree of positive association between job attributes in the sampling
distribution) has the same interpretation as (UE-2D) in Theorem 2 for Y = 2. It prevents distributional
obstructions to PAM that could occur despite sufficient complementarities in production.36 Lastly,
Condition 3 restricts the lower support of the sampling distribution and echoes Condition 3 from Theorem
2, extended to Y -dimensional heterogeneity in job attributes.
A.2 Separable Technology with Y ≥ 2
Arguably the most substantive restriction placed by Theorem 5 on the production technology is strict
monotonicity of p(x,y) w.r.t. at least one job attribute yY . While it covers a wide range of applications, it
excludes some important special cases, such as the popular “bliss point” specification. An example (which
goes back to, at least, Tinbergen, 1956) would be, in the case X = Y , p(x,y) = c0 −∑X
i=1 ci(xi − yi)2,
where the ci’s are strictly positive numbers. In this example, each job has an “ideal” skill bundle given
by y, and output is a decreasing function of the distance between the worker’s skill bundle x and that
ideal skill bundle. This and other related specifications are covered in the following theorem:
Theorem 7 (EE Sorting, Y ≥ 2, Separable Technology). If for all x ∈ X :
1. p(x,y) is continuously differentiable w.r.t. y
2. for a given k and for all ℓ ∈ 2, · · · , Y , ∂2p(x,y)/∂xk∂yℓ = 0 (separability)
3. for all y ∈ RY+, ∂2p(x,y)/∂xk∂y1 > 0
then PAM occurs in the (y1, xk) dimension along the EE margin.
36Note that there can arise a tension between (UE-YD) and (EE-YD’) that is also assumed to hold here.Importantly, however, both conditions can be satisfied simultaneously. E.g., in the case of X = Y = 3 and whereγ is truncated normal, if τ12 is sufficiently high, e.g. τ12 = 1, and τ13 and τ23 are low, e.g. τ13 = τ23 = 0, thencondition (EE-YD’) holds strictly (see Footnote 15) while condition (UE-YD) holds with equality.
47
The proof is in Appendix D.11. The key restriction imposed in Theorem 7 is Condition 2 (separabil-
ity), which states that there are components of the worker’s skill bundle that are only relevant to perform
task 1, i.e. that are neither complement nor substitute with any other task ℓ ∈ 2, · · · , Y . The sign
of sorting on the EE margin between job attribute of task 1 and the skills that are only relevant to task
1 can then be determined: it has the sign of ∂2p(x,y)/∂xk∂y1.37 Assumption 2 is easily satisfied, for
instance, by production functions that only feature within-complementarities of skills and job attributes
but no complementarities across tasks. For example, under the “bliss point” specification mentioned
above (p(x,y) = c0 −∑X
i=1 ci(xi − yi)2), Theorem 7 establishes that there will be positive within-skill
sorting along the EE margin (i.e. H1 (·|x) increases in x1 in the FOSD sense), but says nothing about
between-skill sorting (i.e. it says nothing about the monotonicity of Hℓ (·|x) w.r.t. x1 for ℓ ≥ 2).
Note that the case of separable technologies is special: the characterization of sorting in Theorem
7 is independent of the sampling distribution Γ irrespective of the dimensionality of job heterogeneity.
In particular, the restrictions on the sampling distribution (EE-YD) in Theorem 5 are only needed
if complementarities in (y1, xk) compete with complementarities between xk and other dimensions yℓ,
ℓ = 1. Theorem 5 then guarantees PAM in all but one dimensions (yℓ, xk), ℓ = Y, while Theorem 7 for
separable production functions only ensures sorting in a single dimension (y1, xk).
B Alternative Wage Setting Protocols
In this appendix, we explore generalizations of our results to wage setting protocols other than the pure
Sequential Auction model of Postel-Vinay and Robin (2002). For notational brevity, we first denote the
surplus of a match as S(x,y) = P (x,y) − U(x). Under any wage setting rule, so long as workers and
firms have linear preferences over wages, the surplus and unemployment value functions are defined by:
(ρ+ δ)S(x,y) = p(x,y)− ρU(x) + λ1
∫m(x,y,y′) [Ωm(x,y,y′)− S(x,y)] γ(y′)dy′ (8)
ρU(x) = b(x) + λ0
∫m(x,0Y ,y
′)Ωm(x,0Y ,y′)γ(y′)dy′ (9)
where Ωm(x,y,y′) is the surplus (over the value of unemployed search) achieved by the worker if he
moves from employer y to employer y′ and m(x,y,y′) is the worker’s mobility decision: m(x,y,y′) = 1
if the worker chooses to move from y to y′ and 0 otherwise.
We now consider three alternative wage setting models: sequential auction with worker bargaining
power (Cahuc et al., 2006), fixed-share surplus-splitting (Mortensen and Pissarides, 1994; Moscarini,
2001), and wage posting (Burdett and Mortensen, 1998). In each case, we refer the reader to the
relevant reference for details about the wage setting model at hand.
37Note that, under separability Assumption 2 in Theorem 7, the sign of ∂2p/∂xk∂y1 is the same as the signof ∂
∂xk
(∂p/∂y1∂p/∂yj′
), 1 = j′, which is our single-crossing condition. In that sense, the characterization of sorting in
Theorem 7 parallels the characterization provided by Theorems 1 and 5 under different assumptions on p.
48
B.1 Sequential Auctions with Worker Bargaining Power (Cahuc et al., 2006)
In this case, Ωm(x,y,y′) = S(x,y)+β [S(x,y′)− S(x,y)] and the mobility decision rule is m(x,y,y′) =
1 S(x,y′) > S(x,y). Thus:
ρU(x) = b(x) + λ0β
∫1 S(x,y′) ≥ 0S(x,y′)γ(y′)dy′ (10)
= b(x) + λ0β
∫ +∞
0
sdFS|x(s) = b(x) + λ0β
∫ +∞
0
FS|x(s)ds
where FS|x(s) =∫1 S(x,y′) ≤ s γ(y′)dy′ is the sampling cdf of match surplus faced by a type-x
worker, and where the last equality above obtains from integration by parts. Following similar steps:
(ρ+ δ)S(x,y) = p(x,y)− ρU(x) + λ1β
∫ +∞
S(x,y)
FS|x(s)ds
Substituting (10) into the latter equation:
(ρ+ δ)S(x,y) = p(x,y)−[b(x) + (λ0 − λ1)β
∫ +∞
0
FS|x(s)ds
]− λ1β
∫ S(x,y)
0
FS|x(s)ds
Letting b(x) = b(x) + (λ0 − λ1)β∫ +∞0
FS|x(s)ds and σ(x,y) = p(x,y)− b(x), we thus have:
(ρ+ δ)S(x,y) = σ(x,y)− λ1β
∫ S(x,y)
0
FS|x(s)ds (11)
As is clear from (11), S(x,y) only depends on y through σ(x,y) (and in a differentiable way). With a
slight abuse of notation, we can thus define dS/dσ which, from (11), is given by:
dS(x,y)
dσ(x,y)=
1
ρ+ δ + λ1βFS|x (S(x,y))> 0.
This proves that, for any y1 = y2, S(x,y2) > S(x,y1) ⇐⇒ σ(x,y2) > σ(x,y1). Moreover, it is
also clear from (11) that S(x,y) = 0 ⇐⇒ σ(x,y) = 0. Hence, the job acceptance rule is equivalent
to m(x,y,y′) = 1 σ(x,y′) > σ(x,y) for employed workers and m(x,0Y ,y′) = 1 σ(x,y′) > 0 for
unemployed workers: the acceptance rule is the same as in the pure sequential auction case seen in the
main text, up to the redefinition of σ(x,y) from p(x,y)− b(x) to p(x,y)− b(x).
Crucially, this redefinition of σ preserves monotonicity and linearity of σ w.r.t. y, as well as super-
modularity w.r.t. (x,y). Therefore, any result relying on those properties alone will continue to hold in
this modified model. We take stock of what those are in the last paragraph of this appendix.
B.2 Fixed-Share Surplus-Splitting (Moscarini, 2001)
In this case, Ωm(x,y,y′) = βS(x,y′). But also, the worker’s value in the incumbent match is βS(x,y),
implying that the mobility decision rule is again m(x,y,y′) = 1 S(x,y′) > S(x,y). We thus still have
49
ρU(x) = b(x) + λ0β∫ +∞0
FS|x(s)ds, and now:
(ρ+ δ)S(x,y) = p(x,y)− ρU(x) + λ1
∫1 S(x,y′) > S(x,y) [βS(x,y′)− S(x,y)] γ(y′)dy′
= p(x,y)− ρU(x) + λ1
∫ +∞
S(x,y)
[βs− S(x,y)] dFS|x(s)
⇐⇒[ρ+ δ+λ1(1− β)FS|x (S(x,y))
]S(x,y) = p(x,y)− ρU(x) + λ1β
∫ +∞
S(x,y)
FS|x(s)ds
Substituting U(x):
[ρ+ δ + λ1(1− β)FS|x (S(x,y))
]S(x,y) = σ(x,y)− λ1β
∫ S(x,y)
0
FS|x(s)ds (12)
where b(x) = b(x)+ (λ0 −λ1)β∫ +∞0
FS|x(s)ds and σ(x,y) = p(x,y)− b(x) are defined as before. Then,
using a similar argument as in the Cahuc et al. (2006) case:
(ρ+ δ + λ1FS|x (S(x,y))− λ1(1− β)S(x,y)fS|x (S(x,y))
)dS(x,y) = dσ(x,y)
Thus, a necessary and sufficient condition for S(x,y) to be in a one-to-one increasing relationship with
σ(x,y) is λ1(1− β)SfS|x(S) ≤ ρ+ δ + λ1FS|x(S), which is to say:
λ1(1− β)S
∫1 S(x,y′) = S γ(y′)dy′ ≤ ρ+ δ + λ1
∫1 S(x,y′) ≥ S γ(y′)dy′. (13)
This is an unwieldy condition on γ, but still a condition on the primitives.38 Note that the reason we
need this condition is that, under this particular rent-sharing rule, a share 1− β of the surplus from the
initial match is lost to a third party (the new employer) when the worker changes jobs. More precisely,
when the worker changes jobs, the initial firm-worker collective “gains” β [S(x,y′)− S(x,y)] (a share
β of the rent supplement brought about by the new match, although all of these gains accrue to the
workers), and “loses” (1 − β)S(x,y) to the new employer. Thus, if there is a high concentration of
potential matches with equal surplus (if fS|x (S(x,y)) is large), the initial match stands to lose a lot
and gain very little in case the worker accepts an outside offer. As a result, the surplus may be higher
in a slightly less productive match but with fewer similar potential matches available in the economy.
Condition (13) prevents that from happening.
B.3 Wage Posting (Burdett and Mortensen, 1998)
Assuming that firms post wages and are allowed to make offers contingent on worker type x, the model
becomes one of segmented wage-posting markets (one market for each x), where workers are homogeneous
38Note that a sufficient condition for (13) is that the hazard rate of FS|x be decreasing fast enough, specificallySfS|x(S)/FS|x(S) ≤ ρ+δ+λ1
λ1(1−β)
50
within each market and firms in market x are heterogeneous with (scalar) productivity p(x,y). We then
know from Burdett and Mortensen (1998) — or indeed from standard monotone comparative statics —
that firms with higher p(x,y) will post higher wages for type-x workers (and thus offer higher values to
those workers).
Adjusting the notation from Burdett and Mortensen (1998), posted wages are given by:
w(x,y) = p(x,y)−[δ + λ1F p|x (p(x,y))
]2∫ p(x,y)
b(x)
dt[δ + λ1F p|x(t)
]2 + C(x)
where Fp|x is the sampling distribution of match productivity conditional on x, b(x) is the lowest pro-
ductivity amongst viable matches on the market for type-x workers, and C(x) is the profit of the least
productive match employing a type-x worker.39 Again defining σ(x,y) = p(x,y) − b(x), it is straight-
forward to check that w(x,y) is a strictly increasing function of σ(x,y), so that employed workers move
up the σ(x,y)-ladder. Moreover, by construction, unemployed workers accept offers iff σ(x,y) ≥ 0.
B.4 Taking Stock
In all the cases reviewed above, worker mobility is governed by comparisons of σ(x,y) = p(x,y)− b(x),
where b(x) is a (potentially very complex) function of x only. So any theorem that only uses monotonicity
of σ in y, linearity of σ in y, or supermodularity σ in (x,y) goes through. What fails to go through,
though, is any property that relies on the linearity of σ in x: even when assuming linearity in x of p
and b, the function b(x) is generically nonlinear. What this means in terms of the results in this paper
is that the only properties that are specific to the pure sequential auction model are Theorems 3 and 4.
All of the conditions ensuring PAM in equilibrium are preserved under any of the three alternative wage
setting models covered in this appendix.
C Derivations
C.1 Derivation of h(x,y)
Substituting the definition of Fσ|x (Fσ|x(s) = E [1 σ(x,y′) > s]) into (1), we see that h(x,y) can be
written as h(x,y) = χ (x, σ(x,y)) γ(y), where the function χ solves:
[δ + λ1Fσ|x(s)
]χ(x, s)fσ|x(s) = λ0fσ|x(s)1 s ≥ 0u(x) + λ1fσ|x(s)
∫ s
0
χ(x, s′)dFσ|x (s′) .
39Calling the reservation wage of a type-x unemployed worker R(x) (see Burdett and Mortensen, 1998 for aderivation), we have that b(x) = max R(x);miny′∈Y p(x,y′). Then, the integration constant C(x) is zero ifb(x) = R(x) and some positive function of x otherwise.
51
This ODE solves as:
[δ + λ1Fσ|x(s)
] ∫ s
0
χ(x, s′) dFσ|x (s′) = λ01 s ≥ 0u(x)
[Fσ|x(s)− Fσ|x(0)
].
In other words, by differentiation:
h(x,y) = λ01 σ(x,y) ≥ 0u(x)δ + λ1Fσ|x(0)[
δ + λ1Fσ|x (σ(x,y))]2 γ(y).
Finally, remembering from the flow-balance equations that λ0Fσ|x(0)u(x) = δ (ℓ(x)− u(x)) and substi-
tuting yields the expression of h(x,y) in the main body of the paper.
C.2 Derivation of (DC)
Recalling equation (4):
Hj(y|x) =δ[δ + λ1Fσ|x(0)
]Fσ|x(0)
∫1 σ(x,y′) ≥ 0 × 1
y′j ≤ y
[δ + λ1Fσ|x (σ(x,y′))
]2 γ(y′)dy′
and differentiating yields:
∂Hj(y|x)∂xk
= −δ ∂∂xk
Fσ|x(0)
Fσ|x(0)[δ + λ1Fσ|x(0)
]Hj(y|x)︸ ︷︷ ︸(1)
+δ[δ + λ1Fσ|x(0)
]Fσ|x(0)
×∫ ∂σ(x,y′)
∂xk× 1 σ(x,y′) = 0 × 1
y′j ≤ y
[δ + λ1Fσ|x (σ(x,y′))
]2 γ(y′)dy′
︸ ︷︷ ︸(2)
−δ[δ + λ1Fσ|x(0)
]Fσ|x(0)
×∫
2λ11 σ(x,y′) ≥ 0 × 1y′j ≤ y
[δ + λ1Fσ|x (σ(x,y′))
]3 × d
dxk
[1− Fσ|x (σ(x,y
′))]γ(y′)dy′
︸ ︷︷ ︸(3)
.
We examine those three terms in turn.
First, the definition 1− Fσ|x(s) =∫1 σ(x,y′) ≥ s γ(y′)dy′ implies:
∂
∂xj
[1− Fσ|x(s)
]=
∫1 σ(x,y′) = s ∂σ(x,y′)
∂xjγ(y′)dy′
= EΓ
[∂σ(x,y′)
∂xj| σ(x,y′) = s
]× fσ|x(s). (14)
Replacing into term (1) yields:
(1) = −δfσ|x(0)
Fσ|x(0)[δ + λ1Fσ|x(0)
] × EΓ
[∂σ(x,y′)
∂xk| σ(x,y′) = 0
]×Hj(y|x)
= gσ|x(0)× EΓ
[∂σ(x,y′)
∂xk| σ(x,y′) = 0
]×Hj(y|x).
52
Next, term (2) can be rewritten as:
(2) =δ
Fσ|x(0)[δ + λ1Fσ|x(0)
] × ∫ ∂σ(x,y′)
∂xk× 1 σ(x,y′) = 0 × 1
y′j ≤ y
γ(y′)dy′
=δ[δ + λ1Fσ|x(0)
]Fσ|x(0)
×∫
1 σ(x,y′) = 0 × 1y′j ≤ y
[δ + λ1Fσ|x(0)
]2 γ(y′)dy′ × EΓ
[∂σ(x,y′)
∂xk| σ(x,y′) = 0, y′j ≤ y
]=
∂Kj(y, 0|x)∂s
× EΓ
[∂σ(x,y′)
∂xk| σ(x,y′) = 0, y′j ≤ y
]
where Kj(y, s|x) is the joint cdf of job attribute yj and match flow surplus s, conditional on worker type
x, in the population of employed workers, given by:
Kj(y, s|x) =∫
1y′j ≤ y
× 1 σ(x,y′) ≤ sh (y′|x) dy′
=δ[δ + λ1Fσ|x(0)
]Fσ|x(0)
∫1 0 ≤ σ(x,y′) ≤ s × 1
y′j ≤ y
[δ + λ1Fσ|x (σ(x,y′))
]2 γ(y′)dy′ (15)
which is the probability that a randomly chosen type-x employed worker is in a job whose jth attribute
is less than y and generates a flow surplus less than s. Note that Hj(y|x) and Gσ|x(s) are the marginal
cdfs of Kj(y, s|x), so that Kj(y,+∞|x) = Hj(y|x) and Kj(+∞, s|x) = Gσ|x(s), and moreover that:
∂Kj(y, s|x)∂s
= gσ|x(s)× PrΓy′j ≤ y|σ(x,y′) = s
.
Therefore, term (2) reads:
(2) = gσ|x(0)× PrΓy′j ≤ y|σ(x,y′) = 0
× EΓ
[∂σ(x,y′)
∂xk| σ(x,y′) = 0, y′j ≤ y
]
Now on to term (3). Again from (14), we have that:
d
dxk
[1− Fσ|x (σ(x,y
′))]= fσ|x (σ(x,y
′))×EΓ
[∂σ(x,y′′)
∂xk| σ(x,y′′) = σ(x,y′)
]− ∂σ(x,y′)
∂xk
.
Substituting into term (3):
(3) =δ[δ + λ1Fσ|x(0)
]Fσ|x(0)
×∫
2λ11 σ(x,y′) ≥ 0 × 1y′j ≤ y
× fσ|x (σ(x,y
′))[δ + λ1Fσ|x (σ(x,y′))
]3×∂σ(x,y′)
∂xk− EΓ
[∂σ(x,y′′)
∂xk| σ(x,y′′) = σ(x,y′)
]γ(y′)dy′
53
which can be recast as:40
(3) =δ[δ + λ1Fσ|x(0)
]Fσ|x(0)
×∫ +∞
0
2λ1fσ|x(s)[δ + λ1F σ|x(s)
] × ∫ 1 σ(x,y′) = s × 1y′j ≤ y
[δ + λ1Fσ|x(s)
]2 γ(y′)dy′
×
EΓ
[∂σ(x,y′)
∂xk| σ(x,y′) = s, y′j ≤ y
]− EΓ
[∂σ(x,y′)
∂xk| σ(x,y′) = s
]ds
=
∫ +∞
0
2λ1fσ|x(s)[δ + λ1Fσ|x(s)
] × ∂Kj(y, s|x)∂s
×
EΓ
[∂σ(x,y)
∂xk| σ(x,y′) = s, y′j ≤ y
]− EΓ
[∂σ(x,y′)
∂xk| σ(x,y′) = s
]ds.
Combining (1), (2) and (3) and substituting the definitions of gσ|x(0) and ∂Kj(y, 0|x)/∂s proves that
∂Hj(y|x)∂xk
= gσ|x(0)×
PrΓ
y′j ≤ y|σ(x,y′) = 0
× EΓ
[∂σ(x,y′)
∂xk| σ(x,y′) = 0, y′j ≤ y
]
−Hj(y|x)× EΓ
[∂σ(x,y′)
∂xk| σ(x,y′) = 0
]
+
∫ +∞
0
2λ1fσ|x(s)
δ + λ1Fσ|x(s)× gσ|x(s)× PrΓ
y′j ≤ y|σ(x,y′) = s
×
EΓ
[∂σ(x,y′)
∂xk| σ(x,y′) = s, y′j ≤ y
]− EΓ
[∂σ(x,y′)
∂xk| σ(x,y′) = s
]ds.
where we incorporated the identity ∂Kj(y, s|x)/∂s = gσ|x(s) × PrΓy′j ≤ y|σ(x,y′) = s
and which,
after adding and subtracting PrΓy′j ≤ y|σ(x,y′) = 0
× EΓ
[∂σ(x,y′)
∂xk| σ(x,y′) = 0
], is exactly (DC).
D Proofs
D.1 An Ancillary Lemma
We begin by stating a lemma to better understand the nature of decomposition (DC). This will help us
specify conditions on primitives under which sorting arises.
40A technical note: strictly speaking, the correct integration bounds in the following formula are
s ∈
[max
0, min
y′∈Y,y′j≤y
σ(x,y′)
, maxy′∈Y,y′
j≤yσ(x,y′)
].
rather than [0,+∞). To avoid cluttering the formula with these unwieldy integration bounds, we write it asan integral over all s ≥ 0. As a consequence, it may be that the joint event
(σ(x,y′) = s, y′
j ≤ y)
on whichsome of the expectations are conditioned has zero probability for some values of (s, y). Yet in those cases,∫1 σ(x,y′) = s × 1
y′j ≤ y
γ(y′)dy′ = 0. The formula thus remains correct with [0,+∞) as integration
bounds if we adopt the convention that any expectation conditioned on a zero probability event is equal to zero.
54
Lemma 1. If for all s ≥ 0 and y such that PrΓy′j ≤ y|σ(x,y′) = s
> 0,
y 7→ EΓ
[∂σ(x,y′)
∂xk| σ(x,y′) = s, y′j = y
]is increasing [decreasing] (CMP)
then term (2) in (DC) is negative [positive] for all y, i.e. positive [negative] assortative matching occurs
in the (yj , xk) dimension along the EE margin.
Lemma 1 implies that, if the UE margin is shut down (i.e. if σ(x,y) > 0 for all y) and if condition
(CMP) — our label for complementarity — holds, then the marginal distribution of job attribute yj of
employed workers of type x, Hj(·|x), is monotone with respect to worker skill xk in the FOSD sense, i.e.
there is PAM in dimension (yj , xk).
Condition (CMP) can be interpreted as imposing a strong form of complementarity (or substitutabil-
ity, in the decreasing case) between job attribute j and worker skill k, as is typical of models of sorting.
Indeed, in the one-dimensional case (Y = 1), Condition (CMP) collapses to a restriction on the sign of
∂2σ/∂xk∂y = ∂2p/∂xk∂y, which is the familiar super- (or sub) modularity condition on the production
function from one-dimensional frictionless assignment models. In the multi-dimensional case, Condition
(CMP) imposes, loosely speaking, that supermodularity hold along all level curves of σ(x,y), which
amounts to a complex restriction involving not only the technology, but also the sampling distribution
of job types.
D.2 Proof of Theorem 1
The objective is to find conditions for PAM in dimension (y1, xk). Generally, in order to obtain sufficient
conditions we want to specify conditions subject to which (CMP) in Lemma 1 holds, i.e. under which
EΓ
[∂σ(x,y′)
∂xk| σ(x,y′) = s, y′j = y
]is increasing in y. In the case of bilinear technology with Y = 2,
however, we can circumvent this sufficient condition and compute term (2) in expression (DC) explicitly.
First, notice that Assumption 1 implies explicit invertibility of the surplus function, so that:
σ(x,y) = s ⇔ y2 − b2 =s
q2(x)− q1(x)
q2(x)(y1 − b1) := R(s, y1).
Then, we can express
EΓ
[∂σ(x,y′)
∂xk| σ(x,y′) = s, y′1 = y
]= Eµs,x
(∂σ (x, (y′1, R(s, y′1)))
∂xk| y′1 = y
)(16)
where µs,x(y1) is the sampling density of y1 conditional on x and on σ(x,y) = s:
µs,x(y1) =γ (y1, R(s, y1)) ∂R(s, y1)/∂s∫γ (y′1, R(s, y′1)) ∂R(s, y′1)/∂sdy
′1
.
Note that ∂R(s, y1)/∂s is a constant and thus drops from µs,x. Finally, Assumption 1 implies ∂σ(x,y)/∂xk =
55
∑2j=1 qkj(yj − bj).
Substituting this expression along with R(s, y1) into Eµs,x in (16), we obtain,
Eµs,x
[∂σ(x,y′)
∂xk| σ(x,y′) = s, y′1 = y
]=
qk2q2(x)
s+qk1q2(x)− qk2q1(x)
q2(x)Eµs,x [y
′1 − b1|y′1 = y]
implying:
Eµs,x
[∂σ(x,y′)
∂xk| σ(x,y′) = s, y′1 ≤ y
]=
qk2q2(x)
s+qk1q2(x)− qk2q1(x)
q2(x)Eµs,x [y
′1 − b1|y′1 ≤ y]
Hence, term (2) in expression (DC) is equal to:
− qk1q2(x)− qk2q1(x)
q2(x)
∫ +∞
0
2λ1fσ|x(s)gσ|x(s)
δ + λ1Fσ|x(s)
× PrΓy′j ≤ y|σ(x,y′) = s
×(Eµs,x [y
′1 − b1]− Eµs,x [y
′1 − b1|y′1 ≤ y]
)ds. (17)
Both q2(x) (by assumption) and the difference in expectations (by construction) are positive. Condition
(SC-2D), which here is equivalent to qk1q2(x)− qk2q1(x) > 0 (or detQ > 0), is therefore necessary and
sufficient for term (2) in (DC) to be negative, i.e. necessary and sufficient for PAM in (y1, xk).
D.3 Proof of Theorem 2
Sufficiency. We have to sign term (1) in decomposition (DC) which, as explained in the main text,
reflects sorting along the UE margin. Whenever gσ|x(0) > 0, said term (1) has the sign of:
PrΓy′j ≤ y|σ(x,y′) = 0
×EΓ
[∂σ(x,y′)
∂xk| σ(x,y′) = 0, y′j ≤ y
]− EΓ
[∂σ(x,y′)
∂xk| σ(x,y′) = 0
]+ EΓ
[∂σ(x,y′)
∂xk| σ(x,y′) = 0
]×[PrΓ
y′j ≤ y|σ(x,y′) = 0
−Hj(y|x)
](18)
In the case of Y = 2, treated in Theorem 2, the first line in (18) is negative under the assumed condition
(SC-2D) (see proof of Theorem 1). We thus focus on showing that the second line is negative. Using the
identity Hj(y|x) =∫ +∞0
gσ|x(s)PrΓy′j ≤ y|σ(x,y′) = s
ds, the second line can be reformulated as:
EΓ
[∂σ(x,y′)
∂xk| σ(x,y′) = 0
]×∫ +∞
0
gσ|x(s)[PrΓ
y′j ≤ y|σ(x,y′) = 0
− PrΓ
y′j ≤ y|σ(x,y′) = s
]ds
We then proceed in two steps. First, we will find conditions under which PrΓ y′1 ≤ y|σ(x,y′) = s is
decreasing in s, implying that [PrΓ y′1 ≤ y|σ(x,y′) = 0 − PrΓ y′1 ≤ y|σ(x,y′) = s] ≥ 0. Equivalently,
56
we want to derive conditions under which, for any sH > sL ≥ 0,
∫ y
y1
γ(y′1,
sHq2(x)
+ b2 − q1(x)q2(x)
(y′1 − b1))dy′1∫ y1
y1
γ(y′1,
sHq2(x)
+ b2 − q1(x)q2(x)
(y′1 − b1))dy′1
≤
∫ y
y1
γ(y′1,
sLq2(x)
+ b2 − q1(x)q2(x)
(y′1 − b1))dy′1∫ y1
y1
γ(y′1,
sLq2(x)
+ b2 − q1(x)q2(x)
(y′1 − b1))dy′1
Defining
g(y, s) :=
∫ y
y1
γ
(y′1,
s
q2(x)+ b2 −
q1(x)
q2(x)(y′1 − b1)
)dy′1
and rearranging the previous inequality gives g (y1, sL) g(y, sH) ≤ g(y, sL)g (y1, sH). Since y ≤ y1, this
inequality holds if g is log-supermodular in (y, s). To show when this is the case, define — as in the
proof of Theorem 1 — the joint distribution of y1 and s (conditional on x) as
µx(y1, s) = γ
(y1,
s
q2(x)+ b2 −
q1(x)
q2(x)(y1 − b1)
)
and rewrite g(y, s) =∫1y′1 < yµx (y
′1, s) dy
′1. Note that
1. the support of µx(y1, s) is a lattice (for more details, see the proof of Theorem 6 and in particular
footnote 42 below);
2. the joint distribution µx(y1, s) is log-supermodular in (y1, s) if
q2(x)∂2 ln γ
∂y2∂y1− q1(x)
∂2 ln γ
∂y22≥ 0 (19)
3. the indicator function, 1y′1 < y, is log-supermodular in (y, y′1).
Therefore, the product 1y′1 < yµx(y′1, s) is log-supermodular in (y, y′1, s) since the product of log-
supermodular functions is log-supermodular. In turn, this implies that g is log-supermodular in (y, s)
since log-supermodularity is preserved under integration.
We have thus proven that if (19) holds (stated as condition (UE-2D) in Theorem 2), then PrΓ y′1 ≤ y|σ(x,y′) = s
is decreasing in s.
Second, we derive conditions such that EΓ
[∂σ(x,y′)
∂xk| σ(x,y′) = 0
]≤ 0. Note that under Assumptions
1 and 2:
EΓ
[∂σ(x,y′)
∂xk| σ(x,y′) = 0
]= q1(x)
[qk1q1(x)
− qk2q2(x)
]EΓ (y
′1 − b1 | σ(x,y′) = 0) .
The term outside the expectation is strictly positive by single-crossing condition (SC-2D) and the ex-
pectation is negative if y2 ≥ b2 and q1(x) > 0 (stated as Conditions 1. and 3. in Theorem 2) since
EΓ (y′1 − b1 | σ(x,y′) = 0) = EΓ
[y′1|y′1 = −(y′2 − b2)
q2(x)
q1(x)+ b1
]− b1.
Thus, EΓ
[∂σ(x,y′)
∂xk| σ(x,y′) = 0
]≤ 0 under the conditions from Theorem 2, proving that those condi-
57
tions are sufficient for (18) (and thus, term (1) in decomposition (DC)) to be negative and hence sufficient
for PAM on the UE margin.
Necessity. To show that single crossing condition (SC-2D) is also necessary for PAM when considering
all possible sampling distributions γ, recall that there is PAM in dimension (y1, xk) on the UE margin
if and only if (18) is negative. It thus suffices to show that there exists a sampling distribution γ, under
which (SC-2D) is necessary for (18) to be negative.
We focus on bilinear technology and Y = 2. First, note the first line in (18) is negative if and only if
(SC-2D) holds (by an analogous argument as in Theorem 1). Second, note that if γ is log-supermodular
with log-concave marginals, then [PrΓ y′1 ≤ y|σ(x,y′) = 0 − PrΓ y′1 ≤ y|σ(x,y′) = s] ds ≥ 0, see the
Sufficiency-part above. Hence, if γ is uniform with independent marginals then ∂2 ln γ∂y1∂y2
= 0 and ∂2 ln γ∂y2
2= 0
so that [PrΓ y′1 ≤ y|σ(x,y′) = 0 − PrΓ y′1 ≤ y|σ(x,y′) = s] ds = 0. Only the first line in (18) remains,
which is negative only if (SC-2D) holds.
D.4 Proof of Theorem 3
To economize on notation, we will use the identity ∂Kj(y, s|x)/∂s = gσ|x(s)×PrΓy′j ≤ y|σ(x,y′) = s
throughout the proof (see (15) above for its derivation).
From decomposition (DC) applied to the case of bilinear production function, we obtain:
(x+ a)⊤∇Hj(y|x) =X∑
k=1
(xk + ak)∂Hj(y|x)
∂xk
= EΓ
[(x+ a)⊤∇xσ(x,y
′) | σ(x,y′) = 0, y′j ≤ y] ∂Kj(y, 0|x)
∂s
− EΓ
[(x+ a)⊤∇xσ(x,y
′) | σ(x,y′) = 0]Hj(y|x)gσ|x(0)
+
∫ +∞
0
2λ1fσ|x(s)
δ + λ1Fσ|x(s)× ∂Kj(y, s|x)
∂s
×
EΓ
[(x+ a)⊤∇xσ(x,y
′) | σ(x,y′) = s, y′j ≤ y]− EΓ
[(x+ a)⊤∇xσ(x,y
′) | σ(x,y′) = s]
ds.
But then linearity in (x+ a) of the flow surplus function σ implies that (x+ a)⊤∇xσ(x,y′) = σ(x,y′).
Substitution into the latter equation yields:
(x+ a)⊤∇Hj(y|x)
= EΓ
[σ(x,y′) | σ(x,y′) = 0, y′j ≤ y
]× ∂Kj(y, 0|x)
∂s− EΓ [σ(x,y
′) | σ(x,y′) = 0]Hj(y|x)gσ|x(0)
+
∫ +∞
0
2λ1fσ|x(s)
δ + λ1Fσ|x(s)×∂Kj(y, s|x)
∂s×
EΓ
[σ(x,y′) | σ(x,y′) = s, y′j ≤ y
]−EΓ [σ(x,y
′) | σ(x,y′) = s]
ds,
all terms of which are equal to zero.
Note that the proof above is virtually unchanged if, instead of assuming that σ(·) is linear in (x+a),
58
one only assumes that it is homogeneous in (x+ a). In that case, (x+ a)⊤∇xσ(x,y) = ασ(x,y), where
α is the degree of homogeneity (a constant), and the proof goes through as above.
D.5 Proof of Theorem 4
We first note that similar steps as were taken in the proof of decomposition (DC) can be used to establish
the following result about the way in which the marginal density of the jth job attribute in the population
of firm-worker matches, hj(y|x) = ∂Hj(y|x)/∂y, responds to a change in the kth worker attribute:
∂hj(y|x)∂xk
= gσ|x(0)
EΓ
[∂σ(x,y′)
∂xk| σ(x,y′) = 0, y′j = y
]PrΓ
y′j = y|σ(x,y′) = 0
− EΓ
[∂σ(x,y′)
∂xk| σ(x,y′) = 0
]hj(y|x)
+
∫ +∞
0
2λ1fσ|x(s)
δ + λ1Fσ|x(s)× ∂2Kj(y, s|x)
∂y∂s
×
EΓ
[∂σ(x,y′)
∂xk| σ(x,y′) = s, y′j = y
]− EΓ
[∂σ(x,y′)
∂xk| σ(x,y′) = s
]ds.
where Kj is again given by (15). Assuming, as is done in the theorem, that the UE margin is shut down
(σ(x,y) > 0 for all y), implies that the first two terms in the equation above equal zero. Next consider:
(x+ a)⊤Q∂xkE(y|x) =
X∑i=1
Y∑j=1
(xi + ai)qij
∫ yj
yj
y′j∂hj(y|x)
∂xkdy′j
for all k ∈ 1, · · · , X, where ∂xkE(y|x) = (∂E(y1|x)/∂xk, · · · , ∂E(yY |x)/∂xk)
⊤. Using the expression
for ∂hj(y|x)/∂xk derived above, the definition of the bilinear flow surplus function from Assumption 1,
and the definition of Kj(y, s|x) from equation (15), we have that
(x+ a)⊤Q∂xkE(y|x) =
X∑i=1
Y∑j=1
(xi + ai)qij
∫ yj
yj
y′j
∫ +∞
0
2λ1fσ|x(s)
δ + λ1Fσ|x(s)× ∂2Kj(y, s|x)
∂y∂s
×
EΓ
[∂σ(x,y′)
∂xk| σ(x,y′) = s, y′j = yj
]− EΓ
[∂σ(x,y′)
∂xk| σ(x,y′) = s
]dsdy′j
=
X∑i=1
Y∑j=1
(xi + ai)qijδ[δ + λ1Fσ|x(0)
]Fσ|x(0)
×∫ +∞
0
2λ1fσ|x(s)[δ + λ1Fσ|x(s)
]3×∫ yj
yj
∫1 σ(x,y′) = s1
y′j = yj
y′j
∂σ(x,y′)
∂xkγ(y′)dy′
−∫
1 σ(x,y′) = s1y′j = yj
y′jγ(y
′)dy′ × EΓ
[∂σ(x,y′)
∂xk| σ(x,y′) = s
]dy′jds
59
=
X∑i=1
Y∑j=1
(xi + ai)qijδ[δ + λ1Fσ|x(0)
]Fσ|x(0)
×∫ +∞
0
2λ1fσ|x(s)[δ + λ1Fσ|x(s)
]3×
∫1 σ(x,y′) = s y′j
∂σ(x,y′)
∂xkγ(y′)dy′
−∫
1 σ(x,y′) = s y′jγ(y′)dy′ × EΓ
[∂σ(x,y′)
∂xk| σ(x,y′) = s
]ds
=δ[δ + λ1Fσ|x(0)
]Fσ|x(0)
×∫ +∞
0
2λ1fσ|x(s)[δ + λ1Fσ|x(s)
]3×
∫1 σ(x,y′) = s
X∑i=1
Y∑j=1
(xi + ai)qijy′j
∂σ(x,y′)
∂xkγ(y′)dy′
−∫
1 σ(x,y′) = sX∑i=1
Y∑j=1
(xi + ai)qijy′jγ(y
′)dy′ × EΓ
[∂σ(x,y′)
∂xk| σ(x,y′) = s
]ds
=δ[δ + λ1Fσ|x(0)
]Fσ|x(0)
×∫ +∞
0
2λ1fσ|x(s)[δ + λ1Fσ|x(s)
]3 ×(s+ (x+ a)⊤Qb
)×
∫1 σ(x,y′) = s ∂σ(x,y′)
∂xkγ(y′)dy′
−∫
1 σ(x,y′) = s γ(y′)dy′ × EΓ
[∂σ(x,y′)
∂xk| σ(x,y′) = s
]ds = 0,
as the term in curly brackets inside the integral is equal to zero.
D.6 Proof of Corollary 2
For the bilinear technology with qj(x) ≥ 0 for all j, vector (x+a)⊤Q has only positive elements. Theorem
4 then implies that if sorting is positive in (yj , xk) for all j ∈ 1, · · · , Y − 1, implying ∂E(yj |x)∂xk
> 0 for
all j ∈ 1, · · · , Y − 1, then it must be that ∂E(yY |x)∂xk
< 0. Since ∂E(yY |x)∂xk
> 0 is necessary for HY (y|x)∂xk
< 0
to hold for all values of yY (i.e. necessary for PAM in dimension (yY , xk)), it must be that HY (y|x)∂xk
> 0
on some set of values yY of positive measure and thus there cannot be PAM in dimension (yY , xk).
D.7 Proof of Theorem 5
The objective is to find conditions for PAM in dimension (yj , xk). Based on Lemma 1 (Appendix D.1) this
means we want to specify conditions under which EΓ
[∂σ(x,y′)
∂xk| σ(x,y′) = s, y′j = y
]is increasing in y.
Let y−Y = (y1, · · · , yY−1) denote the (Y − 1)-dimensional vector formed of all components of y
except yY . Note that Y−Y ∈×Y−1
j=1
[yj, yj
]. Fix any x ∈ X and any s ≥ 0, and consider the equation
σ (x, (y−Y , yY )) = s ⇔ f (x, (y−Y , yY )) = b(x) + s (20)
60
Then strict monotonicity of yY 7→ f (x, (y−Y , yY )) (Assumption 2a) guarantees that at most one value
of yY ∈[yY, yY
]solves (20). In turn, assumption 2b ensures that there always exists a unique yY ∈[
yY, yY
]that solves (20). The equation σ(x,y) = s is therefore equivalent to yY = R(s,y−Y ), where
R(s, ·) is a well-defined function over×Y−1
j=1
[yj, yj
].41 Application of the Implicit Function Theorem
further implies that R(·) is differentiable over its domain, with, for all j ∈ 1, · · · , Y − 1:
∂R(S,y−Y )
∂yj= − ∂p/∂yj
∂p/∂yY(x, (y−Y , R(s,y−Y ))) (21)
and:∂R(s,y−Y )
∂s=
1
∂p/∂yY(x, (y−Y , R(s,y−Y ))) (22)
We next notice that, for j ∈ 1, · · · , Y − 1:
d
dyj
(∂σ (x, (y−Y , R(s,y−Y )))
∂xk
)=
∂2p
∂xk∂yj− ∂p/∂yj
∂p/∂yY× ∂2p
∂xk∂yY(23)
where (21) was used and where the arguments of p (x, (y−Y , R(s,y−Y ))) were omitted in the r.h.s. to
reduce notational clutter. Condition (SC-YD) in the theorem then implies, together with (21) that
∂σ (x, (y−Y , R(s,y−Y ))) /∂xk is increasing in all elements of y−Y ∈×Y−1
j=1
[yj, yj
].
Our objective is to exhibit conditions under which condition (CMP) in Lemma 1 holds. Fixing ℓ = Y ,
notice that (CMP) can be expressed as
EΓ
[∂σ(x,y′)
∂xk| σ(x,y′) = s, y′ℓ = y
]= Eµs,x
(∂σ(x,(y′−Y , R(s,y′
−Y )))
∂xk| y′ℓ = y
).
where µs,x(y−Y ) is the joint sampling density of y−Y conditional on x and on σ(x,y) = s:
µs,x(y−Y ) =γ (y−Y , R(s,y−Y )) ∂R(s,y−Y )/∂s∫
γ(y′−Y , R(s,y′
−Y ))∂R(s,y′
−Y )/∂s dy′−Y
with ∂R(s,y−Y )/∂s given in (22). We will now derive conditions under which
y 7→ Eµs,x
(∂σ(x,(y′−Y , R(s,y′
−Y )))
∂xk| y′ℓ = y
)(24)
is increasing in y and proceed in two steps.
First, we show that the support of µs,x is a lattice. The equation σ(x,y) = s has one unique solution
yY = R(s,y−Y ) ∈[yY, yY
]for all y−Y ∈×Y−1
j=1
[yj, yj
]. Thus, for any two points (y′
−Y ,y′′−Y ) ∈(
×Y−1
j=1
[yj, yj
])2, R(s,y′
−Y ∧y′′−Y ) and R(s,y′
−Y ∨y′′−Y ) both exist and are both elements of
[yY, yY
],
41The dependence of R(·) on x, which is fixed for this proof, is omitted.
61
since both y′−Y ∧ y′′
−Y and y′−Y ∨ y′′
−Y are elements of×Y−1
j=1
[yj, yj
]. This establishes that
(y′−Y ∧ y′′
−Y , R(s,y′−Y ∧ y′′
−Y ))∈
Y×j=1
[yj, yj
](y′−Y ∨ y′′
−Y , R(s,y′−Y ∨ y′′
−Y ))∈
Y×j=1
[yj, yj
],
so that Suppµs,x is a lattice, implying that µs,x
(y′−Y ∧ y′′
−Y
)and µs,x
(y′−Y ∨ y′′
−Y
)are strictly positive.
Second, given that ∂σ (x, (y−Y , R(s,y−Y ))) /∂xk is increasing in all elements of y−Y ∈×Y−1
j=1
[yj, yj
]and given that Suppµs,x is a lattice (both established above), a sufficient condition for (24) to be an
increasing function of y is that the density µs,x be such that (the proof is an adaptation of Theorem 4.1
in Karlin and Rinott, 1980):
∀(i, j) ∈ 1, · · · , Y − 12 : i = j,∂2 lnµs,x
∂yi∂yj≥ 0
which translates into condition (EE-YD):
∂p
∂yY
[(∂p
∂yY
)2∂2 ln γ
∂yi∂yj+
∂p
∂yi
∂p
∂yj
∂2 ln γ
∂y2Y− ∂p
∂yj
∂p
∂yY
∂2 ln γ
∂yi∂yY− ∂p
∂yi
∂p
∂yY
∂2 ln γ
∂yj∂yY
]
− ∂ ln γ
∂yY
[(∂p
∂yY
)2∂2p
∂yi∂yj+
∂p
∂yi
∂p
∂yj
∂2p
∂y2Y− ∂p
∂yj
∂p
∂yY
∂2p
∂yi∂yY− ∂p
∂yi
∂p
∂yY
∂2p
∂yj∂yY
]
−(
∂p
∂yY
)2∂3p
∂yi∂yj∂yY− ∂p
∂yi
∂p
∂yj
∂3p
∂y3Y+
∂p
∂yi
∂p
∂yY
∂3p
∂yj∂y2Y+
∂p
∂yj
∂p
∂yY
∂3p
∂yi∂y2Y
+∂p
∂yY
[∂2p
∂yj∂yY
∂2p
∂yi∂yY− ∂2p
∂y2Y
∂2p
∂yi∂yj
]≥ 0.
Remark: The proof for sufficient conditions for NAM is similar. In this case, we need to find conditions
under which EΓ
[∂σ(x,y′)
∂xk| σ(x,y′) = s, y′j = y
]is decreasing in y (see Lemma 1), or, equivalently, under
which EΓ
[−∂σ(x,y′)
∂xk| σ(x,y′) = s, y′j = y
]is increasing in y. To apply the results from Karlin and
Rinott (1980) as in the proof for PAM, we thus need a condition under which (using the same change
of variable as above) −∂σ (x, (y−Y , R(s,y−Y ))) /∂xk is increasing in all y−Y or, equivalently, under
which ∂σ (x, (y−Y , R(s,y−Y ))) /∂xk is decreasing in all yj . This is the case if (23) is negative for all
j ∈ 1, · · · , Y − 1, i.e. if the generalized signal crossing condition (SC-YD) reverses its sign for all
j ∈ 1, · · · , Y − 1
∂
∂xk
(∂p(x,y)/∂yj∂p(x,y)/∂yY
)=
∂2p
∂xk∂yj− ∂p/∂yj
∂p/∂yY× ∂2p
∂xk∂yY≤ 0. (25)
62
This is the only change to Theorem 5 if we want to provide sufficient conditions for NAM. If (25) holds
along with the remaining conditions from the theorem, then sorting is NAM in dimensions (yj , xk),
j ∈ 1, · · · , Y − 1, on the EE margin.
D.8 Proof of Corollary 3
When Y = 2, equation (20) writes as p (x, (y1, y2)) = b(x)+s. Strict monotonicity of y2 7→ p (x, (y1, y2))
still guarantees that at most one value of y2 ∈[y2, y2
]solves (20). Moreover, the set of y1 such that
(20) has one solution is an interval (possibly empty). To see this, suppose there exist two distinct
values y′1 < y′′1 such that (20) has a solution. Denote these solutions by y′2 and y′′2 , respectively. Con-
sider a number t ∈ (0, 1) and define y1(t) = ty′1 + (1 − t)y′′1 . Quasi-concavity of p(x, ·) implies that
p (x, (y1(t), ty′2 + (1− t)y′′2 )) ≥ b(x) + s. Moreover, by assumption, miny2∈R p (x, (y1(t), y2)) < b(x) ≤
b(x)+ s. By continuity, (20) has a solution R (s, y1(t)) ≤ ty′2 +(1− t)y′′2 when y1 = y1(t). Note that this
solution can be smaller than y2: in this case, γ (y1(t), R (s, y1(t))) = 0.
Denote the interval of values of y1 for which (20) has one solution by I1(s). Application of the
Implicit Function Theorem (as in the proof of Theorem 5 above) then implies that equation (23) holds
for all y ∈ I1(s):d
dy1
(∂σ (x, (y,R(s, y)))
∂xk
)=
∂2p
∂xk∂y1− ∂p/∂y1
∂p/∂y2× ∂2p
∂xk∂y2
which is positive by the single-crossing condition in Corollary 3.
Consider s ≥ 0 and two values (y′, y′′) ∈[y1, y1
]2such that PrΓy1 = y′|σ(x,y) = s > 0 and
PrΓy1 = y′′|σ(x,y) = s > 0, which implies in particular that y′ and y′′ are both in I1(s). Assume
w.l.o.g. that y′′ > y′. Then, by the results derived above:
EΓ
[∂σ(x,y)
∂xk| σ(x,y) = s, y1 = y′′
]− EΓ
[∂σ(x,y)
∂xk| σ(x,y) = s, y1 = y′
]=
∂σ (x, (y′′, R(s, y′′)))
∂xk− ∂σ (x, (y′, R(s, y′)))
∂xk> 0.
This proves that Condition (CMP) holds for monotone production functions and Y = 2, i.e. there is
PAM along (y1, xk).
Note in passing that Condition (EE-YD) from Theorem 5 becomes irrelevant in the two-dimensional
case Y = 2, which obviates the need to impose conditions on the behavior of p over the support of γ: Con-
dition 2b in Theorem 5 is replaced by the twofold requirement of quasi-concavity and miny2∈R p(x,y) < b (x).
D.9 Proof of Corollary 4
Theorem 5 also nests the case of a bilinear technology from Corollary 4. The bilinear technology is C3
(Condition 1 from Theorem 5). The condition qY (x) > 0 (imposed in Assumption 1) ensures that p(x,y)
is strictly increasing in yY (Condition 2a from Theorem 5). Moreover, with bilinear technology, we can
63
explicitly solve the equation σ(x,y) = s, thus circumventing the need to appeal to the Implicit Function
Theorem. Together, the requirements that yY = +∞ and limyY →yYp(x,y) < b(x) parallel Condition
2b from Theorem 5 and ensure that suppµs,x is a lattice so that we can apply Theorem 4.1 in Karlin
and Rinott (1980) to derive conditions under which (24) is increasing in y. Next, Condition 2 in the
corollary is the generalized single crossing condition (which echoes Condition 2c in Theorem 5). Finally,
Condition (EE-YD’) in point 3 of the corollary is a rewrite of Condition (EE-YD) in Theorem 5 in the
case of a bilinear production function: bilinearity implies that the last three lines in (EE-YD) vanish (as
all the partial derivatives of order greater than one are zero in this case) and that the derivatives of the
flow surplus (or of the production) function can be expressed explicitly.
D.10 Proof of Theorem 6
We now prove Theorem 6 (note that it covers Y = 2 in Theorem 2 as a special case). We have to sign
the first term in decomposition (DC) (which, as explained in the main text, reflects sorting along the UE
margin) which is spelled out in (18). By the same arguments as were used in the proofs of Theorems 1
and 5, the first term in curly brackets in (18) is negative under Condition (EE-YD’). We thus focus on
the second term of (18). We proceed similarly as in the proof of Theorem 2.
First, we seek conditions under which PrΓ y′ℓ ≤ y|σ(x,y′) = s is decreasing in s for any fixed
ℓ ∈ 1, · · · , Y − 1. In particular, we want to derive conditions under which, for any sH > sL ≥ 0,
∫ y
yℓ
∫γ(y′ℓ,y
′−(ℓ,Y ),
sHqY (x) + bY −
∑Y−1j=1
qj(x)qY (x) (y
′j − bj)
)dy′
−(ℓ,Y )dy′ℓ∫ yℓ
yℓ
∫γ(y′ℓ,y
′−(ℓ,Y ),
sHqY (x) + bY −
∑Y−1j=1
qj(x)qY (x) (y
′j − bj)
)dy′
−(ℓ,Y )dy′ℓ
≤
∫ y
yℓ
∫γ(y′ℓ,y
′−(ℓ,Y ),
sLqY (x) + bY −
∑Y−1j=1
qj(x)qY (x) (y
′j − bj)
)dy′
−(ℓ,Y )dy′ℓ∫ yℓ
yℓ
∫γ(y′ℓ,y
′−(ℓ,Y ),
sLqY (x) + bY −
∑Y−1j=1
qj(x)qY (x) (y
′j − bj)
)dy′
−(ℓ,Y )dy′ℓ
where y′−(ℓ,Y ) =
(y′1, · · · , y′ℓ−1, y
′ℓ+1, · · · , y′Y−1
). Defining
g(y, s) =
∫ y
yℓ
∫γ
y′ℓ,y′−(ℓ,Y ),
s
qY (x)+ bY −
Y−1∑j=1
qj(x)
qY (x)(y′j − bj)
dy′−(ℓ,Y )dy
′ℓ
and rearranging the previous inequality gives g (yℓ, sL) g(y, sH) ≤ g(y, sL)g (yℓ, sL). Since y ≤ yℓ, this
inequality holds if g is log-supermodular in (y, s). To show when this is the case, define the joint
distribution of y−Y and s (conditional on x) as
µx(y−Y , s) = γ
y−Y,s
qY (x)+ bY −
Y−1∑j=1
qj(x)
qY (x)(yj − bj)
and rewrite g(y, s) =
∫1y′ℓ < yµx
(y′−Y , s
)dy−Y
′. Note that
64
1. the support of µx(y−Y , s) is a lattice;42
2. the joint distribution µx(y−Y , s) is log-supermodular in (y−Y , s) if it is log-supermodular in all
pairs of arguments. This is the case if:
∀j = 1, ..., Y − 1 : qY (x)∂2 ln γ
∂yY ∂yj− qj(x)
∂2 ln γ
∂y2Y≥ 0 (26)
∀(i, j) ∈ 1, · · · , Y − 12, i = j :
qY (x)2 ∂
2 ln γ
∂yi∂yj+ qi(x)qj(x)
∂2 ln γ
∂y2Y− qj(x)qY (x)
∂2 ln γ
∂yi∂yY− qi(x)qY (x)
∂2 ln γ
∂yj∂yY≥ 0 (27)
3. the indicator function, 1y′ℓ < y, is log-supermodular in (y, y′ℓ).
Therefore, the product 1y′ℓ < yµx(y′−Y , s) is log-supermodular in (y,y′
−Y , s) since the product of log-
supermodular functions is log-supermodular). This implies in turn that g(·) is log-supermodular in (y, s)
since log-supermodularity is preserved under integration.
We have thus proven that if (26) (stated as condition (UE-YD) in Theorem 6) and (27) (stated as
condition (EE-YD’) in Corollary 4) hold, then PrΓ y′ℓ ≤ y|σ(x,y′) = s is decreasing in s.
Second, we will provide a condition guaranteeing that EΓ [∂σ(x,y′)/∂xk | σ(x,y′) = 0] ≤ 0. First
note that σ(x,y) = 0 is equivalent to yY = bY − 1qY (x)
∑Y−1j=1 qj(x)(yj − bj). Thus:
σ(x,y) = 0 =⇒ ∂σ(x,y)
∂xk=
Y−1∑j=1
qkjqY (x)− qkY qj(x)
qY (x)(yj − bj).
Therefore, a sufficient condition for EΓ [∂σ(x,y′)/∂xk | σ(x,y′) = 0] ≤ 0 is that the value of the linear
program
maxy
Y−1∑j=1
qkjqY (x)− qkY qj(x)
qY (x)(yj − bj)
subject to:1
qY (x)
Y−1∑j=1
qj(x)(yj − bj) ≤ bY − yY
yj≤ yj j = 1, · · · , Y − 1
be negative. This is a sufficient condition which ensures that ∂σ(x,y)/∂xk ≤ 0 over the entire set of
42This follows from the proof of Theorem 5 with one extra step to deal with the joint distribution of (y−Y , s)(instead of the conditional distribution of y−Y given s). Note that we have proven that yY = R(s,y−Y ) ∈[yY, yY
]for any s ≥ 0. Hence, for two points s ≥ 0 and s′ ≥ 0, s ∧ s′ ≥ 0 and s ∨ s′ ≥ 0 and
(y−Y ∧ y′
−Y , R(s ∧ s′,y−Y ∧ y′−Y )
)∈
Y×j=1
[yj, yj
]and
(y−Y ∨ y′
−Y , R(s ∨ s′,y−Y ∨ y′−Y )
)∈
Y×j=1
[yj, yj
].
Hence the support of µx is a lattice and µx (y−Y ∧ y′−Y , s ∧ s′) and µx (y−Y ∨ y′
−Y , s ∨ s′) are strictly positive.This is why we can take lnµx in the next step.
65
y’s such that σ(x,y) = 0. Although strong, this condition is the minimal ‘distribution-free’ one. This
program can be rewritten in standard form as:
Y−1∑j=1
qkjqY (x)− qkY qj(x)
qY (x)
(yj− bj
)+max
Y
Y−1∑j=1
qkjqY (x)− qkY qj(x)
qY (x)Yj (28)
subject to:Y−1∑j=1
qj(x)Yj ≤ −σ(x,y
)0 ≤ Yj j = 1, · · · , Y − 1
where Yj = yj−yj. The first thing to note about program (28) is that if there exists a j ∈ 1, · · · , Y −1
such that qj(x) < 0, then (28) is clearly unbounded: in that case, one can set Yj → +∞ whenever
qj(x) < 0 and Yj′ = 0 when qj′(x) ≥ 0, which satisfies the constraints and gives (28) an infinite value.
We thus assume that qj(x) ≥ 0 for all j ∈ 1, · · · , Y − 1.
The dual of (28) is simply:
Y−1∑j=1
qkjqY (x)− qkY qj(x)
qY (x)
(yj− bj
)+min
Z
⟨−Zσ
(x,y
)⟩(29)
subject to: qj(x)Z ≥ qkjqY (x)− qkY qj(x)
qY (x), j = 1, · · · , Y − 1
Z ≥ 0
Assuming that σ(x,y
)< 0, the solution to the latter program is then to set:
Z = maxj′
qkj′
qj′(x)
− qkY
qY (x)
and the value of the linear program (28) (which equals that of its dual) is:
Y−1∑j=1
qkjqY (x)− qkY qj(x)
qY (x)
(yj− bj
)−
Y∑j=1
[maxj′
qkj′
qj′(x)
− qkY
qY (x)
]qj(x)
(yj− bj
)
=
Y∑j=1
[qkjqj(x)
−maxj′
qkj′
qj′(x)
]qj(x)
(yj− bj
). (30)
The requirement that this be negative yields the condition in the theorem.
D.11 Proof of Theorem 7
Under the separability assumption 2 in the Theorem, p(x,y) = p1(x, y1)+p2(x−k,y) where x−k includes
all components of x but xk. Hence:
∂σ(x,y)
∂xk=
∂p1(x, y1)
∂xk− ∂b(x)
∂xk
66
which only depends on the first job attribute y1. Then:
EΓ
[∂σ(x,y′)
∂xk| σ(x,y′) = s, y′1 = y
]=
∂p1(x, y)
∂xk− ∂b(x)
∂xk
which is increasing in y under Assumption 3. Condition (CMP) thus holds, hence the result.
67