multidimensional sorting under random search · dimensional sorting in a frictionless assignment...

Multidimensional Sorting

Under Random Search∗

Ilse Lindenlaub† Fabien Postel-Vinay‡

July 2017

Abstract

We analyze sorting in a standard market environment with search frictions and randomsearch, where both workers and jobs have multi-dimensional characteristics. We offer def-initions of multi-dimensional positive and negative assortative matching in this frictionalenvironment. We say that matching is positive assortative in dimension (j, k) if workerswith higher endowments in skill dimension k are matched to a distribution of jobs withhigher values of job attribute j, in the sense of first-order stochastic dominance, and simi-larly for negative assortative matching. We then provide conditions on the primitives of theeconomy (technology and distributions) under which sorting arises in equilibrium. An essen-tial condition for positive sorting is a single-crossing property on the technology, althoughin general further conditions on type distributions are needed. Guided by our theoreticalframework, we develop an empirical test of the standard assumption that heterogeneity inthe data is one-dimensional. We conduct simulation exercises (i) to show that the test cor-rectly reveals misspecification and (ii) to quantify the errors in assessing sorting, mismatchand policy caused by the restriction to one-dimensional heterogeneity when our test indicatesthat said restriction is not valid.

Keywords. Multidimensional Heterogeneity, Random Search, Sorting, Assortative Matching

∗We would like to thank Hector Chade and Jan Eeckhout for useful comments, as well as Carlos Carrillo-Tudela and Jim Albrecht for their insightful discussions of this paper at the 2016 Cowles Foundation MacroConference and the 2016 Konstanz Search and Matching Conference. We further thank seminar audiences atEssex, Yale, the Chicago Fed, Einaudi, Konstanz, the NBER Summer Institute, MIT, Simon Fraser University,McGill, EUI, and at the SED 2017 for their comments. All remaining errors are ours.

†Yale University, Address: Department of Economics, Yale University, 28 Hillhouse Avenue, New Haven,06520, US. Email: [email protected]

‡UCL and IFS. Address: Department of Economics, University College London, Drayton House, 30 GordonStreet, London WC1H 0AX, UK. Email: [email protected]

1 Introduction

Labor market frictions impede the efficient assignment of heterogeneous workers to heterogeneous

jobs and cause mismatch. Measuring mismatch in frictional markets — its nature, its extent,

and its implications for efficiency, inequality and welfare — is the focus of a growing body of

literature in structural labor economics. With very few exceptions, that literature works under

the assumption that job and worker heterogeneity can both be captured by scalar indexes, i.e.

heterogeneity is one-dimensional.1 While this restriction to one-dimensional heterogeneity is

a natural starting point and convenient for modeling, it is at odds with the fact that typical

data sets describe both workers and jobs in terms of many different productive attributes (e.g.

cognitive skills, manual skills, or psychometric test scores for workers, and numerous task-specific

skill requirements for jobs).2

If the productive characteristics of workers and firms are truly multi-dimensional, what are

the features of the data, relating to sorting and mismatch, that we miss by modeling them as

one-dimensional scalars? This is the question we ask in this paper.

We proceed in two steps. First, we develop a theory of the assignment of workers to jobs

when both workers and jobs differ along several dimensions in a random search environment.

We propose a notion of assortative matching (or sorting) in this setting and analyze how the

economy’s primitives shape equilibrium sorting. Second, we use our theory to develop an empir-

ical test to assess whether the assumption of one-dimensional (1D) heterogeneity in the random

search framework is valid. We show on simulated data that the test accurately reveals misspec-

ification of the one-dimensional model. We then highlight some consequences of approximating

job and worker characteristics by 1D summary indexes when it is not justified, i.e. when the test

indicates that the data is multi-dimensional. We show that using the misspecified 1D model to

measure sorting and mismatch (as opposed to the multi-dimensional one) leads to quantitative

and qualitative errors as well as misguided policy recommendations.

We now describe the two steps of our analysis — theory and application — in more detail.

We begin with developing a theoretical framework of multi-dimensional sorting under random

search. Our environment is a standard random search model, except that workers and firms

(also referred to as jobs) are endowed with vectors of productive attributes, x = (x1, · · · , xX)

1A recent exception is Lise and Postel-Vinay (2015), who focus on the accumulation of skills in various dimen-sions within a model that can otherwise be seen as a special case of the theoretical framework we develop here.

2Beyond search models, a growing applied literature takes explicit account of these multiple dimensions ofproductive heterogeneity. Recent examples include Yamaguchi (2012), Sanders (2012), and Guvenen et al. (2015).

1

for workers and y = (y1, · · · , yY ) for jobs. Employed and unemployed workers receive job offers

drawn at random from an exogenous sampling distribution of job attributes. Employers face no

capacity constraint. Utility is transferable: workers and firms are joint surplus maximizers. The

fact that agents base their match acceptance decisions on a scalar value (i.e. the match surplus

that summarizes all underlying multi-dimensional heterogeneity) makes our multi-dimensional

problem tractable.

In order to interpret the assignment patterns that arise in equilibrium, we define a notion

of positive assortative matching (PAM) and negative assortative matching (NAM) in this en-

vironment. This notion is based on first-order stochastic dominance (FOSD) ordering of the

marginal distributions of job attributes across workers with different skills: if workers with a

higher endowment of some skill xk are matched to jobs with ‘better’ (in the first-order stochastic

dominance sense) attributes yj , then PAM occurs in dimension (xk, yj). For instance, if workers

with more cognitive skills are matched to a distribution of jobs whose cognitive contents stochas-

tically dominates that of less cognitively skilled workers, then we call this sorting pattern ‘PAM

between cognitive skills and cognitive job contents’. It is important to note that sorting is thus

defined dimension by dimension, meaning that PAM can arise in one particular dimension (here

in the ‘cognitive’ dimension) while NAM occurs in another.

Using this definition of sorting, we present three main sets of theoretical results. The first one

is about the sign of sorting. We provide conditions on the economy’s primitives under which pos-

itive or negative sorting arises in equilibrium. For ease of exposition and clarity of interpretation,

we focus in the main body of the paper on a baseline model: it features two-dimensional het-

erogeneity on the job side, bilinear technology, wage setting by sequential auctions, and positive

surplus of any possible match — all assumptions that we can and do relax. In this baseline case,

we find that matching in, say, dimension (x1, y1) is positive assortative if and only if the technol-

ogy satisfies a single crossing condition implying that the complementarity between worker skill

x1 and job attribute y1 dominates the complementarity in the competing dimension (x1, y2).

This condition is distribution-free: it only involves restrictions on the production technology.

We generalize this analysis of the sign of sorting to cases where (1) not all possible matches

generate positive surplus (implying that there also is sorting on the unemployment-to-employment

margin), (2) heterogeneity on the job side is of dimension higher than two, and (3) the technol-

ogy is monotone in at least one job attribute but not necessarily bilinear or it is non-monotone

but separable. We show that in these more general environments the conditions for sorting not

2

only involve single-crossing of the technology but also interactions between the technology and

sampling distribution of jobs. We also show that these results do not hinge on the assumed se-

quential auctions wage setting but hold for several other commonly used wage setting protocols

like Nash bargaining, sequential auctions with worker bargaining power, and wage posting.

Second, our model predicts sorting based on specialization rather than on absolute advantage.

This implies that uniformly more skilled workers do not sort into jobs with uniformly higher

skill requirements. Rather, they sort into jobs with a higher requirement for the skill in which

they are relatively strong, possibly at the cost of a lower requirement for the other skills.

Our third set of results highlights that sorting generally cannot be positive between all skill

and job dimensions. Instead, there are sorting trade-offs. We provide conditions under which

PAM arises in all dimensions except the one that is characterized by the weakest complemen-

tarities in production, where forces push towards NAM.

An important insight from our analysis is that multi-dimensional heterogeneity per se is a

cause of sorting. That is, in the 1D version of this model, where match surplus depends on

scalar worker type x and job type y and is increasing in y, workers all rank jobs in the same

way (there is a single job ladder), regardless of their skill x: their common strategy is to accept

any job with a higher y than their current one, which rules out sorting. By contrast, in the

multi-dimensional world where workers have skill bundles, what matters is not just to match

with a productive job in any component of y. Rather, a worker wants to match with a job that

puts much weight on the skill in which he is strong. Thus, workers with different skill bundles

(and hence different strengths) accept and reject different types of jobs, they climb different job

ladders, which is why sorting arises.

In the second part of our analysis, we show that our theory of multi-dimensional sorting has

important implications for applied work. Based on the theoretical insight that multi-dimensional

job ladders are heterogenous across workers, we design a test of the assumption that job and

worker heterogeneity can be captured by scalar indices (i.e. the single index assumption). If

the data is well-approximated by 1D worker and job types, any two workers with the same

(scalar) type should face the same job ladder. Conversely, if the single index assumption is

not justified, approximating different multi-dimensional worker types by the same 1D worker

type will produce job ladder heterogeneity within 1D worker types, revealing misspecification.

To assess the accuracy of this test, we simulate data from several two-dimensional models that

comply with our theory and fit misspecified 1D models to those data. We show that the test

3

correctly reveals misspecification of the 1D model when the data is multi-dimensional.

We then use our simulations to assess the potential errors in the analysis of sorting and

mismatch caused by imposing the single-index assumption when it is not justified (i.e. when our

test indicates that this restriction is violated). We find that the misspecified model is largely

uninformative on true sorting patterns (it predicts sorting when the true data features none and

vice versa). Perhaps most importantly, a misspecified 1D model suggests a first-best allocation

that is very different from the true first-best under multi-dimensional types. Implementing the

first-best allocation suggested by the misspecified 1D model can cause sizable welfare losses.

While much is known about sorting under one-dimensional heterogeneity without frictions

(Becker, 1973; Legros and Newman, 2007) and one-dimensional heterogeneity with frictions

(Shimer and Smith, 2000; Smith, 2006; Eeckhout and Kircher, 2010), little is known about

sorting on multi -dimensional types. One exception is Lindenlaub (2017) who studies multi-

dimensional sorting in a frictionless assignment game.3 To the best of our knowledge, this

paper is the first to develop a theory of multi -dimensional sorting under random search —

an environment of great importance for applied work since it carries a well-defined notion of

mismatch and allows for policy analysis.

Our work not only shifts the focus to multi-dimensional heterogeneity but also differs in

another important way from that theoretical literature on sorting. All of the aforementioned

papers analyze assignment problems, meaning that agents on either side of the market can be

matched with at most one agent from the other side (matching is one to one). While this

assumption is certainly appropriate in partnership models of the marriage market, it is much

less common in analyses of the labor market, where a firm usually employs many workers. In

this paper we assume, as is almost invariably done in the applied structural search literature,

that firms operate technologies with constant returns to scale and face no capacity constraints.4

The no-capacity-constraint assumption greatly increases the tractability of structural search

models.5 Its cost, though, is that it tends to rule out sorting when job and worker heterogeneity3Our restriction on the technology to obtain sorting is most closely related (but not equivalent) to that

needed in the multi-dimensional, frictionless assignment problem by Lindenlaub (2017), who also has to disciplinecomplementarities across tasks to ensure PAM within tasks. But compared to Lindenlaub, the introduction ofrandom search and no-capacity constraint changes the technical nature of the problem (for instance, we cannotrely on optimal transport theory here) as well as the insights (in our framework with frictions, we can analyzesorting on the UE and EE margin, heterogenous job ladders, mismatch and policy).

4Examples of widely-used models without firm capacity constraint are Burdett and Mortensen (1998) andPostel-Vinay and Robin (2002).

5For instance, it is instrumental in circumventing a complex existence proof involving a fixed point problem inthe distribution of the unmatched agents that is common in one-to-one assignment models under search frictions(see Shimer and Smith, 2000).

4

are one-dimensional. As discussed above, if technology is monotone in firm type, workers share a

common ranking of firms and all seek to climb a single, economy-wide job ladder. The lack of ca-

pacity constraint on the firm side then implies that all workers match with the same distribution

of jobs in equilibrium. This result is independent of the complementarities in production.

Getting the standard structural search framework without firm capacity constraint to pro-

duce a meaningful notion of equilibrium sorting is not straightforward.6 In this paper, we extend

that framework by allowing for multi-dimensional heterogeneity of jobs and workers. This change

drastically alters the model’s predictions. Multi-dimensional skill bundles cause different work-

ers to rank jobs differently and face different job ladders, which is why sorting arises. Our model

and definition of multi-dimensional sorting can guide the measurement of sorting in the data.

It can be used to study efficiency losses due to mismatch or the role of sorting and mismatch in

wage inequality — topics that are of great importance from an applied point of view.

The rest of the paper is organized as follows: Section 2 illustrates our main theoretical

insights via a simple example. Section 3 introduces our model. Section 4 provides a definition

of sorting in multiple dimensions under random search. Section 5 contains our main results

on the sign of sorting, established within our baseline model with bilinear technology and two-

dimensional job attributes. Generalizations are discussed in Section 6 (and presented formally

in Appendix A). Section 7 investigates sorting based on absolute advantage vs specialization

and the interdependence of sorting patterns across different heterogeneity dimensions. Section 8

contains our application, including our test of the single index restriction. Section 9 concludes.

2 An Illustrative Example

We begin by illustrating our main theoretical insights using a simple example, the foundations

of which will be explored later as we introduce our full model. Consider a labor market where

workers are characterized by a two-dimensional skill vector x = (x1, x2) ∈ R2++ (e.g. “cognitive”

and “manual” skills). Workers can either be employed or unemployed. Either way, they face

search frictions in that they sample job offers randomly and sequentially. Jobs are also charac-

terized by a two-dimensional vector of attributes y = (y1, y2) ∈ R2++ describing their contents

6Bagger and Lentz (2016), who build on Lentz (2010), include endogenous search intensity into an otherwisestandard 1D sequential auction model. Under complementarities in production, high worker types have moreto gain from being with high firm types, hence search more intensively, and therefore end up in better firms inequilibrium. Thus, their model features sorting due to endogenous search intensity. Another way to introducesorting is to abandon the assumption that technology is monotone in firm type.

5

in terms of skill dimensions 1 and 2 (e.g. “cognitive” and “manual” skill requirements). A match

between a worker with skills x and a job with attributes y generates a surplus σ(x,y).

We also assume that jobs and workers are joint surplus maximizers. Hence, a meeting

between a type-x unemployed worker and a type-y job will result in a match if and only if

σ(x,y) ≥ 0. And a meeting between a type-x worker, employed in job y, and an alternative

type-y′ job will result in the worker accepting the type-y′ job if and only if σ(x,y′) > σ(x,y).

For illustration, we assume that the match surplus function is bilinear:

σ(x,y) = 2x1y1 + x2y2

(It will become clear in our baseline model below under which conditions it suffices to focus

on this flow surplus σ, which is a primitive.) In this example, complementarities in production

within task dimension 1 and within dimension 2 are positive while between-task complemen-

tarities are zero. This implies that complementarities are larger in dimension (x1, y1) than in

(x1, y2), ensuring that the single crossing condition

∂

∂x1

[∂σ/∂y1∂σ/∂y2

]> 0

holds. Note that, in this example, σ(x,y) ≥ 0 for all pairs (x,y), implying that all matches out

of unemployment will be accepted.

Consider two unemployed workers, one with skills (x1, x2) = (1, 1) and the other one

with (x1, x2) = (2, 1), so the second worker is relatively specialized in skill x1. Assume for

illustration that both workers receive the same sequence of job offers over time, (y1, y2) =

(2, 2), (3, 1), (3.4, 0), implying the following acceptance/rejection decisions:

Offers: y = (2, 2) −→ y = (3, 1) −→ y = (3.4, 0)

Worker x = (1, 1): accept −→ accept −→ reject

Worker x = (2, 1): accept −→ accept −→ accept

The last job offer with high y1 but very low y2 is only accepted by the worker with higher x1

since for him the surplus comparison yields σ ((2, 1), (3.4, 0)) = 13.6 > 13 = σ ((2, 1), (3, 1))

while for the worker with lower x1, σ ((1, 1), (3.4, 0)) = 6.8 < 7 = σ ((1, 1), (3, 1)).

This simple example illustrates our main theoretical insights in an intuitive way. Even

though both workers are inclined to accept jobs with higher y1 at the expense of lower y2 —

6

because complementarities in dimension (x1, y1) are stronger — for workers with higher x1 this

sorting trade-off is more pronounced: they are willing to accept jobs with particularly low

y2 in exchange for a higher y1. This is why in equilibrium, for given x2, higher-x1 workers

will be matched to a distribution of jobs that is ‘better’ (in the FOSD sense) in terms of y1,

compared to workers with lower x1. But, in turn, higher-x1 workers will be matched to jobs

with ‘worse’ y2 (again in the FOSD-sense) relative to workers with lower x1. According to our

definition of PAM, there is PAM in the dimension (x1, y1) (which is the dimension of relatively

strong complementarities in production), while there is NAM in dimension (x1, y2) (which is the

dimension of relatively weak complementarities). These sorting patterns arise because workers

with different skill bundles have different ‘specializations’ and thus rank jobs in different ways.

As a consequence, the two workers in this example climb different job ladders over time.

It is important to note that if there were no differences in the relative complementarities

across dimensions, then both workers in our example would share the same ranking of jobs. To

illustrate this, change the technology to σ(x,y) = 2x1y1 + 2x1y2 + x2y1 + x2y2, which implies

that ∂[∂σ/∂y1∂σ/∂y2

]/∂x1 = 0. One can easily check that both workers would reject the last job offer

y = (3.4, 0) in this case. Independent of their different degrees of specialization, they would

both climb the same job ladder, and no sorting would arise. The labor market would be similar

to one with 1D heterogeneity, where workers have a single skill x and jobs a single attribute y.

In such a 1D world, if the technology is monotone in y, workers share the same ranking of jobs

and climb the same job ladder over time (as is the case, e.g., in Postel-Vinay and Robin, 2002).

3 The Model

3.1 The Environment.

Time t ∈ R+ is continuous. The economy is populated by infinitely lived workers and firms.

There is a fixed unit mass of workers, each characterized by a time-invariant skill bundle x =

(x1, · · · , xX) ∈ X =×Xk=1[xk, xk], where X denotes the number of different skill dimensions.7

We normalize the lower support of worker skills to xk = 0 and allow xk ∈ R+ ∪ +∞. Skills

are distributed with cdf L and strictly positive density ℓ.8 Firms can be thought of as single7For early influential work on the importance of workers’ characteristics being bundled in the labor market

context, see Heckman and Scheinkman (1987).8We adopt the following notational conventions throughout the paper: we denote all CDFs using uppercase

letters (e.g. L), densities by the associated lowercase letter (e.g. ℓ), and survivor functions using bars over CDFs(e.g. L = 1−L). Also, we will state a function is increasing/decreasing or positive/negative if this is the case inthe weak sense. Strict properties will be mentioned explicitly.

7

jobs (possibly vacant), or as collections of independent, perfectly substitutable jobs. Jobs are

characterized by a vector of time-invariant productive attributes, or “skill requirements” y =

(y1, · · · , yY ) ∈ Y =×Yj=1[yj , yj ], where Y denotes the number of different job attributes and

where yj∈ R+ and yj ∈ R+ ∪ +∞.9

Workers search for jobs at random, both on and off the job. They can be matched to a job or

be unemployed. If matched, they lose their job at Poisson rate δ, and sample alternate job offers

drawn from an exogenous sampling distribution distribution Γ at rate λ1. We assume Γ to have

a strictly positive density γ, which is twice continuously differentiable. Unemployed workers

sample job offers from the same sampling distribution at rate λ0. The output flow in a match

between a worker with skills x and a job with attributes y is p(x,y), where p : RX×RY −→ R.10

We denote the income flow of an unemployed worker with skill x by b(x).

We stress that there is no capacity constraint on the firm side (firms are happy to hire any

worker with whom they generate positive surplus) and matched jobs do not search for other

workers. As such, this set-up is really a (partial equilibrium) model of the labor market rather

than one of symmetric, one-to-one matching of the marriage market.

3.2 Rent Sharing and Value Functions

Workers and firms are risk-neutral and have equal time discounting rates ρ > 0. Under those

assumptions, the total present discounted value of a type-(x,y) match is independent of the

way in which it is shared, and only depends on match attributes (x,y). We denote this value

by P (x,y). We further denote the value of unemployment by U(x), and the worker’s value of

being employed under his current wage contract by W , where W ≥ U(x) (otherwise the worker

would quit into unemployment), and W ≤ P (x,y) (otherwise the firm would fire the worker).

Assuming that the employer’s value of a job vacancy is zero, the total surplus generated by a

type-(x,y) match is P (x,y)− U(x).

We assume in the main text that wage contracts are set as in the sequential auction model

without worker bargaining power of Postel-Vinay and Robin (2002). This is mainly to simplify

exposition. We show in Appendix B that most of our results extend to other common wage set-

ting rules, as Nash bargaining (Mortensen and Pissarides, 1994; Moscarini, 2001), wage/contract9The restriction to x and y having positive elements is not necessary but simplifies the economic interpreta-

tions. Also, we can relax that X is a product of intervals but assume it for symmetry with job attributes.10We assume that the production function is defined over the entire space RX ×RY , not just the set X ×Y of

observed (x,y). This is to streamline some proofs and can be relaxed. Details are available upon request.

8

posting (Burdett and Mortensen, 1998; Moscarini and Postel-Vinay, 2013), or sequential auctions

with worker bargaining power (Cahuc, Postel-Vinay and Robin, 2006).

In the sequential auction model, firms offer take-it-or-leave-it wage contracts to workers.

Wage contracts are long-term contracts specifying a fixed wage that can be renegotiated by

mutual agreement only. In particular, when an employed worker receives an outside offer, the

current and outside employers Bertrand-compete for the worker. Consider a type-x worker

employed at a type-y firm and receiving an outside offer from a firm of type y′. Bertrand

competition between the type-y and type-y′ employers results in the worker matching with the

employer where the total match value is higher, while extracting the full surplus from the lower-

surplus match. This implies that he stays in his initial job if P (x,y) ≥ P (x,y′), moves to the

type-y′ job otherwise, and ends up with a new wage contract worth W ′ = min P (x,y), P (x,y′)

(provided that W ′ exceeds the value of the worker’s initial contract, W , as otherwise the worker

would not have initiated the contract renegotiation in the first place).

It follows that the total value of a type-(x,y) match, P (x,y), solves the equation:

ρP (x,y) = p(x,y) + δ [U(x)− P (x,y)] .

The annuity value of the match, ρP (x,y), equals the output flow p(x,y) plus the expected

capital loss δ[U(x)− P (x,y)] of the firm-worker pair from job destruction.11

Given that U(x) is independent of firm type, the optimal mobility choices of workers hinge on

the comparison of match surplus P (x,y)−U(x) across jobs. It solves (ρ+δ) [P (x,y)− U(x)] =

p(x,y)− ρU(x). In what follows, we will mostly reason in terms of the match flow surplus:

σ(x,y) = p(x,y)− ρU(x).

A worker x employed in a job y accepts an offer from a job y′ if and only if P (x,y′)− U(x) >

P (x,y) − U(x). This is equivalent to σ(x,y′) > σ(x,y), so that the optimal strategy to ac-

cept/reject a job is entirely based on the comparison of flow surpluses. Likewise, an unemployed

worker x accepts a job of type y if σ(x,y) ≥ 0. In turn, the optimal strategy of firm y is to

accept any worker x if σ(x,y) ≥ 0. It follows that the dynamic optimization problem of the11Note that, under the sequential auction model, the realization the “other” risk faced by the firm-worker pair,

namely the receipt of an outside job offer by the worker, generates zero capital gain for the match: either theworker rejects the offer and stays, in which case the continuation value of the match is still P (x,y), or the workeraccepts the offer and leaves, in which case he receives P (x,y) while his initial employer is left with a vacant jobworth 0, so that the initial firm-worker pair’s continuation value is again P (x,y).

9

agents is solved simply by flow surplus comparisons (as hinted at in the example of Section 2).

Finally note that, in the sequential auction case, the value of unemployment, U(x), is given

by ρU(x) = b(x), implying σ(x,y) = p(x,y)− b(x), i.e. σ is pinned down by technology. Thus,

optimal mobility decisions are entirely determined by technology.

3.3 Steady-State Distribution of Skills and Skill Requirements

In our analysis of sorting below, the key object will be the steady-state equilibrium measure of

type-(x,y) matches, denoted by h(x,y), which indicates who matches with whom. It is deter-

mined by the following flow-balance equation, which embeds the optimal mobility decisions,12

δ + λ1EΓ

[1σ(x,y′) > σ(x,y)

]h(x,y) = λ0γ(y)1 σ(x,y) ≥ 0u(x)

+ λ1γ(y)

∫1σ(x,y) > σ(x,y′)

h(x,y′)dy′, (1)

where u(x) is the measure of type-x unemployed workers in the economy. Note that we use primes

(′) to denote random variables with respect to which expectations are taken. The left-hand side

(l.h.s.) of (1) is the outflow from the stock of type-(x,y) matches, comprising matches that are

destroyed at rate δ and matches that are dissolved because the worker receives a dominant outside

offer. The flow probability of this latter event is λ1EΓ [1 σ(x,y′) > σ(x,y)], the product of

the arrival rate of offers λ1 and the probability of drawing a job type y′ that yields a higher flow

surplus with the worker than the current type-y job. The right-hand side (r.h.s.) of (1) is the

inflow into the stock of type-(x,y) matches and composed of two groups: unemployed type-x

workers who draw a type-y job with flow probability λ0γ(y) and accept it (which they do if

the flow surplus is positive); and type-x workers employed in any type-y′ job who draw an offer

from a type-y job with flow probability λ1γ(y) and accept it (which they do if the flow surplus

with that job exceeds the one with their initial type-y′ job). The measure of type-x unemployed

workers solves the following flow-balance equation with similar interpretation:

λ0EΓ

[1σ(x,y′) > 0)

]u(x) = δ

∫h(x,y′)dy′. (2)

Finally note that, consistently with (1) and (2), the total measure of workers with skill bundle

x in the economy solves ℓ(x) = u(x) +∫h(x,y′)dy′.

12Throughout the paper, we use the notation EΓ to distinguish expectations taken w.r.t. the sampling distri-bution Γ from expectations w.r.t. the equilibrium distribution of matches, which we simply denote by E.

10

Note that the job acceptance rule of an employed worker in a type-(x,y) match hinges on the

comparison of two scalar random variables, σ(x,y′) and σ(x,y), despite the underlying multi-

dimensional heterogeneity of workers and firms. It is therefore convenient to introduce the con-

ditional sampling distribution Fσ|x of flow surplus σ, given x (with density fσ|x). With this no-

tation, the job acceptance probability for an employed worker x is EΓ [1 σ(x,y′) > σ(x,y)] =

F σ|x (σ(x,y)) and that for an unemployed worker is EΓ [1 σ(x,y′) > 0)] = F σ|x(0).

Substituting these elements into (1), we show in Appendix C that the matching density

h(x,y) has the following closed-form:

h(x,y)

ℓ(x)γ(y)=

δλ01 σ(x,y) ≥ 0δ + λ0F σ|x(0)

×δ + λ1F σ|x(0)[

δ + λ1F σ|x (σ(x,y))]2 .

This equation also implies that the equilibrium conditional density of job types y given worker

types x among employed workers is given by:

h (y|x) = δ1 σ(x,y) ≥ 0F σ|x(0)

×[δ + λ1F σ|x(0)

]γ(y)[

δ + λ1F σ|x (σ(x,y))]2 =

gσ|x (σ(x,y))

fσ|x (σ(x,y))× γ(y), (3)

where for any s ∈ R, the steady-state cross-section cdf of flow surplus among employed workers

of type x is

Gσ|x(s) := 1−δ + λ1F σ|x(0)

F σ|x(0)×

F σ|x(s)

δ + λ1F σ|x(s)

and gσ|x is its density.

4 Equilibrium Sorting

4.1 Measuring Sorting

We first specify a measure of sorting in this multi-dimensional environment under frictions.

A criterion that has been proposed for multi-dimensional positive assortative matching in

a frictionless context is that the Jacobian matrix of the equilibrium matching function be a

P -matrix, meaning that all its principal minors are positive (Lindenlaub, 2017). This criterion

captures the way in which a worker’s job type y improves or deteriorates as one varies the

worker’s skill bundle x when matching is pure, i.e. when any two workers with the same skill

bundle are matched to the exact same type of job. By contrast, in our frictional environment

with random search the equilibrium assignment is generally not pure — there is mismatch. A

11

natural extension of this measure of sorting to our environment is to consider changes in the

quantiles of the conditional matching distribution of job types y as one varies worker type x.13

Formally, let Hj(y|x) denote the marginal cdf of yj (the jth component of job attribute vector

y) conditional on the matched workers having skill bundle x. Using (3), we can express this as

Hj(y|x) =∫

1y′j ≤ y

h(y′|x

)dy′ =

δ[δ + λ1F σ|x(0)

]F σ|x(0)

∫ 1 σ(x,y′) ≥ 0 × 1y′j ≤ y

[δ + λ1F σ|x (σ(x,y′))

]2 γ(y′)dy′.

(4)

We are interested in signing, for each job attribute j, the elements of the gradient ∇Hj(y|x) =

(∂Hj(y|x)/∂x1, · · · , ∂Hj(y|x)/∂xX)⊤, i.e. the Jacobian of H(y|x). A situation of particular

interest is when a component of this Jacobian, ∂Hj(y|x)/∂xk, has a constant sign over the

support of yj . If that sign is negative [positive], then Hj(·|x) is increasing [decreasing] in xk in

the sense of FOSD: positive [negative] assortative matching then occurs in dimension (yj , xk),

as a worker with higher type-k skill is matched to jobs with greater (in the FOSD sense) type-j

skill requirement compared to a worker with lower type-k skill.14 We will thus use the following

formal definition of sorting:

Definition 1 (Positive and Negative Assortative Matching). Matching is positive [negative]

assortative in dimension (yj , xk) if and only if ∂Hj(y|x)/∂xk is negative [positive] for all y ∈

[yj, yj ] and all x ∈ X .

We will use the acronyms PAM and NAM for positive and negative assortative matching.

Alternatively, we will also refer to PAM (or NAM) as positive (or negative) sorting.15 To avoid

duplication of some results, we focus on positive sorting throughout most of the paper.

4.2 A Decomposition Result

We begin our analysis by showing how equilibrium sorting can be decomposed into sorting on

the unemployment-to-employment (UE) margin and on the employment-to-employment (EE)13We choose to analyze the equilibrium matching distribution of y given x and not that of x given y for

the following reason. While workers sample job types from an exogenous sampling distribution γ, jobs ‘sample’workers from an endogenous distribution (the distribution of workers across employment statuses and job types),which in itself is a complex equilibrium object. The acceptance decisions of firms would impact the distributionof x across employment statuses and job types, and the distribution of x, in turn, impacts the acceptancedecisions of the firms. Analyzing the matching distributions of x given y would therefore require us to deal witha complicated fixed point problem, which proved intractable.

14FOSD has been used to characterize sorting under frictions and 1D-heterogeneity (e.g. Chade, 2006).15Note that Definition 1 is “global” in the sense that it imposes a sign restriction on ∂Hj(y|x)/∂xk for all skill

bundles x ∈ X . Alternatively, we could have opted for a “local” definition by only imposing the weaker conditionthat ∂Hj(y|x)/∂xk be positive or negative at a given skill bundle x. In what follow we use the global version ofthis definition as sorting is commonly envisaged as a global property in the literature.

12

margin. As we show in Appendix C, a typical element of the gradient of Hj(y|x), which we use

to characterize sorting patterns (Definition 1), has two parts, indicating that a marginal increase

in the worker’s skill xk affects his equilibrium distribution of job types yj in two ways.16

∂Hj(y|x)∂xk

= CUE︸︷︷︸(1): UE margin

+ CEE︸︷︷︸(2): EE margin

(DC)

where components CUE and CEE are given by

CUE := gσ|x(0)× EΓ

[∂σ(x,y′)

∂xk| σ(x,y′) = 0

]×[PrΓ

y′j ≤ y|σ(x,y′) = 0

−Hj(y|x)

]+PrΓ

y′j ≤ y|σ(x,y′) = 0

×EΓ

[∂σ(x,y′)

∂xk| σ(x,y′) = 0, y′j ≤ y

]−EΓ

[∂σ(x,y′)

∂xk| σ(x,y′) = 0

]

and

CEE :=

∫ +∞

0

2λ1fσ|x(s)gσ|x(s)

δ + λ1F σ|x(s)× PrΓ

y′j ≤ y|σ(x,y′) = s

×

EΓ

[∂σ(x,y′)

∂xk| σ(x,y′) = s, y′j ≤ y

]− EΓ

[∂σ(x,y′)

∂xk| σ(x,y′) = s

]ds.

First, a marginal increase in skill xk affects the boundary of the set of profitable matches for that

worker, i.e. the set of job types y such that σ(x,y) ≥ 0. An increase in skill may render some

matches between unemployed workers and jobs profitable that were unprofitable before. This

is reflected in the first term of expression (DC) — our label for decomposition — which only

works through selection on the UE margin: term CUE contains (multiplicatively) the density of

marginally profitable matches for type-x workers, gσ|x(0). If the worker’s type x is such that

σ(x,y) > 0 for all job types y (i.e. if the worker accepts any job type when unemployed), then

there are no such marginal matches, gσ|x(0) = 0, and sorting on the UE margin is shut down.

Second, a marginal increase in xk affects the job acceptance probability for employed workers

as well. More specifically, an increase in xk changes the comparison between any two potential

matches involving the worker: for any two job types (y,y′), the difference σ(x,y′) − σ(x,y)

generally varies with xk. This, in turn, changes how employed workers reallocate between jobs

through on-the-job search. This effect, given by term CEE of expression (DC), operates through16Two important technical notes: PrΓA is used to denote the probability of A occurring following a random

draw of a job type y from the sampling distribution γ. Second, it may be that the joint event(σ(x,y′) = s, y′

j ≤ y)

on which some of the expectations in (DC) are conditioned have zero probability in γ. As explained in the detailedderivation, we set expectations conditional on zero-probability events to zero by convention.

13

sorting on the EE margin.

If we shut down sorting on the UE margin (for example by assuming σ(x,y) > 0 for all

(x,y)-pairs), then ∂Hj(y|x)/∂xk = CEE. In turn, if we shut down sorting on the EE margin (for

example by setting λ1 = 0, or by specifying a flow surplus function such that the job acceptance

probability of employed job seekers is independent of skills x), then ∂Hj(y|x)/∂xk = CUE. In

what follows, we will say that PAM [NAM] occurs on the EE margin whenever CEE is negative

[positive], and that PAM [NAM] occurs on the UE margin whenever CUE is negative [positive].

In other words, we analyze sorting on the EE margin “as if” sorting on the UE margin were shut

down, and vice-versa. This is obviously an expositional convention: in general, both margins of

sorting are present, and the contributions to overall sorting of both terms CUE and CEE need to

be signed in order to determine the sign of ∂Hj(y|x)/∂xk, as indicated by (DC).

To offer some economic intuition for (DC), consider the EE margin first. Term CEE is

negative if EΓ

[∂σ(x,y′)/∂xk | σ(x,y′) = s, y′j ≤ y

]− EΓ [∂σ(x,y

′)/∂xk | σ(x,y′) = s] ≤ 0 for

all s. In the one-dimensional case (Y = 1), one can show that this will be true if ∂2σ/∂xk∂y ≥ 0,

i.e. if xk and y are complements in the production technology, echoing the familiar finding from

the literature on one-dimensional, frictionless sorting that supermodularity implies PAM. While

things are more complex in the multi-dimensional world, we show below that complementarities

remain a key determinant of sorting patterns in that case. Note that a similar expression,

EΓ

[∂σ(x,y′)/∂xk | σ(x,y′) = 0, y′j ≤ y

]− EΓ [∂σ(x,y

′)/∂xk | σ(x,y′) = 0], is present in term

CUE , hinting at the importance of productive complementarities for UE sorting as well.

Beyond this basic intuition about the driving forces of sorting on the UE and EE margin,

those two margins involve complex interactions between the technology σ and the sampling

distribution of job types γ. This implies that terms CUE and CEE in (DC) cannot easily be

signed without further assumptions on the primitives. In order to make progress towards a

characterization of the sign of sorting, we focus in the next section on a class of technologies

for which we can derive clean and (with one exception) distribution-free conditions for positive

sorting. We investigate generalizations in the following section and in the appendix.

5 The Sign of Sorting: Bilinear Technology in Two Dimensions

5.1 Assumptions

The following two assumptions simplify decomposition (DC) considerably.

14

Assumption 1. (a) The production function p(x,y) is bilinear in (x,y):

p(x,y) = (x+ a)⊤Qy =

X∑k=1

Y∑j=1

qkj(xk + ak)yj

where Q = (qkj) 1≤k≤X1≤j≤Y

is a X × Y matrix and a = (a1, · · · , aX)⊤ ∈ RX+ is a fixed vector;

(b) the nonemployment income function b(x) is linear in x:

b(x) = (x+ a)⊤Qb =

X∑k=1

Y∑j=1

qkj(xk + ak)bj

where b = (b1, · · · , bY )⊤ ∈ RY is a fixed vector;

(c) there exists j ∈ 1, · · · , Y such that qj(x) := ∂p(x,y)/∂yj =∑X

k=1 qkj(xk + ak) > 0 for

all x ∈ X ; to fix the notation, we will assume w.l.o.g. that qY (x) > 0.

Assumptions 1 (a)-(b) restrict the production technology in such a way that the flow surplus

function σ(x,y) is bilinear in (x,y). Indeed they imply that:

σ(x,y) = p(x,y)− b(x) = (x+ a)⊤Q(y − b) =

X∑k=1

Y∑j=1

qkj(xk + ak)(yj − bj)

The technology matrix Q captures the complementarity structure between all job and worker

characteristics and will be crucial to our analysis of sorting. We interpret the vector b as the

production technology of the unemployed. In turn, we interpret vector a, which is a technological

parameter, as the baseline productivity of workers, noting that a⊤Qy is the output of a type-y

job filled with the least skilled worker, x = 01×X . We assume a to be positive (Assumption 1(a)).

This ensures that the worker’s total input into production, x + a, is positive in all dimensions

(recall that x ∈ RX+ ). While not strictly necessary for our analysis, this restriction ensures that

our sorting results do not change with the sign of x+ a. Finally, Assumption 1(c) ensures that,

for any level of worker skills, there is at least one job attribute, here denoted by yY , that impacts

output positively.Note that we do not impose monotonicity of the production function in all job

attributes. Nor do we restrict the monotonicity of the production or flow surplus function in

worker skills x. Next, we consider:

Assumption 2. Each job has Y = 2 attributes, i.e. y ∈ Y ⊂ R2+.

The sorting results established in the next subsections will rely on Assumptions 1 and 2 (our

15

baseline model). In the following section, we provide generalizations of our results on the sign

of sorting to other surplus functions and to higher dimensions of job heterogeneity. We now

investigate the sign of sorting along both the EE and the UE margin, based on (DC).

5.2 The EE Margin

We begin with the following result on the sign of EE sorting.

Theorem 1 (EE Sorting, Y = 2, Bilinear Technology). Under Assumptions 1 and 2, PAM

occurs in dimension (y1, xk) along the EE margin if and only if, for all y ∈ Y:

∂

∂xk

(∂p(x,y)/∂y1∂p(x,y)/∂y2

)> 0 or, equivalently:

∂

∂xk

(q1(x)

q2(x)

)> 0. (SC-2D)

Condition (SC-2D) is a single-crossing condition of the production function (also known as

Spence-Mirrlees condition, in its differential form). Note that, in the two-dimensional, bilinear

case at hand, (SC-2D) simply amounts to detQ > 0, which does not depend on any skill bundle

x and either holds or fails to hold for all x ∈ X . Single-crossing properties have been shown to

guarantee positive sorting in several one-dimensional matching problems.17

The analysis of our multi -dimensional matching model with search frictions and transferable

utility further highlights the importance of single-crossing as a driving force toward positive

sorting. Condition (SC-2D) states that the marginal rate of substitution between (y1, y2) is

increasing in worker skill xk. This implies that skill xk is more complementary to job attribute

y1 than to y2, which is why positive sorting occurs between between xk and y1 (but negative

sorting between between xk and y2 — in the dimension of relatively weak complementarity).18

To illustrate the single crossing condition in our setting and its implication graphically, we

consider two workers with skill bundles x′ = (x′1, x′2) and x′′ = (x′′1, x

′′2) such that x′′1 > x′1 and

x′′2 = x′2 (the second worker has more of x1 but both have equal amounts of x2). In Figure 1, we

plot for each worker the locus of jobs with which that worker produces the same output as with

the job that has attributes A. Single-crossing condition (SC-2D) implies that these isoquants

cross only once (at point A). Moreover, because the marginal rate of substitution is increasing

in x1, the curve of the more skilled worker is steeper. Consider point A as a benchmark with17In an important paper, Legros and Newman (2007) show that a single crossing property is sufficient to

guarantee PAM in frictionless one-dimensional problems with non-transferable utility (NTU). Chade, Eeckhoutand Smith (2017) then demonstrate that several one-dimensional matching problems with transferable utilityboth in environments with and without frictions can be recast as NTU, frictionless matching problems. Afterfinding the associated NTU problem, the Legros-Newman-condition can be applied and guarantees PAM.

18This statement about NAM in (y2, xk) holds true if ∂p(x,y)/∂y1 > 0 in addition to ∂p(x,y)/∂y2 > 0.

16

y1

y2

AB

p(x′,y) ≡ p(x′,yA)

p(x′′,y) ≡ p(x′′,yA)

Low x′1 High x′′1

Figure 1: Single Crossing Property

no sorting (both workers are matched to the same job). Condition (SC-2D) says the following:

if the lower-skilled worker weakly prefers job B over job A where B has lower y2 but higher y1,

then the worker with higher x1 strictly prefers job B, as is the case in the graph.

5.3 The UE Margin

The next result establishes conditions for positive sorting along the UE margin.

Theorem 2 (UE Sorting, Y = 2, Bilinear Technology). Under Assumptions 1 and 2 and under

single-crossing condition (SC-2D) (from Theorem 1), if for all x ∈ X :

1. q1(x) > 0 (i.e. p(x,y) is not only increasing in y2 but also in y1)

2. along all level curves of p(x, ·) (i.e. at all y such that p(x,y) = C for some fixed C ≥ 0):

q2(x)∂2 ln γ

∂y2∂y1− q1(x)

∂2 ln γ

∂y22≥ 0 (UE-2D)

3. at the lower support of γ, denoted by y = (y1, y

2): y

2≥ b2 and y

1< b1

then PAM occurs in dimension (y1, xk) along the UE margin. Moreover, (SC-2D) is also neces-

sary for PAM to occur generically under any sampling distribution γ.

Theorem 2 highlights the importance of single-crossing also for sorting on the UE margin. It

becomes a necessary condition for sorting if all possible sampling distributions are considered.

However, contrary to the EE margin, single crossing alone is not sufficient for PAM on the

UE margin: additional restrictions on the sampling distribution are needed, given by (UE-2D).

17

Recall that sorting on the UE margin occurs when a marginal increase in skill xk affects the

boundary of the set of profitable matches, i.e. the locus of y’s such that σ(x,y) = 0. Figure 2

shows how this boundary shifts with xk under the assumptions of Theorem 2, and helps visualize

the role of both single crossing and distributional restrictions in that theorem.

Yy2

y1

(b1, b2)

y1

y2

σ(x,y)=0

(highx ′′k )

σ(x,y) =0 (low

x ′k )

AB

C

D

Figure 2: Sorting along the UE margin

Figure 2 represents the (y1, y2) plane, where the origin is placed at b = (b1, b2). The shaded

area materializes Y, the support of γ: the (lower) boundaries of Y are the horizontal line at

y2 = y2

and the vertical line at y1 = y1, which are placed in compliance with Condition 3 in

Theorem 2. The oblique lines are zero level curves of σ(x, ·), which under the assumed linear

technology are given by y2 = b2 − q1(x)q2(x)

(y1 − b1). By Condition 1 in Theorem 2, such lines are

downward sloping and go through point y = b. The boundary of feasible matches for a given

skill bundle x is at the intersection between the zero level curve of σ(x, ·) and Y (the emphasized

line segment). Note that this boundary lies entirely in the region of Y where y1 < b1: because

it is assumed that y2≥ b2, it has to be the case that y1 < b1 for surplus to equal zero.

Those zero level curves are drawn for two workers x′ and x′′ with x′′k > x′k (and with the same

amount of all other skills). The higher-x′′k (blue) curve is steeper than the lower-x′k (black) one,

meaning that for a given y2, the more skilled worker needs a higher y1 to generate non-negative

surplus. The reason is as follows: by the single crossing property (SC-2D), complementarities

in production are stronger between xk and y1 than between xk and y2. Thus, the jobs under

consideration (with y1 < b1) are prone to generate surplus losses especially for those workers

with high xk. Therefore, for a given y2, workers with higher xk need jobs with higher y1 to

generate non-negative surplus, which is clearly a force towards PAM. In the figure, this means

18

that all job types between the black and the blue line can be profitably matched with the low

skilled (x′k) worker, but produce negative surplus with the high-skilled (x′′k) worker. This is why

all those jobs with relatively low attribute y1 drop out of his equilibrium matching set.

However, complementarities in production alone are not enough to ensure PAM on the UE

margin. To see this, consider points A, B, C and D on the figure. After increasing skill k from

x′k to x′′k, a worker no longer breaks even with a job at A. Moreover, jobs around B (with higher

y1 but lower y2) are also made unprofitable while jobs around C with lower y1 but higher y2

compared to A remain profitable. Therefore, if the sampling distribution γ has most of its mass

concentrated around points A, B and C (i.e. there is a negative correlation between y1 and y2)

then workers with higher xk will be matched to jobs with lower y1 (since jobs around B with

higher y1 have too little of y2, leading to negative surplus) — a force towards NAM. To prevent

this, we assume a sufficient degree of positive association between y1 and y2 in γ to ensure that

more mass is concentrated around points A and D. Notice that the distributional barrier to

PAM arising from a negative association of (y1, y2) becomes more severe the larger is the positive

impact of y2 on the surplus (i.e. the larger is q2(x), which makes the zero-surplus lines flatter).

How likely is condition (UE-2D) in Theorem 2, guaranteeing a sufficient positive association

of job attributes, to hold? Sufficient conditions on the sampling distribution are that the density

γ be log-supermodular and its marginals log-concave. This class of distributions is quite broad.

For instance, any bivariate distribution of independent random variables that has log-concave

marginals (e.g. the uniform distribution with independent random variables) satisfies (UE-2D).

Other examples of log-supermodular densities with log-concave marginals are the (truncated)

bivariate Gaussian with positive covariance and the multivariate Gamma distribution.19

5.4 Taking Stock

Both theorems on the sign of sorting show that sorting under multi-dimensional job heterogeneity

is fundamentally different from a comparable model with one-dimensional heterogeneity. In such

a model, there is no sorting on the EE margin (Postel-Vinay and Robin, 2002): the strategy

of firms is to accept any worker that yields positive surplus while the strategy of workers is to

accept all jobs that yield a higher (flow) surplus than the current one. The assumption that

the flow surplus is increasing in y (the one-dimensional version of Assumption 1) implies that19The multivariate Gamma distribution is defined by a linear combination of independent random variables that

have standard gamma distribution. Log-supermodularity of the multivariate Gamma distribution is implied byKarlin and Rinott (1980), Prop. 3.8, and its log-concavity by Shapiro, Dentcheva, Ruszczynski (2009), Th. 4.26.

19

all workers rank firms in the same way and try to climb up a single economy-wide job ladder.

Hence, all workers tend to move into higher-y jobs over time, which rules out sorting. Moreover,

regarding the UE margin, there is positive sorting in our multi-dimensional setting under the

specified conditions. But there would again be no sorting in the 1D model since any match in

which the job productivity is too low (y < b) would not form, independent of the worker’s skill.

Why does sorting arise only in the multi-dimensional model? By contrast to the one-

dimensional case, what matters here is not only to match skilled workers to productive jobs

in any dimension. Instead it is important to match a given worker to a job that requires much

of the skill in which the worker is particularly strong. Thus, workers with different skill bundles

rank firms differently, have different job ladders, and therefore accept and reject different types

of jobs. This heterogeneity of job ladders induced by multiple skill dimensions is one of our

crucial insights and the reason why sorting arises in this multi-dimensional setting.

6 The Sign of Sorting: Generalizations

In this section, we discuss how our results on the sign of sorting generalize to other environ-

ments. We are able to relax both Assumptions 1 (bilinear technology) and 2 (two-dimensional

firm types). First, we extend our results to the case of a nonlinear technology with an unrestricted

number of job attributes Y ≥ 2, provided that the technology be monotone in at least one yj .

Second, we investigate a ‘separable’ technology (possibly non-linear and non-monotone), again

for general Y ≥ 2 dimensions of job attributes.20 In all these cases, we provide sufficient condi-

tions on primitives — technology and distribution of job attributes — that guarantee positive

sorting. Finally, all of our results on the sign of sorting also apply to other commonly used wage

bargaining protocols, beyond the sequential auction setting we have described in detail. These

generalizations are technically more involved than the results from the presented baseline model.

To keep the text focussed on the simplest case that conveys the full intuition, we state these

theorems along with a detailed discussion in Appendix A and only give a brief overview here.

Theorem 5 in Appendix A establishes sufficient conditions for sorting on the EE margin,

generalizing Theorem 1 to the case of non-linear monotone technologies for Y ≥ 2 (where

‘monotone’ holds for at least one dimension yj). Similar to our baseline model, a (generalized)

20The ‘separable’ class includes the popular specification of Tinbergen (1956) p(x,y) = c0 −∑X

i=1 ci(xi − yi)2,

where X = Y and where the ci’s are strictly positive numbers. In this example, each job has an “ideal” skillbundle given by y, and output is a decreasing function of the distance between the worker’s skill bundle x andthat ideal skill bundle.

20

single-crossing condition is at the center of this result. It implies stronger complementarities

between productivity-skill pairs (yj , xk) for all j < Y than between (yY , xk), pushing towards

positive sorting in all pairs (yj , xk) except (yY , xk). Contrary to our baseline result, however, this

theorem reveals that in more general environments the conditions for sorting are not distribution-

free. We need to place restrictions on the correlation of job attributes in the sampling distribution

γ. This is to prevent distributional obstructions to positive sorting between skill xk and all the

yj ’s that could stem from negative correlation among the different job attributes yj .

Theorem 5 is our most general result on EE sorting and is useful because it nests several

special cases of interest. Not only does it nest Theorem 1 above, but also the considerably simpler

conditions for sorting under nonlinear, monotone surplus functions with Y = 2 (Corollary 3) as

well as those under bilinear surplus functions with Y > 2 (Corollary 4).

These generalizations on EE sorting highlight that under richer forms of job heterogeneity

(Y > 2), the conditions for sorting do not only involve single-crossing of the technology but also

distribution γ. We can only circumvent this complication in the case of separable technologies

where there is no interaction between the different job attributes in production: In Theorem 7,

we dispense with the monotonicity assumption from Theorem 5 but impose that the skill under

consideration xk only complement a single yj and be irrelevant to other job dimensions (that is,

we impose that ∂2p(x,y)/∂xk∂yj = 0 and that ∂2p(x,y)/∂xk∂yj′ = 0 for j′ = j). We show that

there is PAM on the EE margin in the pair (yj , xk) that satisfies single-crossing in production.

Lastly, we focussed on surplus splitting via sequential auctions without worker bargaining

power for expositional clarity only. We show in Appendix B that our results on the sign of sorting

hold for other wage setting rules, as Nash bargaining (Mortensen and Pissarides, 1994; Moscarini,

2001), wage/contract posting (Burdett and Mortensen, 1998; Moscarini and Postel-Vinay, 2013),

or sequential auctions with worker bargaining power (Cahuc, Postel-Vinay, and Robin, 2006).

To summarize, we are able to provide sufficient conditions to characterize the sign of sorting

for (i) several broad classes of technology, (ii) all commonly used wage bargaining protocols and

(iii) an arbitrary number of job and worker attributes. However, departing from our baseline

case with bilinear technology and two-dimensional job heterogeneity complicates our analysis

considerably and requires restriction on the sampling distribution. Yet, across all of the environ-

ments that we consider, a single-crossing condition on the production technology that guarantees

sufficiently strong complementarities is the linchpin of positive sorting under random search.

21

7 Interrelation of Sorting Patterns Across Dimensions

We first approached the question of sorting “dimension by dimension”, focusing on the equilib-

rium relation between some skill xk and some job attribute yj (Section 5). Section 6 generalizes

those results to higher dimensions of job heterogeneity. More precisely, we give conditions under

which sorting is positive in dimensions (yj , xk) for a given skill xk and all j ∈ 1, · · · , Y − 1,

Y ≥ 2, keeping the other skills k′ = k fixed. This corresponds to signing all elements of any

column k in the Jacobian matrix of the conditional matching distribution,

∂H(y|x)∂x⊤ =

∂H1(y|x)

∂x1· · · ∂H1(y|x)

∂xX

.... . .

...∂HY (y|x)

∂x1· · · ∂HY (y|x)

∂xX

(5)

except for the last element of a column, ∂HY (y|x)/∂xk.

In this section we further explore the interdependence of sorting patterns across dimensions of

worker skills and job attributes, thereby completing the analysis of sorting. We first investigate

the effect of a simultaneous expansion of all skills, which corresponds to signing the elements

of any row j of (5), i.e. the gradient of Hj(y|x). This enables us to compare the matching

distributions of any two workers in the economy and illustrates how the degree of worker spe-

cialization matters for sorting. Second, we address sorting in the “remaining dimension” (yY , xk)

(that was not addressed in Section 6) by investigating the sign of ∂HY (y|x)/∂xk. The results

in this section are established under our baseline Assumption 1, i.e. for a bilinear technology.21

7.1 Sorting Patterns Across Dimensions of Skills

Our first result is based on the simultaneous expansion of all skills and shows that there is no

sorting on “absolute advantage”: if two workers x and x′ are such that one is twice as productive

as the other in all jobs, x′+a = 2(x+a), then both workers are matched to the same distribution

of jobs in equilibrium, irrespective of the complementarities in production.

Theorem 3. Under Assumption 1, ∀j ∈ 1, · · · , Y :

(x+ a)⊤∇Hj (y|x) = 0

i.e. the function (x+ a) 7→ Hj (y|x) is homogeneous of degree 0 in (x+ a) for all j.21We show in the Appendix that those results are straightforward to generalize to the case of a homogeneous

technology, i.e. a technology such that σ (−a+ t(x+ a),y) = tα · σ(x,y) for any positive scaling factor t.

22

Theorem 3 addresses the case of a simultaneous expansion of all skills such that the sum

x + a is scaled up, i.e. it considers an expansion in the direction of x + a. Since a is only a

fixed productivity parameter that does not differ across workers (see Assumption 1), there is no

obvious reason why workers’ skills should expand along this particular direction. We therefore

focus on a generic skill expansion in x instead. To illustrate its effect on the worker’s assignment,

we focus on the case X = Y = 2, so that x = (x1, x2), y = (y1, y2), and a = (a1, a2), and further

impose assumptions on technology and distributions that guarantee PAM in dimension (y1, x1),

i.e. ∂H1(y|x)/∂x1 ≤ 0 (and NAM in dimension (y1, x2)).22 In that context, we consider a

marginal expansion of a worker’s skills from (x1, x2) to (x1 +∆x1, x2 +∆x2). Using Theorem

3, the resulting change in H1(y|x) is:

∆H1(y|x) ≃ (x1 + a1)∂H1(y|x)

∂x1

[∆x1

x1 + a1− ∆x2

x2 + a2

](6)

Equation (6) has three implications. First, it confirms our earlier finding that ∆H1(y|x) = 0 if

∆x1x1+a1

= ∆x2x2+a2

(i.e. no change in sorting if the worker improves his skills in the direction of x+a).

Second, (6) shows more generally that a marginal (but not necessarily proportional) im-

provement in both skills will cause the worker to match with jobs with stochastically higher y1

attributes if and only if ∆x1x1+a1

> ∆x2x2+a2

. This condition says that his total productive input,

consisting of his individual skills x plus the baseline productivity a, is increased proportionately

more in dimension 1 than in dimension 2. As a result, the simultaneous improvement in both

skills will cause the worker to sort into jobs with stochastically higher y1 (but with lower y2).

A third implication of (6) is that under a proportional increase in both skills (∆x1x1

= ∆x2x2

),

∆H1(y|x) < 0 if and only if x1/x2 > a1/a2. In other words, scaling up all skills leads to a

stochastically better distribution of job matches in the dimension where the worker is specialized

relative to the baseline productivity vector a. By contrast, scaling up all skills leads to a

deterioration of the distribution of job matches in the second dimension, i.e. ∆H2(y|x) > 0.

Scaling up all of a worker’s skills simultaneously thus has a non-uniform effect on the worker’s

distribution of jobs across dimensions, which depends on his specialization. Our interpretation

is that this multi-dimensional model does not feature any hierarchical sorting based on absolute22Assumptions that ensure this are: Q is a positive matrix with detQ > 0, γ is a bivariate normal distribution

truncated over Y with positive covariance, and y2− b2 ≥ 0 > y

1− b1. Condition (SC-2D) from Theorem 1 writes

as (x2 + a2) detQ > 0, which holds here by assumption. Hence, the example has PAM on the EE margin. Next,because x + a is a positive vector and Q a positive matrix, p(x,y) is increasing in both y1 and y2, satisfyingCondition 1 of Theorem 2. The truncated normal with positive covariance satisfies Condition (UE-2D), as shownin Section 5, and Condition 3 from Theorem 2 is satisfied by assumption. PAM thus also occurs on the UE margin.

23

advantage but instead features sorting based on specialization.23

The content of these results is quite different when skills are one-dimensional. Consider

Theorem 3 for the one-dimensional case X = 1 (i.e. x and a are scalars x and a, Q is a

1 × Y row vector so that y = Q(y − b) is a scalar, and the flow surplus function σ(x,y) =

(x+ a)y). In this case, Theorem 3 echoes a known result: there cannot be sorting, in the sense

that ∂Hj(y|x)/∂x = 0 between x and any yj , j ∈ 1, · · · , Y . In other words, x and yj are

independent in the population of job-worker matches (Postel-Vinay and Robin, 2002).

We end this section with a result that follows directly from Theorem 3, addressing the

interrelation of sorting patterns between a given job attribute and different skills. That is, we

focus on signing any row j of (5):

Corollary 1. Under Assumption 1, if PAM occurs between yj and xk for all skill dimensions

but one, i.e. for k ∈ 1, · · · , X\K, then NAM occurs between yj and xK .

The underlying cause of Corollary 1 becomes most transparent in the special case X = Y = 2

with the UE margin shut down. Suppose for example that the single-crossing condition (SC-2D),

∂[q1(x)q2(x)

]/∂x1 > 0, holds, which is equivalent to detQ > 0, implying PAM in dimension (y1, x1).

However, detQ > 0 also implies ∂[q1(x)q2(x)

]/∂x2 < 0, leading to NAM in dimension (x2, y1).

By the same argument, detQ > 0 implies PAM in dimension (x2, y2) but NAM in dimension

(x1, y2). This illustrates that PAM cannot arise simultaneously across those two dimensions

because the (necessary and sufficient) single-crossing conditions cannot hold simultaneously for

x1 and x2. Put differently, detQ > 0 says that complementarities in (x1, y1) and (x2, y2)

“dominate” complementarities in (x2, y1) and (x1, y2), which is why PAM occurs within task

dimensions 1 and 2 but NAM occurs between those dimensions. Again in this case of Y = 2,

this occurs for purely technological reasons, independent of the sampling distribution.24

This section contains two main insights, both of which are empirically testable. First, our

model predicts that multi-dimensional sorting under random search occurs based on special-

ization instead of absolute advantage: uniformly better workers do not match with uniformly

better jobs but instead with jobs that are suited to their mix of skills. Similar to Section 5,

the underlying reason is that workers with different skill bundles rank jobs differently and climb23It is important to note that this result is not due to our assumption of no capacity constraint on the firm

side but appears in models of multi-dimensional sorting more generally. The absence of sorting based on absoluteadvantage can also be obtained in a multi-dimensional model with capacity constraints (e.g. Lindenlaub, 2017).

24Things are much more involved in higher dimensions Y ≥ 3, where the intuition underlying Corollary 1 isless clear. Yet, the result still holds: under Assumption 1, sorting cannot be simultaneously positive between agiven yj and all xk, k ∈ 1, · · · , X. More details are available upon request.

24

different job ladders. Here, we allow all of a worker’s skills to vary simultaneously, which illus-

trates more clearly how his entire skill bundle matters for sorting. Second, Corollary 1 shows

that sorting cannot be simultaneously positive between a given job dimension and all skill di-

mensions. Instead, there are sorting trade-offs and agents need to decide which skill dimension

to ‘sacrifice’. PAM arises between a job attribute and all skill dimensions except the one that is

characterized by the weakest complementarities, where NAM materializes.

7.2 Sorting Patterns Across Dimensions of Job Attributes

We now return to the interrelation of sorting patterns across job attributes for a given skill

xk, i.e. in any column of (5). The following theorem implies that the way the conditional

distributions of job attributes Hj(y|x) (and thus the conditional expectations) co-vary with skill

xk follows a pattern, which is driven by the technology Q:

Theorem 4. Under Assumption 1, if σ(x,y) > 0 for all y ∈ Y and x ∈ X (no UE sorting), then:

(x+ a)⊤Q∂E (y|x)∂x⊤ = 01×X .

For illustration, we consider again the two-dimensional case (X = Y = 2). Figure 3 shows the

locus of the pair (E(y1|x),E(y2|x)) when one of the components of x (say x1) varies. Theorem

4 implies that the normal vector to this locus is Q⊤(x + a) at all points, so the slope of the

E(y|x)-locus is entirely determined by the technology. Moreover, to fix ideas, Figure 3 was

drawn under the assumption that output is increasing in all components of y (all components of

Q⊤(x+a) are positive), so that the slope of the E(y|x)-locus is negative. We consider again our

two workers with skill bundles x′ = (x′1, x′2) and x′′ = (x′′1, x

′′2) such that x′′1 > x′1 and x′′2 = x′2.

If there is PAM within (y1, x1), then Theorem 1 implies ∂E(y1|x)/∂x1 > 0, and the higher-x′′1

worker will be matched on average to a job with higher y1 than the lower-x′1 worker, as shown

on Figure 3. However, the average job of the more skilled worker will have lower y2 than the

average job of the lower skilled worker, i.e. ∂E(y2|x)/∂x1 < 0.

More generally, we can use Theorem 4 to obtain some insights about sorting in dimension

(yY , xk) — the remaining dimension in any column of (5) in which the sign of sorting is not

determined by Corollary 4 (Section 6 and Appendix A).

Corollary 2. Under Assumption 1, assuming qj(x) ≥ 0 for all j ∈ 1, · · · , Y and σ(x,y) > 0

for all y ∈ Y and x ∈ X , then if PAM occurs in dimensions (yj , xk), j ∈ 1, · · · , Y − 1, there

cannot be PAM in dimension (yY , xk).

25

E(y2|x)

E(y1|x)

Q⊤(x′ + a)

E(y|x = x′)Low x′1

Q⊤(x′′ + a)

E(y|x = x′′)

High x′′1

Figure 3: Sorting Across Job Dimensions

Corollary 2 implies that if PAM occurs in dimensions (yj , xk), j ∈ 1, · · · , Y − 1, and if

there is sorting in dimension (yY , xk) — i.e. if ∂HY (y|x)/∂xk has a constant sign across the

entire support of yY — then sorting in this last dimension can only be negative.25 Note that

the requirement that production be increasing in all job attributes yj is essential for Corollary

2: if instead qj(x) < 0 for some j, then any sorting pattern (PAM, NAM, or neither) can arise

in the residual dimension (yY , xk) (example available on request).

Corollary 2 echoes Corollary 1 but focusses on the job attribute dimensions (instead of the

skill dimensions): sorting cannot be simultaneously positive between a given skill and all job

attributes. Once again there are sorting trade-offs. Agents need to decide which dimension of

job attributes to ‘sacrifice’, and base that decision on the relative strength of complementarities

in the technology. We provide conditions under which PAM arises between a skill and all job

attributes except for the pair that is characterized by the weakest complementarities, where

forces push towards NAM.

8 Numerical Application

In this final section, we explore the usefulness of our theoretical framework for applied work.

We have two main objectives. First, based on our theory, we develop an empirical test of the

assumption that job and worker heterogeneity can be captured by scalar indexes, i.e. that job25A stronger result than Corollary 2 holds in the two-dimensional case Y = 2. In this case, it follows directly

from Theorem 1 that, under Assumption 1 and with σ(x,y) > 0 for all y ∈ Y, if PAM arises in dimension(y1, xk), then there must be NAM in dimension (y2, xk), and vice versa. This is the case depicted on Figure 3.

26

and worker heterogeneity are both one-dimensional. Second, we highlight some of the potential

errors in the analysis of sorting and mismatch caused by imposing the 1D assumption in cases

when it is not justified (i.e. when our test indicates that this restriction is violated).

For the purpose of this exercise, we proceed in a controlled environment so that we know

the underlying data generating process (DGP).26 We simulate an economy based on our multi-

dimensional model, which we consider to be the true DGP. We then fit a deliberately misspecified

1D model to these data, based upon which we implement our test and analyze potential errors

regarding sorting and mismatch. We begin with a discussion of identification and estimation of

the 1D model before advancing to our two main exercises.

8.1 Estimation of a Misspecified 1D Model

8.1.1 The Model

The 1D model that we consider in this section is identical to our multi-dimensional model except

for two differences: we restrict heterogeneity to be 1D, X = Y = 1, and we do not impose any

functional form restriction on the flow surplus function σ. Importantly, we do not assume

monotonicity of σ in x and y, as this would limit the ability of the 1D model to produce sorting.

We only make the following identifying assumptions:

Assumption 3. The flow surplus function σ has the following properties:

1. x → maxy∈Y σ(x, y) is well-defined and strictly increasing in x.

2. y → maxx∈X σ(x, y) is well-defined and strictly increasing in y.

Under those assumptions, the 1D model can in principle generate sorting since we do not

assume monotonicity of σ in y. As we show below (and as has been established in the literature

for more sophisticated versions of the same model), this model is non-parametrically identified up

to strictly increasing transforms of x and y. We therefore normalize worker and firm attributes

by specifying the model in terms of their ranks:

Assumption 4.

1. x is uniformly distributed over X = [0, 1] in the population of workers.

2. y is uniformly distributed over Y = [0, 1] in the population of firms.26The natural first step before going to real data is to show on simulated data that our new test works.

27

Below we will estimate the (misspecified) 1D model following the literature on identifying

sorting on unobserved (to economists) scalar attributes of workers and firms.

8.1.2 Identification and Estimation

The simulated sample on which we estimate the misspecified 1D model is a panel of N workers,

indexed i ∈ 1, · · · , N, sorting themselves into M firms, j ∈ 1, · · · ,M. Time is discretized

for the purposes of simulation, and workers are followed over T periods, t ∈ 1, · · · , T. A

typical observation is described as a vector Jit, σi,Jit,t, where Jit ∈ 1, · · · ,M is the identity

of the worker’s employer at date t (we further normalize Jit = 0 if worker i is unemployed at

date t so that Jit also indicates the worker’s employment status), and σi,Jit,t is the flow surplus

achieved in the match between worker i and firm Jit (missing when Jit = 0).27

The steps below establish identification of the 1D model based on this sample. The identifi-

cation proof is constructive, and therefore also provides a practical estimation protocol.

Worker types. The maximum attainable surplus for a type-x worker is maxy∈Y σ(x, y) in this

model which, by Assumption 3, is a strictly increasing function of x. Any worker i’s type xi can

thus be estimated as:

xi = QW (max σi,Jit,t : t = 1, · · · , T)

where QW is the quantile function in the population of workers.

Firm types and surplus. The flow surplus σi,Jit,t generated by a match between worker i and

firm Jit equals σ (xi, yJit). Flow surplus is thus observed for any viable match involving any firm

j in the sample, i.e. for matches between firm j and any type-x worker such that σ(x, yj) ≥ 0:

σ(x, yj) = mean σi,Jit,t : xi = x, Jit = j, t = 1, · · · , T .27The assumption that the flow surplus is observed in the data is obviously a shortcut. What is typically

observed in practice are wages, not match output or surplus. However, in the context of the family of searchmodels considered in this paper, it has been shown elsewhere in the literature that the flow output from a matchbetween a worker i and a firm j is identified from the maximum wage earned in firm j by workers with the sametype as i, while unemployment income is identified from the wages earned by workers hired from unemployment— see for example Lamadon et al. (2015) for details. Because these particular identification issues are peripheralto the question addressed in this section (namely the distinction between one- vs multidimensional heterogeneity),we bypass them by assuming that flow surplus is directly observed.

28

Knowledge of σ(x, yj) then allows estimation of any firm j’s type, yj (by Assumption 3):

yj = QF

(max

σ(x, yj) : x ∈ [0, 1]

)

where QF is the quantile function in the population of firms.

At this stage we have estimates (xi, yj) of the types of every worker and firm in the sample,

as well as estimates σ(xi, yj) of the surplus of every viable match. Together those allow for

non-parametric estimation of σ over the set of viable type-(x, y) matches. They also allow for

the construction of the equilibrium distribution of employer types y conditional on employed

workers of any given type x, which in turn permits the analysis of equilibrium sorting.

Sampling distribution and offer arrival rates. A type-x worker exiting unemployment

draws his employer type from the conditional density γ(y)/γ y′ : σ(x, y′) ≥ 0. Moreover, the

job finding rate of any unemployed worker of type x is λ0γ y′ : σ(x, y′) ≥ 0. Combining those

two properties, one obtains the following estimator of λ0γ(yj) for any employer j in the sample:

λ0γ(yj) = Pr Jit = j | xi = x, ei,t−1 = 0, eit = 1 × JF (xi)

where JF (xi) is the empirical unemployment exit rate of type-xi workers. Note that the above

estimator is valid for any worker type in the sample. This, together with the constraint that γ

must integrate to 1, affords estimates of the sampling distribution γ and the unemployed offer

arrival rate λ0.

Finally, the probability of an employer change occurring for a type-x in a type-y firm is

λ1γ y′ : σ(x, y′) > σ(x, y) for all x ∈ [0, 1]. The term γ y′ : σ(x, y′) > σ(x, y) is known

from previous steps, allowing us to construct the following estimator of λ1:

λ1 = mean

Pr Jit = Ji,t−1 | xi = x, Ji,t−1 = j, ei,t−1 = eit = 1

γy′ : σ(xi, y′) > σ(xi, yj)

.

8.1.3 Parameterization and Basic Estimation Results

All of the examples shown below are based on simulations of 100,000 workers with two dimensions

of skills and (monthly) parameter values of λ0 = 0.3, λ1 = 0.1, and δ = 0.025.28 The population

distribution of skills, L, is Gaussian with mean (0, 0) and correlation corrℓ(x1, x2) = −0.5,28All samples are simulated for 480 months (40 years), after a 100-year burn-in period to reach steady-state.

29

Specification: 0 (1D model) 1 2 3

Q

(1 00 0

) (1 11 1

) (4.5 21.5 4

) (4.5 −2−1.5 4

)

a (1, 1)⊤

b (0, 0)⊤

Table 1: Examples of Technologies

truncated over [0, 1]2. The sampling distribution of job attributes, Γ, is also Gaussian with

mean (1, 1) and correlation corrΓ(y1, y2) = 0.33, truncated over [1, 2]2.

We consider four different examples of the bilinear technology (parameters Q, a and b),

shown in Table 1. All parameterizations have a = (1, 1)⊤ and b = (0, 0)⊤, implying that

unemployed workers accept all job offers and there is no sorting on the UE margin.

In Example 0, the DGP is truly one-dimensional: the technology described by the Q matrix

in this example implies that p(x,y) = x1 · (y1 + 1), meaning that only dimension 1 of worker

and firm heterogeneity matters for production. We will use this example to check that our

estimation procedure returns the true parameter values when the model is correctly specified

and to illustrate the validity of our proposed tests of the single-index assumption.

Examples 1, 2, and 3 are all two-dimensional models. In those examples the three different

Q-matrices are designed to feature different patterns of complementarities that, according to

Theorem 1, produce certain sorting patterns. Specifically, Example 1 is the case of no sort-

ing, within or between dimensions. Both Example 2 and 3 have positive sorting within tasks,

and negative between tasks but differ in the complementarity structure. While the technology

in Example 2 features worker-job complementarities both within and between dimensions, in

Example 3, there are complementarities within but substitutabilities between dimensions.

We next fit a model with 1D heterogeneity to the simulated data generated from our four

examples, where we follow the identification protocol explained above. In the cases of Examples

1, 2 and 3, the 1D fitted model is misspecified, as the true DGP is 2D. Figures 4 and 5 show

contour plots of the estimated scalar attributes x and y as functions of the underlying true

(vector) attributes x and y, respectively.

Panel (a) on each figure relates to Example 0, the 1D model. The “iso-x”-curves (the contour

lines on the graph) on Figure 4(a) are vertical lines in the (x1, x2)-plane: as expected, the

estimated x coincides with the true x1 in this correctly specified 1D model. Figure 5(a) shows

30

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

(a) Example 0

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

(b) Example 1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

(c) Example 2

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

(d) Example 3

Figure 4: True and Estimated Worker Types

1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 21

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

(a) Example 0

1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 21

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

(b) Example 1

1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 21

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

(c) Example 2

1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 21

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

(d) Example 3

Figure 5: True and Estimated Firm Types

that the same applies to firm attributes, albeit with slightly more estimation noise.

Panels (b) relate to Example 1, the case with no sorting in the ‘real’ 2D world. The contour

lines on Figure 4(b) roughly coincide with lines of slope −1 in the (x1, x2)-plane, as was intuitively

31

(a) Example 0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

6

7

8

9

10

11

12

13

14

15

(b) Example 1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

20

25

30

35

40

45

(c) Example 2

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

8

10

12

14

16

18

(d) Example 3

Figure 6: Estimated Surplus Function

expected given the underlying symmetric technology of Example 1. Figure 5(b) suggests the

same pattern for firm attributes, but again with slightly more noise. In turn, panels (c) and (d)

on Figures 4 and 5 plot these results for Examples 2 and 3 from Table 1 that feature varying

complementarities across dimensions. It is no longer true in those examples that the “iso-type”

curves have slope −1 in the (x1, x2) and (y1, y2)-planes. For instance, in Example 2, the “iso-x”

curves are steeper than in Example 1. This means that in the 1D estimation more weight is

put on skill x1 than skill x2, possibly owing to the relatively stronger complementarities within

dimension 1 compared to dimension 2 (see the true technology Q in Example 2). A similar

pattern arises in Example 3, here more pronounced for the estimates of firm types, Figure 5(d).

Figure 6 analyzes the estimated 1D surplus. Figure 6(a), which again pertains to Example

0, plots the estimated surplus σ as a function of the true surplus σ, as a check of the precision of

our estimation procedure. The figure shows that, despite some estimation noise, the estimated σ

is reasonably close to the true σ in this correctly specified model. Finally, Figures 6(b), (c) and

(d) plot the estimated flow surplus function σ for Examples 1, 2 and 3 as functions of estimated

scalar types (x, y). They show that the estimated surplus function increases in x and y (up to

some estimation noise) in all three examples, which implies that no sorting should occur in any

of these examples, were the 1D model correctly specified.

32

We now move on to our two main exercises: first, we design and apply a test of the single-

index assumption to our simulated data to check the validity of the 1D-approximation. Second,

we ask about the consequences of imposing the 1D-assumption in cases that fail our test.

8.2 A Test of the Single Index Restriction

With few exceptions, the random search literature on sorting and mismatch has operated under

the assumption that firm and worker heterogeneity can both be captured by scalar indexes —

what we refer to as the single index restriction. We now use our theory to develop a test of that

restriction. We then show on our simulated data that it correctly reveals misspecification.

8.2.1 Principles

The test is based on a simple observation that holds in the 1D (as well as in the multi-D) version

of our model: Two workers with equal skill types employed in jobs with the same attributes

make identical job mobility decisions, i.e. have equal job transition probabilities and equal job

acceptance sets. Therefore, if the 1D model is correctly specified, then any two workers with the

same estimated 1D skill index x that are currently employed in two jobs with the same estimated

1D job attribute y should have the same job switching probabilities and job acceptance sets.

They should climb the same job ladder. However, if the true data is not 1D, this prediction is

generally not true, revealing misspecification.

We further illustrate the principle of our test on Figure 7, which is based the parameterization

in Example 1. Figure 7(a) shows two workers with true skill bundles (x1, 0) and (0, x2), but

whose estimated 1D skill indexes x are the same (as the two true skill bundles are on the same

“iso-x” line). Figure 7(b) further depicts a situation where those two workers are employed in

jobs with different true attributes y, but equal estimated 1D attributes y on the same ‘iso-y” line.

Thus, from the perspective of the 1D model, those two workers look exactly the same, even

though in reality they have different skill bundles and work in different jobs. If the 1D model

were correctly specified, those two workers should have equal job-switching probabilities and job

acceptance sets. Yet in reality, they differ on both counts: as the figure shows, worker 1, whose

true skill bundle is (x1, 0), will accept all jobs with a higher y1 than his current one, while worker

2, whose true skill bundle is (0, x2), will only take jobs with higher y2. This shows that the two

workers will have different job acceptance sets, and therefore generically different job-switching

33

probabilities.29 The reason is that the underlying 2D-type workers climb different job ladders.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

(a) Two workers with equal x

1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 21

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

(b) Acceptance sets

Figure 7: Illustration of the Single Index Test

Because acceptance sets are difficult to work with in practice, we implement our test based

on job-switching probabilities. In the 2D model, those equal λ1F σ|x (σ(x,y)) for a type-x worker

employed in a type-y job. In the 1D world, the job-switching probability is λ1F σ|x (σ(x, y)).

Either way, the probability of a worker switching jobs only depends on the worker’s type and

the surplus of the worker’s current match. Based on those observations, we implement our

single-index test in two ways.

8.2.2 Implementation

Observed Multi-D Attributes. If true 2D worker and job attributes, x and y, are ob-

servable, we implement our test by regressing a job-switching indicator on σ(x, y), (a flexible

polynomial in) x, and x1, x2, y1, y2 (or a subset of those latter four variables), in a cross-section

sample of workers. If the 1D model were correctly specified, the coefficients on all variables other

than σ(x, y) and x should equal zero, while the coefficient on σ(x, y) should be negative and

statistically significant. Conversely, finding that any of the true two-dimensional attributes x1,

x2, y1 or y2 is a significant determinant of the job-switching probability conditional on σ(x, y)

and x, would be inconsistent with the 1D restriction.30

Table 2 shows results from our test, run on each of our four simulated examples. In each case,29It also shows that both workers will accept some offers with lower estimated y than they currently have,

and reject some higher y’s. This is also at odds with this simple version of the 1D model. We do not see thisas a strong rejection of the single index restriction, though, as even the 1D model could easily be extended toaccommodate reallocation down the job ladder by allowing for exogenous reallocation shocks. We briefly discussthis issue below.

30An alternative specification of the same test would consist of regressing the job-switching indicator on x, yand (x1, x2, y1, y2) (or a subset of those latter four variables). The results lead to the same conclusions.

34

we first regress a job-switching indicator on σ(x, y) and x. Second, we regress the job-switching

indicator on σ(x, y), x and multi-D attributes x and y.31 The test results are as expected. The

one-dimensional Example 0 passes the test, while all other three examples — whose DGP is two-

dimensional — fail it (at least one, and usually all of the “additional regressors” are estimated

to be significant determinants of job switching).

Unobserved Multi-D Attributes. Implementing the test described in the previous para-

graph requires that one observe at least some of the multi-D worker and job attributes to include

them in the set of “additional regressors”. This may not always be the case. We therefore present

an alternative test in Table 3, which only relies on estimated 1D attributes, x and y.

This alternative test consists of regressing a job-switching indicator on σ(x, y), (a flexible

polynomial in) x, and moments (in this case, the mean and standard deviation) of the individual-

level distributions of past estimated 1D job types. If the 1D model is correctly specified, then

the history of estimated job types should not add any relevant information on the occurrence of

an employer change conditional on σ(x, y) and x. Conversely, if the DGP is really 2D, then the

history of estimated job types y from the misspecified 1D model will convey information about

the worker’s true skill type x and true current job type y, over and above what the current job’s

estimated y conveys. The worker’s job history would thus impact his job-switching probability.

Table 3 shows that, as expected, the one-dimensional Example 0 passes the test. But all

other three examples fail it, as at least one, and usually both of the “additional regressors” are

estimated to be significant determinants of job-switching. This suggests that this alternative

test also correctly detects model misspecification.

As a consistency check, we also test the validity of the 2D assumption in our simulated data.

We apply the second version of our test (i.e. regressing a job-switching indicator on the true flow

surplus σ(x,y), the true two-dimensional skill type x and the mean and standard deviations

of past true job types y). If the 2D assumption is valid, past job types should not provide

any information on job switching conditional on current flow surplus σ(x,y) and worker type x.

Table 4 contains the results, showing that the 2D assumption is not violated in our 2D examples.31In Example 0 (the truly 1D model), x1 and y1 are excluded from the list of additional regressors because

they are the true 1D worker and job attributes, and are therefore equal (up to estimation error) to x and y.

35

Specification: 0 1

σ −0.058[−0.061,−0.056]

−0.058[−0.061,−0.056]

−0.019[−0.020,−0.018]

+0.002[−0.003,−0.006]

x +0.104[0.099,0.109]

+0.104[0.099,0.109]

+0.084[0.079,0.089]

+0.021[0.007,0.034]

x1 — — — −0.019[−0.036,−0.002]

x2 — −0.001[−0.005,0.002]

— −0.022[−0.039,−0.004]

y1 — — — −0.064[−0.079,−0.049]

y2 — +0.001[−0.002,0.004]

— −0.066[−0.081,−0.051]

Specification: 2 3

σ −0.007[−0.007,−0.007]

+0.001[−0.001,−0.002]

−0.016[−0.016,−0.015]

−0.006[−0.010,−0.003]

x +0.090[0.085,0.096]

+0.017[−0.000,0.034]

+0.089[0.083,0.094]

+0.030[0.014,0.046]

x1 — −0.019[−0.033,−0.006]

— +0.010[−0.005,0.024]

x2 — −0.018[−0.030,−0.007]

— +0.003[−0.011,0.016]

y1 — −0.069[−0.082,−0.055]

— −0.047[−0.062,−0.032]

y2 — −0.069[−0.082,−0.055]

— −0.024[−0.034,−0.014]

Note: Dependent variable is a job-switching indicator. All regressions include a constant.95% confidence intervals are reported in small type under coefficients.

Table 2: Test of the 1D Restriction

Specification: 0 1

σ −0.058[−0.061,−0.056]

−0.058[−0.061,−0.056]

−0.019[−0.020,−0.018]

−0.019[−0.020,−0.019]

x +0.104[0.099,0.109]

+0.104[0.099,0.109]

+0.084[0.079,0.089]

+0.084[0.079,0.089]

Mean of past y — +0.005[−0.006,0.016]

— +0.017[0.006,0.027]

SD of past y — +0.015[−0.003,0.033]

— +0.033[0.015,0.052]

Specification: 2 3

σ −0.007[−0.007,−0.007]

−0.007[−0.007,−0.007]

−0.016[−0.016,−0.015]

−0.017[−0.017,−0.015]

x +0.090[0.085,0.096]

+0.091[0.086,0.096]

+0.089[0.083,0.094]

+0.089[0.084,0.094]

Mean of past y — +0.021[0.010,0.017]

— +0.017[0.006,0.027]

SD of past y — +0.036[0.017,0.055]

— +0.029[0.010,0.049]



36

Specification: 1 2 3

σ −0.020[−0.021,−0.019]

−0.020[−0.021,−0.019]

−0.007[−0.007,−0.007]

−0.007[−0.007,−0.007]

−0.016[−0.017,−0.015]

−0.016[−0.017,−0.016]

x1 +0.069[0.064,0.073]

+0.070[0.065,0.074]

+0.078[0.073,0.083]

+0.079[0.074,0.084]

+0.075[0.070,0.079]

+0.076[0.071,0.081]

x2 +0.066[0.062,0.071]

+0.067[0.063,0.072]

+0.064[0.059,0.068]

+0.064[0.060,0.069]

+0.065[0.061,0.067]

+0.065[0.061,0.070]

Mean of past y1 — +0.012[0.001,0.023]

— +0.008[−0.004,0.019]

+0.008[−0.003,0.019]

Mean of past y2 — +0.009[−0.002,0.020]

— +0.012[0.000,0.023]

+0.008[−0.002,0.019]

SD of past y1 — +0.004[−0.022,0.030]

— −0.001[−0.028,0.027]

−0.006[−0.031,0.012]

SD of past y2 — +0.009[−0.017,0.035]

— +0.012[−0.016,0.040]

+0.018[−0.009,0.044]



8.2.3 Additional Remarks

It is important to note that our tests rely on within-worker predictions of our model and, as

such, do not require the 1D production function to be monotone — i.e., they do not require

identical 1D job ladders across worker types.

Also, as mentioned above (footnote 29), the analysis in this section is relevant to a common

practice in the literature, which is to include exogenous job-to-job reallocation shocks (sometimes

referred to as ‘Godfather shocks’) to match the observed share of job-to-job transitions down

the job ladder. According to our theory, what appear to be moves down the 1D job ladder may

in fact be moves up the 2D job ladder. We leave it for future research to study the share of

downward moves explained by misspecification of worker and job heterogeneity.

Finally, yet another alternative version of our test consists of proxying the job acceptance

set by the expected accepted job type after a job switch (instead of the job-switching indicator

used above). The results are qualitatively similar as those above and available on request.

8.3 Consequences of the Single Index Restriction

We now turn to the consequences for sorting and mismatch of imposing the single index assump-

tion when the test indicates that it is violated.

8.3.1 Sorting

We first examine the sorting patterns in the true data. They are in line with the model’s

predictions and reported in the middle column of Table 5, which shows regressions of each yj ,

37

j = 1, 2, on both x1 and x2 in the cross-section of employed workers, for each of our examples.32

Example 0 (1D model) does not feature any sorting due to the monotone technology and the two-

dimensional regressions correctly reflect that, as worker and job attributes are not correlated.

Example 1 is a 2D model where the single-crossing condition is not satisfied and thus there is no

sorting, as shown in the second panel of the table. The technology in Examples 2 and 3 satisfies

the conditions for PAM, and this is picked up in the regressions in the third and fourth panel,

which show a positive within but negative between-correlation of worker and job attributes.

We then examine the predictions of the 1D model in terms of sorting between the estimated

(scalar) job and worker attributes, denoted by y and x. To this end, we simply regress y on x

in a cross section of job-worker matches, as we did in each dimension of heterogeneity for the

correctly specified 2D models. The results are in the third column of Table 5 and suggest that

relying on the misspecified 1D model for inference on sorting patterns is misleading (except for

Example 0 which is 1D and thus not misspecified; it correctly predicts ‘no sorting’). In Example

1, in which there is truly no sorting in any pair (xk, yj) in the data, the 1D model predicts

positive sorting: the coefficient in the regression of y on x is positive and statistically significant.

Similar results hold for Example 2: the 1D model predicts a positive correlation between y and

x, whereas the true sorting patterns are positive within skill dimensions and negative between.

In Example 3, however, where the true 2D model features in fact the strongest positive sorting

within heterogeneity dimensions (as measured by the regression coefficients in the fourth panel,

second column of Table 5), the 1D model fails to predict any clear sorting pattern: the regression

coefficient of y on x is very small and not statistically significant.

These results suggest that the 1D sorting estimates are largely uninformative about the true

underlying sorting patterns. Worse, relying on a 1D model to assess sorting when the true data

is 2D can be misleading: remember that, because the 1D surplus function was estimated to

be increasing in y (Figure 6), no sorting should occur in equilibrium between y and x if agents

actually behaved as per the 1D model. The fact that y and x are found to be positively correlated

in Examples 1 and 2 should therefore alert the researcher to some form of model misspecification.

In Example 3, however, things are even worse: The 1D estimates look perfectly consistent with

the estimated surplus function (y and x are uncorrelated), even though they suggest very different

sorting patterns than the true model. This highlights the importance of testing the validity of32Note that this is not a test of FOSD of the matching distributions in worker skills but FOSD implies the

monotonicity of the conditional means that we report here.

38

Specification: True 2D model Misspecified 1D model

0E(y1|x) = 1.751 +0.004

[−0.002,0.010]x1 −0.000

[−0.006,0.006]x2

E(y2|x) = 1.512 +0.003[−0.003,0.010]

x1 −0.003[−0.010,0.003]

x2E(y|x) = 0.736 −0.001

[−0.007,0.005]x

1E(y1|x) = 1.688 −0.005

[−0.011,0.002]x1 +0.000

[−0.006,0.006]x2

E(y2|x) = 1.686 +0.000[−0.006,0.006]

x1 −0.001[−0.007,0.005]

x2E(y|x) = 0.670 +0.010

[0.004,0.016]x

2E(y1|x) = 1.687 +0.024

[0.018,0.030]x1 −0.027

[−0.033,−0.020]x2

E(y2|x) = 1.686 −0.028[−0.035,−0.022]

x1 +0.026[0.020,0.032]

x2E(y|x) = 0.671 +0.009

[0.003,0.015]x

3E(y1|x) = 1.709 +0.085

[0.079,0.092]x1 −0.094

[−0.100,−0.088]x2

E(y2|x) = 1.650 −0.130[−0.137,−0.124]

x1 +0.127[0.120,0.133]

x2E(y|x) = 0.667 +0.003

[−0.003,0.009]x

Note: 95% confidence intervals are reported in small type under coefficients.

Table 5: Sorting Patterns

the single index assumption separately, as we have done above.

8.3.2 Mismatch

Labor market sorting matters for at least two reasons. First, allocative efficiency requires certain

sorting patterns governed by the technology. Second, positive sorting is a driver of inequality.

In what follows, we will focus on efficiency. Our objective is to asses the efficiency loss from

mismatch and to gauge the magnitude of the welfare loss/gain from implementing the ‘optimal’

allocation suggested by the misspecified 1D model in a world where heterogeneity is really 2D.

In this exercise, we take the populations of workers and active jobs as given and ask to what

extent the equilibrium can be improved by eliminating search frictions (i.e. letting λ0 → ∞

and λ1 → ∞) and implementing the worker-job allocation from the frictionless benchmark. By

focusing on the reallocation of workers into existing jobs, we take a short-term perspective. This

is a common approach in the search literature, which circumvents the issue that, absent capacity

constraints, allowing for free entry and exit of jobs in the frictionless world would cause the job

type distribution to degenerate, with all workers being employed in the most productive firm.33

We compute the allocations (and surpluses) under the frictionless benchmark in the 1D

and 2D models.34 The results are reported in Table 6. For the 2D model in the top part33See, e.g. Bagger and Lentz (2016) or Hagedorn, Law and Manovskii (2016) for a similar approach.34The allocation and sorting of this frictionless benchmark with fixed multivariate distributions of workers

39

Specification: 1 2 3

True 2D model

E[σ(x,y)] 12.298 36.902 15.446E[σ∗(x,y)] 13.458 40.531 17.234E[σ∗

1D(x,y)] 12.415 37.277 15.550Surplus Loss from Mismatch -0.086 -0.090 -0.104Surplus Gain from 1D Optimal Allocation rel. to 2D Allocation 0.010 0.010 0.007Surplus Loss from 1D Optimal Allocation rel. to 2D Optimum -0.077 -0.080 -0.098

Misspecified 1D model

E[σ(x, y)] 12.209 36.830 15.358E[σ∗(x, y)] 13.482 40.564 17.016Surplus Loss from Mismatch -0.094 -0.092 -0.097

Table 6: Expected Surplus and Mismatch

of the table, the rows display, in this order: average actual surplus under search frictions,

E[σ(x,y)], the average optimal surplus from reallocating workers to jobs in a surplus-maximizing

way when frictions vanish, E[σ∗(x,y)], and the average surplus when implementing the 1D

optimal allocation, E[σ∗1D(x,y)] (here we implement the 1D allocation but then, to compute the

average surplus, take into account the agents’ true 2D types). Based on these statistics, we then

compute the percentage surplus loss from mismatch E[σ(x,y)]−E[σ∗(x,y)]E[σ∗(x,y)] , the percentage surplus

gain/loss from implementing the optimal 1D allocation instead of the actual 2D allocationE[σ∗

1D(x,y)]−E[σ(x,y)]E[σ(x,y)] , and the percentage surplus loss from implementing the optimal 1D allocation

instead of the optimal 2D allocation E[σ∗1D(x,y)]−E[σ∗(x,y)]

E[σ∗(x,y)] (rows 4 to 6). Similar notation is used

for the 1D model in the lower panel of Table 6.

Three findings stand out. First, both models indicate that the surplus loss from mismatch

is sizable in all examples (around 10%). Interpreting the surplus loss as a measure of mismatch,

the 1D model overestimates mismatch in two out of three specifications relative to the 2D model.

Second, implementing the 1D frictionless allocation generates essentially no welfare gain (around

1% or less) compared to the equilibrium 2D frictional allocation. Third, and most striking, the

results indicate that welfare losses from implementing the “wrong” optimal allocation (the one

suggested by the 1D model), relative to implementing the correct 2D one can be substantial

(between 7.7% and 9.8% of aggregate surplus). Policy recommendations based on an analysis

that imposes the single index assumption when it is not valid can thus be severely misguided.

and jobs is analyzed in Lindenlaub (2017): Examples 2 and 3 feature PAM while Example 1 yields a matchingfunction that in not one-to-one. We compute these frictionless allocations numerically using a linear program.

40

9 Conclusion

We analyze sorting in a standard random-search setting, but where both workers and jobs have

multi-dimensional characteristics. We first define notions of multi-dimensional positive and

negative assortative matching (PAM and NAM) in this frictional environment, based on FOSD-

monotonicity of the equilibrium matching distribution. For instance, if more skilled workers

are matched to jobs with stochastically better attributes, we call this PAM. We then provide

conditions on the primitives of this economy under which positive sorting arises in equilibrium,

where we distinguish sorting on the UE and the EE margins. In all the environments we study,

the key restriction on the primitives for PAM to occur between a given worker skill and a given

job attribute is a single-crossing condition on the technology. It guarantees that the skill and

job attribute at hand are sufficiently complementary in production relative to other skills and

job attributes. But contrary to well-known results on one-dimensional sorting, our conditions

for multi-dimensional sorting are generally not distribution-free.

Two main insights emerge from our analysis. First, workers with different skill specializations

look for and accept different kinds of jobs, meaning that they move up and down different job

ladders. This is why sorting arises. Remarkably, multi-dimensional heterogeneity is per se a

source of sorting: in comparable one-dimensional models, all workers share the same ranking of

firms (i.e. face the same job ladder), which prevents sorting. Second, there are sorting trade-

offs, in the sense that positive sorting, when it occurs, will do so along the dimensions of skills

and job attributes that are strong complements in production relative to other dimensions. But

negative sorting tends to arise in the dimension of weakest complementarity.

Our theory has important implications for applied work. We use it as the basis for an empir-

ical test of the commonly used single index assumption which states that both worker and job

heterogeneity are one-dimensional. We then show in a series of simulation exercises that — when

the test fails — approximating workers’ and jobs’ true multi-dimensional characteristics by one-

dimensional indices may lead to quantitatively and even qualitatively mistaken conclusions about

the sign and extent of sorting as well as mismatch. Consequently, policies that are based on a

(misspecified) one-dimensional model and aim to reduce allocative inefficiencies can be misguided

and lead to welfare losses. We provide a theoretical framework and, along with it, a definition

of sorting that can (i) guide the measurement of sorting in the data and (ii) be used as a basis

for policies that aim to improve worker-job allocations when heterogeneity is multi-dimensional.

41

References

[1] Bagger, J. and R. Lentz (2016) “An Empirical Model of Wage Dispersion with Sorting”.

Working Paper.

[2] Becker, G. S. (1973) “A Theory of Marriage: Part I”, Journal of Political Economy, 81(4),

813-46.

[3] Burdett, K. and D. T. Mortensen (1998) “Wage Differentials, Employer Size and Unem-

ployment”, International Economic Review, 39, 257-73.

[4] Cahuc, P., F. Postel-Vinay and J.-M. Robin (2006) “Wage Bargaining with On-the-job

Search: Theory and Evidence”, Econometrica, 74(2), 323-64.

[5] Chade, H. (2006) “Matching with Noise and the Acceptance Curse”, Journal of Economic

Theory, 129, 81-113.

[6] Chade, H., J. Eeckhout and L. Smith (2017) “Sorting Through Search and Matching Models

in Economics”, Journal of Economic Literature, forthcoming.

[7] Guvenen, F., B. Kuruscu, S. Tanaka, and D. Wiczer (2015) “Multidimensional Skill Mis-

match”, manuscript, University of Minnesota.

[8] Eeckhout, J. and P. Kircher (2010) “Sorting and Decentralized Price Competition”, Econo-

metrica, 78, 539-74.

[9] Hagedorn, M., T. Law and I. Manovksii (2014) “Identifying Equilibrium Models of Labor

Market Sorting”, NBER Working Paper No. 18661.

[10] Heckman, J. and Scheinkman, J. (1987) “The Importance of Bundling in a Gorman-

Lancaster Model of Earnings”, Review of Economic Studies, 54(2), 243-255.

[11] Karlin, S. and Y. Rinott (1980) “Classes of Orderings of Measures and Related Correla-

tion Inequalities. I - Multivariate Totally Positive Distributions”, Journal of Multivariate

Analysis, 10, 467-98.

[12] Lamadon, T., J. Lise, C. Meghir and J.-M. Robin (2015) “Matching, Sorting, and Wages”,

manuscript, University of Chicago.

42

[13] Legros, P. and A. F. Newman (2007) “Beauty is a Beast, Frog is a Prince: Assortative

Matching with Nontransferabilities”, Econometrica, 75(4), 1073-102.

[14] Lindenlaub, I. (2017) “Sorting Multidimensional Types: Theory and Application”, Review

of Economic Studies, 84(2): 718-789.

[15] Lise, J., and F. Postel-Vinay (2015) “Multidimensional Skills, Sorting, and Human Capital

Accumulation”, manuscript, University College London.

[16] Mortensen, D. T. and C. A. Pissarides (1994) “Job Creation and Job Destruction in the

Theory of Unemployment”, Review of Economic Studies, 61(3), 397-415.

[17] Moscarini, G. (2001) “Excess Worker Reallocation”, Review of Economic Studies, 69(3),

593-612.

[18] Moscarini, G. and F. Postel-Vinay (2013) “Stochastic Search Equilibrium”, Review of Eco-

nomic Studies, 80(4), 1545-81.

[19] Postel-Vinay, F. and J.-M. Robin (2002) “Equilibrium Wage Dispersion with Worker and

Employer Heterogeneity”, Econometrica, 70(6), 2295-350.

[20] Sanders, C. (2012) “Skill Uncertainty, Skill Accumulation, and Occupational Choice”,

manuscript, Washington University in St. Louis.

[21] Shapiro, A., D. Dentcheva and A. Ruszczyński (2009) Lectures on Stochastic Programming

- Modeling and Theory, MPS-SIAM series on optimization, 9.

[22] Shimer, R., and L. Smith (2000) “Assortative Matching and Search”, Econometrica, 68(2),

343-69.

[23] Smith, L. (2006).“The Marriage Model with Search Frictions," Journal of Political Economy,

114(6), 1124-1146.

[24] Villani, C. (2009) Optimal Transport: Old and New. Springer, Berlin.

[25] Yamaguchi, S. (2012) “Tasks and Heterogeneous Human Capital”, Journal of Labor Eco-

nomics, 30(1), 1-53.

43

APPENDIX

A Generalizations of Theorems 1 and 2

A.1 Monotone Technology with Y ≥ 2

We begin with sufficient conditions for sorting on the EE margin, which was addressed by Theorem 1

for the case of Y = 2 and bilinear technology. The following theorem relaxes both Assumptions 1 and 2,

generalizing Theorem 1 to the case of monotone technologies for Y ≥ 2:

Theorem 5 (EE Sorting, Y ≥ 2, Monotone Technology). If, for all x ∈ X :

1. p(x,y) is three times differentiable in y

2. (a) p(x,y) is strictly increasing in yY (monotonicity)

(b) for all y ∈ Y, limyY →yYp(x,y) < b(x) and limyY →yY

p(x,y) = +∞

(c) for all ℓ ∈ 1, · · · , Y − 1,∂

∂xk

(∂p(x,y)/∂yℓ∂p(x,y)/∂yY

)> 0 (SC-YD)

3. if Y ≥ 3, then for all (i, j) ∈ 1, · · · , Y − 12, i = j, and along all level curves of p(x, ·):

∂p

∂yY

[(∂p

∂yY

)2∂2 ln γ

∂yi∂yj+

∂p

∂yi

∂p

∂yj

∂2 ln γ

∂y2Y− ∂p

∂yj

∂p

∂yY

∂2 ln γ

∂yi∂yY− ∂p

∂yi

∂p

∂yY

∂2 ln γ

∂yj∂yY

]

− ∂ ln γ

∂yY

[(∂p

∂yY

)2∂2p

∂yi∂yj+

∂p

∂yi

∂p

∂yj

∂2p

∂y2Y− ∂p

∂yj

∂p

∂yY

∂2p

∂yi∂yY− ∂p

∂yi

∂p

∂yY

∂2p

∂yj∂yY

]

−(

∂p

∂yY

)2∂3p

∂yi∂yj∂yY+

∂p

∂yi

∂p

∂yY

∂3p

∂yj∂y2Y+

∂p

∂yj

∂p

∂yY

∂3p

∂yi∂y2Y− ∂p

∂yi

∂p

∂yj

∂3p

∂y3Y

+∂p

∂yY

[∂2p

∂yj∂yY

∂2p

∂yi∂yY− ∂2p

∂y2Y

∂2p

∂yi∂yj

]> 0 (EE-YD)

then PAM occurs in all dimensions (yℓ, xk) other than ℓ = Y along the EE margin.

The proof is in Appendix D.7. Conditions 2a (monotonicity) and 2b (yY is an “essential” input) are

technical and ensure that p(x,y) is invertible w.r.t. one of the y’s (which w.l.o.g. we take to be yY ).

Conditions 2a and 2b further ensure that the support of γ keeps a lattice structure under a change of

variables (see proof). Condition 2c is central to the theorem and generalizes the single-crossing condition

(SC-2D) from Theorem 1. It states that there exists a job attribute yY , satisfying Conditions 2a and 2b,

that is among those which are less complementary to xk than all other yℓ.

While Theorem 1 provided distribution-free, necessary and sufficient conditions for EE sorting in the

case of Y = 2, Theorem 5 only provides sufficient conditions that are no longer distribution-free (even if

we assume bilinear technology, see Corollary 4): Condition 3 — or, equivalently, equation (EE-YD) —

restricts both the sampling distribution and its interaction with the production function in a complicated

44

way. In essence, equation (EE-YD) places restrictions on the way the job attributes are associated in

the sampling distribution γ.

The need to restrict the sampling distribution in the case of Y > 2 can be explained as follows.

Because complementarities between skill k with all job attributes ℓ (ℓ = Y ) are stronger than with

attribute Y (Condition 2c), a worker with higher xk would ideally move towards jobs with higher levels

of yℓ for all ℓ = Y (and possibly with lower level of yY ). Yet, if the attributes yℓ are too strongly

negatively associated (and/or there is a positive correlation between yℓ and yY ), then such moves may

not be feasible. The role of condition (EE-YD) is to prevent these distributional obstructions to positive

sorting in dimensions (yℓ, xk), for ℓ = Y .

Theorem 5 is our most general result on EE sorting and is useful because it nests several special

cases of interest. We show in the following corollaries that the sufficient conditions for sorting simplify

considerably for nonlinear, monotone surplus functions with Y = 2 (Corollary 3) as well as bilinear

surplus functions with Y > 2 (Corollary 4).

Corollary 3 (EE Sorting, Y = 2, Monotone Technology). Under Assumption 2 (Y = 2), if for all

x ∈ X , p(x,y) is twice continuously differentiable and quasi-concave in y, strictly increasing in y2 with

miny2∈R p(x, (y1, y2)) < b(x) for all y1 ∈[y1, y1

], and such that:

∂

∂xk

(∂p(x,y)/∂y1∂p(x,y)/∂y2

)> 0

then PAM occurs in the (y1, xk) dimension along the EE margin.

The proof is in Appendix D.8. If we restrict the number of job attributes to two, the sufficient

conditions for EE sorting from Theorem 5 become considerably simpler. Condition (EE-YD) disappears

altogether: the role of that condition was to prevent the sampling distribution from having job attributes

yℓ (ℓ = Y ) associated in a way that countered the force toward PAM in (yℓ, xk) arising from comple-

mentarities. This issue of association between Y − 1 job attributes becomes moot when Y = 2. The

reason is that in this case we focus on conditions for PAM in a single dimension (Y − 1 = 1). Stronger

complementarities between (y1, xk) than between (y2, xk) are then sufficient to ensure PAM in (y1, xk).

This result again highlights that as long as we restrict our attention to two job attributes, we can specify

distribution-free conditions for EE sorting.

Another useful special case on EE sorting that emerges from Theorem 5 is for bilinear production

functions (Assumption 1), which satisfy monotonicity (as, by Assumption 1, qY (x) > 0).

Corollary 4 (EE Sorting, Y > 2, Bilinear Technology). Under Assumption 1, if for all x ∈ X :

1. for all y ∈ Y, limyY →yYp(x,y) < b(x), and yY = +∞

45

2. for all ℓ ∈ 1, · · · , Y − 1,

∂

∂xk

(∂p(x,y)/∂yℓ∂p(x,y)/∂yY

)> 0 or, equivalently:

∂

∂xk

(qℓ(x)

qY (x)

)> 0 (SC-YD)

3. for all (i, j) ∈ 1, · · · , Y − 12, i = j, and along all level curves of p(x, ·):

qY (x)2 ∂

2 ln γ

∂yi∂yj+ qi(x)qj(x)

∂2 ln γ

∂y2Y− qj(x)qY (x)

∂2 ln γ

∂yi∂yY− qi(x)qY (x)

∂2 ln γ

∂yj∂yY> 0 (EE-YD’)

then PAM occurs in all dimensions (yℓ, xk) other than ℓ = Y along the EE margin.

The proof is in Appendix D.9. Condition 1 echoes Condition 2b of Theorem 5 and is of the same tech-

nical nature. Condition 2 parallels the generalized single crossing condition from Theorem 5. In contrast

to Theorem 1 that focuses on sorting under two-dimensional job heterogeneity, the case with general

Y -dimensional heterogeneity requires restrictions on the sampling distribution (Condition 3), which is a

special case of Condition 3 in Theorem 5. This condition is again satisfied if log-supermodularity of γ in

job attribute yi and any other job attribute, yj (i, j = Y ) sufficiently dominates any log-supermodularity

in pairs (yi, yY ) and (yj , yY ) (assuming qi(x) > 0, qj(x) > 0 for all i, j). Like in Theorem 5, this condi-

tion limits distributional barriers to PAM in dimensions (yℓ, xk), ℓ = Y , that could arise from negative

association among the attributes yℓ in γ.

Condition (EE-YD’) is more demanding on γ than condition (UE-2D) since it involves the relative

strength of log-supermodularity in the various dimensions as well as subtle interactions with the tech-

nology. Nevertheless we can show that the set of models satisfying (EE-YD’) is not empty. For instance,

for a (truncated) multivariate normal distribution, (EE-YD’) reads

−qY (x)2Σ−1

ij +Σ−1ji

2 |Σ|− qi(x)qj(x)

Σ−1Y Y

|Σ|+ qj(x)qY (x)

Σ−1iY +Σ−1

Y i

2 |Σ|+ qi(x)qY (x)

Σ−1Y j +Σ−1

jY

2 |Σ|> 0 (7)

where Σ−1ij is element ij of the inverse of the covariance matrix and |Σ| is its determinant. Condition (7)

holds if the correlation between yi and yj is sufficiently strong.35

We finally generalize Theorem 2 on UE sorting to Y -dimensional heterogeneity of job attributes.

Similar to Corollary 4 on EE sorting, in this more general setting sorting requires complex restrictions

on the sampling distribution Γ.

Theorem 6 (UE Sorting, Y > 2, Bilinear Technology). Under Assumption 1 and Conditions 1-3 from

Corollary 4, if for all x ∈ X :

1. qj(x) > 0 for all j ∈ 1, · · · , Y (i.e. p(x,y) is increasing in all job attributes)

35As an example, consider Y = 3 and assume that standard deviations satisfy θi = θj = θ. Then, (7) reads:

(θ4/ |Σ|2)(q23(τ12 − τ13τ23)− q1q2(1− τ2

12)− q2q3(τ13 − τ12τ23)− q1q3(τ23 − τ12τ13))> 0

This inequality holds if, for instance, τ12 = 1 and τ13 = τ23 = 0 (where τij = θij/θiθj = corrΓ(yi, yj)).

46

2. the following condition holds along all level curves of p(x, ·) and for all j = 1, ..., Y − 1:

qY (x)∂2 ln γ

∂yY ∂yj− qj(x)

∂2 ln γ

∂y2Y≥ 0 (UE-YD)

3. denoting the lower support of y by y =(y1, · · · , y

Y

):

Y∑j=1

[qkjqj(x)

−maxj′

qkj′

qj′(x)

]qj(x)

(yj− bj

)≤ 0

then PAM occurs in all dimensions (yℓ, xk) other than ℓ = Y along the UE margin.

The proof is in Appendix D.10. Theorem 6 shows that signing the sorting patterns on the UE margin

is even more involved than signing EE sorting. It requires additional assumptions on the sampling

distribution Γ, similar to those we discussed under two-dimensional heterogeneity of jobs. Condition

(UE-YD) (which again imposes some degree of positive association between job attributes in the sampling

distribution) has the same interpretation as (UE-2D) in Theorem 2 for Y = 2. It prevents distributional

obstructions to PAM that could occur despite sufficient complementarities in production.36 Lastly,

Condition 3 restricts the lower support of the sampling distribution and echoes Condition 3 from Theorem

2, extended to Y -dimensional heterogeneity in job attributes.

A.2 Separable Technology with Y ≥ 2

Arguably the most substantive restriction placed by Theorem 5 on the production technology is strict

monotonicity of p(x,y) w.r.t. at least one job attribute yY . While it covers a wide range of applications, it

excludes some important special cases, such as the popular “bliss point” specification. An example (which

goes back to, at least, Tinbergen, 1956) would be, in the case X = Y , p(x,y) = c0 −∑X

i=1 ci(xi − yi)2,

where the ci’s are strictly positive numbers. In this example, each job has an “ideal” skill bundle given

by y, and output is a decreasing function of the distance between the worker’s skill bundle x and that

ideal skill bundle. This and other related specifications are covered in the following theorem:

Theorem 7 (EE Sorting, Y ≥ 2, Separable Technology). If for all x ∈ X :

1. p(x,y) is continuously differentiable w.r.t. y

2. for a given k and for all ℓ ∈ 2, · · · , Y , ∂2p(x,y)/∂xk∂yℓ = 0 (separability)

3. for all y ∈ RY+, ∂2p(x,y)/∂xk∂y1 > 0

then PAM occurs in the (y1, xk) dimension along the EE margin.

36Note that there can arise a tension between (UE-YD) and (EE-YD’) that is also assumed to hold here.Importantly, however, both conditions can be satisfied simultaneously. E.g., in the case of X = Y = 3 and whereγ is truncated normal, if τ12 is sufficiently high, e.g. τ12 = 1, and τ13 and τ23 are low, e.g. τ13 = τ23 = 0, thencondition (EE-YD’) holds strictly (see Footnote 15) while condition (UE-YD) holds with equality.

47

The proof is in Appendix D.11. The key restriction imposed in Theorem 7 is Condition 2 (separabil-

ity), which states that there are components of the worker’s skill bundle that are only relevant to perform

task 1, i.e. that are neither complement nor substitute with any other task ℓ ∈ 2, · · · , Y . The sign

of sorting on the EE margin between job attribute of task 1 and the skills that are only relevant to task

1 can then be determined: it has the sign of ∂2p(x,y)/∂xk∂y1.37 Assumption 2 is easily satisfied, for

instance, by production functions that only feature within-complementarities of skills and job attributes

but no complementarities across tasks. For example, under the “bliss point” specification mentioned

above (p(x,y) = c0 −∑X

i=1 ci(xi − yi)2), Theorem 7 establishes that there will be positive within-skill

sorting along the EE margin (i.e. H1 (·|x) increases in x1 in the FOSD sense), but says nothing about

between-skill sorting (i.e. it says nothing about the monotonicity of Hℓ (·|x) w.r.t. x1 for ℓ ≥ 2).

Note that the case of separable technologies is special: the characterization of sorting in Theorem

7 is independent of the sampling distribution Γ irrespective of the dimensionality of job heterogeneity.

In particular, the restrictions on the sampling distribution (EE-YD) in Theorem 5 are only needed

if complementarities in (y1, xk) compete with complementarities between xk and other dimensions yℓ,

ℓ = 1. Theorem 5 then guarantees PAM in all but one dimensions (yℓ, xk), ℓ = Y, while Theorem 7 for

separable production functions only ensures sorting in a single dimension (y1, xk).

B Alternative Wage Setting Protocols

In this appendix, we explore generalizations of our results to wage setting protocols other than the pure

Sequential Auction model of Postel-Vinay and Robin (2002). For notational brevity, we first denote the

surplus of a match as S(x,y) = P (x,y) − U(x). Under any wage setting rule, so long as workers and

firms have linear preferences over wages, the surplus and unemployment value functions are defined by:

(ρ+ δ)S(x,y) = p(x,y)− ρU(x) + λ1

∫m(x,y,y′) [Ωm(x,y,y′)− S(x,y)] γ(y′)dy′ (8)

ρU(x) = b(x) + λ0

∫m(x,0Y ,y

′)Ωm(x,0Y ,y′)γ(y′)dy′ (9)

where Ωm(x,y,y′) is the surplus (over the value of unemployed search) achieved by the worker if he

moves from employer y to employer y′ and m(x,y,y′) is the worker’s mobility decision: m(x,y,y′) = 1

if the worker chooses to move from y to y′ and 0 otherwise.

We now consider three alternative wage setting models: sequential auction with worker bargaining

power (Cahuc et al., 2006), fixed-share surplus-splitting (Mortensen and Pissarides, 1994; Moscarini,

2001), and wage posting (Burdett and Mortensen, 1998). In each case, we refer the reader to the

relevant reference for details about the wage setting model at hand.

37Note that, under separability Assumption 2 in Theorem 7, the sign of ∂2p/∂xk∂y1 is the same as the signof ∂

∂xk

(∂p/∂y1∂p/∂yj′

), 1 = j′, which is our single-crossing condition. In that sense, the characterization of sorting in

Theorem 7 parallels the characterization provided by Theorems 1 and 5 under different assumptions on p.

48

B.1 Sequential Auctions with Worker Bargaining Power (Cahuc et al., 2006)

In this case, Ωm(x,y,y′) = S(x,y)+β [S(x,y′)− S(x,y)] and the mobility decision rule is m(x,y,y′) =

1 S(x,y′) > S(x,y). Thus:

ρU(x) = b(x) + λ0β

∫1 S(x,y′) ≥ 0S(x,y′)γ(y′)dy′ (10)

= b(x) + λ0β

∫ +∞

0

sdFS|x(s) = b(x) + λ0β

∫ +∞

0

FS|x(s)ds

where FS|x(s) =∫1 S(x,y′) ≤ s γ(y′)dy′ is the sampling cdf of match surplus faced by a type-x

worker, and where the last equality above obtains from integration by parts. Following similar steps:

(ρ+ δ)S(x,y) = p(x,y)− ρU(x) + λ1β

∫ +∞

S(x,y)

FS|x(s)ds

Substituting (10) into the latter equation:

(ρ+ δ)S(x,y) = p(x,y)−[b(x) + (λ0 − λ1)β

∫ +∞

0

FS|x(s)ds

]− λ1β

∫ S(x,y)

0

FS|x(s)ds

Letting b(x) = b(x) + (λ0 − λ1)β∫ +∞0

FS|x(s)ds and σ(x,y) = p(x,y)− b(x), we thus have:

(ρ+ δ)S(x,y) = σ(x,y)− λ1β

∫ S(x,y)

0

FS|x(s)ds (11)

As is clear from (11), S(x,y) only depends on y through σ(x,y) (and in a differentiable way). With a

slight abuse of notation, we can thus define dS/dσ which, from (11), is given by:

dS(x,y)

dσ(x,y)=

1

ρ+ δ + λ1βFS|x (S(x,y))> 0.

This proves that, for any y1 = y2, S(x,y2) > S(x,y1) ⇐⇒ σ(x,y2) > σ(x,y1). Moreover, it is

also clear from (11) that S(x,y) = 0 ⇐⇒ σ(x,y) = 0. Hence, the job acceptance rule is equivalent

to m(x,y,y′) = 1 σ(x,y′) > σ(x,y) for employed workers and m(x,0Y ,y′) = 1 σ(x,y′) > 0 for

unemployed workers: the acceptance rule is the same as in the pure sequential auction case seen in the

main text, up to the redefinition of σ(x,y) from p(x,y)− b(x) to p(x,y)− b(x).

Crucially, this redefinition of σ preserves monotonicity and linearity of σ w.r.t. y, as well as super-

modularity w.r.t. (x,y). Therefore, any result relying on those properties alone will continue to hold in

this modified model. We take stock of what those are in the last paragraph of this appendix.

B.2 Fixed-Share Surplus-Splitting (Moscarini, 2001)

In this case, Ωm(x,y,y′) = βS(x,y′). But also, the worker’s value in the incumbent match is βS(x,y),

implying that the mobility decision rule is again m(x,y,y′) = 1 S(x,y′) > S(x,y). We thus still have

49

ρU(x) = b(x) + λ0β∫ +∞0

FS|x(s)ds, and now:

(ρ+ δ)S(x,y) = p(x,y)− ρU(x) + λ1

∫1 S(x,y′) > S(x,y) [βS(x,y′)− S(x,y)] γ(y′)dy′

= p(x,y)− ρU(x) + λ1

∫ +∞

S(x,y)

[βs− S(x,y)] dFS|x(s)

⇐⇒[ρ+ δ+λ1(1− β)FS|x (S(x,y))

]S(x,y) = p(x,y)− ρU(x) + λ1β

∫ +∞

S(x,y)

FS|x(s)ds

Substituting U(x):

[ρ+ δ + λ1(1− β)FS|x (S(x,y))

]S(x,y) = σ(x,y)− λ1β

∫ S(x,y)

0

FS|x(s)ds (12)

where b(x) = b(x)+ (λ0 −λ1)β∫ +∞0

FS|x(s)ds and σ(x,y) = p(x,y)− b(x) are defined as before. Then,

using a similar argument as in the Cahuc et al. (2006) case:

(ρ+ δ + λ1FS|x (S(x,y))− λ1(1− β)S(x,y)fS|x (S(x,y))

)dS(x,y) = dσ(x,y)

Thus, a necessary and sufficient condition for S(x,y) to be in a one-to-one increasing relationship with

σ(x,y) is λ1(1− β)SfS|x(S) ≤ ρ+ δ + λ1FS|x(S), which is to say:

λ1(1− β)S

∫1 S(x,y′) = S γ(y′)dy′ ≤ ρ+ δ + λ1

∫1 S(x,y′) ≥ S γ(y′)dy′. (13)

This is an unwieldy condition on γ, but still a condition on the primitives.38 Note that the reason we

need this condition is that, under this particular rent-sharing rule, a share 1− β of the surplus from the

initial match is lost to a third party (the new employer) when the worker changes jobs. More precisely,

when the worker changes jobs, the initial firm-worker collective “gains” β [S(x,y′)− S(x,y)] (a share

β of the rent supplement brought about by the new match, although all of these gains accrue to the

workers), and “loses” (1 − β)S(x,y) to the new employer. Thus, if there is a high concentration of

potential matches with equal surplus (if fS|x (S(x,y)) is large), the initial match stands to lose a lot

and gain very little in case the worker accepts an outside offer. As a result, the surplus may be higher

in a slightly less productive match but with fewer similar potential matches available in the economy.

Condition (13) prevents that from happening.

B.3 Wage Posting (Burdett and Mortensen, 1998)

Assuming that firms post wages and are allowed to make offers contingent on worker type x, the model

becomes one of segmented wage-posting markets (one market for each x), where workers are homogeneous

38Note that a sufficient condition for (13) is that the hazard rate of FS|x be decreasing fast enough, specificallySfS|x(S)/FS|x(S) ≤ ρ+δ+λ1

λ1(1−β)

50

within each market and firms in market x are heterogeneous with (scalar) productivity p(x,y). We then

know from Burdett and Mortensen (1998) — or indeed from standard monotone comparative statics —

that firms with higher p(x,y) will post higher wages for type-x workers (and thus offer higher values to

those workers).

Adjusting the notation from Burdett and Mortensen (1998), posted wages are given by:

w(x,y) = p(x,y)−[δ + λ1F p|x (p(x,y))

]2∫ p(x,y)

b(x)

dt[δ + λ1F p|x(t)

]2 + C(x)

where Fp|x is the sampling distribution of match productivity conditional on x, b(x) is the lowest pro-

ductivity amongst viable matches on the market for type-x workers, and C(x) is the profit of the least

productive match employing a type-x worker.39 Again defining σ(x,y) = p(x,y) − b(x), it is straight-

forward to check that w(x,y) is a strictly increasing function of σ(x,y), so that employed workers move

up the σ(x,y)-ladder. Moreover, by construction, unemployed workers accept offers iff σ(x,y) ≥ 0.

B.4 Taking Stock

In all the cases reviewed above, worker mobility is governed by comparisons of σ(x,y) = p(x,y)− b(x),

where b(x) is a (potentially very complex) function of x only. So any theorem that only uses monotonicity

of σ in y, linearity of σ in y, or supermodularity σ in (x,y) goes through. What fails to go through,

though, is any property that relies on the linearity of σ in x: even when assuming linearity in x of p

and b, the function b(x) is generically nonlinear. What this means in terms of the results in this paper

is that the only properties that are specific to the pure sequential auction model are Theorems 3 and 4.

All of the conditions ensuring PAM in equilibrium are preserved under any of the three alternative wage

setting models covered in this appendix.

C Derivations

C.1 Derivation of h(x,y)

Substituting the definition of Fσ|x (Fσ|x(s) = E [1 σ(x,y′) > s]) into (1), we see that h(x,y) can be

written as h(x,y) = χ (x, σ(x,y)) γ(y), where the function χ solves:

[δ + λ1Fσ|x(s)

]χ(x, s)fσ|x(s) = λ0fσ|x(s)1 s ≥ 0u(x) + λ1fσ|x(s)

∫ s

0

χ(x, s′)dFσ|x (s′) .

39Calling the reservation wage of a type-x unemployed worker R(x) (see Burdett and Mortensen, 1998 for aderivation), we have that b(x) = max R(x);miny′∈Y p(x,y′). Then, the integration constant C(x) is zero ifb(x) = R(x) and some positive function of x otherwise.

51

which can be recast as:40

(3) =δ[δ + λ1Fσ|x(0)

]Fσ|x(0)

×∫ +∞

0

2λ1fσ|x(s)[δ + λ1F σ|x(s)

] × ∫ 1 σ(x,y′) = s × 1y′j ≤ y

[δ + λ1Fσ|x(s)

]2 γ(y′)dy′

×

EΓ

[∂σ(x,y′)

∂xk| σ(x,y′) = s, y′j ≤ y

]− EΓ

[∂σ(x,y′)

∂xk| σ(x,y′) = s

]ds

=

∫ +∞

0

2λ1fσ|x(s)[δ + λ1Fσ|x(s)

] × ∂Kj(y, s|x)∂s

×

EΓ

[∂σ(x,y)

∂xk| σ(x,y′) = s, y′j ≤ y

]− EΓ

[∂σ(x,y′)

∂xk| σ(x,y′) = s

]ds.

Combining (1), (2) and (3) and substituting the definitions of gσ|x(0) and ∂Kj(y, 0|x)/∂s proves that

∂Hj(y|x)∂xk

= gσ|x(0)×

PrΓ

y′j ≤ y|σ(x,y′) = 0

× EΓ

[∂σ(x,y′)

∂xk| σ(x,y′) = 0, y′j ≤ y

]

−Hj(y|x)× EΓ

[∂σ(x,y′)

∂xk| σ(x,y′) = 0

]

+

∫ +∞

0

2λ1fσ|x(s)

δ + λ1Fσ|x(s)× gσ|x(s)× PrΓ

y′j ≤ y|σ(x,y′) = s

×

EΓ

[∂σ(x,y′)

∂xk| σ(x,y′) = s, y′j ≤ y

]− EΓ

[∂σ(x,y′)

∂xk| σ(x,y′) = s

]ds.

where we incorporated the identity ∂Kj(y, s|x)/∂s = gσ|x(s) × PrΓy′j ≤ y|σ(x,y′) = s

and which,

after adding and subtracting PrΓy′j ≤ y|σ(x,y′) = 0

× EΓ

[∂σ(x,y′)

∂xk| σ(x,y′) = 0

], is exactly (DC).

D Proofs

D.1 An Ancillary Lemma

We begin by stating a lemma to better understand the nature of decomposition (DC). This will help us

specify conditions on primitives under which sorting arises.

40A technical note: strictly speaking, the correct integration bounds in the following formula are

s ∈

[max

0, min

y′∈Y,y′j≤y

σ(x,y′)

, maxy′∈Y,y′

j≤yσ(x,y′)

].

rather than [0,+∞). To avoid cluttering the formula with these unwieldy integration bounds, we write it asan integral over all s ≥ 0. As a consequence, it may be that the joint event

(σ(x,y′) = s, y′

j ≤ y)

on whichsome of the expectations are conditioned has zero probability for some values of (s, y). Yet in those cases,∫1 σ(x,y′) = s × 1

y′j ≤ y

γ(y′)dy′ = 0. The formula thus remains correct with [0,+∞) as integration

bounds if we adopt the convention that any expectation conditioned on a zero probability event is equal to zero.

54

Lemma 1. If for all s ≥ 0 and y such that PrΓy′j ≤ y|σ(x,y′) = s

> 0,

y 7→ EΓ

[∂σ(x,y′)

∂xk| σ(x,y′) = s, y′j = y

]is increasing [decreasing] (CMP)

then term (2) in (DC) is negative [positive] for all y, i.e. positive [negative] assortative matching occurs

in the (yj , xk) dimension along the EE margin.

Lemma 1 implies that, if the UE margin is shut down (i.e. if σ(x,y) > 0 for all y) and if condition

(CMP) — our label for complementarity — holds, then the marginal distribution of job attribute yj of

employed workers of type x, Hj(·|x), is monotone with respect to worker skill xk in the FOSD sense, i.e.

there is PAM in dimension (yj , xk).

Condition (CMP) can be interpreted as imposing a strong form of complementarity (or substitutabil-

ity, in the decreasing case) between job attribute j and worker skill k, as is typical of models of sorting.

Indeed, in the one-dimensional case (Y = 1), Condition (CMP) collapses to a restriction on the sign of

∂2σ/∂xk∂y = ∂2p/∂xk∂y, which is the familiar super- (or sub) modularity condition on the production

function from one-dimensional frictionless assignment models. In the multi-dimensional case, Condition

(CMP) imposes, loosely speaking, that supermodularity hold along all level curves of σ(x,y), which

amounts to a complex restriction involving not only the technology, but also the sampling distribution

of job types.

D.2 Proof of Theorem 1

The objective is to find conditions for PAM in dimension (y1, xk). Generally, in order to obtain sufficient

conditions we want to specify conditions subject to which (CMP) in Lemma 1 holds, i.e. under which

EΓ

[∂σ(x,y′)

∂xk| σ(x,y′) = s, y′j = y

]is increasing in y. In the case of bilinear technology with Y = 2,

however, we can circumvent this sufficient condition and compute term (2) in expression (DC) explicitly.

First, notice that Assumption 1 implies explicit invertibility of the surplus function, so that:

σ(x,y) = s ⇔ y2 − b2 =s

q2(x)− q1(x)

q2(x)(y1 − b1) := R(s, y1).

Then, we can express

EΓ

[∂σ(x,y′)

∂xk| σ(x,y′) = s, y′1 = y

]= Eµs,x

(∂σ (x, (y′1, R(s, y′1)))

∂xk| y′1 = y

)(16)

where µs,x(y1) is the sampling density of y1 conditional on x and on σ(x,y) = s:

µs,x(y1) =γ (y1, R(s, y1)) ∂R(s, y1)/∂s∫γ (y′1, R(s, y′1)) ∂R(s, y′1)/∂sdy

′1

.

Note that ∂R(s, y1)/∂s is a constant and thus drops from µs,x. Finally, Assumption 1 implies ∂σ(x,y)/∂xk =

55

∑2j=1 qkj(yj − bj).

Substituting this expression along with R(s, y1) into Eµs,x in (16), we obtain,

Eµs,x

[∂σ(x,y′)

∂xk| σ(x,y′) = s, y′1 = y

]=

qk2q2(x)

s+qk1q2(x)− qk2q1(x)

q2(x)Eµs,x [y

′1 − b1|y′1 = y]

implying:

Eµs,x

[∂σ(x,y′)

∂xk| σ(x,y′) = s, y′1 ≤ y

]=

qk2q2(x)

s+qk1q2(x)− qk2q1(x)

q2(x)Eµs,x [y

′1 − b1|y′1 ≤ y]

Hence, term (2) in expression (DC) is equal to:

− qk1q2(x)− qk2q1(x)

q2(x)

∫ +∞

0

2λ1fσ|x(s)gσ|x(s)

δ + λ1Fσ|x(s)

× PrΓy′j ≤ y|σ(x,y′) = s

×(Eµs,x [y

′1 − b1]− Eµs,x [y

′1 − b1|y′1 ≤ y]

)ds. (17)

Both q2(x) (by assumption) and the difference in expectations (by construction) are positive. Condition

(SC-2D), which here is equivalent to qk1q2(x)− qk2q1(x) > 0 (or detQ > 0), is therefore necessary and

sufficient for term (2) in (DC) to be negative, i.e. necessary and sufficient for PAM in (y1, xk).


Sufficiency. We have to sign term (1) in decomposition (DC) which, as explained in the main text,

reflects sorting along the UE margin. Whenever gσ|x(0) > 0, said term (1) has the sign of:

PrΓy′j ≤ y|σ(x,y′) = 0

×EΓ

[∂σ(x,y′)

∂xk| σ(x,y′) = 0, y′j ≤ y

]− EΓ

[∂σ(x,y′)

∂xk| σ(x,y′) = 0

]+ EΓ

[∂σ(x,y′)

∂xk| σ(x,y′) = 0

]×[PrΓ

y′j ≤ y|σ(x,y′) = 0

−Hj(y|x)

](18)

In the case of Y = 2, treated in Theorem 2, the first line in (18) is negative under the assumed condition

(SC-2D) (see proof of Theorem 1). We thus focus on showing that the second line is negative. Using the

identity Hj(y|x) =∫ +∞0

gσ|x(s)PrΓy′j ≤ y|σ(x,y′) = s

ds, the second line can be reformulated as:

EΓ

[∂σ(x,y′)

∂xk| σ(x,y′) = 0

]×∫ +∞

0

gσ|x(s)[PrΓ

y′j ≤ y|σ(x,y′) = 0

− PrΓ

y′j ≤ y|σ(x,y′) = s

]ds

We then proceed in two steps. First, we will find conditions under which PrΓ y′1 ≤ y|σ(x,y′) = s is

decreasing in s, implying that [PrΓ y′1 ≤ y|σ(x,y′) = 0 − PrΓ y′1 ≤ y|σ(x,y′) = s] ≥ 0. Equivalently,

56

we want to derive conditions under which, for any sH > sL ≥ 0,

∫ y

y1

γ(y′1,

sHq2(x)

+ b2 − q1(x)q2(x)

(y′1 − b1))dy′1∫ y1

y1

γ(y′1,

sHq2(x)

+ b2 − q1(x)q2(x)

(y′1 − b1))dy′1

≤

∫ y

y1

γ(y′1,

sLq2(x)

+ b2 − q1(x)q2(x)

(y′1 − b1))dy′1∫ y1

y1

γ(y′1,

sLq2(x)

+ b2 − q1(x)q2(x)

(y′1 − b1))dy′1

Defining

g(y, s) :=

∫ y

y1

γ

(y′1,

s

q2(x)+ b2 −

q1(x)

q2(x)(y′1 − b1)

)dy′1

and rearranging the previous inequality gives g (y1, sL) g(y, sH) ≤ g(y, sL)g (y1, sH). Since y ≤ y1, this

inequality holds if g is log-supermodular in (y, s). To show when this is the case, define — as in the

proof of Theorem 1 — the joint distribution of y1 and s (conditional on x) as

µx(y1, s) = γ

(y1,

s

q2(x)+ b2 −

q1(x)

q2(x)(y1 − b1)

)

and rewrite g(y, s) =∫1y′1 < yµx (y

′1, s) dy

′1. Note that

1. the support of µx(y1, s) is a lattice (for more details, see the proof of Theorem 6 and in particular

footnote 42 below);

2. the joint distribution µx(y1, s) is log-supermodular in (y1, s) if

q2(x)∂2 ln γ

∂y2∂y1− q1(x)

∂2 ln γ

∂y22≥ 0 (19)

3. the indicator function, 1y′1 < y, is log-supermodular in (y, y′1).

Therefore, the product 1y′1 < yµx(y′1, s) is log-supermodular in (y, y′1, s) since the product of log-

supermodular functions is log-supermodular. In turn, this implies that g is log-supermodular in (y, s)

since log-supermodularity is preserved under integration.

We have thus proven that if (19) holds (stated as condition (UE-2D) in Theorem 2), then PrΓ y′1 ≤ y|σ(x,y′) = s

is decreasing in s.

Second, we derive conditions such that EΓ

[∂σ(x,y′)

∂xk| σ(x,y′) = 0

]≤ 0. Note that under Assumptions

1 and 2:

EΓ

[∂σ(x,y′)

∂xk| σ(x,y′) = 0

]= q1(x)

[qk1q1(x)

− qk2q2(x)

]EΓ (y

′1 − b1 | σ(x,y′) = 0) .

The term outside the expectation is strictly positive by single-crossing condition (SC-2D) and the ex-

pectation is negative if y2 ≥ b2 and q1(x) > 0 (stated as Conditions 1. and 3. in Theorem 2) since

EΓ (y′1 − b1 | σ(x,y′) = 0) = EΓ

[y′1|y′1 = −(y′2 − b2)

q2(x)

q1(x)+ b1

]− b1.

Thus, EΓ

[∂σ(x,y′)

∂xk| σ(x,y′) = 0

]≤ 0 under the conditions from Theorem 2, proving that those condi-

57

tions are sufficient for (18) (and thus, term (1) in decomposition (DC)) to be negative and hence sufficient

for PAM on the UE margin.

Necessity. To show that single crossing condition (SC-2D) is also necessary for PAM when considering

all possible sampling distributions γ, recall that there is PAM in dimension (y1, xk) on the UE margin

if and only if (18) is negative. It thus suffices to show that there exists a sampling distribution γ, under

which (SC-2D) is necessary for (18) to be negative.

We focus on bilinear technology and Y = 2. First, note the first line in (18) is negative if and only if

(SC-2D) holds (by an analogous argument as in Theorem 1). Second, note that if γ is log-supermodular

with log-concave marginals, then [PrΓ y′1 ≤ y|σ(x,y′) = 0 − PrΓ y′1 ≤ y|σ(x,y′) = s] ds ≥ 0, see the

Sufficiency-part above. Hence, if γ is uniform with independent marginals then ∂2 ln γ∂y1∂y2

= 0 and ∂2 ln γ∂y2

2= 0

so that [PrΓ y′1 ≤ y|σ(x,y′) = 0 − PrΓ y′1 ≤ y|σ(x,y′) = s] ds = 0. Only the first line in (18) remains,

which is negative only if (SC-2D) holds.


To economize on notation, we will use the identity ∂Kj(y, s|x)/∂s = gσ|x(s)×PrΓy′j ≤ y|σ(x,y′) = s

throughout the proof (see (15) above for its derivation).

From decomposition (DC) applied to the case of bilinear production function, we obtain:

(x+ a)⊤∇Hj(y|x) =X∑

k=1

(xk + ak)∂Hj(y|x)

∂xk

= EΓ

[(x+ a)⊤∇xσ(x,y

′) | σ(x,y′) = 0, y′j ≤ y] ∂Kj(y, 0|x)

∂s

− EΓ


′) | σ(x,y′) = 0]Hj(y|x)gσ|x(0)

+

∫ +∞

0

2λ1fσ|x(s)

δ + λ1Fσ|x(s)× ∂Kj(y, s|x)

∂s

×

EΓ


′) | σ(x,y′) = s, y′j ≤ y]− EΓ


′) | σ(x,y′) = s]

ds.

But then linearity in (x+ a) of the flow surplus function σ implies that (x+ a)⊤∇xσ(x,y′) = σ(x,y′).

Substitution into the latter equation yields:

(x+ a)⊤∇Hj(y|x)

= EΓ

[σ(x,y′) | σ(x,y′) = 0, y′j ≤ y

]× ∂Kj(y, 0|x)

∂s− EΓ [σ(x,y

′) | σ(x,y′) = 0]Hj(y|x)gσ|x(0)

+

∫ +∞

0

2λ1fσ|x(s)

δ + λ1Fσ|x(s)×∂Kj(y, s|x)

∂s×

EΓ

[σ(x,y′) | σ(x,y′) = s, y′j ≤ y

]−EΓ [σ(x,y

′) | σ(x,y′) = s]

ds,

all terms of which are equal to zero.

Note that the proof above is virtually unchanged if, instead of assuming that σ(·) is linear in (x+a),

58

one only assumes that it is homogeneous in (x+ a). In that case, (x+ a)⊤∇xσ(x,y) = ασ(x,y), where

α is the degree of homogeneity (a constant), and the proof goes through as above.


We first note that similar steps as were taken in the proof of decomposition (DC) can be used to establish

the following result about the way in which the marginal density of the jth job attribute in the population

of firm-worker matches, hj(y|x) = ∂Hj(y|x)/∂y, responds to a change in the kth worker attribute:

∂hj(y|x)∂xk

= gσ|x(0)

EΓ

[∂σ(x,y′)

∂xk| σ(x,y′) = 0, y′j = y

]PrΓ

y′j = y|σ(x,y′) = 0

− EΓ

[∂σ(x,y′)

∂xk| σ(x,y′) = 0

]hj(y|x)

+

∫ +∞

0

2λ1fσ|x(s)

δ + λ1Fσ|x(s)× ∂2Kj(y, s|x)

∂y∂s

×

EΓ

[∂σ(x,y′)

∂xk| σ(x,y′) = s, y′j = y

]− EΓ

[∂σ(x,y′)

∂xk| σ(x,y′) = s

]ds.

where Kj is again given by (15). Assuming, as is done in the theorem, that the UE margin is shut down

(σ(x,y) > 0 for all y), implies that the first two terms in the equation above equal zero. Next consider:

(x+ a)⊤Q∂xkE(y|x) =

X∑i=1

Y∑j=1

(xi + ai)qij

∫ yj

yj

y′j∂hj(y|x)

∂xkdy′j

for all k ∈ 1, · · · , X, where ∂xkE(y|x) = (∂E(y1|x)/∂xk, · · · , ∂E(yY |x)/∂xk)

⊤. Using the expression

for ∂hj(y|x)/∂xk derived above, the definition of the bilinear flow surplus function from Assumption 1,

and the definition of Kj(y, s|x) from equation (15), we have that

(x+ a)⊤Q∂xkE(y|x) =

X∑i=1

Y∑j=1

(xi + ai)qij

∫ yj

yj

y′j

∫ +∞

0

2λ1fσ|x(s)

δ + λ1Fσ|x(s)× ∂2Kj(y, s|x)

∂y∂s

×

EΓ

[∂σ(x,y′)

∂xk| σ(x,y′) = s, y′j = yj

]− EΓ

[∂σ(x,y′)

∂xk| σ(x,y′) = s

]dsdy′j

=

X∑i=1

Y∑j=1

(xi + ai)qijδ[δ + λ1Fσ|x(0)

]Fσ|x(0)

×∫ +∞

0


]3×∫ yj

yj

∫1 σ(x,y′) = s1

y′j = yj

y′j

∂σ(x,y′)

∂xkγ(y′)dy′

−∫

1 σ(x,y′) = s1y′j = yj

y′jγ(y

′)dy′ × EΓ

[∂σ(x,y′)

∂xk| σ(x,y′) = s

]dy′jds

59

=

X∑i=1

Y∑j=1

(xi + ai)qijδ[δ + λ1Fσ|x(0)

]Fσ|x(0)

×∫ +∞

0


]3×

∫1 σ(x,y′) = s y′j

∂σ(x,y′)

∂xkγ(y′)dy′

−∫

1 σ(x,y′) = s y′jγ(y′)dy′ × EΓ

[∂σ(x,y′)

∂xk| σ(x,y′) = s

]ds

=δ[δ + λ1Fσ|x(0)

]Fσ|x(0)

×∫ +∞

0


]3×

∫1 σ(x,y′) = s

X∑i=1

Y∑j=1

(xi + ai)qijy′j

∂σ(x,y′)

∂xkγ(y′)dy′

−∫

1 σ(x,y′) = sX∑i=1

Y∑j=1

(xi + ai)qijy′jγ(y

′)dy′ × EΓ

[∂σ(x,y′)

∂xk| σ(x,y′) = s

]ds

=δ[δ + λ1Fσ|x(0)

]Fσ|x(0)

×∫ +∞

0


]3 ×(s+ (x+ a)⊤Qb

)×

∫1 σ(x,y′) = s ∂σ(x,y′)

∂xkγ(y′)dy′

−∫

1 σ(x,y′) = s γ(y′)dy′ × EΓ

[∂σ(x,y′)

∂xk| σ(x,y′) = s

]ds = 0,

as the term in curly brackets inside the integral is equal to zero.

D.6 Proof of Corollary 2

For the bilinear technology with qj(x) ≥ 0 for all j, vector (x+a)⊤Q has only positive elements. Theorem

4 then implies that if sorting is positive in (yj , xk) for all j ∈ 1, · · · , Y − 1, implying ∂E(yj |x)∂xk

> 0 for

all j ∈ 1, · · · , Y − 1, then it must be that ∂E(yY |x)∂xk

< 0. Since ∂E(yY |x)∂xk

> 0 is necessary for HY (y|x)∂xk

< 0

to hold for all values of yY (i.e. necessary for PAM in dimension (yY , xk)), it must be that HY (y|x)∂xk

> 0

on some set of values yY of positive measure and thus there cannot be PAM in dimension (yY , xk).


The objective is to find conditions for PAM in dimension (yj , xk). Based on Lemma 1 (Appendix D.1) this

means we want to specify conditions under which EΓ

[∂σ(x,y′)

∂xk| σ(x,y′) = s, y′j = y

]is increasing in y.

Let y−Y = (y1, · · · , yY−1) denote the (Y − 1)-dimensional vector formed of all components of y

except yY . Note that Y−Y ∈×Y−1

j=1

[yj, yj

]. Fix any x ∈ X and any s ≥ 0, and consider the equation

σ (x, (y−Y , yY )) = s ⇔ f (x, (y−Y , yY )) = b(x) + s (20)

60

Then strict monotonicity of yY 7→ f (x, (y−Y , yY )) (Assumption 2a) guarantees that at most one value

of yY ∈[yY, yY

]solves (20). In turn, assumption 2b ensures that there always exists a unique yY ∈[

yY, yY

]that solves (20). The equation σ(x,y) = s is therefore equivalent to yY = R(s,y−Y ), where

R(s, ·) is a well-defined function over×Y−1

j=1

[yj, yj

].41 Application of the Implicit Function Theorem

further implies that R(·) is differentiable over its domain, with, for all j ∈ 1, · · · , Y − 1:

∂R(S,y−Y )

∂yj= − ∂p/∂yj

∂p/∂yY(x, (y−Y , R(s,y−Y ))) (21)

and:∂R(s,y−Y )

∂s=

1

∂p/∂yY(x, (y−Y , R(s,y−Y ))) (22)

We next notice that, for j ∈ 1, · · · , Y − 1:

d

dyj

(∂σ (x, (y−Y , R(s,y−Y )))

∂xk

)=

∂2p

∂xk∂yj− ∂p/∂yj

∂p/∂yY× ∂2p

∂xk∂yY(23)

where (21) was used and where the arguments of p (x, (y−Y , R(s,y−Y ))) were omitted in the r.h.s. to

reduce notational clutter. Condition (SC-YD) in the theorem then implies, together with (21) that

∂σ (x, (y−Y , R(s,y−Y ))) /∂xk is increasing in all elements of y−Y ∈×Y−1

j=1

[yj, yj

].

Our objective is to exhibit conditions under which condition (CMP) in Lemma 1 holds. Fixing ℓ = Y ,

notice that (CMP) can be expressed as

EΓ

[∂σ(x,y′)

∂xk| σ(x,y′) = s, y′ℓ = y

]= Eµs,x

(∂σ(x,(y′−Y , R(s,y′

−Y )))

∂xk| y′ℓ = y

).

where µs,x(y−Y ) is the joint sampling density of y−Y conditional on x and on σ(x,y) = s:

µs,x(y−Y ) =γ (y−Y , R(s,y−Y )) ∂R(s,y−Y )/∂s∫

γ(y′−Y , R(s,y′

−Y ))∂R(s,y′

−Y )/∂s dy′−Y

with ∂R(s,y−Y )/∂s given in (22). We will now derive conditions under which

y 7→ Eµs,x

(∂σ(x,(y′−Y , R(s,y′

−Y )))

∂xk| y′ℓ = y

)(24)

is increasing in y and proceed in two steps.

First, we show that the support of µs,x is a lattice. The equation σ(x,y) = s has one unique solution

yY = R(s,y−Y ) ∈[yY, yY

]for all y−Y ∈×Y−1

j=1

[yj, yj

]. Thus, for any two points (y′

−Y ,y′′−Y ) ∈(

×Y−1

j=1

[yj, yj

])2, R(s,y′

−Y ∧y′′−Y ) and R(s,y′

−Y ∨y′′−Y ) both exist and are both elements of

[yY, yY

],

41The dependence of R(·) on x, which is fixed for this proof, is omitted.

61

since both y′−Y ∧ y′′

−Y and y′−Y ∨ y′′

−Y are elements of×Y−1

j=1

[yj, yj

]. This establishes that

(y′−Y ∧ y′′

−Y , R(s,y′−Y ∧ y′′

−Y ))∈

Y×j=1

[yj, yj

](y′−Y ∨ y′′

−Y , R(s,y′−Y ∨ y′′

−Y ))∈

Y×j=1

[yj, yj

],

so that Suppµs,x is a lattice, implying that µs,x

(y′−Y ∧ y′′

−Y

)and µs,x

(y′−Y ∨ y′′

−Y

)are strictly positive.

Second, given that ∂σ (x, (y−Y , R(s,y−Y ))) /∂xk is increasing in all elements of y−Y ∈×Y−1

j=1

[yj, yj

]and given that Suppµs,x is a lattice (both established above), a sufficient condition for (24) to be an

increasing function of y is that the density µs,x be such that (the proof is an adaptation of Theorem 4.1

in Karlin and Rinott, 1980):

∀(i, j) ∈ 1, · · · , Y − 12 : i = j,∂2 lnµs,x

∂yi∂yj≥ 0

which translates into condition (EE-YD):

∂p

∂yY

[(∂p

∂yY

)2∂2 ln γ

∂yi∂yj+

∂p

∂yi

∂p

∂yj

∂2 ln γ

∂y2Y− ∂p

∂yj

∂p

∂yY

∂2 ln γ

∂yi∂yY− ∂p

∂yi

∂p

∂yY

∂2 ln γ

∂yj∂yY

]

− ∂ ln γ

∂yY

[(∂p

∂yY

)2∂2p

∂yi∂yj+

∂p

∂yi

∂p

∂yj

∂2p

∂y2Y− ∂p

∂yj

∂p

∂yY

∂2p

∂yi∂yY− ∂p

∂yi

∂p

∂yY

∂2p

∂yj∂yY

]

−(

∂p

∂yY

)2∂3p

∂yi∂yj∂yY− ∂p

∂yi

∂p

∂yj

∂3p

∂y3Y+

∂p

∂yi

∂p

∂yY

∂3p

∂yj∂y2Y+

∂p

∂yj

∂p

∂yY

∂3p

∂yi∂y2Y

+∂p

∂yY

[∂2p

∂yj∂yY

∂2p

∂yi∂yY− ∂2p

∂y2Y

∂2p

∂yi∂yj

]≥ 0.

Remark: The proof for sufficient conditions for NAM is similar. In this case, we need to find conditions

under which EΓ

[∂σ(x,y′)

∂xk| σ(x,y′) = s, y′j = y

]is decreasing in y (see Lemma 1), or, equivalently, under

which EΓ

[−∂σ(x,y′)

∂xk| σ(x,y′) = s, y′j = y

]is increasing in y. To apply the results from Karlin and

Rinott (1980) as in the proof for PAM, we thus need a condition under which (using the same change

of variable as above) −∂σ (x, (y−Y , R(s,y−Y ))) /∂xk is increasing in all y−Y or, equivalently, under

which ∂σ (x, (y−Y , R(s,y−Y ))) /∂xk is decreasing in all yj . This is the case if (23) is negative for all

j ∈ 1, · · · , Y − 1, i.e. if the generalized signal crossing condition (SC-YD) reverses its sign for all

j ∈ 1, · · · , Y − 1

∂

∂xk

(∂p(x,y)/∂yj∂p(x,y)/∂yY

)=

∂2p

∂xk∂yj− ∂p/∂yj

∂p/∂yY× ∂2p

∂xk∂yY≤ 0. (25)

62

This is the only change to Theorem 5 if we want to provide sufficient conditions for NAM. If (25) holds

along with the remaining conditions from the theorem, then sorting is NAM in dimensions (yj , xk),

j ∈ 1, · · · , Y − 1, on the EE margin.


When Y = 2, equation (20) writes as p (x, (y1, y2)) = b(x)+s. Strict monotonicity of y2 7→ p (x, (y1, y2))

still guarantees that at most one value of y2 ∈[y2, y2

]solves (20). Moreover, the set of y1 such that

(20) has one solution is an interval (possibly empty). To see this, suppose there exist two distinct

values y′1 < y′′1 such that (20) has a solution. Denote these solutions by y′2 and y′′2 , respectively. Con-

sider a number t ∈ (0, 1) and define y1(t) = ty′1 + (1 − t)y′′1 . Quasi-concavity of p(x, ·) implies that

p (x, (y1(t), ty′2 + (1− t)y′′2 )) ≥ b(x) + s. Moreover, by assumption, miny2∈R p (x, (y1(t), y2)) < b(x) ≤

b(x)+ s. By continuity, (20) has a solution R (s, y1(t)) ≤ ty′2 +(1− t)y′′2 when y1 = y1(t). Note that this

solution can be smaller than y2: in this case, γ (y1(t), R (s, y1(t))) = 0.

Denote the interval of values of y1 for which (20) has one solution by I1(s). Application of the

Implicit Function Theorem (as in the proof of Theorem 5 above) then implies that equation (23) holds

for all y ∈ I1(s):d

dy1

(∂σ (x, (y,R(s, y)))

∂xk

)=

∂2p

∂xk∂y1− ∂p/∂y1

∂p/∂y2× ∂2p

∂xk∂y2

which is positive by the single-crossing condition in Corollary 3.

Consider s ≥ 0 and two values (y′, y′′) ∈[y1, y1

]2such that PrΓy1 = y′|σ(x,y) = s > 0 and

PrΓy1 = y′′|σ(x,y) = s > 0, which implies in particular that y′ and y′′ are both in I1(s). Assume

w.l.o.g. that y′′ > y′. Then, by the results derived above:

EΓ

[∂σ(x,y)

∂xk| σ(x,y) = s, y1 = y′′

]− EΓ

[∂σ(x,y)

∂xk| σ(x,y) = s, y1 = y′

]=

∂σ (x, (y′′, R(s, y′′)))

∂xk− ∂σ (x, (y′, R(s, y′)))

∂xk> 0.

This proves that Condition (CMP) holds for monotone production functions and Y = 2, i.e. there is

PAM along (y1, xk).

Note in passing that Condition (EE-YD) from Theorem 5 becomes irrelevant in the two-dimensional

case Y = 2, which obviates the need to impose conditions on the behavior of p over the support of γ: Con-

dition 2b in Theorem 5 is replaced by the twofold requirement of quasi-concavity and miny2∈R p(x,y) < b (x).


Theorem 5 also nests the case of a bilinear technology from Corollary 4. The bilinear technology is C3

(Condition 1 from Theorem 5). The condition qY (x) > 0 (imposed in Assumption 1) ensures that p(x,y)

is strictly increasing in yY (Condition 2a from Theorem 5). Moreover, with bilinear technology, we can

63

explicitly solve the equation σ(x,y) = s, thus circumventing the need to appeal to the Implicit Function

Theorem. Together, the requirements that yY = +∞ and limyY →yYp(x,y) < b(x) parallel Condition

2b from Theorem 5 and ensure that suppµs,x is a lattice so that we can apply Theorem 4.1 in Karlin

and Rinott (1980) to derive conditions under which (24) is increasing in y. Next, Condition 2 in the

corollary is the generalized single crossing condition (which echoes Condition 2c in Theorem 5). Finally,

Condition (EE-YD’) in point 3 of the corollary is a rewrite of Condition (EE-YD) in Theorem 5 in the

case of a bilinear production function: bilinearity implies that the last three lines in (EE-YD) vanish (as

all the partial derivatives of order greater than one are zero in this case) and that the derivatives of the

flow surplus (or of the production) function can be expressed explicitly.


We now prove Theorem 6 (note that it covers Y = 2 in Theorem 2 as a special case). We have to sign

the first term in decomposition (DC) (which, as explained in the main text, reflects sorting along the UE

margin) which is spelled out in (18). By the same arguments as were used in the proofs of Theorems 1

and 5, the first term in curly brackets in (18) is negative under Condition (EE-YD’). We thus focus on

the second term of (18). We proceed similarly as in the proof of Theorem 2.

First, we seek conditions under which PrΓ y′ℓ ≤ y|σ(x,y′) = s is decreasing in s for any fixed

ℓ ∈ 1, · · · , Y − 1. In particular, we want to derive conditions under which, for any sH > sL ≥ 0,

∫ y

yℓ

∫γ(y′ℓ,y

′−(ℓ,Y ),

sHqY (x) + bY −

∑Y−1j=1

qj(x)qY (x) (y

′j − bj)

)dy′

−(ℓ,Y )dy′ℓ∫ yℓ

yℓ

∫γ(y′ℓ,y

′−(ℓ,Y ),

sHqY (x) + bY −

∑Y−1j=1

qj(x)qY (x) (y

′j − bj)

)dy′

−(ℓ,Y )dy′ℓ

≤

∫ y

yℓ

∫γ(y′ℓ,y

′−(ℓ,Y ),

sLqY (x) + bY −

∑Y−1j=1

qj(x)qY (x) (y

′j − bj)

)dy′

−(ℓ,Y )dy′ℓ∫ yℓ

yℓ

∫γ(y′ℓ,y

′−(ℓ,Y ),

sLqY (x) + bY −

∑Y−1j=1

qj(x)qY (x) (y

′j − bj)

)dy′

−(ℓ,Y )dy′ℓ

where y′−(ℓ,Y ) =

(y′1, · · · , y′ℓ−1, y

′ℓ+1, · · · , y′Y−1

). Defining

g(y, s) =

∫ y

yℓ

∫γ

y′ℓ,y′−(ℓ,Y ),

s

qY (x)+ bY −

Y−1∑j=1

qj(x)

qY (x)(y′j − bj)

dy′−(ℓ,Y )dy

′ℓ

and rearranging the previous inequality gives g (yℓ, sL) g(y, sH) ≤ g(y, sL)g (yℓ, sL). Since y ≤ yℓ, this

inequality holds if g is log-supermodular in (y, s). To show when this is the case, define the joint

distribution of y−Y and s (conditional on x) as

µx(y−Y , s) = γ

y−Y,s

qY (x)+ bY −

Y−1∑j=1

qj(x)

qY (x)(yj − bj)

and rewrite g(y, s) =

∫1y′ℓ < yµx

(y′−Y , s

)dy−Y

′. Note that

64

1. the support of µx(y−Y , s) is a lattice;42

2. the joint distribution µx(y−Y , s) is log-supermodular in (y−Y , s) if it is log-supermodular in all

pairs of arguments. This is the case if:

∀j = 1, ..., Y − 1 : qY (x)∂2 ln γ

∂yY ∂yj− qj(x)

∂2 ln γ

∂y2Y≥ 0 (26)

∀(i, j) ∈ 1, · · · , Y − 12, i = j :

qY (x)2 ∂

2 ln γ

∂yi∂yj+ qi(x)qj(x)

∂2 ln γ

∂y2Y− qj(x)qY (x)

∂2 ln γ

∂yi∂yY− qi(x)qY (x)

∂2 ln γ

∂yj∂yY≥ 0 (27)

3. the indicator function, 1y′ℓ < y, is log-supermodular in (y, y′ℓ).

Therefore, the product 1y′ℓ < yµx(y′−Y , s) is log-supermodular in (y,y′

−Y , s) since the product of log-

supermodular functions is log-supermodular). This implies in turn that g(·) is log-supermodular in (y, s)

since log-supermodularity is preserved under integration.

We have thus proven that if (26) (stated as condition (UE-YD) in Theorem 6) and (27) (stated as

condition (EE-YD’) in Corollary 4) hold, then PrΓ y′ℓ ≤ y|σ(x,y′) = s is decreasing in s.

Second, we will provide a condition guaranteeing that EΓ [∂σ(x,y′)/∂xk | σ(x,y′) = 0] ≤ 0. First

note that σ(x,y) = 0 is equivalent to yY = bY − 1qY (x)

∑Y−1j=1 qj(x)(yj − bj). Thus:

σ(x,y) = 0 =⇒ ∂σ(x,y)

∂xk=

Y−1∑j=1

qkjqY (x)− qkY qj(x)

qY (x)(yj − bj).

Therefore, a sufficient condition for EΓ [∂σ(x,y′)/∂xk | σ(x,y′) = 0] ≤ 0 is that the value of the linear

program

maxy

Y−1∑j=1


qY (x)(yj − bj)

subject to:1

qY (x)

Y−1∑j=1

qj(x)(yj − bj) ≤ bY − yY

yj≤ yj j = 1, · · · , Y − 1

be negative. This is a sufficient condition which ensures that ∂σ(x,y)/∂xk ≤ 0 over the entire set of

42This follows from the proof of Theorem 5 with one extra step to deal with the joint distribution of (y−Y , s)(instead of the conditional distribution of y−Y given s). Note that we have proven that yY = R(s,y−Y ) ∈[yY, yY

]for any s ≥ 0. Hence, for two points s ≥ 0 and s′ ≥ 0, s ∧ s′ ≥ 0 and s ∨ s′ ≥ 0 and

(y−Y ∧ y′

−Y , R(s ∧ s′,y−Y ∧ y′−Y )

)∈

Y×j=1

[yj, yj

]and

(y−Y ∨ y′

−Y , R(s ∨ s′,y−Y ∨ y′−Y )

)∈

Y×j=1

[yj, yj

].

Hence the support of µx is a lattice and µx (y−Y ∧ y′−Y , s ∧ s′) and µx (y−Y ∨ y′

−Y , s ∨ s′) are strictly positive.This is why we can take lnµx in the next step.

65

y’s such that σ(x,y) = 0. Although strong, this condition is the minimal ‘distribution-free’ one. This

program can be rewritten in standard form as:

Y−1∑j=1


qY (x)

(yj− bj

)+max

Y

Y−1∑j=1


qY (x)Yj (28)

subject to:Y−1∑j=1

qj(x)Yj ≤ −σ(x,y

)0 ≤ Yj j = 1, · · · , Y − 1

where Yj = yj−yj. The first thing to note about program (28) is that if there exists a j ∈ 1, · · · , Y −1

such that qj(x) < 0, then (28) is clearly unbounded: in that case, one can set Yj → +∞ whenever

qj(x) < 0 and Yj′ = 0 when qj′(x) ≥ 0, which satisfies the constraints and gives (28) an infinite value.

We thus assume that qj(x) ≥ 0 for all j ∈ 1, · · · , Y − 1.

The dual of (28) is simply:

Y−1∑j=1


qY (x)

(yj− bj

)+min

Z

⟨−Zσ

(x,y

)⟩(29)

subject to: qj(x)Z ≥ qkjqY (x)− qkY qj(x)

qY (x), j = 1, · · · , Y − 1

Z ≥ 0

Assuming that σ(x,y

)< 0, the solution to the latter program is then to set:

Z = maxj′

qkj′

qj′(x)

− qkY

qY (x)

and the value of the linear program (28) (which equals that of its dual) is:

Y−1∑j=1


qY (x)

(yj− bj

)−

Y∑j=1

[maxj′

qkj′

qj′(x)

− qkY

qY (x)

]qj(x)

(yj− bj

)

=

Y∑j=1

[qkjqj(x)

−maxj′

qkj′

qj′(x)

]qj(x)

(yj− bj

). (30)

The requirement that this be negative yields the condition in the theorem.


Under the separability assumption 2 in the Theorem, p(x,y) = p1(x, y1)+p2(x−k,y) where x−k includes

all components of x but xk. Hence:

∂σ(x,y)

∂xk=

∂p1(x, y1)

∂xk− ∂b(x)

∂xk

66

which only depends on the first job attribute y1. Then:

EΓ

[∂σ(x,y′)

∂xk| σ(x,y′) = s, y′1 = y

]=

∂p1(x, y)

∂xk− ∂b(x)

∂xk

which is increasing in y under Assumption 3. Condition (CMP) thus holds, hence the result.

67

multidimensional sorting under random search · dimensional sorting in a frictionless assignment...

Documents