using panel data to easily estimate hedonic demand functions · figure 1: recovering individual...
TRANSCRIPT
Using Panel Data To Easily Estimate
Hedonic Demand Functions∗
Kelly C. Bishop†
Arizona State University
Christopher Timmins‡
Duke University and NBER
October 2015
Abstract
The hedonics literature has often asserted that if one were able to observe the same
individual make multiple purchasing decisions, one could recover rich estimates of
preferences for a given amenity. In particular, in the face of a changing price schedule,
observing each individual twice is sufficient to recover the underlying (linear) demand
function separately for each individual, with no restrictions on heterogeneity in either
the intercept or the slope. Observing individuals on more than two occasions allows
for greater flexibility in this heterogeneity. Using a rich panel dataset, we estimate the
fully-heterogenous demand functions for clean air in the Bay Area of California. Our
results suggest that such such heterogeneity is important from a policy perspective.
Key Words: Hedonic Demand, Valuation, Ozone
JEL Classification Numbers: Q50, Q51, R21, R23
∗The initial version of this work was completed as part of Bishop’s doctoral dissertation at Duke Uni-versity. We thank seminar participants at the Triangle Resource and Environmental Economics Workshop,Camp Resources, University of Wisconsin, Iowa State University, AERE Winter Meetings, Syracuse Uni-versity, Montana State University, and XXXX for their helpful comments and suggestions.†Kelly Bishop, Arizona State University, Tempe, AZ 85287. [email protected]‡Chris Timmins, Duke University, Box 90097, Durham, NC 27708. [email protected]
1
1 Introduction
In many applications of the hedonic model, one may wish to value non-marginal changes in
amenity values, requiring the estimation of the underlying demand function. This, however,
is not without costs; in addition to a potentially high computational burden, assumptions
may need to be made which restrict preference heterogeneity. In this paper, we show how to
recover fully-heterogeneous structural preference parameters with a computationally-light,
data-driven estimation approach.
The traditional estimation approach associated with hedonic demand is based on
Rosen’s seminal 1974 paper. However, in addition to suffering from issues related to omitted
variables in the first stage and identification in the second stage, this approach suffers from
a difficult second-stage endogeneity problem: when the hedonic price function is non-linear
in the amenity of interest, home buyers simultaneously choose both the hedonic price and
the quantity of that amenity that they will consume. The problem of consumer choice
subject to a non-linear budget constraint creates a difficult endogeneity problem when
using statistical inference to recover the parameters describing those preferences [Epple
(1987), Bartik (1987)]. Typically, IV approaches in this literature have relied upon some
combination of – for example, certain socio-demographic variables enter directly into the
MWTP function while others do not and the excluded variables can be used as instruments
for endogenous attribute levels or the MWTP function only contains socio-demographic
variables in linear form, while higher-order terms are excluded from the function and can
therefore serve as instruments. These assumptions are not testable and place arbitrary
restrictions on the estimated heterogeneity in MWTP; aside from market dummies, these
instruments are generally hard to justify. These difficulties have led most researchers to
forgo estimating the MWTP function altogether, restricting their analysis to marginal
policy changes only.
In their 2005 paper, Bajari and Benkard demonstrate that this endogeneity problem
may be avoided by replacing the statistical inference used in the second stage of Rosen’s
method with a “preference inversion” procedure which inverts the first-order conditions
of utility maximization to recover demand at the individual level. The strengths of this
approach lie in (i) its admission of any form of preference heterogeneity and (ii) its avoid-
ance of the endogeneity problems described above. Its primary weakness, however, comes
in the strict functional-form assumptions that are required to perform the inversion proce-
2
dure, i.e., when observing each individual on only one purchase occasion (one data point),
it is only possible to recover the hedonic demand function by fully assuming its form.1
When the shape of the hedonic demand function is so strongly dictated by functional-form
assumptions, the value in going beyond the first stage of Rosen’s two-step procedure is
limited.
Figure 1: Recovering Individual Demand using Panel Data
Our method recovering the hedonic demand function is quite intuitive: observing
the exact same household make purchase decisions in multiple markets (under different
price schedules), allows one to trace-out individual-specific demand functions. Given a
graphical representation of the problem, our empirical approach becomes clear; observing
each household on at least two purchase occasions allows us to “connect the dots” and
recover a unique (linear) demand curve for each individual in the dataset. Figure (1)
describes this approach for a given housing amenity, Z; when facing the associated price
schedules on purchase occasions t and τ , household i chooses to consume Z∗i,t and Z∗i,τ ,
respectively. These two observations are sufficient for recovering a household-specific linear
demand function for household i.
1Bajari and Benkard recognizes this, pointing-out that these assumptions could be relaxed if the re-searcher is able to observe the same individual home buyer on multiple purchase occasions.
3
Importantly, the intuition behind our identification is straightforward. The sched-
ule of market prices (i.e., the hedonic price gradient) exhibits exogenous variation across
markets (e.g., across time) from the point of view of any given household, as we assume
that households are price-takers in the real estate market. Thus, we are observing each
individual household choosing how much of amenity Z to consume under different supply
conditions, allowing us to trace out the hedonic demand curve for each household in the
data. Naturally, the demand function that we recover with two observed purchases is lin-
ear. However, we impose no additional restrictions on the functional form of demand and
no restrictions on the heterogeneity of the individual-specific coefficients for intercept and
slope. We show how these demand coefficients may be decomposed to isolate the effect
of fixed demographic characteristics, such as race. We also show that with three or more
observations for each household, we can estimate the effect of time-varying demographic
characteristics, such as income.2
We apply this methodology to recovering individual-specific demand for air quality
in the Bay Area of California. In particular, we estimate the marginal willingness to pay
to avoid ground-level ozone pollution. For this analysis, we create a rich panel dataset
describing real estate transactions and the associated buyers over the thirteen-year period
of 1991 to 2003. To create this dataset, we first isolate and match individual households
over time, generating a unique “id” for each household in the data. We are then able to
merge household demographic characteristics provided on mortgage applications per the
Home Mortgage Disclosure Act of 1975. Finally, we generate a house-level annual measure
of ozone pollution from the monitor data provided by the California Air Resources Board.
In the implementation of this approach, we allow for the most flexible representation
of preferences possible with the available data. We begin by estimating a non-parametric
regression to recover a flexible set of time-varying hedonic price gradients which includes
a full set of fixed-effects at the Census-tract level. In the second stage, we estimate the
most flexible set of inverse demand functions that may be identified without functional
form assumptions – an individual-specific linear demand curve.
We find considerable heterogeneity in the marginal willingness to pay to avoid ozone
pollution across individuals. The median value to avoid one additional day exceeding the
California state maximum 1-hour ozone concentration is is $595. Given our framework,
2With three or more observations per household, we could alternatively allow for greater flexibility inthe shape of the demand curve, such as allowing the function to be quadratic in Z.
4
the heterogeneity becomes evident by reporting the elasticity of this marginal willingness
to pay with respect to ozone; the median value of the elasticity is 0.36.
This paper proceeds as follows. Section 2 describes our methodological approach for
recovering the hedonic demand for air quality in the Bay Area of California. The creation
of our unique two-sided panel dataset and its summary statistics are discussed in Section
3. Section 4 presents our results. Finally, Section 5 concludes.
2 Model
In this section, we describe our data-driven approach to recovering the structural param-
eters of the hedonic model. First, we discuss the non-parametric repeat sales model that
we use to recover the hedonic gradient with respect to a particular amenity. While this ap-
proach allows for the fewest assumptions while still controlling for house-level fixed-effects,
a much simpler specification would be sufficient for the identification of the second stage.
Next, we show how using a panel of buyers (with implicit prices from the first stage) we
can recover fully heterogeneous inverse demand functions.
2.1 A Non-Parametric Fixed-Effects Approach to Recovering the
Hedonic Gradient
The first stage of the model estimates the hedonic price function, which relates the price
of house j transacted in period t (Pj,t) to its attributes: both those that vary over time,
Zj,t, and those that do not, Xj. The gradient can be estimated in any number of ways,
although a simple, linear framework may impose unrealistic restrictions on the equilibrium
underlying the hedonic price function [Heckman, Ekeland, and Nesheim (2004)]. Addi-
tionally, concern must be paid to the potential bias caused by omitted variables.3 Using
panel data with repeat-sales and controlling for house fixed-effects avoids the bias caused
by time-invariant house characteristics, whether observed or unobserved. Thus, following
3Bias from omitted variables arises when the regressors are not orthogonal to the regression error. Thiswould be the case if data describing important neighborhood or housing attributes (e.g., distance to citycenter or curb appeal) are not available to the researcher, but those variables are correlated with theattribute of interest.
5
Lee and Mukherjee (2014), we adopt a non-parametric first-difference approach for the es-
timation of the gradient and begin by writing down a flexible representation of the hedonic
price function:
Pj,t = f(Zj,t) +Xj + νj,t (1)
where f(·) is an unspecified, flexible function of time-varying attributes of house j (or its
neighborhood) and Xj represents all time invariant attributes (whether they are observed
by the researcher or not). νj,t is assumed to be distributed i.i.d. with mean zero and
constant variance σ2ν . While we allow νj,t to be correlated with Xj, we assume that it
is uncorrelated with the elements of Zj,t. In our application, Zj,t will consist of (i) a
measure of ground-level ozone pollution at house j in year t and (ii) the year of the housing
transaction.4
We take a first-order Taylor series expansion of f(·) around some vector χ.5 This
vector has the same dimension as the vector Zj,t and is comprised of the set of points that
the linear regression is evaluated at:
Pj,t = f(χ) + (Zj,t − χ)′f ′(χ) +Xj + νj,t (2)
We denote each property’s prior year of sale with t−1 and write Equation (2) for
this prior sale:6
Pj,t−1 = f(χ) + (Zj,t−1 − χ)′f ′(χ) +Xj + νj,t−1 (3)
This allows us to subtract Equation (3) from Equation (2), differencing out both
the time-invariant attributes Xj and the time-invariant term f(χ).7
Pj,t − Pj,t−1 = (Zj,t − Zj,t−1)′f ′(χ) + (νj,t − νj,t−1) (4)
4In practice, we use the number of days in the course of each year that the state maximum 1-hour ozoneconcentration of 90 ppb was violated. Other potential measures of ground-level ozone include the 1-hourand 8-hour average maximum concentrations over the course of each year. Ozone exceedences are a bettermeasure of extreme pollution events that house buyers may be more aware of.
5The remainder term associated with the Taylor expansion is ignored.6This is expanded around the same vector χ.7Losing f(·) from this expression does not pose a problem, as our interest is only in recovering the
hedonic gradient, i.e., the slope of the hedonic price function, which is represented non-parametrically byf ′(·).
6
Denoting first differences with “∼” and replacing f ′(·) with β(·), we arrive at:
P̃j,t = Z̃ ′j,tβ(χ) + ν̃j,t (5)
Lee and Mukherjee (2014) show that β(χ) (i.e., the slope of the hedonic price func-
tion at Zj,t = χ) may be recovered with the following least-squares minimization procedure:
β(χ) = argminβ(χ)
J∑j=1
Tj∑t=1
(P̃j,t − Z̃ ′j,tβ(χ)
)2Kh(Zj,t − χ)Kh(Zj,t−1 − χ) (6)
where Tj denotes the number of first-differenced observations for each house, j. Important
to the local-linear regression procedure is the choice of weights placed on data as one moves
further from the point of evaluation, χ. For our estimation, Kh(·) is the Gaussian kernel:
Kh(Zj,t − χ) =2∏
k=1
1
hσ̂Zk
1√2πexp
{− 1
2
(Zk,j,t − χkhσ̂Zk
)2}(7)
where h represents the kernel bandwidth and σ̂Zkis the standard deviation of the kth
element of Zj,t.
In practice, this allows us recover an estimate of β(χ) for all observed values of Zj,t.
We therefore end up with a (potentially) different estimate of β(χ) at each data point; this
is in contrast to a fully parametric estimation procedure, where β would be constrained to
be the same for every value of Zj,t.
Given the linear nature of this problem, estimation of β(χ) can be summarized as
the following weighted least-squares regression:
β(χ) = (Z̃ ′WχZ̃)−1Z̃ ′WχP̃ (8)
where n =∑J
j=1 Tj is the total number of first-differenced observations, Z̃ is an (n x
2) matrix of first-differenced (within house) regressors, P̃ is an (n x 1) vector of first-
differenced (within house) house prices, and Wχ is an (n x n) matrix of weights, such that
Wχ = diag(Kh(Zj,t − χ)
).
7
2.2 Recovering MWTP Functions using a Panel of Buyers
In this section, we demonstrate that it is straightforward to recover the flexible distribution
of linear MWTP functions with panel data on home purchasers. We begin by specifying
the utility of household i with non-housing consumption (Ci,t) and income (Ii,t), choosing
a house j with attributes (Xj, Zj,t), in period t:
U(Xj, Zj,t, Ci) = α0,i + α1,iXj + α2,iX2j + α3,iZj,t +
α4,i
2Z2j,t + α5,iCi,t (9)
We use a monotonic transformation of utility to normalize α5,i to one and incorporate
household i’s budget constraint (Ci,t +Rj,t ≤ Ii,t), to arrive at the indirect utility function:
Vi,j,t = α0,i + α1,iXj + α2,iX2j + α3,iZj,t +
α4,i
2Z2j,t + (Ii,t −Rj,t) (10)
where Rj is the imputed annual rent or housing expenditure associated with house j. In
practice, we calculate this figure as 5% of the observed transaction price.8 As Rj is a
constant fraction of Pj, the following relationship holds:∂Rj
∂Zj= (0.05) · ∂Pj
∂Zj.
We now demonstrate that this functional form will yield the same linear MWTP
specification that is common in the hedonics literature. Moreover, as long as MWTP is not
a function of time-varying individual attributes, the parameters of the demand function
can be identified with just two observations for each home buyer. Consider the first-order
conditions associated with choosing the optimal value of Zj,t:
∂Vi,j,t∂Zj,t
= α3,i + α4,iZj,t −∂Rj,t
∂Zj,t
= 0 (11)
For household i, we need to recover the values of two unknown parameters: (α3,i, α4,i).9
Fortunately, we observe household i in panel data on (at least) two occasions. For house-
8This is a commonly used discount in the literature. A similar discount is estimated in Peiser andSmith (1985).
9Murray (1975) takes the approach of using an equation very similar to (15), except without individual
heterogeneity in preference parameters, to form an estimating equation. Recovering estimates of∂Rj,t
∂Zj,t
from the first-stage, he then recovers estimates of α̂3 and α̂4 from a two-stage least squares procedureusing income and prices as instruments.
8
holds observed twice, we then have two equations in two unknowns:
α3,i + α4,iZj∗(i),1 = ρj∗(i),1 α3,i + α4,iZj∗(i),2 = ρj∗(i),2 (12)
where ρj∗(i),t =∂Rj,t
∂Zj,t
∣∣∣∣Zj,t=zj∗(i),t
are recovered in the first-stage nonparametric regressions;
t = 1, 2 are the two periods in which individual i purchases.
Solving these two equations yields:
α̂3,i =ρj∗(i),2zj∗(i),1 − ρj∗(i),1zj∗(i),2
zj∗(i),1 − zj∗(i),2α̂4,i =
ρj∗(i),1 − ρj∗(i),2zj∗(i),1 − zj∗(i),2
(13)
2.2.1 Allowing MWTP to Vary with Individual Household Attributes
It is possible to determine how these inverse demand functions differ systematically with
fixed individual attributes, Ai, such as race, education level, or gender. This is easily done
by performing the following least-squares regressions in a separate stage:
α̂3,i = δ3,0 + A′iδ3,1 + ς3,i α̂4,i = δ4,0 + A′iδ4,1 + ς4,i (14)
However, income is often included as a determinant of MWTP, but is usually time-
varying (in panel data) and hence cannot be summarized by the vector of fixed attributes,
Ai. Murray (1983) shows why it is important for the MWTP function to vary with non-
housing consumption expenditure. We demonstrate next how time varying non-housing
consumption expenditure can be included with access to three observations on each in-
dividual household. We begin by specifying household i’s indirect utility from choosing
house j in period t as:
Vi,j,t = α0,i + α1,iXj + α2,iX2j + α3,iZj,t +
α4,i
2Z2j,t + α5,iZj,t(Ii,t −Rj,t) + (Ii,t −Rj,t) (15)
where we now allow the marginal utility of Zj,t to vary with non-housing consumption
expenditure.10 The first-order condition associated with the individual’s optimal choice of
10Note that our indirect utility specifications could be expanded to include interactions with multipletime-varying attributes. Were we to do so, we would simply need to solve for multiple first-order conditions(using multiple derivatives of the price function) for each observed purchase. In this paper, we keep thingsrelatively simple and avoid this complication.
9
Zj,t is then given by:
∂Vi,j,t∂Zj,t
= α3,i + α4,iZj,t + α5,i(Ii,t −Rj,t)−α5,iZj,t∂Rj,t
∂Zj,t− ∂Rj,t
∂Zj,t︸ ︷︷ ︸−ρj,t(1+α5,iZj,t)
= 0 (16)
For households which may be observed on three separate purchase occasions, we
have three equations in as many unknowns (α3,i, α4,i, α5,i):
α3,i + α4,iZj∗(i),1 + α5,i(Ii,1 −Rj∗(i),1)−ρj∗(i),1(1 + α5,iZj∗(i),1) = 0 (17)
α3,i + α4,iZj∗(i),2 + α5,i(Ii,2 −Rj∗(i),2)−ρj∗(i),2(1 + α5,iZj∗(i),2) = 0
α3,i + α4,iZj∗(i),3 + α5,i(Ii,3 −Rj∗(i),3)−ρj∗(i),3(1 + α5,izj∗(i),3) = 0
Solving this system provides us with the following equations for household i’s pref-
erence parameters:
α̂4,i =θj∗(i),3(ρj∗(i),1 − ρj∗(i),2)− θj∗(i),2(ρj∗(i),1 − ρj∗(i),3)θj∗(i),3(Zj∗(i),1 − Zj∗(i),2)− θj∗(i),2(Zj∗(i),1 − Zj∗(i),3)
α̂5,i =α̂4,i(Zj∗(i),1 − Zj∗(i),3)− ρj∗(i),1 + ρj∗(i),3
ρj∗(i),1Zj∗(i),1 − ρj∗(i),3Zj∗(i),3 + (Ii,3 −Rj∗(i),3)− (Ii,1 −Rj∗(i),1)
α̂3,i = ρj∗(i),1(1 + α5,iZj∗(i),1)− α̂4,iZj∗(i),1 − α̂5,i(Ii,1 −Rj∗(i),1)
where:
θj∗(i),2 = ρj∗(i),1zj∗(i),1 − ρj∗(i),2zj∗(i),2 + (Ii,2 −Rj∗(i),2)− (Ii,1 −Rj∗(i),1) (18)
θj∗(i),3 = ρj∗(i),1zj∗(i),1 − ρj∗(i),3zj∗(i),3 + (Ii,3 −Rj∗(i),3)− (Ii,1 −Rj∗(i),1)
Solving for all three parameters, household i’s MWTP function for z is given by:
ρj∗(i),t =α̂3,i + α̂4,iZj∗(i),t + α̂5,i(Ii,t −Rj∗(i),t)
(1 + α̂5,iZj∗(i),t)(19)
which, as α̂5,i is presumably a very small number, closely approximates a linear
function of Zj,t.
10
3 Data
To implement our first-stage nonparametric regressions, we use data describing single-
family housing transactions over the period 1991 to 2003 in the Bay Area of California.
For the second-stage estimation, we require data on a panel of home buyers. For this, we
assemble a dataset by combining information from the real estate transactions dataset and
a dataset describing mortgage applicants’ demographic characteristics obtained through
the Home Mortgage Disclosure Act (HMDA).
3.1 Property Transactions Data
The real estate transactions data we employ cover the six core counties of the San Francisco
Bay Area (Alameda, Contra Costa, Marin, San Francisco, San Mateo, Santa Clara). The
data are purchased from Dataquick and include transaction dates, prices, loan amounts,
and buyers’, sellers’ and lenders’ names for all transactions. In addition, the data for the
final observed transaction include housing characteristics, such as exact street address,
square footage, year built, lot size, number of rooms, number of bathrooms, and number
bedrooms.
As Dataquick publishes housing characteristics for the final assessment of each prop-
erty, we take steps to ensure that the house has not undergone any major changes. First,
to control for land sales or re-builds, we drop all transactions where “year built” is miss-
ing or with a transaction date that is prior to “year built.” Second, to control for major
property improvements that would not present as a re-build, we drop properties that ex-
perience a yearly appreciation/depreciation rate that is more than four times greater than
the average appreciation/depreciation rate (in absolute value).11 Additionally, we drop
transactions where the price is missing, negative, or zero. After using the consumer price
index to convert all transaction prices into 2000 dollars, we drop one percent of observations
from each tail to minimize the effect of outliers. As we merge in the pollution data using
the property’s geographic coordinates, we drop properties where latitude and longitude are
missing.12
11In a repeat-sales analysis, similar in spirit to Case and Shiller (1989), we regress log prices on a set ofhouse and year dummies, which provides us a crude measure of yearly appreciation (or depreciation) ratesin the Bay Area.
12Although we use repeat-sales estimation approach, we make some cuts on property characteristics to
11
Finally, we restrict our analysis to properties with multiple sales over the thirteen
year period.13 This yields a final sample of 277,011 transactions (i.e., property-year obser-
vations) comprised of 126, 227 unique properties. Table (1) describes the data.
Table 1: Housing Transactions Data
Full Sample Repeat Sales Samplen = 630,384 n = 277,011
variable mean st.dev. mean st.dev.Price (in year 2000 $) 379,368.70 202,887.70 371,189.30 192,626.00Sq. Ft. House 1,716.42 679.78 1,651.01 634.30Sq. Ft. Lot 7,148.83 7,962.83 6,523.88 7,068.32Num. Bedrooms 3.20 0.90 3.12 0.88Num. Bath 2.12 0.73 2.09 0.69
3.2 Buyer Characteristics Data
In order to implement our second-stage estimator, we need to create a panel of buyers, their
characteristics, and their chosen properties over time. This involves first identifying and
matching individuals over time in the property transactions dataset and, second, merging-
in the individual attribute data from the HMDA dataset. This is possible as we have
buyer and seller names in the transaction record. We use the algorithm found in Bayer,
McMillan, Murphy, and Timmins (2015).14
The individual attributes data come from a dataset on mortgage applications pub-
lished through the Home Mortgage Disclosure Act. These data provide information on all
mortgage applications filed in the Bay Area over the relevant time period. Included are all
ensure all single-family homes are trading in the same market: dropping properties where year built ispre-1850, lotsize is either zero or greater than three acres, square footage is either less than 400 or greaterthan 10,000, number of bedrooms or bathrooms is greater than ten, number of total rooms is greater thanfifteen, or number of stories is greater than three.
13We drop properties that sell more than once within a calendar year or more than five times over thethirteen year period.
14To link individuals over time within the transactions data, the algorithm matches on the individual’sname and the date of the transaction. We require that the last name matches exactly and the first namematches on the first letter. We also require that for two properties to be linked to the same buyer, theymust have transacted within 12 months of one another. For example, I see A. Anderson sell a home inSeptember 1997 and assume that it is the same A. Anderson who purchases a home within the periodSeptember 1996 to September 1998.
12
applicants’ race and gender, income, loan amount, lender name, and Census tract of the
property.
We are able to merge the individual attribute data in the HMDA dataset to the
buyers in the property transactions dataset using the common variables of lender name,
loan amount, transaction date, and Census tract of the property. We successfully match
approximately two-thirds of individuals in the transactions sample to the uncleaned HMDA
sample.15 Based on the algorithm for tracking buyers through time, we keep only obser-
vations where the buyer is observed on either two or three purchase occasions. Table (2)
provides statistics describing the one-purchase, two-purchase, and three-purchase buyers.
Table 2: One-, Two-, and Three- Purchase Samples
One-Purchase Two-Purchase Three-Purchaseindividuals=111,415 individuals=22,190 individuals=1,824
variable mean median mean median mean medianAsian 0.21 0 0.22 0 0.22 0Black 0.03 0 0.02 0 0.02 0Hispanic 0.10 0 0.11 0 0.16 0White 0.66 1 0.65 1 0.61 1Income (in yr 2000 $) 117,098 97,599 121,202 101,013 121,471 97,972Price (in yr 2000 $) 386,912 339,336 413,934 359,975 388,411 333,936
3.3 Ozone Data
The ozone data we employ are taken from the California Air Resources Board.16 We use
yearly ozone data from all thirty-five monitors in the nine counties of Alameda, Contra
Costa, Marin, Napa, San Francisco, San Mateo, Santa Clara, Solano, and Sonoma over the
period 1990 to 2004. In particular, we use the monitor data to construct property-specific
measures of the number of days exceeding the one-hour state standard (i.e., 90 parts per
billion).
In addition to the ozone readings, the dataset provides information on the “year
coverage,” or the percent of time (during the relevant high-ozone season) each particu-
15We drop observations where either race or income is missing.16Publicly available at www.arb.ca.gov/adam/.
13
lar monitor was available and the geographic coordinates of each monitor.17 Using this
coverage variable, we drop monitors with less than 60 percent coverage in a given year
(amounting to less than 4 percent of the available monitor-year observations).
Using the latitudinal and longitudinal coordinates of the monitors and the proper-
ties, we use the “Great Circle” estimator to compute the distance to all monitors from each
property. We then create a three-year weighted average for each property of all monitors’
readings, weighting distance by one over distance-squared. In order to mitigate bound-
ary effects, we include monitor data from the surrounding counties of Napa, Solano, and
Sonoma, in addition to the six counties that appear in our transactions data.
The maps in Figure (2) describe the spatial distribution of ground-level ozone pollu-
tion in the Bay Area. Two important features emerge from these pictures. First, geography
is largely responsible for cross-sectional variation in pollution. San Francisco (on the tip
of the peninsula extending from the South Bay into the Pacific Ocean), Oakland (in the
East Bay), and San Jose (at the southern end of the San Francisco Bay) all face heavy
traffic congestion. Wind patterns, however, mitigate much of the ozone pollution in San
Francisco and Oakland, while worsening it in San Jose. Mountains ringing the southern
end of the Bay Area block air flows and contribute to this effect. The mountains on the
eastern side of the Bay are similarly responsible for high levels of pollution along the I-680
corridor in eastern Contra Costa and Alameda counties.
These maps also make clear that there is significant variation in pollution levels over
time. Figure (3) describes this time variation. Much of this is due to a variety of programs
that were initiated after California passed its Clean Air Act in 1988. After multiple years
of relatively low ozone pollution, while the Bay Area counties were designated as being
out of attainment according to EPA rules, the Bay Area experienced the worst air quality
since the mid-eighties in 1995, corresponding to the time at which it was place back into
attainment and mandatory ozone reduction programs were eliminated. In 1996, the Vehicle
Buyback Program for cars manufactured in 1975 or before was implemented. This program,
in addition to the Lawn Mower Buyback and the Clean Air Plan of 1997, presumably
contributed to the summer of 1997 being the cleanest season since the early 1960s. The
1998 season experienced considerably more ozone pollution; it corresponded to the Bay
Area being placed back into non-attainment status. With the strongest mandatory ozone
reduction policies (mostly targeted at mobile sources), the remaining years of our sample
17Some monitors opened or closed permanently during this time period.
14
Figure 2: Ozone Pollution in the Bay Area
returned to relatively low ozone levels. Also during the late 1990s, almost 100 emitting
facilities were reviewed under the Title V Program Major Facility Review. There is no
reason to expect that any of these programs would have had special economic consequences
for housing prices any particular part of the Bay Area, aside from those coming through
changing amenity values.
15
Figure 3: Average Number of Days Exceeding CA Ozone Limit
16
4 Results
In this section, we describe the results of each estimation procedure. First, we illustrate
the results of our non-parametric estimation that is used to recover hedonic gradients with
respect to ground-level ozone pollution. Second, we consider the average MWTP and
heterogeneous MWTPs recovered using the Bajari-Benkard inversion with cross-sectional
data. Third, we use the panel of 22, 190 home buyers who are observed purchasing two
houses in our dataset to recover estimates of fully-heterogeneous inverse demand functions.
Fourth, we use the sample of 1, 824 individuals who buy three times to recover estimates
of inverse demand functions that are allowed to vary with non-housing expenditure.
4.1 Results from the First-stage Hedonic Regressions
The local linear estimation allows the estimate of the slope coefficient, β(χ), to differ for
each observed value of ozone pollution in each year of the panel. Thus, for each year, the
gradient can be graphically represented as a flexible function of ozone pollution.
In the current estimation, we oversmooth these gradients to be monotonic.18 When
these gradients are fully flexible, an additional issue arises: individuals who choose a level
of z that lies on a downward sloping part of the gradient do not satisfy the second order
condition for utility maximization unless the WTP function is sufficiently steep. In the case
of the Bajari-Benkard inversion (where the WTP function is flat), this sufficient steepness
cannot be achieved.
For ease of exposition, we show the hedonic gradient for three years of the panel
(1992, 1997, and 2002) in Figure (4). These gradients are everywhere negative in sign
(i.e., an increase in the level of pollution at a house leads to a reduction in its sale price,
ceteris paribus). Moreover, these negative gradients are increasing in the level of pollution.
This suggests that individuals may sort based on preferences (i.e., those with a lower
willingness to pay tend to live in places with greater levels of pollution). This result alone
has important consequences for policy-makers. It suggests that using the average MWTP
to value pollution reductions in the most polluted places may overstate the value of those
reductions.
18This is achieved by setting the bandwidth over ozone to 5.92 and the bandwidth over years to 2.01.
17
Figure 4: Hedonic Price Gradients (1992, 1997, 2002)
4.2 Results from the Second Stage
4.2.1 Preference Inversion with Cross-Sectional Data
Using the full sample of one, two, and three-time buyers buyers (treated as a cross-section),
we perform the BB preference inversion. The trimmed distribution of MWTP using the
time-varying gradients is shown in Figure (5). The median MWTP is −594.60. By assump-
tion, the elasticity of each individual’s MWTP function with respect to ozone exceedences
is zero.
18
Figure 5: Trimmmed Distribution of MWTP
19
4.2.2 Demand Estimation with Two-Purchase Panel
Employing our approach with a panel of buyers who are observed on two separate purchase
occasions allows for the recovery of both an intercept and a slope coefficient for the MWTP
function for each individual. Thus, each individual’s MWTP function is allowed to vary
with the level of ozone pollution, which can have important implications for valuing large
changes in the disamenity. We find evidence of considerable heterogeneity,with a median
slope value of -$-81.97.
It is difficult to summarize the joint distributions of intercepts and slopes in a single
figure. We instead report the distribution of MWTP elasticities (with respect to ozone
exceedances). See Figure (6).
Figure 6: Trimmed Distribution of Price Elasticities
The distribution of these elasticities also shows significant heterogeneity. With a
median value of 0.36, the distribution has an inter-quartile range of 1.07. The elasticity of
MWTP with respect to ozone pollution is not, however, strongly correlated with the level
of ozone pollution people are experiencing (i.e., a correlation of 0.01).
20
4.2.3 Demand Estimation with Three-Purchase Panel
As with the two-purchase panel, we report the distribution of MWTP elasticities in Figure
(7). In this case, with access to data describing three separate purchase occasions, we
additionally report the distribution of elasticities with respect to income.
Figure 7: Trimmed Distributions of Elasticities for Three-Purchase Panel
21
5 Conclusion
With the growing availability of panel datasets that match household attributes to the
prices and characteristics of the properties they purchase, our method is increasingly im-
plementable and is applicable to wide variety of non-marginal policy analysis, such as those
associated with the welfare implications of changing school quality, crime rates, or local
pollutants. Importantly, this method allows for the evaluation of heterogeneous preferences
(and taste-based sorting) and for the analysis of non-marginal policy changes without en-
countering the difficult endogeneity issues associated with the recovery of hedonic demand.
22
6 References
Anselin, L. and J.L. Gallo (2006). “Interpolation of Air Quality Measures in Hedonic House
Price Models: Spatial Aspects.” Spatial Economic Analysis. 1(1):31-52.
Bartik, T.J. (1987). “The Estimation of Demand Parameters in Hedonic Price Models.”
Journal of Political Economy. 95(1):81-88.
Bajari, P. and L. Benkard (2005). “Demand Estimation With Hetrogeneous Consumers
and Unobserved Product Characteristics: A Hedonic Approach.” Journal of Political
Economy. 113(6):1239-1276.
Bajari, P. and M. Kahn (2005). “Estimating Housing Demand With an Application to
Explaining Racial Segregation in Cities.” Journal of Business and Economic Statistics.
23(1):20-33.
Bayer, P., McMillan, R., Murphy, A., and Timmins, C. (2008). “A Dynamic Model of
Demand for Houses and Neighborhoods.” Mimeo.
Bishop, K. (2008). “A Dynamic Model of Location Choice and Hedonic Valuation.” Mimeo.
Bound, J., Jaeger, D.A. and Baker, R.M. (1995). “Problems With Instrumental Variables
Estimation when the Correlation Between the Instruments and the Endogenous Explana-
tory Variable is Weak.” Journal of the American Statistical Association. 90:443-450.
Boyle, K.J., Poor, P.J. and Taylor, L.O. (1999). “Estimating the Demand for Protecting
Freshwater Lakes From Eutrophication.” American Journal of Agricultural Economics.
81:1118-1122.
Brown and Rosen (1982). “On the Estimation of Structural Hedonic Price Models.” Econo-
metrica. 50(3):765-768.
23
Case, K.E., and Schiller, R.J. (1989). “The Efficiency of the Market for Single-Family
Homes.” American Economic Review. 79(1):125-137.
Chattopadhyay, S. (1999). “Estimating Demand for Air Quality: New Evidence Based on
the Chicago Housing Market.” Land Economics. 75:22-38.
Epple, D. (1987). “Hedonic Prices and Implicit Markets: Estimating Demand and Supply
Functions for Differentiated Products.” Journal of Political Economy. 95(1):59-80.
Fan, J. and Gijbels, I. (1996). “Local Polynomial Modeling and Its Applications.” In
Monographs on Statistics and Applied Probability (66). Chapman and Hall. London, U.K.
Greenstone, M. and Gallagher, J. (2008). “Does Hazardous Waste Matter? Evidence
from the Housing Market and the Superfund Program.” Quarterly Journal of Economics.
123(3):951-1003.
Heckman, J., Ekeland, and L. Nesheim (2004). “Identification and Estimation of Hedonic
Models.” Journal of Political Economy. 112(1.2):S60-S109.
Kuminoff, N. (2007). “Recovering Preferences from a Dual-Market Locational Equilib-
rium.” Manuscript. Department of Agricultural and Applied Economics. Virginia Tech
University.
Mendelsohn, R. (1985). “Identifying Structural Equations with Single Market Data.” Re-
view of Economics and Statistics. 67(3):525-529.
Mendelsohn, R., Hellerstein, D., Huguenin, M., Unsworth, R., and Brazee, R. (1992).
“Measuring Hazardous Waste Damages with Panel Models.” Journal of Environmental
Economics and Management. 22:259-271.
Murray, M. (1975). “The Distribution of Tenant Benefits in Public Housing.” Economet-
rica. 43:771-788.
24
(1983). “Mythical Demands and Mythical Supplies for Proper Estimation of Rosen’s
Hedonic Price Model.” Journal of Urban Economics. 14:327-337.
Nakamura, A. and Nakamura, M. (1998). “Model Specification and Endogeneity.” Journal
of Econometrics. 83:213-237.
Palmquist, R. (1982). “Measuring Environmental Effects on Property Values Without
Hedonic Regressions.” Journal of Urban Economics. 11:333-347.
(1984). “Estimating the Demand for the Characteristics of Housing.” Review of
Economics and Statistics. 66:394-404.
(2005). “Property Value Models,” in K.G. Maler and J.R. Vincent (eds.), Hand-
book of Environmental Economics: Valuing Environmental Changes. Vol.2. Elsevier B.V.,
Amsterdam, The Netherlands. pp.763-820.
Parsons, G.R. (1992). “The Effect of Coastal Land Use Restrictions on Housing Prices: A
Repeat Sales Analysis.” Journal of Environmental Economics and Management. 22:25-37.
Peiser, R. and Smith, L. (1985). “Home Ownership Returns, Tenure Choice and Inflation.”
Journal of the American Real Estate and Urban Economics Association. 13(4): 343-360.
Rosen, S. (1974). “Hedonic Prices and Implicit Markets: Product Differentiation in Perfect
Competition.” Journal of Political Economy. 82(1)34-55.
Sieg, H., Smith, K., Banzhaf, S., and Walsh, R. (2004). “Estimating the General Equi-
librium Benefits of Large Changes in Spatially Delineated Public Goods.” International
Economic Review. 45(4)1047-1077.
Sieg, H., Smith, K., Banzhaf, S., and Walsh, R. (2004). “General equilibrium benefits for
environmental improvements: projected ozone reductions under EPA’s Prospective Analy-
sis for the Los Angeles air basin.” Journal of Environmental Economics and Management.
47(3)559-584.
25
Silverman, B. (1986). “Density Estimation for Statistics and Data Analysis.” In Mono-
graphs on Statistics and Applied Probability (26). Chapman and Hall. London, U.K.
Taylor, L.O. (2003). “The Hedonic Method,” in Champ, P.A., K.J. Boyle, and T.C. Brown
(eds.), A Primer on Nonmarket Valuation. Kluwer Academic Publishers. Dordrecht, The
Netherlands. pp.331-393.
Timmins, C. (2007). “If You Can’t Take the Heat, Get out of the Cerrado... Recovering
the Equilibrium Amenity Cost of Non-Marginal Climate Change in Brazil.” Journal of
Regional Science. 47(1)1-25.
Timmins, C. and Murdock, J. (2007). “A Revealed Preference Approach to the Mea-
surement of Congestion in Travel Cost Models” Journal of Environmental Economics and
Management. 53(2)230-249.
Ullah, A. and N. Roy (1998). “Nonparametric and Semiparametric Econometrics of Panel
Data,” in A. Ullah and D.E.A. Giles (eds.), Handbook of Applied Economic Statistics.
Marcel Dekker. New York, New York. pp.579-604.
26