using panel data to easily estimate hedonic demand functions · figure 1: recovering individual...

Using Panel Data To Easily Estimate

Hedonic Demand Functions∗

Kelly C. Bishop†

Arizona State University

Christopher Timmins‡

Duke University and NBER

October 2015

Abstract

The hedonics literature has often asserted that if one were able to observe the same

individual make multiple purchasing decisions, one could recover rich estimates of

preferences for a given amenity. In particular, in the face of a changing price schedule,

observing each individual twice is sufficient to recover the underlying (linear) demand

function separately for each individual, with no restrictions on heterogeneity in either

the intercept or the slope. Observing individuals on more than two occasions allows

for greater flexibility in this heterogeneity. Using a rich panel dataset, we estimate the

fully-heterogenous demand functions for clean air in the Bay Area of California. Our

results suggest that such such heterogeneity is important from a policy perspective.

Key Words: Hedonic Demand, Valuation, Ozone

JEL Classification Numbers: Q50, Q51, R21, R23

∗The initial version of this work was completed as part of Bishop’s doctoral dissertation at Duke Uni-versity. We thank seminar participants at the Triangle Resource and Environmental Economics Workshop,Camp Resources, University of Wisconsin, Iowa State University, AERE Winter Meetings, Syracuse Uni-versity, Montana State University, and XXXX for their helpful comments and suggestions.†Kelly Bishop, Arizona State University, Tempe, AZ 85287. [email protected]‡Chris Timmins, Duke University, Box 90097, Durham, NC 27708. [email protected]

1

1 Introduction

In many applications of the hedonic model, one may wish to value non-marginal changes in

amenity values, requiring the estimation of the underlying demand function. This, however,

is not without costs; in addition to a potentially high computational burden, assumptions

may need to be made which restrict preference heterogeneity. In this paper, we show how to

recover fully-heterogeneous structural preference parameters with a computationally-light,

data-driven estimation approach.

The traditional estimation approach associated with hedonic demand is based on

Rosen’s seminal 1974 paper. However, in addition to suffering from issues related to omitted

variables in the first stage and identification in the second stage, this approach suffers from

a difficult second-stage endogeneity problem: when the hedonic price function is non-linear

in the amenity of interest, home buyers simultaneously choose both the hedonic price and

the quantity of that amenity that they will consume. The problem of consumer choice

subject to a non-linear budget constraint creates a difficult endogeneity problem when

using statistical inference to recover the parameters describing those preferences [Epple

(1987), Bartik (1987)]. Typically, IV approaches in this literature have relied upon some

combination of – for example, certain socio-demographic variables enter directly into the

MWTP function while others do not and the excluded variables can be used as instruments

for endogenous attribute levels or the MWTP function only contains socio-demographic

variables in linear form, while higher-order terms are excluded from the function and can

therefore serve as instruments. These assumptions are not testable and place arbitrary

restrictions on the estimated heterogeneity in MWTP; aside from market dummies, these

instruments are generally hard to justify. These difficulties have led most researchers to

forgo estimating the MWTP function altogether, restricting their analysis to marginal

policy changes only.

In their 2005 paper, Bajari and Benkard demonstrate that this endogeneity problem

may be avoided by replacing the statistical inference used in the second stage of Rosen’s

method with a “preference inversion” procedure which inverts the first-order conditions

of utility maximization to recover demand at the individual level. The strengths of this

approach lie in (i) its admission of any form of preference heterogeneity and (ii) its avoid-

ance of the endogeneity problems described above. Its primary weakness, however, comes

in the strict functional-form assumptions that are required to perform the inversion proce-

2

dure, i.e., when observing each individual on only one purchase occasion (one data point),

it is only possible to recover the hedonic demand function by fully assuming its form.1

When the shape of the hedonic demand function is so strongly dictated by functional-form

assumptions, the value in going beyond the first stage of Rosen’s two-step procedure is

limited.

Figure 1: Recovering Individual Demand using Panel Data

Our method recovering the hedonic demand function is quite intuitive: observing

the exact same household make purchase decisions in multiple markets (under different

price schedules), allows one to trace-out individual-specific demand functions. Given a

graphical representation of the problem, our empirical approach becomes clear; observing

each household on at least two purchase occasions allows us to “connect the dots” and

recover a unique (linear) demand curve for each individual in the dataset. Figure (1)

describes this approach for a given housing amenity, Z; when facing the associated price

schedules on purchase occasions t and τ , household i chooses to consume Z∗i,t and Z∗i,τ ,

respectively. These two observations are sufficient for recovering a household-specific linear

demand function for household i.

1Bajari and Benkard recognizes this, pointing-out that these assumptions could be relaxed if the re-searcher is able to observe the same individual home buyer on multiple purchase occasions.

3

Importantly, the intuition behind our identification is straightforward. The sched-

ule of market prices (i.e., the hedonic price gradient) exhibits exogenous variation across

markets (e.g., across time) from the point of view of any given household, as we assume

that households are price-takers in the real estate market. Thus, we are observing each

individual household choosing how much of amenity Z to consume under different supply

conditions, allowing us to trace out the hedonic demand curve for each household in the

data. Naturally, the demand function that we recover with two observed purchases is lin-

ear. However, we impose no additional restrictions on the functional form of demand and

no restrictions on the heterogeneity of the individual-specific coefficients for intercept and

slope. We show how these demand coefficients may be decomposed to isolate the effect

of fixed demographic characteristics, such as race. We also show that with three or more

observations for each household, we can estimate the effect of time-varying demographic

characteristics, such as income.2

We apply this methodology to recovering individual-specific demand for air quality

in the Bay Area of California. In particular, we estimate the marginal willingness to pay

to avoid ground-level ozone pollution. For this analysis, we create a rich panel dataset

describing real estate transactions and the associated buyers over the thirteen-year period

of 1991 to 2003. To create this dataset, we first isolate and match individual households

over time, generating a unique “id” for each household in the data. We are then able to

merge household demographic characteristics provided on mortgage applications per the

Home Mortgage Disclosure Act of 1975. Finally, we generate a house-level annual measure

of ozone pollution from the monitor data provided by the California Air Resources Board.

In the implementation of this approach, we allow for the most flexible representation

of preferences possible with the available data. We begin by estimating a non-parametric

regression to recover a flexible set of time-varying hedonic price gradients which includes

a full set of fixed-effects at the Census-tract level. In the second stage, we estimate the

most flexible set of inverse demand functions that may be identified without functional

form assumptions – an individual-specific linear demand curve.

We find considerable heterogeneity in the marginal willingness to pay to avoid ozone

pollution across individuals. The median value to avoid one additional day exceeding the

California state maximum 1-hour ozone concentration is is $595. Given our framework,

2With three or more observations per household, we could alternatively allow for greater flexibility inthe shape of the demand curve, such as allowing the function to be quadratic in Z.

4

the heterogeneity becomes evident by reporting the elasticity of this marginal willingness

to pay with respect to ozone; the median value of the elasticity is 0.36.

This paper proceeds as follows. Section 2 describes our methodological approach for

recovering the hedonic demand for air quality in the Bay Area of California. The creation

of our unique two-sided panel dataset and its summary statistics are discussed in Section

3. Section 4 presents our results. Finally, Section 5 concludes.

2 Model

In this section, we describe our data-driven approach to recovering the structural param-

eters of the hedonic model. First, we discuss the non-parametric repeat sales model that

we use to recover the hedonic gradient with respect to a particular amenity. While this ap-

proach allows for the fewest assumptions while still controlling for house-level fixed-effects,

a much simpler specification would be sufficient for the identification of the second stage.

Next, we show how using a panel of buyers (with implicit prices from the first stage) we

can recover fully heterogeneous inverse demand functions.

2.1 A Non-Parametric Fixed-Effects Approach to Recovering the

Hedonic Gradient

The first stage of the model estimates the hedonic price function, which relates the price

of house j transacted in period t (Pj,t) to its attributes: both those that vary over time,

Zj,t, and those that do not, Xj. The gradient can be estimated in any number of ways,

although a simple, linear framework may impose unrealistic restrictions on the equilibrium

underlying the hedonic price function [Heckman, Ekeland, and Nesheim (2004)]. Addi-

tionally, concern must be paid to the potential bias caused by omitted variables.3 Using

panel data with repeat-sales and controlling for house fixed-effects avoids the bias caused

by time-invariant house characteristics, whether observed or unobserved. Thus, following

3Bias from omitted variables arises when the regressors are not orthogonal to the regression error. Thiswould be the case if data describing important neighborhood or housing attributes (e.g., distance to citycenter or curb appeal) are not available to the researcher, but those variables are correlated with theattribute of interest.

5

Lee and Mukherjee (2014), we adopt a non-parametric first-difference approach for the es-

timation of the gradient and begin by writing down a flexible representation of the hedonic

price function:

Pj,t = f(Zj,t) +Xj + νj,t (1)

where f(·) is an unspecified, flexible function of time-varying attributes of house j (or its

neighborhood) and Xj represents all time invariant attributes (whether they are observed

by the researcher or not). νj,t is assumed to be distributed i.i.d. with mean zero and

constant variance σ2ν . While we allow νj,t to be correlated with Xj, we assume that it

is uncorrelated with the elements of Zj,t. In our application, Zj,t will consist of (i) a

measure of ground-level ozone pollution at house j in year t and (ii) the year of the housing

transaction.4

We take a first-order Taylor series expansion of f(·) around some vector χ.5 This

vector has the same dimension as the vector Zj,t and is comprised of the set of points that

the linear regression is evaluated at:

Pj,t = f(χ) + (Zj,t − χ)′f ′(χ) +Xj + νj,t (2)

We denote each property’s prior year of sale with t−1 and write Equation (2) for

this prior sale:6

Pj,t−1 = f(χ) + (Zj,t−1 − χ)′f ′(χ) +Xj + νj,t−1 (3)

This allows us to subtract Equation (3) from Equation (2), differencing out both

the time-invariant attributes Xj and the time-invariant term f(χ).7

Pj,t − Pj,t−1 = (Zj,t − Zj,t−1)′f ′(χ) + (νj,t − νj,t−1) (4)

4In practice, we use the number of days in the course of each year that the state maximum 1-hour ozoneconcentration of 90 ppb was violated. Other potential measures of ground-level ozone include the 1-hourand 8-hour average maximum concentrations over the course of each year. Ozone exceedences are a bettermeasure of extreme pollution events that house buyers may be more aware of.

5The remainder term associated with the Taylor expansion is ignored.6This is expanded around the same vector χ.7Losing f(·) from this expression does not pose a problem, as our interest is only in recovering the

hedonic gradient, i.e., the slope of the hedonic price function, which is represented non-parametrically byf ′(·).

6

Denoting first differences with “∼” and replacing f ′(·) with β(·), we arrive at:

P̃j,t = Z̃ ′j,tβ(χ) + ν̃j,t (5)

Lee and Mukherjee (2014) show that β(χ) (i.e., the slope of the hedonic price func-

tion at Zj,t = χ) may be recovered with the following least-squares minimization procedure:

β(χ) = argminβ(χ)

J∑j=1

Tj∑t=1

(P̃j,t − Z̃ ′j,tβ(χ)

)2Kh(Zj,t − χ)Kh(Zj,t−1 − χ) (6)

where Tj denotes the number of first-differenced observations for each house, j. Important

to the local-linear regression procedure is the choice of weights placed on data as one moves

further from the point of evaluation, χ. For our estimation, Kh(·) is the Gaussian kernel:

Kh(Zj,t − χ) =2∏

k=1

1

hσ̂Zk

1√2πexp

{− 1

2

(Zk,j,t − χkhσ̂Zk

)2}(7)

where h represents the kernel bandwidth and σ̂Zkis the standard deviation of the kth

element of Zj,t.

In practice, this allows us recover an estimate of β(χ) for all observed values of Zj,t.

We therefore end up with a (potentially) different estimate of β(χ) at each data point; this

is in contrast to a fully parametric estimation procedure, where β would be constrained to

be the same for every value of Zj,t.

Given the linear nature of this problem, estimation of β(χ) can be summarized as

the following weighted least-squares regression:

β(χ) = (Z̃ ′WχZ̃)−1Z̃ ′WχP̃ (8)

where n =∑J

j=1 Tj is the total number of first-differenced observations, Z̃ is an (n x

2) matrix of first-differenced (within house) regressors, P̃ is an (n x 1) vector of first-

differenced (within house) house prices, and Wχ is an (n x n) matrix of weights, such that

Wχ = diag(Kh(Zj,t − χ)

).

7

2.2 Recovering MWTP Functions using a Panel of Buyers

In this section, we demonstrate that it is straightforward to recover the flexible distribution

of linear MWTP functions with panel data on home purchasers. We begin by specifying

the utility of household i with non-housing consumption (Ci,t) and income (Ii,t), choosing

a house j with attributes (Xj, Zj,t), in period t:

U(Xj, Zj,t, Ci) = α0,i + α1,iXj + α2,iX2j + α3,iZj,t +

α4,i

2Z2j,t + α5,iCi,t (9)

We use a monotonic transformation of utility to normalize α5,i to one and incorporate

household i’s budget constraint (Ci,t +Rj,t ≤ Ii,t), to arrive at the indirect utility function:

Vi,j,t = α0,i + α1,iXj + α2,iX2j + α3,iZj,t +

α4,i

2Z2j,t + (Ii,t −Rj,t) (10)

where Rj is the imputed annual rent or housing expenditure associated with house j. In

practice, we calculate this figure as 5% of the observed transaction price.8 As Rj is a

constant fraction of Pj, the following relationship holds:∂Rj

∂Zj= (0.05) · ∂Pj

∂Zj.

We now demonstrate that this functional form will yield the same linear MWTP

specification that is common in the hedonics literature. Moreover, as long as MWTP is not

a function of time-varying individual attributes, the parameters of the demand function

can be identified with just two observations for each home buyer. Consider the first-order

conditions associated with choosing the optimal value of Zj,t:

∂Vi,j,t∂Zj,t

= α3,i + α4,iZj,t −∂Rj,t

∂Zj,t

= 0 (11)

For household i, we need to recover the values of two unknown parameters: (α3,i, α4,i).9

Fortunately, we observe household i in panel data on (at least) two occasions. For house-

8This is a commonly used discount in the literature. A similar discount is estimated in Peiser andSmith (1985).

9Murray (1975) takes the approach of using an equation very similar to (15), except without individual

heterogeneity in preference parameters, to form an estimating equation. Recovering estimates of∂Rj,t

∂Zj,t

from the first-stage, he then recovers estimates of α̂3 and α̂4 from a two-stage least squares procedureusing income and prices as instruments.

8

holds observed twice, we then have two equations in two unknowns:

α3,i + α4,iZj∗(i),1 = ρj∗(i),1 α3,i + α4,iZj∗(i),2 = ρj∗(i),2 (12)

where ρj∗(i),t =∂Rj,t

∂Zj,t

∣∣∣∣Zj,t=zj∗(i),t

are recovered in the first-stage nonparametric regressions;

t = 1, 2 are the two periods in which individual i purchases.

Solving these two equations yields:

α̂3,i =ρj∗(i),2zj∗(i),1 − ρj∗(i),1zj∗(i),2

zj∗(i),1 − zj∗(i),2α̂4,i =

ρj∗(i),1 − ρj∗(i),2zj∗(i),1 − zj∗(i),2

(13)

2.2.1 Allowing MWTP to Vary with Individual Household Attributes

It is possible to determine how these inverse demand functions differ systematically with

fixed individual attributes, Ai, such as race, education level, or gender. This is easily done

by performing the following least-squares regressions in a separate stage:

α̂3,i = δ3,0 + A′iδ3,1 + ς3,i α̂4,i = δ4,0 + A′iδ4,1 + ς4,i (14)

However, income is often included as a determinant of MWTP, but is usually time-

varying (in panel data) and hence cannot be summarized by the vector of fixed attributes,

Ai. Murray (1983) shows why it is important for the MWTP function to vary with non-

housing consumption expenditure. We demonstrate next how time varying non-housing

consumption expenditure can be included with access to three observations on each in-

dividual household. We begin by specifying household i’s indirect utility from choosing

house j in period t as:

Vi,j,t = α0,i + α1,iXj + α2,iX2j + α3,iZj,t +

α4,i

2Z2j,t + α5,iZj,t(Ii,t −Rj,t) + (Ii,t −Rj,t) (15)

where we now allow the marginal utility of Zj,t to vary with non-housing consumption

expenditure.10 The first-order condition associated with the individual’s optimal choice of

10Note that our indirect utility specifications could be expanded to include interactions with multipletime-varying attributes. Were we to do so, we would simply need to solve for multiple first-order conditions(using multiple derivatives of the price function) for each observed purchase. In this paper, we keep thingsrelatively simple and avoid this complication.

9

Zj,t is then given by:

∂Vi,j,t∂Zj,t

= α3,i + α4,iZj,t + α5,i(Ii,t −Rj,t)−α5,iZj,t∂Rj,t

∂Zj,t− ∂Rj,t

∂Zj,t︸︷︷︸−ρj,t(1+α5,iZj,t)

= 0 (16)

For households which may be observed on three separate purchase occasions, we

have three equations in as many unknowns (α3,i, α4,i, α5,i):

α3,i + α4,iZj∗(i),1 + α5,i(Ii,1 −Rj∗(i),1)−ρj∗(i),1(1 + α5,iZj∗(i),1) = 0 (17)

α3,i + α4,iZj∗(i),2 + α5,i(Ii,2 −Rj∗(i),2)−ρj∗(i),2(1 + α5,iZj∗(i),2) = 0

α3,i + α4,iZj∗(i),3 + α5,i(Ii,3 −Rj∗(i),3)−ρj∗(i),3(1 + α5,izj∗(i),3) = 0

Solving this system provides us with the following equations for household i’s pref-

erence parameters:

α̂4,i =θj∗(i),3(ρj∗(i),1 − ρj∗(i),2)− θj∗(i),2(ρj∗(i),1 − ρj∗(i),3)θj∗(i),3(Zj∗(i),1 − Zj∗(i),2)− θj∗(i),2(Zj∗(i),1 − Zj∗(i),3)

α̂5,i =α̂4,i(Zj∗(i),1 − Zj∗(i),3)− ρj∗(i),1 + ρj∗(i),3

ρj∗(i),1Zj∗(i),1 − ρj∗(i),3Zj∗(i),3 + (Ii,3 −Rj∗(i),3)− (Ii,1 −Rj∗(i),1)

α̂3,i = ρj∗(i),1(1 + α5,iZj∗(i),1)− α̂4,iZj∗(i),1 − α̂5,i(Ii,1 −Rj∗(i),1)

where:

θj∗(i),2 = ρj∗(i),1zj∗(i),1 − ρj∗(i),2zj∗(i),2 + (Ii,2 −Rj∗(i),2)− (Ii,1 −Rj∗(i),1) (18)

θj∗(i),3 = ρj∗(i),1zj∗(i),1 − ρj∗(i),3zj∗(i),3 + (Ii,3 −Rj∗(i),3)− (Ii,1 −Rj∗(i),1)

Solving for all three parameters, household i’s MWTP function for z is given by:

ρj∗(i),t =α̂3,i + α̂4,iZj∗(i),t + α̂5,i(Ii,t −Rj∗(i),t)

(1 + α̂5,iZj∗(i),t)(19)

which, as α̂5,i is presumably a very small number, closely approximates a linear

function of Zj,t.

10

3 Data

To implement our first-stage nonparametric regressions, we use data describing single-

family housing transactions over the period 1991 to 2003 in the Bay Area of California.

For the second-stage estimation, we require data on a panel of home buyers. For this, we

assemble a dataset by combining information from the real estate transactions dataset and

a dataset describing mortgage applicants’ demographic characteristics obtained through

the Home Mortgage Disclosure Act (HMDA).

3.1 Property Transactions Data

The real estate transactions data we employ cover the six core counties of the San Francisco

Bay Area (Alameda, Contra Costa, Marin, San Francisco, San Mateo, Santa Clara). The

data are purchased from Dataquick and include transaction dates, prices, loan amounts,

and buyers’, sellers’ and lenders’ names for all transactions. In addition, the data for the

final observed transaction include housing characteristics, such as exact street address,

square footage, year built, lot size, number of rooms, number of bathrooms, and number

bedrooms.

As Dataquick publishes housing characteristics for the final assessment of each prop-

erty, we take steps to ensure that the house has not undergone any major changes. First,

to control for land sales or re-builds, we drop all transactions where “year built” is miss-

ing or with a transaction date that is prior to “year built.” Second, to control for major

property improvements that would not present as a re-build, we drop properties that ex-

perience a yearly appreciation/depreciation rate that is more than four times greater than

the average appreciation/depreciation rate (in absolute value).11 Additionally, we drop

transactions where the price is missing, negative, or zero. After using the consumer price

index to convert all transaction prices into 2000 dollars, we drop one percent of observations

from each tail to minimize the effect of outliers. As we merge in the pollution data using

the property’s geographic coordinates, we drop properties where latitude and longitude are

missing.12

11In a repeat-sales analysis, similar in spirit to Case and Shiller (1989), we regress log prices on a set ofhouse and year dummies, which provides us a crude measure of yearly appreciation (or depreciation) ratesin the Bay Area.

12Although we use repeat-sales estimation approach, we make some cuts on property characteristics to

11

Finally, we restrict our analysis to properties with multiple sales over the thirteen

year period.13 This yields a final sample of 277,011 transactions (i.e., property-year obser-

vations) comprised of 126, 227 unique properties. Table (1) describes the data.

Table 1: Housing Transactions Data

Full Sample Repeat Sales Samplen = 630,384 n = 277,011

variable mean st.dev. mean st.dev.Price (in year 2000 $) 379,368.70 202,887.70 371,189.30 192,626.00Sq. Ft. House 1,716.42 679.78 1,651.01 634.30Sq. Ft. Lot 7,148.83 7,962.83 6,523.88 7,068.32Num. Bedrooms 3.20 0.90 3.12 0.88Num. Bath 2.12 0.73 2.09 0.69

3.2 Buyer Characteristics Data

In order to implement our second-stage estimator, we need to create a panel of buyers, their

characteristics, and their chosen properties over time. This involves first identifying and

matching individuals over time in the property transactions dataset and, second, merging-

in the individual attribute data from the HMDA dataset. This is possible as we have

buyer and seller names in the transaction record. We use the algorithm found in Bayer,

McMillan, Murphy, and Timmins (2015).14

The individual attributes data come from a dataset on mortgage applications pub-

lished through the Home Mortgage Disclosure Act. These data provide information on all

mortgage applications filed in the Bay Area over the relevant time period. Included are all

ensure all single-family homes are trading in the same market: dropping properties where year built ispre-1850, lotsize is either zero or greater than three acres, square footage is either less than 400 or greaterthan 10,000, number of bedrooms or bathrooms is greater than ten, number of total rooms is greater thanfifteen, or number of stories is greater than three.

13We drop properties that sell more than once within a calendar year or more than five times over thethirteen year period.

14To link individuals over time within the transactions data, the algorithm matches on the individual’sname and the date of the transaction. We require that the last name matches exactly and the first namematches on the first letter. We also require that for two properties to be linked to the same buyer, theymust have transacted within 12 months of one another. For example, I see A. Anderson sell a home inSeptember 1997 and assume that it is the same A. Anderson who purchases a home within the periodSeptember 1996 to September 1998.

12

applicants’ race and gender, income, loan amount, lender name, and Census tract of the

property.

We are able to merge the individual attribute data in the HMDA dataset to the

buyers in the property transactions dataset using the common variables of lender name,

loan amount, transaction date, and Census tract of the property. We successfully match

approximately two-thirds of individuals in the transactions sample to the uncleaned HMDA

sample.15 Based on the algorithm for tracking buyers through time, we keep only obser-

vations where the buyer is observed on either two or three purchase occasions. Table (2)

provides statistics describing the one-purchase, two-purchase, and three-purchase buyers.

Table 2: One-, Two-, and Three- Purchase Samples

One-Purchase Two-Purchase Three-Purchaseindividuals=111,415 individuals=22,190 individuals=1,824

variable mean median mean median mean medianAsian 0.21 0 0.22 0 0.22 0Black 0.03 0 0.02 0 0.02 0Hispanic 0.10 0 0.11 0 0.16 0White 0.66 1 0.65 1 0.61 1Income (in yr 2000 $) 117,098 97,599 121,202 101,013 121,471 97,972Price (in yr 2000 $) 386,912 339,336 413,934 359,975 388,411 333,936

3.3 Ozone Data

The ozone data we employ are taken from the California Air Resources Board.16 We use

yearly ozone data from all thirty-five monitors in the nine counties of Alameda, Contra

Costa, Marin, Napa, San Francisco, San Mateo, Santa Clara, Solano, and Sonoma over the

period 1990 to 2004. In particular, we use the monitor data to construct property-specific

measures of the number of days exceeding the one-hour state standard (i.e., 90 parts per

billion).

In addition to the ozone readings, the dataset provides information on the “year

coverage,” or the percent of time (during the relevant high-ozone season) each particu-

15We drop observations where either race or income is missing.16Publicly available at www.arb.ca.gov/adam/.

13

lar monitor was available and the geographic coordinates of each monitor.17 Using this

coverage variable, we drop monitors with less than 60 percent coverage in a given year

(amounting to less than 4 percent of the available monitor-year observations).

Using the latitudinal and longitudinal coordinates of the monitors and the proper-

ties, we use the “Great Circle” estimator to compute the distance to all monitors from each

property. We then create a three-year weighted average for each property of all monitors’

readings, weighting distance by one over distance-squared. In order to mitigate bound-

ary effects, we include monitor data from the surrounding counties of Napa, Solano, and

Sonoma, in addition to the six counties that appear in our transactions data.

The maps in Figure (2) describe the spatial distribution of ground-level ozone pollu-

tion in the Bay Area. Two important features emerge from these pictures. First, geography

is largely responsible for cross-sectional variation in pollution. San Francisco (on the tip

of the peninsula extending from the South Bay into the Pacific Ocean), Oakland (in the

East Bay), and San Jose (at the southern end of the San Francisco Bay) all face heavy

traffic congestion. Wind patterns, however, mitigate much of the ozone pollution in San

Francisco and Oakland, while worsening it in San Jose. Mountains ringing the southern

end of the Bay Area block air flows and contribute to this effect. The mountains on the

eastern side of the Bay are similarly responsible for high levels of pollution along the I-680

corridor in eastern Contra Costa and Alameda counties.

These maps also make clear that there is significant variation in pollution levels over

time. Figure (3) describes this time variation. Much of this is due to a variety of programs

that were initiated after California passed its Clean Air Act in 1988. After multiple years

of relatively low ozone pollution, while the Bay Area counties were designated as being

out of attainment according to EPA rules, the Bay Area experienced the worst air quality

since the mid-eighties in 1995, corresponding to the time at which it was place back into

attainment and mandatory ozone reduction programs were eliminated. In 1996, the Vehicle

Buyback Program for cars manufactured in 1975 or before was implemented. This program,

in addition to the Lawn Mower Buyback and the Clean Air Plan of 1997, presumably

contributed to the summer of 1997 being the cleanest season since the early 1960s. The

1998 season experienced considerably more ozone pollution; it corresponded to the Bay

Area being placed back into non-attainment status. With the strongest mandatory ozone

reduction policies (mostly targeted at mobile sources), the remaining years of our sample

17Some monitors opened or closed permanently during this time period.

14

Figure 2: Ozone Pollution in the Bay Area

returned to relatively low ozone levels. Also during the late 1990s, almost 100 emitting

facilities were reviewed under the Title V Program Major Facility Review. There is no

reason to expect that any of these programs would have had special economic consequences

for housing prices any particular part of the Bay Area, aside from those coming through

changing amenity values.

15

Figure 3: Average Number of Days Exceeding CA Ozone Limit

16

4 Results

In this section, we describe the results of each estimation procedure. First, we illustrate

the results of our non-parametric estimation that is used to recover hedonic gradients with

respect to ground-level ozone pollution. Second, we consider the average MWTP and

heterogeneous MWTPs recovered using the Bajari-Benkard inversion with cross-sectional

data. Third, we use the panel of 22, 190 home buyers who are observed purchasing two

houses in our dataset to recover estimates of fully-heterogeneous inverse demand functions.

Fourth, we use the sample of 1, 824 individuals who buy three times to recover estimates

of inverse demand functions that are allowed to vary with non-housing expenditure.

4.1 Results from the First-stage Hedonic Regressions

The local linear estimation allows the estimate of the slope coefficient, β(χ), to differ for

each observed value of ozone pollution in each year of the panel. Thus, for each year, the

gradient can be graphically represented as a flexible function of ozone pollution.

In the current estimation, we oversmooth these gradients to be monotonic.18 When

these gradients are fully flexible, an additional issue arises: individuals who choose a level

of z that lies on a downward sloping part of the gradient do not satisfy the second order

condition for utility maximization unless the WTP function is sufficiently steep. In the case

of the Bajari-Benkard inversion (where the WTP function is flat), this sufficient steepness

cannot be achieved.

For ease of exposition, we show the hedonic gradient for three years of the panel

(1992, 1997, and 2002) in Figure (4). These gradients are everywhere negative in sign

(i.e., an increase in the level of pollution at a house leads to a reduction in its sale price,

ceteris paribus). Moreover, these negative gradients are increasing in the level of pollution.

This suggests that individuals may sort based on preferences (i.e., those with a lower

willingness to pay tend to live in places with greater levels of pollution). This result alone

has important consequences for policy-makers. It suggests that using the average MWTP

to value pollution reductions in the most polluted places may overstate the value of those

reductions.

18This is achieved by setting the bandwidth over ozone to 5.92 and the bandwidth over years to 2.01.

17

Figure 4: Hedonic Price Gradients (1992, 1997, 2002)

4.2 Results from the Second Stage

4.2.1 Preference Inversion with Cross-Sectional Data

Using the full sample of one, two, and three-time buyers buyers (treated as a cross-section),

we perform the BB preference inversion. The trimmed distribution of MWTP using the

time-varying gradients is shown in Figure (5). The median MWTP is −594.60. By assump-

tion, the elasticity of each individual’s MWTP function with respect to ozone exceedences

is zero.

18

Figure 5: Trimmmed Distribution of MWTP

19

4.2.2 Demand Estimation with Two-Purchase Panel

Employing our approach with a panel of buyers who are observed on two separate purchase

occasions allows for the recovery of both an intercept and a slope coefficient for the MWTP

function for each individual. Thus, each individual’s MWTP function is allowed to vary

with the level of ozone pollution, which can have important implications for valuing large

changes in the disamenity. We find evidence of considerable heterogeneity,with a median

slope value of -$-81.97.

It is difficult to summarize the joint distributions of intercepts and slopes in a single

figure. We instead report the distribution of MWTP elasticities (with respect to ozone

exceedances). See Figure (6).

Figure 6: Trimmed Distribution of Price Elasticities

The distribution of these elasticities also shows significant heterogeneity. With a

median value of 0.36, the distribution has an inter-quartile range of 1.07. The elasticity of

MWTP with respect to ozone pollution is not, however, strongly correlated with the level

of ozone pollution people are experiencing (i.e., a correlation of 0.01).

20

4.2.3 Demand Estimation with Three-Purchase Panel

As with the two-purchase panel, we report the distribution of MWTP elasticities in Figure

(7). In this case, with access to data describing three separate purchase occasions, we

additionally report the distribution of elasticities with respect to income.

Figure 7: Trimmed Distributions of Elasticities for Three-Purchase Panel

21

5 Conclusion

With the growing availability of panel datasets that match household attributes to the

prices and characteristics of the properties they purchase, our method is increasingly im-

plementable and is applicable to wide variety of non-marginal policy analysis, such as those

associated with the welfare implications of changing school quality, crime rates, or local

pollutants. Importantly, this method allows for the evaluation of heterogeneous preferences

(and taste-based sorting) and for the analysis of non-marginal policy changes without en-

countering the difficult endogeneity issues associated with the recovery of hedonic demand.

22

6 References

Anselin, L. and J.L. Gallo (2006). “Interpolation of Air Quality Measures in Hedonic House

Price Models: Spatial Aspects.” Spatial Economic Analysis. 1(1):31-52.

Bartik, T.J. (1987). “The Estimation of Demand Parameters in Hedonic Price Models.”

Journal of Political Economy. 95(1):81-88.

Bajari, P. and L. Benkard (2005). “Demand Estimation With Hetrogeneous Consumers

and Unobserved Product Characteristics: A Hedonic Approach.” Journal of Political

Economy. 113(6):1239-1276.

Bajari, P. and M. Kahn (2005). “Estimating Housing Demand With an Application to

Explaining Racial Segregation in Cities.” Journal of Business and Economic Statistics.

23(1):20-33.

Bayer, P., McMillan, R., Murphy, A., and Timmins, C. (2008). “A Dynamic Model of

Demand for Houses and Neighborhoods.” Mimeo.

Bishop, K. (2008). “A Dynamic Model of Location Choice and Hedonic Valuation.” Mimeo.

Bound, J., Jaeger, D.A. and Baker, R.M. (1995). “Problems With Instrumental Variables

Estimation when the Correlation Between the Instruments and the Endogenous Explana-

tory Variable is Weak.” Journal of the American Statistical Association. 90:443-450.

Boyle, K.J., Poor, P.J. and Taylor, L.O. (1999). “Estimating the Demand for Protecting

Freshwater Lakes From Eutrophication.” American Journal of Agricultural Economics.

81:1118-1122.

Brown and Rosen (1982). “On the Estimation of Structural Hedonic Price Models.” Econo-

metrica. 50(3):765-768.

23

Case, K.E., and Schiller, R.J. (1989). “The Efficiency of the Market for Single-Family

Homes.” American Economic Review. 79(1):125-137.

Chattopadhyay, S. (1999). “Estimating Demand for Air Quality: New Evidence Based on

the Chicago Housing Market.” Land Economics. 75:22-38.

Epple, D. (1987). “Hedonic Prices and Implicit Markets: Estimating Demand and Supply

Functions for Differentiated Products.” Journal of Political Economy. 95(1):59-80.

Fan, J. and Gijbels, I. (1996). “Local Polynomial Modeling and Its Applications.” In

Monographs on Statistics and Applied Probability (66). Chapman and Hall. London, U.K.

Greenstone, M. and Gallagher, J. (2008). “Does Hazardous Waste Matter? Evidence

from the Housing Market and the Superfund Program.” Quarterly Journal of Economics.

123(3):951-1003.

Heckman, J., Ekeland, and L. Nesheim (2004). “Identification and Estimation of Hedonic

Models.” Journal of Political Economy. 112(1.2):S60-S109.

Kuminoff, N. (2007). “Recovering Preferences from a Dual-Market Locational Equilib-

rium.” Manuscript. Department of Agricultural and Applied Economics. Virginia Tech

University.

Mendelsohn, R. (1985). “Identifying Structural Equations with Single Market Data.” Re-

view of Economics and Statistics. 67(3):525-529.

Mendelsohn, R., Hellerstein, D., Huguenin, M., Unsworth, R., and Brazee, R. (1992).

“Measuring Hazardous Waste Damages with Panel Models.” Journal of Environmental

Economics and Management. 22:259-271.

Murray, M. (1975). “The Distribution of Tenant Benefits in Public Housing.” Economet-

rica. 43:771-788.

24

(1983). “Mythical Demands and Mythical Supplies for Proper Estimation of Rosen’s

Hedonic Price Model.” Journal of Urban Economics. 14:327-337.

Nakamura, A. and Nakamura, M. (1998). “Model Specification and Endogeneity.” Journal

of Econometrics. 83:213-237.

Palmquist, R. (1982). “Measuring Environmental Effects on Property Values Without

Hedonic Regressions.” Journal of Urban Economics. 11:333-347.

(1984). “Estimating the Demand for the Characteristics of Housing.” Review of

Economics and Statistics. 66:394-404.

(2005). “Property Value Models,” in K.G. Maler and J.R. Vincent (eds.), Hand-

book of Environmental Economics: Valuing Environmental Changes. Vol.2. Elsevier B.V.,

Amsterdam, The Netherlands. pp.763-820.

Parsons, G.R. (1992). “The Effect of Coastal Land Use Restrictions on Housing Prices: A

Repeat Sales Analysis.” Journal of Environmental Economics and Management. 22:25-37.

Peiser, R. and Smith, L. (1985). “Home Ownership Returns, Tenure Choice and Inflation.”

Journal of the American Real Estate and Urban Economics Association. 13(4): 343-360.

Rosen, S. (1974). “Hedonic Prices and Implicit Markets: Product Differentiation in Perfect

Competition.” Journal of Political Economy. 82(1)34-55.

Sieg, H., Smith, K., Banzhaf, S., and Walsh, R. (2004). “Estimating the General Equi-

librium Benefits of Large Changes in Spatially Delineated Public Goods.” International

Economic Review. 45(4)1047-1077.

Sieg, H., Smith, K., Banzhaf, S., and Walsh, R. (2004). “General equilibrium benefits for

environmental improvements: projected ozone reductions under EPA’s Prospective Analy-

sis for the Los Angeles air basin.” Journal of Environmental Economics and Management.

47(3)559-584.

25

Silverman, B. (1986). “Density Estimation for Statistics and Data Analysis.” In Mono-

graphs on Statistics and Applied Probability (26). Chapman and Hall. London, U.K.

Taylor, L.O. (2003). “The Hedonic Method,” in Champ, P.A., K.J. Boyle, and T.C. Brown

(eds.), A Primer on Nonmarket Valuation. Kluwer Academic Publishers. Dordrecht, The

Netherlands. pp.331-393.

Timmins, C. (2007). “If You Can’t Take the Heat, Get out of the Cerrado... Recovering

the Equilibrium Amenity Cost of Non-Marginal Climate Change in Brazil.” Journal of

Regional Science. 47(1)1-25.

Timmins, C. and Murdock, J. (2007). “A Revealed Preference Approach to the Mea-

surement of Congestion in Travel Cost Models” Journal of Environmental Economics and

Management. 53(2)230-249.

Ullah, A. and N. Roy (1998). “Nonparametric and Semiparametric Econometrics of Panel

Data,” in A. Ullah and D.E.A. Giles (eds.), Handbook of Applied Economic Statistics.

Marcel Dekker. New York, New York. pp.579-604.

26

using panel data to easily estimate hedonic demand functions · figure 1: recovering individual...

Documents