05-pooling_cross_sections_across_time.pdf
TRANSCRIPT
-
Introduction to Pooled Cross Sections
EMET 8002Lecture 5
August 13, 2009
-
Administrative Matters
Consultation hours: Tuesdays 3 pm to 5pm
Im going to be away again next Tuesday, so Ill hold consultation hours next week on Wednesday, August 19th from 3 pm to 5 pm
Case Studies projects have now been assigned. If you did not receive the email that I sent on Tuesday you can view the assignments on the course website
If you have not already done so, please contact your supervisor immediately. A few of the projects will require you to apply fordata access which should be done immediately!
-
Outline
Introduce pooled cross sections regression analysis Chapter 13 in the text
Potential pitfall using difference-in-differences estimation strategy
Introduction to two-period panel data and the first difference estimator
-
Pooling Independent Cross Sections across Time
What is it? It is obtained by sampling randomly from a large
population at two or more points in time For example, randomly sampling households from ACT
residents in 2000 and 2008
-
A Spreadsheet
view
Pooled cross sections Panel
Unit Period Unit Period
1 1 1 1
2 1 2 1
3 1 3 1
4 2 1 2
5 2 2 2
6 2 3 2
-
Formal notation
We can denote the pooled cross section as a random sample:
{ }1 2 1 1 1 2 1 21 2
, , ,..., 1,2,..., , 1,..., ,..., ...
1,2,...,
it it it kit T
Period Period PeriodT
y x x x i N N N N N N N
t T
= + + + + +
=
-
Pooling Independent Cross Sections across Time
Consider the regression model:
Benefits: Pooling can lead to larger sample sizes: This leads to more precise
estimators and test statistics with more power. However, this is only true if the relationship between the dependent variable and at least some of the explanatory variables remains constant over time
If the xs are changing over time, it can also provide additional variation in x with which to estimate its effect on y
Note: the error term may have the structure
For now, well make the following assumptions:
And that the two components of the error term are independent
0 1 1 2 2 1 1 1 2 1 2
1 2
... 1,2,..., , 1,..., ,..., ...
1,2,...,
it it it k kit it T
Period Period PeriodT
y x x x u i N N N N N N N
t T
= + + + + + = + + + + +=
it t itu = +( ) ( )2 2| ~ 0, | ~ 0,t itX and X
-
Pooling Independent Cross Sections across Time
Suppose the true error structure does include a year component, but we ignore that and run OLS on the following equation:
Does it matter? Yes! This introduces serial correlation between observations within the same
time period:
0 1 1 2 2 ...it it it k kit ity x x x u = + + + + +
( ) ( )( )( ) ( )2
2
| |
| ,
0
it jt t it t jt
t t it jt it jt
E u u X E X
E X i j t
= + + = + + +
=
-
Pooling Independent Cross Sections across Time
This violates one of our assumptions for OLS! The OLS coefficient estimate is still unbiased and
consistent However, the variance-covariance matrix is
biased/inconsistent which leads to incorrect standard errors and incorrect inference
This is similar to the problem of serial correlation in time series models which we saw previously.
Thus, it makes sense to include time dummies (a.k.a. year effects):
0 1 1 2 22
...T
it it it k kit t itt
y x x x =
= + + + + + +
-
Interpretation of the year effects
How do we interpret the year dummies?
In words, each time dummy is the difference in the conditional expected value of y between the base year (t=1) and the year t=j
( )( )
( ) ( )1 1 2 2
1 1 2 2
| 1, ...
| , ...
| , | 1,
it it k kit
it it k kit j
j
E y t x x x
E y t j x x x
E y t j E y t
= = + + += = + + + +
= = =
x
x
x x
-
Pooling Independent Cross Sections across Time
Further benefits: A further benefit of these data is that we can explore changes in the coefficients
over time This amount to allowing some or all of the s to have
t-subscripts In a two-period pooled cross section dataset with one explanatory variable:
We can then use F-tests (including Chow Tests) to test for changes in the regression model over time
Note: While changes in the coefficients may be interesting, one has to be very cautious in interpreting the source of the changes (e.g., as the impact of a policy or changing economic structure)
0 1 2 2 2it it it ity x x D = + + + +
-
Example 13.1: Womens Fertility over Time
Dependent variable is the number of children born to a woman.
Explanatory variables include socio-demographic characteristics.
Pooled time series of cross-sections (GSS: 1972, 1974, 1976, 1978, 1980, 1982, 1984).
N = 1,129 Dataset: FERTIL1.RAW One question of interest is: After controlling for other
observable factors, what happened to the fertility rate over time?
-
Example 13.1: Womens Fertility over Time
In Stata:sort yearby year: summarize kids
Mean number of children
1972 1974 1976 1978 1980 1982 1984
3.026 3.208 2.803 2.804 2.817 2.403 2.237
Year t -
1972
0.182 -0.223 -0.222 -0.209 -0.623 -0.789
-
Example 13.1: Womens Fertility over Time
In Stata: reg kids educ age agesq black east northcen west farm othrural town smcity y74 y76 y78 y80 y82 y84
kids
Coef.
Std. Err.
t
P>t
[95% Conf.
Interval]
educ
-.1284268
.0183486
-7.00
0.000
-.1644286
-.092425age
.5321346
.1383863
3.85
0.000
.2606065
.8036626agesq
-.005804
.0015643
-3.71
0.000
-.0088733
-.0027347black
1.075658
.1735356
6.20
0.000
.7351631
1.416152east
.217324
.1327878
1.64
0.102
-.0432192
.4778672northcen
.363114
.1208969
3.00
0.003
.125902
.6003261west
.1976032
.1669134
1.18
0.237
-.1298978
.5251041farm
-.0525575
.14719
-0.36
0.721
-.3413592
.2362443othrural
-.1628537
.175442
-0.93
0.353
-.5070887
.1813814town
.0843532
.124531
0.68
0.498
-.1599893
.3286957smcity
.2118791
.160296
1.32
0.187
-.1026379
.5263961y74
.2681825
.172716
1.55
0.121
-.0707039
.6070689y76
-.0973795
.1790456
-0.54
0.587
-.448685
.2539261y78
-.0686665
.1816837
-0.38
0.706
-.4251483
.2878154y80
-.0713053
.1827707
-0.39
0.697
-.42992
.2873093y82
-.5224842
.1724361
-3.03
0.003
-.8608214
-.184147y84
-.5451661
.1745162
-3.12
0.002
-.8875846
-.2027477_cons
-7.742457
3.051767
-2.54
0.011
-13.73033
-1.754579
-
Example 13.1: Womens Fertility over Time
There may be heteroskedasticity in the previous model. This could be related to the observed characteristics,
or It could simply be that the error variance is changing
over time Nonetheless, the usual heteroskedasticity-robust
standard errors and t statistics are still Just use the robust option with the regress command
in Stata
-
Allowing the effect to change across periods
We can also interact year dummy variables with key explanatory variables to see if the effect of that variable changed over time
-
Example 13.2: Changes in the returns to education and the gender wage gap
Consider the following regression model pooled over the years 1978 and 1985:
The dataset is CPS78_85.RAW reg lwage y85 educ y85educ exper expersq union
female y85fem
( ) 0 0 1 1 22
3 4 5 5
log 85 85
85
wage y educ y educ exper
exper union female y female u
= + + + ++ + + + +
-
Example 13.2: Changes in the returns to education and the gender wage gap
lwage
Coef.
Std. Err.
t
P>t
[95% Conf.Interval]
y85
.1178062
.1237817
0.95
0.341
-.125075
.3606874educ
.0747209
.0066764
11.19
0.000
.0616206
.0878212y85educ
.0184605
.0093542
1.97
0.049
.000106
.036815exper
.0295843
.0035673
8.29
0.000
.0225846
.036584expersq
-.0003994
.0000775
-5.15
0.000
-.0005516
-.0002473union
.2021319
.0302945
6.67
0.000
.1426888
.2615749female
-.3167086
.0366215
-8.65
0.000
-.3885663
-.244851y85fem
.085052
.051309
1.66
0.098
-.0156251
.185729_cons
.4589329
.0934485
4.91
0.000
.2755707
.642295
-
Chow test for structural change across time
We can apply the Chow test to see if a multiple regression function differs across two time periods
We can do this is pooled cross sections by interacting all explanatory variables with time dummies and performing an F-test that the interactions are jointly insignificant Usually, we allow the intercept to change over time
and only test that the slope parameters have changed
-
Policy Analysis with Pooled Cross Sections
This type of data can be useful in identifying the impacts of policies (or government programs) on various outcomes
They are especially helpful if the policy experiment has before & after and treatment & control dimensions
Consider a simple example: We wish to estimate the impact of participating in a
government program (i.e., the treatment) on an outcome, y Let participation in the program be captured by the dummy
variables: 1 affected ("treatment")0 unaffected ("control")i
D =
-
A simple estimate of the treatment effect
One estimator of the treatment effect is given by the difference of means:
In a regression context, we could estimate this difference by:
Such a regression would work if
T Cy y
i i iy D u = + +
( ) ( )( ) ( )
( ) ( ) ( ) ( )
| 0 | 0
| 1 | 1
| 1 | 0 iff | 1 | 0
i i i i
i i i i
i i i i i i i i
E y D E u D
E y D E u D
E y D E y D E u D E u D
= = + == = + + =
= = = = = =
-
A simple estimate of the treatment effect
The difference in means between the treatment and control groups and the OLS estimate of are consistent estimates of the treatment effect ONLY if there are no other differences between the treatment and control groups
Sometimes we can add covariates (Xs) to help control for differences between the treatment and control group
Nonetheless, this is often an impractical assumption to make
However, we may be able to use time variation in the application of the program (before & after) combined with variation in treatment
-
Empirical example: The effect of building an incinerator on house prices
The hypothesis that we are interested in testing is that the announcement of the pending construction of an incinerator wouldcause the prices of houses located nearby to fall, relative to houses further away.
A house is considered to be close if it is within 3 miles of the incinerator.
We have data on house prices for houses that sold in 1978, before the announcement of the incinerator, and in 1981, after the announcement.
We begin by regressing the real house price on a dummy variable for whether the house is close to the incinerator using data from 1981 using the dataset KIELMC.RAW
-
Empirical example: The effect of building an incinerator on house prices
1981 1978 1978 & 1981
Near Incinerator
-30,688(5,828)[-5.27]
-18,824(4,745)[-3.97]
-18,824(4,875)[-3.86]
Year 1981 18,790(4,050)[4.64]
Near Incinerator * 1981
-11,864(7,457)[-1.59]
-
The Difference-in-Differences Model
Consider the following simple example where we allow:
The model is:
Suppose that the differences between treatment and controlgroups can be written:
Also assume that the time effects can be written (normalized)as:
E u
i| D
i= 1!" #$ % E ui | Di = 0!" #$
y
it=! + "D
it+#
t+ u
it
E uit
| Di1= 0, D
i2= 0!" #$ = % C
E uit
| Di1= 0, D
i2= 1!" #$ = % T
!t= 0,t = 1
!t= ! ,t = 2
-
The Difference-in-Differences Model
The expected outcomes in the before (period 1) are:
In the after period (period 2):
An estimate of can then be recovered by comparing:
E yi1
| Di1= 0, D
i2= 0!" #$ =% + & C
E yi1
| Di1= 0, D
i2= 1!" #$ =% + & T
E yi1
| Di1= 0, D
i2= 0!" #$ =% + & C +'
E yi1
| Di1= 0, D
i2= 1!" #$ =% + & T +' + (
E yi2
| Di1= 0, D
i2= 1!" #$ % E yi1 | Di1 = 0, Di2 = 1!" #$( )
% E yi2
| Di1= 0, D
i2= 0!" #$ % E yi1 | Di1 = 0, Di2 = 0!" #$( )
-
The Difference-in-Differences Model
The difference-in-differences estimator would then be based on:
Or, alternatively,
In a regression framework, we would estimate this as:
yT ,2! y
T ,1"# + $
yC ,2! y
C ,1"#
yT ,2! y
C ,2" #
T! #
C( ) + $
yT ,1! y
C ,1" #
T! #
C( )
y
it=! +"AFTER
it+ # D
it+ $D
it% AFTER
it+ u
it
-
The Difference-in-Differences Model
Of course, we could also add covariates. In this specification, we denote:
! " common time effect
# " permanent differences (across T,C)
$ " treatment effect (diff of diff)
-
Difference-in-difference
They key distinction between the difference-in-difference estimators and the difference in means is that we have relaxed the assumption about the distribution of the error terms across the treatment and control groups.
We no longer require the conditional expectation of the error term to be equal across groups, we only require the conditional expectation for each group to be constant over time
This may still be a strong assumption! You need to thinkabout the validity of making this assumption.
-
Another Example: The Effect of WorkersComp. on Injury Duration (Kentucky)
Notes: Dependent variable is log duration of Workers comp benefits. Controls include: age, sex,married, whether a hospital stay was required, indicators for the type of injury, and industry ofjob. After corresponds to the increase in the cap of weekly WC benefits.
5347534725672567Sample size
0.1880.0220.2140.031R-squared
YesNoYesNoControls
0.175(0.064)
0.229(0.070)
After*HighEarner
0.047(0.041)
0.014(0.045)
After
0.115(0.048)
0.233(0.049)
0.274(0.054)
0.462(0.051)
High Earner
Before andAfter
Before andAfter
AfterAfterData:
-
Examples of difference-in-difference
For a good example of a paper using this strategy, see Duflo, Esther. (2001). Schooling and labor market consequences of school construction in Indonesia: Evidence from an unusual policy experiment. American Economic Review. Vol. 91, No. 4, pp. 795-813.
For a good example of when the impact of the policy/program might have spillover effects on the control group, see Miguel, Edward and Michael Kremer. (2004). Worms: Identifying impacts on education and health in the presence of treatment externalities. American Economic Review. Vol. 72, No. 1, pp. 159-217.
-
How much should we trust diff-in-diff estimates?
The following discussion is based on an excellent paper by Bertrand, Duflo, and Mullainathan (2004) in the Quarterly Journal of Economics
Many papers that employed difference-in-differences estimators use many years of data and focus on serially correlated outcomes but ignore that the resulting standard errors are inconsistent
Diff-in-diff estimates are usually based on estimating an equation of the form:
where i denotes individuals, s denotes a state or group membership, and t denotes the time period
ist s t ist st istY A B cX I = + + + +
-
How much should we trust diff-in-diff estimates? An important point, that we will address later in the course, is possible
correlation of the error terms across individuals within a state/group in a given year.
We are going to ignore this potential problem for now and assume that the econometricians have appropriately dealt with correlation within state-year cells Hence, lets think of the data being averaged over individuals
within a state in each given year Three factors make serial correlation an important issue in the
difference-in-differences context: Estimation often relies on long time series The most commonly used dependent variables are typically highly
serially correlated The treatment variable, Ist, changes very little within a state over
time
-
How much should we trust diff-in-diff estimates?
How severe is the problem? They examine how diff-in-diff performs on placebo laws,
where treated states (in the U.S.) are chosen at random as is the year of passage of the placebo law
Since the laws are fictitious, a significant effect should only be found 5% of the time (i.e., the true null hypothesis of no effect is falsely rejected 5% of the time)
They use wages as the dependent variable over 21 years They find rejection rates of the null hypothesis as high as
45% of the time! In other words, there is statistical evidence that these fakes
laws affected wages in close to half of the simulations
-
How much should we trust diff-in-diff estimates?
Does this matter practically? They find 92 diff-in-diff papers published between
1990 and 2000 in the following journals: the American Economic Review, the Industrial Labor Relations Review, the Journal of Labor Economics, the Journal of Political Economy, the Journal of Public Economics, and the Quarterly Journal of Economics 69 of these papers have more than 2 time periods
Only 4 papers collapse the data into before-after Thus, 65 papers have potential serial correlation problem
Only 5 provide a serial correlation correction
-
How much should we trust diff-in-diff estimates?
Some results: When the treatment variable is not serially correlated,
rejection rates of H0: no effect are close to 5% The overrejection problem diminishes with the serial
correlation in the dependent variable
-
How much should we trust diff-in-diff estimates?
Solutions: Parametric methods
Specify an autocorrelation structure for the error term, estimate its parameters, and use these parameters to compute standard errors This does not do a very good job of remedying the problem
With short time series, the OLS estimation of the autocorrelation parameter is downward biased
The autocorrelation structure may be incorrectly specified
Block bootstrap Bootstrapping is an advanced technique
It does poorly when the number of states/groups becomes small
-
How much should we trust diff-in-diff estimates?
Solutions (continued): Ignore time series information: average the before and after
data and estimate the model on two periods This is difficult when treatment occurs at different times across
states since there is no longer a uniform before and after and it is not even defined for control states This can be corrected for though
This procedure works well, even when there are a small number of states/groups
Arbitrary variance-covariance matrix Does quite well in general, although the rejection rate
increases above 5% when the number of states/groups is small Can be implemented in Stata using the cluster option (at the
state/group level, not the state-time cell)
-
How much should we trust diff-in-diff estimates?
Main message: There is not one preferred correction mechanism
Collapsing the data into pre- and post- periods produced consistent standard errors, even when the number of states is small (although the power of this procedure declines fast)
Allowing for an arbitrary autocorrelation process is also viable when the number of groups is sufficiently large
Doing good econometrics is not easy! Be very, very careful that all your assumptions are
met!
-
Panel data
If we have repeated observations on the same individuals(units, i) then we have longitudinal, or panel data:
Benefits of panel data: Similar to repeated cross-sections; BUT most importantly, we can exploit repeated observations on the
same individual in order to control for certain types of unobservedheterogeneity, which otherwise might contaminate OLS estimation;
Panel data allows for richer controls for unobserved heterogeneitythan just systematic differences between treatment and control.
yit, x
it{ } i = 1,2,3,..., Nt = 1,2,...,T
-
Panel Data
Begin with two periods, for simplicity Of course, we can do all the same stuff with panel data as with
pooled cross-sections. However, we can do more.
and we will also have an additional statistical consideration, withthe loss of independence across observations
Consider the simple regression model:
Omitted variables bias for arises when
Under certain assumptions, we will be able to exploit panel datain order to fix this bias (unobserved heterogeneity)
y
it= !
0+ !
1x
it+ v
it
1!
( ), 0it it
corr x v !
-
Panel Data
When corr(xit,vit)0 this violates assumption MLR.4(and MLR.4)
Hence, OLS is no longer valid
Under some circumstances we can cope with this problem using panel data. This is another example of when one of the core OLS assumptions fails to hold.
-
Fixed Effects Error Structure
Imagine we can write the error term as:
Furthermore, assume that ALL of the omitted variables bias wasdue to
i.e., the correlation of x with fixed (and unobserved) individualcharacteristics.
t
composite error
time effect
fixed effect
idiosyncratic effect
it t i it
it
i
it
v a u
v
a
u
!
!
= + +
"
"
"
"
( ), 0it i
corr x a !
-
First Difference Estimator
Consider the First Differenced (FD) estimator, based on:
The key point is that the fixed effects fall out. By assumption, we also require
Thus, by differencing we have eliminated the heterogeneitybias.
2 0 1 2 2
1 0 1 1 1
1
i i i
i i i
i i i
y x a u
y x a u
y x u
! ! "
! !
! "
= + + + +
= + + + +
# = # + + #
( ), 0i i
corr x u! ! =
-
Example: Crime and unemployment
We have data on crime and unemployment rates for 46 cities in 1982 and 1987 Cities are the unit of observation i 1982 and 1987 are the period of observation t
Well try three specifications:
0 1
0 0 1
0 1
198787 1982,1987
it it it
it t it it
i i i
crmrte unem u tcrmrte d unem u tcrmrte unem u
= + + == + + + =
= + +
-
Simple Example: Unemployment andcrime (standard errors in parentheses)
0.1270.0120.033R-squared
469246N
7.94(7.98)
Y87
2.22(0.88)
0.427(1.19)
-4.16(3.42)
UnemploymentRate
15.40(4.70)
93.42(12.74)
128.38(20.76)
Constant
1982,1987(FD)
1982,1987(Levels)
1987(Levels)
Data:
-
Interpretation
Controlling for unemployment, crime has risen between 1982 and 1987 in these cities
Using just cross sectional data (i.e., only the 1987 data) wouldsuggest that higher unemployment is associated with lower crime rates this is certainly not what we expect!
Using the first-differencing specification suggests that the partial correlation between crime and unemployment is positive when we control for city fixed effects (i.e., the negative partial correlation we observed in the 1987 cross section was biased)
-
Caveats to the first difference estimator
It may be incorrect to assume that corr(xi, vi)=0
We need variation in the xs This means we cannot include variables that do not change
over time across observations (e.g., race, country of birth, etc.)
It also means we cannot include variables for which the change would be the same for all observations (e.g., age)
Also, we cannot expect to get precise estimates on variables, such as education, which will tend to change for a relatively few observations in a dataset
-
The FD Estimator more generally
The FD estimator provides a powerful strategy for dealing withomitted variables bias when panel data are available.
More generally, we can apply the model to multiple time periods(not just two):
In which case the FD estimator is based on:
yit= !x
it+ "
tD
t+ a
i+ u
itt=2
T
#
!yit= "!x
it+ #
tD
t+ !u
itt=2
T
$
-
The FD Estimator and Program Evaluation
Of course, we can also use this framework for policy evaluation(difference-of-differences, as before).
The added benefit is that we can control for the unobservedfixed effects at the level of the individual unit.
We do not require as the same simple structure as with thepooled cross-sections.
But the framework is no panacea, since there may be very goodreasons why
For example, unexplained changes in y (the error term) may becorrelated with changes in policy.
( )cov , 0it itx u! ! "
-
Additional Considerations
Given that there is a time-series dimension to the FD estimator(and panel data more generally), we may need to account forserial correlation.
In addition, we may need to deal with heteroskedasticity. While there are GLS (serial correlation) procedures available,
the easiest solution would be to use Newey-West variance-covariance matrix.
-
Example: County Crime Rates (NC)
Panel of North Carolina counties, 1981-1987. How do various law enforcement variables affect the crime rate? Base specification includes (in logs):
Probability of arrest; Probability of conviction (conditional on arrest); Probability of prison (conditional on conviction) Average sentence (conditional on prison) Police per capita
Covariates: Region, urban, pop density, tax revenues Year effects
Estimated in levels and FD (ignoring serial correlation, etc.)
-
Example: County crime rates in North Carolina, 1981-1987.
Pooled Cross Sections First Differencing
log(prbarr) -0.720(0.037)
-0.327(0.030)
log(prbconv) -0.546(0.026)
-0.238(0.018)
log(prbpris) 0.248(0.067)
-0.165(0.026)
log(avgsen) -0.087(0.058)
-0.022(0.022)
log(polpc) 0.366(0.030)
0.398(0.027)
Year effects Yes Yes
No. observations 630 540
R-squared 0.57 0.43
-
Interpretation
Consider the impact of the probability of being arrested: The first-differencing estimates suggest that we were
overestimating the negative impact on the crime rate (i.e., increasing the probability of arrest has less of an impact once you remove county fixed effects)
-
Potential pitfalls
It can be worse than pooled OLS if one or more of the explanatory variables is subject to measurement error Differencing a poorly measured regressor reduces its
variation relative to its correlation with the difference error (see Wooldridge, 2002, Chapter 11 for more details)
This could be a problem with explanatory variables from household or firm surveys, especially ones in developing countries
-
Differencing with More Than Two Time Periods
More on the error structure: When doing FD estimation with more than two time
periods, we must assume that uit is uncorrelated over time no serial correlation This assumption is sometimes reasonable, but it will not
hold if we assume that uit are uncorrelated over time. If the uit are serially uncorrelated with constant variance
then uit and uit-1 are negatively correlated (-0.5) If uit follows a stable AR(1) process then uit will be
serially correlated Only when uit follows a random walk will uit be serially
uncorrelated
-
Differencing with More Than Two Time Periods
Testing for serial correlation in the first-differenced equation: First, we estimate our first-differenced equation and
obtain the residuals Run a simple pooled OLS regression of the residual on
the lagged residual for t=3,,T, i=1,,N and compute a standard t test for the coefficient on the lagged residual
m ititr u
-
Differencing with More Than Two Time Periods
Correcting for serial correlation: In the presence of AR(1) serial correlation we can use
the Prais-Winsten FGLS estimator The Cochrane-Orcutt procedure is less preferred since
we lose N observations by dropping the first time period
However, standard PW procedures will treat the observations as if they followed an AR(1) process over both i and t, which makes no sense in this situation since we have assumed independence across i
A detailed treatment on how to do this can be found in Wooldridge (2002)
-
Assumptions for Pooled OLS Using First Differences
Assumption FD.1: For each i the model is
where the j
are the parameters to be estimated and ai
is the unobserved effect.
Assumption FD.2: We have a random sample from the cross section.
Assumption FD.3: Each explanatory variable changes over time (for at least some i) and no perfect linear relationships exist among the explanatory variables.
1 1 ... , 1,...,it it k itk i ity x x a u t T = + + + + =
-
Assumptions for Pooled OLS Using First Differences
Assumption FD.4: For each t, the expected value of the idiosyncratic error given the explanatory variables in all time periods and the unobserved effect is zero: E(uit|Xt,ai)=0. As stated, this assumption is stronger than is
necessary for consistency (uit is uncorrelated with xitj for all j=1,,k and for all t=2,,T).
Under assumptions FD.1 through FD.4, the first-difference estimator is unbiased.
-
Assumptions for Pooled OLS Using First Differences
Assumption FD.5: The variance of the differenced errors, conditional on all explanatory variables, is constant (i.e., homoskedastic): var(uit|Xi)=2, t=2,,T.
Assumption FD.6: For all ts, the differences in the idiosyncratic errors are uncorrelated (conditional on all explanatory variables): cov(uit,uis|Xi)=0, ts.
Under assumptions FD.1 through FD.6, the FD estimator of j is the best linear unbiased estimator (conditional on the explanatory variables).
-
Comparison of assumptions with standard OLS
Notice the strong similarities between the first differencing assumptions (FD) and those for standard OLS (MLR): MLR.1 and FD.1 are basically the same, except weve now
added repeated observations and an unobserved effect for each cross-sectional observation
MLR.2 and FD.2 are the same MLR.3 and FD.3 are the same, except weve added the
condition that there has to be at least some time variation for each of the explanatory variables
MLR.4 and FD.4 are the same, except that the condition is across all time periods (clearly FD.4 is the same as MLR.4 if T=1)
-
Comparison of assumptions with standard OLS
FD.5 is the same as MLR.5: homoskedasticity, but of the differenced error terms
FD.6 is new. It assumes that there is no correlation over time of the error terms (clearly this was not an issue when T=1) But we had a no serial correlation assumption in time
series models
-
Practice questions
In-chapter questions: 13.1, 13.3, 13.4, 13.5
End-of-chapter questions: C13.2, C13.7, C13.11 (i-iv)
Introduction to Pooled Cross SectionsAdministrative MattersOutlinePooling Independent Cross Sections across TimeA Spreadsheet viewFormal notationPooling Independent Cross Sections across TimePooling Independent Cross Sections across TimePooling Independent Cross Sections across TimeInterpretation of the year effectsPooling Independent Cross Sections across TimeExample 13.1: Womens Fertility over TimeExample 13.1: Womens Fertility over TimeExample 13.1: Womens Fertility over TimeExample 13.1: Womens Fertility over TimeAllowing the effect to change across periodsExample 13.2: Changes in the returns to education and the gender wage gapExample 13.2: Changes in the returns to education and the gender wage gapChow test for structural change across timePolicy Analysis with Pooled Cross SectionsA simple estimate of the treatment effectA simple estimate of the treatment effectEmpirical example: The effect of building an incinerator on house pricesEmpirical example: The effect of building an incinerator on house pricesSlide Number 25Slide Number 26Slide Number 27Slide Number 28Difference-in-differenceSlide Number 30Examples of difference-in-differenceHow much should we trust diff-in-diff estimates?How much should we trust diff-in-diff estimates?How much should we trust diff-in-diff estimates?How much should we trust diff-in-diff estimates?How much should we trust diff-in-diff estimates?How much should we trust diff-in-diff estimates?How much should we trust diff-in-diff estimates?How much should we trust diff-in-diff estimates?Slide Number 40Slide Number 41Panel DataSlide Number 43Slide Number 44Example: Crime and unemploymentSlide Number 46InterpretationCaveats to the first difference estimatorSlide Number 49Slide Number 50Slide Number 51Slide Number 52Example: County crime rates in North Carolina, 1981-1987.InterpretationPotential pitfallsDifferencing with More Than Two Time PeriodsDifferencing with More Than Two Time PeriodsDifferencing with More Than Two Time PeriodsAssumptions for Pooled OLS Using First DifferencesAssumptions for Pooled OLS Using First DifferencesAssumptions for Pooled OLS Using First DifferencesComparison of assumptions with standard OLSComparison of assumptions with standard OLSPractice questions