05-pooling_cross_sections_across_time.pdf

64
Introduction to Pooled Cross Sections EMET 8002 Lecture 5 August 13, 2009

Upload: orivil

Post on 23-Nov-2015

19 views

Category:

Documents


0 download

TRANSCRIPT

  • Introduction to Pooled Cross Sections

    EMET 8002Lecture 5

    August 13, 2009

  • Administrative Matters

    Consultation hours: Tuesdays 3 pm to 5pm

    Im going to be away again next Tuesday, so Ill hold consultation hours next week on Wednesday, August 19th from 3 pm to 5 pm

    Case Studies projects have now been assigned. If you did not receive the email that I sent on Tuesday you can view the assignments on the course website

    If you have not already done so, please contact your supervisor immediately. A few of the projects will require you to apply fordata access which should be done immediately!

  • Outline

    Introduce pooled cross sections regression analysis Chapter 13 in the text

    Potential pitfall using difference-in-differences estimation strategy

    Introduction to two-period panel data and the first difference estimator

  • Pooling Independent Cross Sections across Time

    What is it? It is obtained by sampling randomly from a large

    population at two or more points in time For example, randomly sampling households from ACT

    residents in 2000 and 2008

  • A Spreadsheet

    view

    Pooled cross sections Panel

    Unit Period Unit Period

    1 1 1 1

    2 1 2 1

    3 1 3 1

    4 2 1 2

    5 2 2 2

    6 2 3 2

  • Formal notation

    We can denote the pooled cross section as a random sample:

    { }1 2 1 1 1 2 1 21 2

    , , ,..., 1,2,..., , 1,..., ,..., ...

    1,2,...,

    it it it kit T

    Period Period PeriodT

    y x x x i N N N N N N N

    t T

    = + + + + +

    =

  • Pooling Independent Cross Sections across Time

    Consider the regression model:

    Benefits: Pooling can lead to larger sample sizes: This leads to more precise

    estimators and test statistics with more power. However, this is only true if the relationship between the dependent variable and at least some of the explanatory variables remains constant over time

    If the xs are changing over time, it can also provide additional variation in x with which to estimate its effect on y

    Note: the error term may have the structure

    For now, well make the following assumptions:

    And that the two components of the error term are independent

    0 1 1 2 2 1 1 1 2 1 2

    1 2

    ... 1,2,..., , 1,..., ,..., ...

    1,2,...,

    it it it k kit it T

    Period Period PeriodT

    y x x x u i N N N N N N N

    t T

    = + + + + + = + + + + +=

    it t itu = +( ) ( )2 2| ~ 0, | ~ 0,t itX and X

  • Pooling Independent Cross Sections across Time

    Suppose the true error structure does include a year component, but we ignore that and run OLS on the following equation:

    Does it matter? Yes! This introduces serial correlation between observations within the same

    time period:

    0 1 1 2 2 ...it it it k kit ity x x x u = + + + + +

    ( ) ( )( )( ) ( )2

    2

    | |

    | ,

    0

    it jt t it t jt

    t t it jt it jt

    E u u X E X

    E X i j t

    = + + = + + +

    =

  • Pooling Independent Cross Sections across Time

    This violates one of our assumptions for OLS! The OLS coefficient estimate is still unbiased and

    consistent However, the variance-covariance matrix is

    biased/inconsistent which leads to incorrect standard errors and incorrect inference

    This is similar to the problem of serial correlation in time series models which we saw previously.

    Thus, it makes sense to include time dummies (a.k.a. year effects):

    0 1 1 2 22

    ...T

    it it it k kit t itt

    y x x x =

    = + + + + + +

  • Interpretation of the year effects

    How do we interpret the year dummies?

    In words, each time dummy is the difference in the conditional expected value of y between the base year (t=1) and the year t=j

    ( )( )

    ( ) ( )1 1 2 2

    1 1 2 2

    | 1, ...

    | , ...

    | , | 1,

    it it k kit

    it it k kit j

    j

    E y t x x x

    E y t j x x x

    E y t j E y t

    = = + + += = + + + +

    = = =

    x

    x

    x x

  • Pooling Independent Cross Sections across Time

    Further benefits: A further benefit of these data is that we can explore changes in the coefficients

    over time This amount to allowing some or all of the s to have

    t-subscripts In a two-period pooled cross section dataset with one explanatory variable:

    We can then use F-tests (including Chow Tests) to test for changes in the regression model over time

    Note: While changes in the coefficients may be interesting, one has to be very cautious in interpreting the source of the changes (e.g., as the impact of a policy or changing economic structure)

    0 1 2 2 2it it it ity x x D = + + + +

  • Example 13.1: Womens Fertility over Time

    Dependent variable is the number of children born to a woman.

    Explanatory variables include socio-demographic characteristics.

    Pooled time series of cross-sections (GSS: 1972, 1974, 1976, 1978, 1980, 1982, 1984).

    N = 1,129 Dataset: FERTIL1.RAW One question of interest is: After controlling for other

    observable factors, what happened to the fertility rate over time?

  • Example 13.1: Womens Fertility over Time

    In Stata:sort yearby year: summarize kids

    Mean number of children

    1972 1974 1976 1978 1980 1982 1984

    3.026 3.208 2.803 2.804 2.817 2.403 2.237

    Year t -

    1972

    0.182 -0.223 -0.222 -0.209 -0.623 -0.789

  • Example 13.1: Womens Fertility over Time

    In Stata: reg kids educ age agesq black east northcen west farm othrural town smcity y74 y76 y78 y80 y82 y84

    kids

    Coef.

    Std. Err.

    t

    P>t

    [95% Conf.

    Interval]

    educ

    -.1284268

    .0183486

    -7.00

    0.000

    -.1644286

    -.092425age

    .5321346

    .1383863

    3.85

    0.000

    .2606065

    .8036626agesq

    -.005804

    .0015643

    -3.71

    0.000

    -.0088733

    -.0027347black

    1.075658

    .1735356

    6.20

    0.000

    .7351631

    1.416152east

    .217324

    .1327878

    1.64

    0.102

    -.0432192

    .4778672northcen

    .363114

    .1208969

    3.00

    0.003

    .125902

    .6003261west

    .1976032

    .1669134

    1.18

    0.237

    -.1298978

    .5251041farm

    -.0525575

    .14719

    -0.36

    0.721

    -.3413592

    .2362443othrural

    -.1628537

    .175442

    -0.93

    0.353

    -.5070887

    .1813814town

    .0843532

    .124531

    0.68

    0.498

    -.1599893

    .3286957smcity

    .2118791

    .160296

    1.32

    0.187

    -.1026379

    .5263961y74

    .2681825

    .172716

    1.55

    0.121

    -.0707039

    .6070689y76

    -.0973795

    .1790456

    -0.54

    0.587

    -.448685

    .2539261y78

    -.0686665

    .1816837

    -0.38

    0.706

    -.4251483

    .2878154y80

    -.0713053

    .1827707

    -0.39

    0.697

    -.42992

    .2873093y82

    -.5224842

    .1724361

    -3.03

    0.003

    -.8608214

    -.184147y84

    -.5451661

    .1745162

    -3.12

    0.002

    -.8875846

    -.2027477_cons

    -7.742457

    3.051767

    -2.54

    0.011

    -13.73033

    -1.754579

  • Example 13.1: Womens Fertility over Time

    There may be heteroskedasticity in the previous model. This could be related to the observed characteristics,

    or It could simply be that the error variance is changing

    over time Nonetheless, the usual heteroskedasticity-robust

    standard errors and t statistics are still Just use the robust option with the regress command

    in Stata

  • Allowing the effect to change across periods

    We can also interact year dummy variables with key explanatory variables to see if the effect of that variable changed over time

  • Example 13.2: Changes in the returns to education and the gender wage gap

    Consider the following regression model pooled over the years 1978 and 1985:

    The dataset is CPS78_85.RAW reg lwage y85 educ y85educ exper expersq union

    female y85fem

    ( ) 0 0 1 1 22

    3 4 5 5

    log 85 85

    85

    wage y educ y educ exper

    exper union female y female u

    = + + + ++ + + + +

  • Example 13.2: Changes in the returns to education and the gender wage gap

    lwage

    Coef.

    Std. Err.

    t

    P>t

    [95% Conf.Interval]

    y85

    .1178062

    .1237817

    0.95

    0.341

    -.125075

    .3606874educ

    .0747209

    .0066764

    11.19

    0.000

    .0616206

    .0878212y85educ

    .0184605

    .0093542

    1.97

    0.049

    .000106

    .036815exper

    .0295843

    .0035673

    8.29

    0.000

    .0225846

    .036584expersq

    -.0003994

    .0000775

    -5.15

    0.000

    -.0005516

    -.0002473union

    .2021319

    .0302945

    6.67

    0.000

    .1426888

    .2615749female

    -.3167086

    .0366215

    -8.65

    0.000

    -.3885663

    -.244851y85fem

    .085052

    .051309

    1.66

    0.098

    -.0156251

    .185729_cons

    .4589329

    .0934485

    4.91

    0.000

    .2755707

    .642295

  • Chow test for structural change across time

    We can apply the Chow test to see if a multiple regression function differs across two time periods

    We can do this is pooled cross sections by interacting all explanatory variables with time dummies and performing an F-test that the interactions are jointly insignificant Usually, we allow the intercept to change over time

    and only test that the slope parameters have changed

  • Policy Analysis with Pooled Cross Sections

    This type of data can be useful in identifying the impacts of policies (or government programs) on various outcomes

    They are especially helpful if the policy experiment has before & after and treatment & control dimensions

    Consider a simple example: We wish to estimate the impact of participating in a

    government program (i.e., the treatment) on an outcome, y Let participation in the program be captured by the dummy

    variables: 1 affected ("treatment")0 unaffected ("control")i

    D =

  • A simple estimate of the treatment effect

    One estimator of the treatment effect is given by the difference of means:

    In a regression context, we could estimate this difference by:

    Such a regression would work if

    T Cy y

    i i iy D u = + +

    ( ) ( )( ) ( )

    ( ) ( ) ( ) ( )

    | 0 | 0

    | 1 | 1

    | 1 | 0 iff | 1 | 0

    i i i i

    i i i i

    i i i i i i i i

    E y D E u D

    E y D E u D

    E y D E y D E u D E u D

    = = + == = + + =

    = = = = = =

  • A simple estimate of the treatment effect

    The difference in means between the treatment and control groups and the OLS estimate of are consistent estimates of the treatment effect ONLY if there are no other differences between the treatment and control groups

    Sometimes we can add covariates (Xs) to help control for differences between the treatment and control group

    Nonetheless, this is often an impractical assumption to make

    However, we may be able to use time variation in the application of the program (before & after) combined with variation in treatment

  • Empirical example: The effect of building an incinerator on house prices

    The hypothesis that we are interested in testing is that the announcement of the pending construction of an incinerator wouldcause the prices of houses located nearby to fall, relative to houses further away.

    A house is considered to be close if it is within 3 miles of the incinerator.

    We have data on house prices for houses that sold in 1978, before the announcement of the incinerator, and in 1981, after the announcement.

    We begin by regressing the real house price on a dummy variable for whether the house is close to the incinerator using data from 1981 using the dataset KIELMC.RAW

  • Empirical example: The effect of building an incinerator on house prices

    1981 1978 1978 & 1981

    Near Incinerator

    -30,688(5,828)[-5.27]

    -18,824(4,745)[-3.97]

    -18,824(4,875)[-3.86]

    Year 1981 18,790(4,050)[4.64]

    Near Incinerator * 1981

    -11,864(7,457)[-1.59]

  • The Difference-in-Differences Model

    Consider the following simple example where we allow:

    The model is:

    Suppose that the differences between treatment and controlgroups can be written:

    Also assume that the time effects can be written (normalized)as:

    E u

    i| D

    i= 1!" #$ % E ui | Di = 0!" #$

    y

    it=! + "D

    it+#

    t+ u

    it

    E uit

    | Di1= 0, D

    i2= 0!" #$ = % C

    E uit

    | Di1= 0, D

    i2= 1!" #$ = % T

    !t= 0,t = 1

    !t= ! ,t = 2

  • The Difference-in-Differences Model

    The expected outcomes in the before (period 1) are:

    In the after period (period 2):

    An estimate of can then be recovered by comparing:

    E yi1

    | Di1= 0, D

    i2= 0!" #$ =% + & C

    E yi1

    | Di1= 0, D

    i2= 1!" #$ =% + & T

    E yi1

    | Di1= 0, D

    i2= 0!" #$ =% + & C +'

    E yi1

    | Di1= 0, D

    i2= 1!" #$ =% + & T +' + (

    E yi2

    | Di1= 0, D

    i2= 1!" #$ % E yi1 | Di1 = 0, Di2 = 1!" #$( )

    % E yi2

    | Di1= 0, D

    i2= 0!" #$ % E yi1 | Di1 = 0, Di2 = 0!" #$( )

  • The Difference-in-Differences Model

    The difference-in-differences estimator would then be based on:

    Or, alternatively,

    In a regression framework, we would estimate this as:

    yT ,2! y

    T ,1"# + $

    yC ,2! y

    C ,1"#

    yT ,2! y

    C ,2" #

    T! #

    C( ) + $

    yT ,1! y

    C ,1" #

    T! #

    C( )

    y

    it=! +"AFTER

    it+ # D

    it+ $D

    it% AFTER

    it+ u

    it

  • The Difference-in-Differences Model

    Of course, we could also add covariates. In this specification, we denote:

    ! " common time effect

    # " permanent differences (across T,C)

    $ " treatment effect (diff of diff)

  • Difference-in-difference

    They key distinction between the difference-in-difference estimators and the difference in means is that we have relaxed the assumption about the distribution of the error terms across the treatment and control groups.

    We no longer require the conditional expectation of the error term to be equal across groups, we only require the conditional expectation for each group to be constant over time

    This may still be a strong assumption! You need to thinkabout the validity of making this assumption.

  • Another Example: The Effect of WorkersComp. on Injury Duration (Kentucky)

    Notes: Dependent variable is log duration of Workers comp benefits. Controls include: age, sex,married, whether a hospital stay was required, indicators for the type of injury, and industry ofjob. After corresponds to the increase in the cap of weekly WC benefits.

    5347534725672567Sample size

    0.1880.0220.2140.031R-squared

    YesNoYesNoControls

    0.175(0.064)

    0.229(0.070)

    After*HighEarner

    0.047(0.041)

    0.014(0.045)

    After

    0.115(0.048)

    0.233(0.049)

    0.274(0.054)

    0.462(0.051)

    High Earner

    Before andAfter

    Before andAfter

    AfterAfterData:

  • Examples of difference-in-difference

    For a good example of a paper using this strategy, see Duflo, Esther. (2001). Schooling and labor market consequences of school construction in Indonesia: Evidence from an unusual policy experiment. American Economic Review. Vol. 91, No. 4, pp. 795-813.

    For a good example of when the impact of the policy/program might have spillover effects on the control group, see Miguel, Edward and Michael Kremer. (2004). Worms: Identifying impacts on education and health in the presence of treatment externalities. American Economic Review. Vol. 72, No. 1, pp. 159-217.

  • How much should we trust diff-in-diff estimates?

    The following discussion is based on an excellent paper by Bertrand, Duflo, and Mullainathan (2004) in the Quarterly Journal of Economics

    Many papers that employed difference-in-differences estimators use many years of data and focus on serially correlated outcomes but ignore that the resulting standard errors are inconsistent

    Diff-in-diff estimates are usually based on estimating an equation of the form:

    where i denotes individuals, s denotes a state or group membership, and t denotes the time period

    ist s t ist st istY A B cX I = + + + +

  • How much should we trust diff-in-diff estimates? An important point, that we will address later in the course, is possible

    correlation of the error terms across individuals within a state/group in a given year.

    We are going to ignore this potential problem for now and assume that the econometricians have appropriately dealt with correlation within state-year cells Hence, lets think of the data being averaged over individuals

    within a state in each given year Three factors make serial correlation an important issue in the

    difference-in-differences context: Estimation often relies on long time series The most commonly used dependent variables are typically highly

    serially correlated The treatment variable, Ist, changes very little within a state over

    time

  • How much should we trust diff-in-diff estimates?

    How severe is the problem? They examine how diff-in-diff performs on placebo laws,

    where treated states (in the U.S.) are chosen at random as is the year of passage of the placebo law

    Since the laws are fictitious, a significant effect should only be found 5% of the time (i.e., the true null hypothesis of no effect is falsely rejected 5% of the time)

    They use wages as the dependent variable over 21 years They find rejection rates of the null hypothesis as high as

    45% of the time! In other words, there is statistical evidence that these fakes

    laws affected wages in close to half of the simulations

  • How much should we trust diff-in-diff estimates?

    Does this matter practically? They find 92 diff-in-diff papers published between

    1990 and 2000 in the following journals: the American Economic Review, the Industrial Labor Relations Review, the Journal of Labor Economics, the Journal of Political Economy, the Journal of Public Economics, and the Quarterly Journal of Economics 69 of these papers have more than 2 time periods

    Only 4 papers collapse the data into before-after Thus, 65 papers have potential serial correlation problem

    Only 5 provide a serial correlation correction

  • How much should we trust diff-in-diff estimates?

    Some results: When the treatment variable is not serially correlated,

    rejection rates of H0: no effect are close to 5% The overrejection problem diminishes with the serial

    correlation in the dependent variable

  • How much should we trust diff-in-diff estimates?

    Solutions: Parametric methods

    Specify an autocorrelation structure for the error term, estimate its parameters, and use these parameters to compute standard errors This does not do a very good job of remedying the problem

    With short time series, the OLS estimation of the autocorrelation parameter is downward biased

    The autocorrelation structure may be incorrectly specified

    Block bootstrap Bootstrapping is an advanced technique

    It does poorly when the number of states/groups becomes small

  • How much should we trust diff-in-diff estimates?

    Solutions (continued): Ignore time series information: average the before and after

    data and estimate the model on two periods This is difficult when treatment occurs at different times across

    states since there is no longer a uniform before and after and it is not even defined for control states This can be corrected for though

    This procedure works well, even when there are a small number of states/groups

    Arbitrary variance-covariance matrix Does quite well in general, although the rejection rate

    increases above 5% when the number of states/groups is small Can be implemented in Stata using the cluster option (at the

    state/group level, not the state-time cell)

  • How much should we trust diff-in-diff estimates?

    Main message: There is not one preferred correction mechanism

    Collapsing the data into pre- and post- periods produced consistent standard errors, even when the number of states is small (although the power of this procedure declines fast)

    Allowing for an arbitrary autocorrelation process is also viable when the number of groups is sufficiently large

    Doing good econometrics is not easy! Be very, very careful that all your assumptions are

    met!

  • Panel data

    If we have repeated observations on the same individuals(units, i) then we have longitudinal, or panel data:

    Benefits of panel data: Similar to repeated cross-sections; BUT most importantly, we can exploit repeated observations on the

    same individual in order to control for certain types of unobservedheterogeneity, which otherwise might contaminate OLS estimation;

    Panel data allows for richer controls for unobserved heterogeneitythan just systematic differences between treatment and control.

    yit, x

    it{ } i = 1,2,3,..., Nt = 1,2,...,T

  • Panel Data

    Begin with two periods, for simplicity Of course, we can do all the same stuff with panel data as with

    pooled cross-sections. However, we can do more.

    and we will also have an additional statistical consideration, withthe loss of independence across observations

    Consider the simple regression model:

    Omitted variables bias for arises when

    Under certain assumptions, we will be able to exploit panel datain order to fix this bias (unobserved heterogeneity)

    y

    it= !

    0+ !

    1x

    it+ v

    it

    1!

    ( ), 0it it

    corr x v !

  • Panel Data

    When corr(xit,vit)0 this violates assumption MLR.4(and MLR.4)

    Hence, OLS is no longer valid

    Under some circumstances we can cope with this problem using panel data. This is another example of when one of the core OLS assumptions fails to hold.

  • Fixed Effects Error Structure

    Imagine we can write the error term as:

    Furthermore, assume that ALL of the omitted variables bias wasdue to

    i.e., the correlation of x with fixed (and unobserved) individualcharacteristics.

    t

    composite error

    time effect

    fixed effect

    idiosyncratic effect

    it t i it

    it

    i

    it

    v a u

    v

    a

    u

    !

    !

    = + +

    "

    "

    "

    "

    ( ), 0it i

    corr x a !

  • First Difference Estimator

    Consider the First Differenced (FD) estimator, based on:

    The key point is that the fixed effects fall out. By assumption, we also require

    Thus, by differencing we have eliminated the heterogeneitybias.

    2 0 1 2 2

    1 0 1 1 1

    1

    i i i

    i i i

    i i i

    y x a u

    y x a u

    y x u

    ! ! "

    ! !

    ! "

    = + + + +

    = + + + +

    # = # + + #

    ( ), 0i i

    corr x u! ! =

  • Example: Crime and unemployment

    We have data on crime and unemployment rates for 46 cities in 1982 and 1987 Cities are the unit of observation i 1982 and 1987 are the period of observation t

    Well try three specifications:

    0 1

    0 0 1

    0 1

    198787 1982,1987

    it it it

    it t it it

    i i i

    crmrte unem u tcrmrte d unem u tcrmrte unem u

    = + + == + + + =

    = + +

  • Simple Example: Unemployment andcrime (standard errors in parentheses)

    0.1270.0120.033R-squared

    469246N

    7.94(7.98)

    Y87

    2.22(0.88)

    0.427(1.19)

    -4.16(3.42)

    UnemploymentRate

    15.40(4.70)

    93.42(12.74)

    128.38(20.76)

    Constant

    1982,1987(FD)

    1982,1987(Levels)

    1987(Levels)

    Data:

  • Interpretation

    Controlling for unemployment, crime has risen between 1982 and 1987 in these cities

    Using just cross sectional data (i.e., only the 1987 data) wouldsuggest that higher unemployment is associated with lower crime rates this is certainly not what we expect!

    Using the first-differencing specification suggests that the partial correlation between crime and unemployment is positive when we control for city fixed effects (i.e., the negative partial correlation we observed in the 1987 cross section was biased)

  • Caveats to the first difference estimator

    It may be incorrect to assume that corr(xi, vi)=0

    We need variation in the xs This means we cannot include variables that do not change

    over time across observations (e.g., race, country of birth, etc.)

    It also means we cannot include variables for which the change would be the same for all observations (e.g., age)

    Also, we cannot expect to get precise estimates on variables, such as education, which will tend to change for a relatively few observations in a dataset

  • The FD Estimator more generally

    The FD estimator provides a powerful strategy for dealing withomitted variables bias when panel data are available.

    More generally, we can apply the model to multiple time periods(not just two):

    In which case the FD estimator is based on:

    yit= !x

    it+ "

    tD

    t+ a

    i+ u

    itt=2

    T

    #

    !yit= "!x

    it+ #

    tD

    t+ !u

    itt=2

    T

    $

  • The FD Estimator and Program Evaluation

    Of course, we can also use this framework for policy evaluation(difference-of-differences, as before).

    The added benefit is that we can control for the unobservedfixed effects at the level of the individual unit.

    We do not require as the same simple structure as with thepooled cross-sections.

    But the framework is no panacea, since there may be very goodreasons why

    For example, unexplained changes in y (the error term) may becorrelated with changes in policy.

    ( )cov , 0it itx u! ! "

  • Additional Considerations

    Given that there is a time-series dimension to the FD estimator(and panel data more generally), we may need to account forserial correlation.

    In addition, we may need to deal with heteroskedasticity. While there are GLS (serial correlation) procedures available,

    the easiest solution would be to use Newey-West variance-covariance matrix.

  • Example: County Crime Rates (NC)

    Panel of North Carolina counties, 1981-1987. How do various law enforcement variables affect the crime rate? Base specification includes (in logs):

    Probability of arrest; Probability of conviction (conditional on arrest); Probability of prison (conditional on conviction) Average sentence (conditional on prison) Police per capita

    Covariates: Region, urban, pop density, tax revenues Year effects

    Estimated in levels and FD (ignoring serial correlation, etc.)

  • Example: County crime rates in North Carolina, 1981-1987.

    Pooled Cross Sections First Differencing

    log(prbarr) -0.720(0.037)

    -0.327(0.030)

    log(prbconv) -0.546(0.026)

    -0.238(0.018)

    log(prbpris) 0.248(0.067)

    -0.165(0.026)

    log(avgsen) -0.087(0.058)

    -0.022(0.022)

    log(polpc) 0.366(0.030)

    0.398(0.027)

    Year effects Yes Yes

    No. observations 630 540

    R-squared 0.57 0.43

  • Interpretation

    Consider the impact of the probability of being arrested: The first-differencing estimates suggest that we were

    overestimating the negative impact on the crime rate (i.e., increasing the probability of arrest has less of an impact once you remove county fixed effects)

  • Potential pitfalls

    It can be worse than pooled OLS if one or more of the explanatory variables is subject to measurement error Differencing a poorly measured regressor reduces its

    variation relative to its correlation with the difference error (see Wooldridge, 2002, Chapter 11 for more details)

    This could be a problem with explanatory variables from household or firm surveys, especially ones in developing countries

  • Differencing with More Than Two Time Periods

    More on the error structure: When doing FD estimation with more than two time

    periods, we must assume that uit is uncorrelated over time no serial correlation This assumption is sometimes reasonable, but it will not

    hold if we assume that uit are uncorrelated over time. If the uit are serially uncorrelated with constant variance

    then uit and uit-1 are negatively correlated (-0.5) If uit follows a stable AR(1) process then uit will be

    serially correlated Only when uit follows a random walk will uit be serially

    uncorrelated

  • Differencing with More Than Two Time Periods

    Testing for serial correlation in the first-differenced equation: First, we estimate our first-differenced equation and

    obtain the residuals Run a simple pooled OLS regression of the residual on

    the lagged residual for t=3,,T, i=1,,N and compute a standard t test for the coefficient on the lagged residual

    m ititr u

  • Differencing with More Than Two Time Periods

    Correcting for serial correlation: In the presence of AR(1) serial correlation we can use

    the Prais-Winsten FGLS estimator The Cochrane-Orcutt procedure is less preferred since

    we lose N observations by dropping the first time period

    However, standard PW procedures will treat the observations as if they followed an AR(1) process over both i and t, which makes no sense in this situation since we have assumed independence across i

    A detailed treatment on how to do this can be found in Wooldridge (2002)

  • Assumptions for Pooled OLS Using First Differences

    Assumption FD.1: For each i the model is

    where the j

    are the parameters to be estimated and ai

    is the unobserved effect.

    Assumption FD.2: We have a random sample from the cross section.

    Assumption FD.3: Each explanatory variable changes over time (for at least some i) and no perfect linear relationships exist among the explanatory variables.

    1 1 ... , 1,...,it it k itk i ity x x a u t T = + + + + =

  • Assumptions for Pooled OLS Using First Differences

    Assumption FD.4: For each t, the expected value of the idiosyncratic error given the explanatory variables in all time periods and the unobserved effect is zero: E(uit|Xt,ai)=0. As stated, this assumption is stronger than is

    necessary for consistency (uit is uncorrelated with xitj for all j=1,,k and for all t=2,,T).

    Under assumptions FD.1 through FD.4, the first-difference estimator is unbiased.

  • Assumptions for Pooled OLS Using First Differences

    Assumption FD.5: The variance of the differenced errors, conditional on all explanatory variables, is constant (i.e., homoskedastic): var(uit|Xi)=2, t=2,,T.

    Assumption FD.6: For all ts, the differences in the idiosyncratic errors are uncorrelated (conditional on all explanatory variables): cov(uit,uis|Xi)=0, ts.

    Under assumptions FD.1 through FD.6, the FD estimator of j is the best linear unbiased estimator (conditional on the explanatory variables).

  • Comparison of assumptions with standard OLS

    Notice the strong similarities between the first differencing assumptions (FD) and those for standard OLS (MLR): MLR.1 and FD.1 are basically the same, except weve now

    added repeated observations and an unobserved effect for each cross-sectional observation

    MLR.2 and FD.2 are the same MLR.3 and FD.3 are the same, except weve added the

    condition that there has to be at least some time variation for each of the explanatory variables

    MLR.4 and FD.4 are the same, except that the condition is across all time periods (clearly FD.4 is the same as MLR.4 if T=1)

  • Comparison of assumptions with standard OLS

    FD.5 is the same as MLR.5: homoskedasticity, but of the differenced error terms

    FD.6 is new. It assumes that there is no correlation over time of the error terms (clearly this was not an issue when T=1) But we had a no serial correlation assumption in time

    series models

  • Practice questions

    In-chapter questions: 13.1, 13.3, 13.4, 13.5

    End-of-chapter questions: C13.2, C13.7, C13.11 (i-iv)

    Introduction to Pooled Cross SectionsAdministrative MattersOutlinePooling Independent Cross Sections across TimeA Spreadsheet viewFormal notationPooling Independent Cross Sections across TimePooling Independent Cross Sections across TimePooling Independent Cross Sections across TimeInterpretation of the year effectsPooling Independent Cross Sections across TimeExample 13.1: Womens Fertility over TimeExample 13.1: Womens Fertility over TimeExample 13.1: Womens Fertility over TimeExample 13.1: Womens Fertility over TimeAllowing the effect to change across periodsExample 13.2: Changes in the returns to education and the gender wage gapExample 13.2: Changes in the returns to education and the gender wage gapChow test for structural change across timePolicy Analysis with Pooled Cross SectionsA simple estimate of the treatment effectA simple estimate of the treatment effectEmpirical example: The effect of building an incinerator on house pricesEmpirical example: The effect of building an incinerator on house pricesSlide Number 25Slide Number 26Slide Number 27Slide Number 28Difference-in-differenceSlide Number 30Examples of difference-in-differenceHow much should we trust diff-in-diff estimates?How much should we trust diff-in-diff estimates?How much should we trust diff-in-diff estimates?How much should we trust diff-in-diff estimates?How much should we trust diff-in-diff estimates?How much should we trust diff-in-diff estimates?How much should we trust diff-in-diff estimates?How much should we trust diff-in-diff estimates?Slide Number 40Slide Number 41Panel DataSlide Number 43Slide Number 44Example: Crime and unemploymentSlide Number 46InterpretationCaveats to the first difference estimatorSlide Number 49Slide Number 50Slide Number 51Slide Number 52Example: County crime rates in North Carolina, 1981-1987.InterpretationPotential pitfallsDifferencing with More Than Two Time PeriodsDifferencing with More Than Two Time PeriodsDifferencing with More Than Two Time PeriodsAssumptions for Pooled OLS Using First DifferencesAssumptions for Pooled OLS Using First DifferencesAssumptions for Pooled OLS Using First DifferencesComparison of assumptions with standard OLSComparison of assumptions with standard OLSPractice questions