topic1 panel

8/3/2019 Topic1 Panel

1/16

Panel Data


2/16

Outline

Panel Data

Fixed-effects vs. random-effects

First-differencing or fixed-effects

Strict Exogeneity Assumption


3/16

Panel Data (or Longitudinal Data)

A typical panel data set has both across-sectional dimension and a timeseries dimension. In particular, thesame cross-sectional units (e.g.individuals, families, firms, cities,states) are observed over time. Panel data is different from pooling

independent cross sections across time(or pooled OLS). Estimating the latteris a simple extension ofOLS.


4/16

Large N or Large T? N is the number of cross-sectional units and T is

the number of time periods.

Small N and small T (of little use)

* Large N and small T (Traditional Panel Data)

N is large enough for the Law of Large Numbers toapply while T is not.

Convenient to use if cross-sectional units areindependent.

Small N and Large T

T is large enough for the Law of Large Numbers toapply while N is not.

Autocorrelation has to be addressed.

Large N and Large T (Still under exploration)


5/16

Fixed Effects Panel-data Model

(individual-specific intercepts)

yit=0+t+1xit1+2xit2+ai+uit

Strict Exogeneity Assumption

Cov(Xit,uis)=0 for all tand s Ruling out dynamic models, which have lagged

dependent variables (e.g. yi,t-1) as explanatoryvariables. Models with the lags of dependentvariables as ind. Var. are still fine.

The effects of time-constant independentvariables can not be directly estimated becausethey are mixed in ai

t (time-specific intercepts) controls forcommon shocks to all agents at period t.


6/16

Names The individual-specific intercept ai may be called ai fixed

effector unobserved heterogenity.

The term uit is called idiosyncratic error.

The sum ai+uit is often called the composite error.

If Cov(Xit,ai) is nonzero but the pooled OLS method isused, estimates of all parameters might be biased.This

bias can be called heterogeneitybias.

Balanced Panelindicates panel data with observationsfor the same time periods for all individuals. Otherwise,the data are unbalanced.


7/16

Random Effects Models

yit=0+t+1xit1+2xit2+ai+uit

Key assumption: ai is uncorrelated with each explanatory variable in all

time periods.

Difference between RE and FE estimators

In FE, we effectively control for ai using dummy

variables. In RE, ai is omitted and is part of the disturbance

RE estimates are more efficient (or more precise) ifthe RE assumption is valid.


8/16

Random Effects Models

(continued)

Difference between RE and pooled OLS Since ai is in the error term, observations over time

are correlated for the same individual i

In RE approach, the correlation over time iseliminated using some sophisticated GLS(generalized least square) method.

In pooled OLS, the GLS correction is not used.

Hauman test

Compare the RE and FE estimates, if theestimates are very different, then the REassumption is probably invalid. In this case FEhas to be used. Otherwise, RE is more efficient.


9/16

Estimation of the Fixed-effect Panel

Data Model Fixed-effects (or Within) Estimator

Each variable is demeaned (i.e. subtracted by itsaverage)

Dummy Variable Regression (i.e. put in adummy variable for each cross-sectional unit,along with other explanatory variables.) Thismay cause estimation difficulty when N is large.

First-difference Estimator Each variable is differenced once over time, so

we are effectively estimating the relationshipbetween changes of variables.


10/16

First Differencing or Fixed-Effect? Theoretically, when N is large and T is small but

greater than 2, FE is more efficient when uit areserially uncorrelated while FD is more efficient whenuit follows a random walk.

When T is large and N is small

FD has advantage for processes with large positiveautocorrelation. FE is more sensitive to nonnormality,heteroskedasticity, and serial correlation in the

idiosyncratic errors. On the other hand, FE is less sensitive to violation of

the strict exogeneity assumption. So FE is preferredwhen the processes are weakly dependent over time.


11/16

With Classical Measurement Errors

When T>2, the measurement errorbias using FE estimator may be

smaller than that with FD approachbut higher than that with OLS.(Griliches and Hausman, 1986)

Natural IV for Measurement Error:Lagged dependent variables


12/16

Violation of the Strict Exogeneity

Assumption

Parameter estimates are inconsistent,natural experiment approach (e.g. IV)

is needed.


13/16

With Strict Exogeneity and

DependentO

bservations Parameter estimates are consistent

Standard errors estimates co

uld still bebiased:

Cross-sectional correlation or serial correlation(over time) in error terms

Heteroskedasticity


14/16

Possible Solutions (Need Large N and

Zero Cross-Sectional Correlation) Heteroskedasticity

Use White robust standard errors

Autocorrelation

Group the sample time dimension into twoperiods and apply the first-difference estimator(need large N). (Perform the best with D-in-Dapproach by Bertrand et al. 2004)

Clustered robust errors Newey-West standard errors (which also

accounts for heteroskedasticity) Cross-sectional Correlations

Clustered robust errors


15/16

Clustered Standard Errors

Key Assumption

Correlations within a cluster (a group of firms, aregion, different years for the same firm, differentyears for the same region) are the same are thesame for different observations.

Procedure

Identify clusters using economic theory (clustered byindustry, year, industry and year)

Let comp

uter calc

ulate cl

ustered standard errors

Try different ways of defining clusters and see howestimated standard errors are affected.


16/16

Unbalanced Panels If a panel data set is unbalanced for

reasons uncorrelated with uit, estimationconsistency using FE will not be affected

The attrition problem: If an unbalancedpanel is a result of some selection processrelated to uit, then endogeneity problem ispresent and need to be dealt with usingsome correction methods.

topic1 panel

Documents