1 panel data analysis – advantages and challenges cheng hsiao
TRANSCRIPT
1
Panel Data Analysis – Advantages and Challenges
Cheng Hsiao
2
Introduction
Year SSCI
1986 292003 5802004 6872005 773
3
Three factors contributing to the phenomenon growth
(i) Data availability
(ii) Greater capacity for modeling the complexity of human behavior
(iii) Challenging methodology
4
Data Availability
US: National Longitudinal Surveys of Labor Market Experience (NLS)Michigan Panel Study of Income Dynamics (PSID)
Eurostat:The European Community Household Panel (ECHP)
Kenya:Primary School Deworming Project (PDSP)
China:Township & Village Enterprises SurveyFinancial Institutions Survey (1984-1990)
Taiwan:Household Demographic Survey
5
6
Advantages
• Cross-Sectional Data may reflect inter-individual differences• Time Series data may suffer from multicollinearity and
shortages of degree of freedom• Panel data, by blending inter-individual indifference with
intra-individual dynamics, can allow a researcher the possibility to specify more complicated behavioral hypotheses than a single cross-sectional data or time series data
7
(i) More degree of freedom, more sample variability, less multicollinearity
uxy~~~
n x 1 n x k k x 1
)'()( 12^
~xxVar
uxy iiii
n
i
u
xxi
Var
1
2
2^
_)(
xn
x i1_
8
9
10
(ii) Greater capacity for capturing the complexity of human behavior
(a) Constructing and testing more complicated
behavioral hypotheses
- Homogenous vs Heterogenous population
Ben-Porath (1973)
- Program Evaluation
Difference-in-Difference method
11
uxgy iii 111 )( if 1d i treatment
uxgy iii 000 )( if 0d i control
Treatment Effect = yy ii 01
Average Treatment Effect = ][ 01 yyE ii
Data ydydy iiiii 01 )1(
Confounding treatment effect with differences
in covariates between control group and
treatment group
)(xi
12
Bias due to selection on unobservables
)1|(0)( 11 duEuE iii
)0|(0)( 00 duEuE iii
13
Difference-in-Difference method
(b) Controlling the impact of omitted variables
)()( yyEyyE cbcatbta
)()(
)(
),()(
1,1,1,
2
^
uuxxyy
xVar
zxCov
x
xyEE
z
uzxy
tiittiittiit
i
ii
i
itiitit
- unobservable
14
(c) Uncovering dynamic relationships
multicollinearity
(d) Generating more accurate predictions for individual outcomes (exchangeability)
(e) Providing micro foundation for aggregate data analysis
“representative agent” heterogeneity
)( 211 xxxx
uxy
tttt
tjtjt
15
16
(ii) Simplifying Statistical Inference and Computation
(a) Time-series inference
if
if
if
i.i.d , t1 ttt yy
)1
,0(~)( ,12
2^
NT
drrw
w
drrw
drrwT
)(
1)1(
)(
)()( ,1
2
221
2
^
,0()( ,1^
NT
17
(b) Measurement errors
(c) Dynamic sample selection models
uxy ititit ititit xz
)( itititit uzy )()()( ,,,, jtiitjtiitjtiitjtiit uuzzyy
duxy
uufxyyf
ufxyyf
y
yyy
uxyy
ti
titi
tiitittiit
itittiit
it
ititit
itittiit
1,
1,*
2,
1,1,
*1,
*
**
*1,
*
)|(),0|(
)(),|(
0 if 0
0 if
18
Methodology Challenges
Panel data also raises the issue of how best to model unobserved heterogeneity
Standard statistical procedures are developed based on the assumption that y conditional on x is randomly distributed with a common mean
2
~
~~
'
~
~~~
'
~~
)|(
)|(
..
) ; |(
xyVar
xxyE
uxyge
xyf
19
Panel data, by its nature, focus on individual outcomes. Factors affecting individual outcomes are too numerous.
One way to restore homogeneity is to add additional conditional variables, say, , ,… so .
However
(a) A model is a simplification of reality, not a mimic of reality. Multicollinearity, shortages of degree of freedom, etc. may confuse the fundamental relationship between and .
(b) , ,… may not be observable.
)|(~xyf ititit
z~w~
,...),,|(~~~wzxyf itititit
y x~
z~w~
20
Another way is to let the parameters characterizing the conditional density of given to vary across i and/or over t,
.
Meaningful inference on can be made only if we assume certain structure on .
y x~
) ; |(~~
ititit xyf
~ it
~ it
~ it
21
Let
- structural parameters
- incidental parameters (increase with N)
- individual-specific effects represent the effects of those variables that vary across individuals but stay constant over time, at least in the short-time span, e.g. ability, socio-economic background variables, marginal utility of initial wealth, etc.
- fixed constant, Fixed Effects Model (FE)
- random variable, Random Effects Model (RE)
),(~~
'
~
' iiit
~ i
i
22
23
24
Concluding Remarks
The power of panel data to isolate the effects of specific actions, treatments or more general policies depends on the compatibility of the assumptions of statistical tools with the data generating process
Factors to consider:
(1) Advantages
(2) Limitations
(3) Compatibility between assumptions and data generating process
(4) Efficiency