econometrics review #1

35
Econometrics Review for ECON215 Seonghoon Kim (courtesy of Bryant Kim at Cornell) School of Economics Singapore Management University August, 2015 1 / 35

Upload: wunderrkind

Post on 10-Dec-2015

17 views

Category:

Documents


1 download

DESCRIPTION

Econometrics

TRANSCRIPT

Page 1: Econometrics Review #1

Econometrics Review for ECON215

Seonghoon Kim(courtesy of Bryant Kim at Cornell)

School of EconomicsSingapore Management University

August, 2015

1 / 35

Page 2: Econometrics Review #1

Introduction

I What is econometrics?I A set of statistical techniques that allow us to examine

empirical relationships between variables.

I “Empirical”: Based on dataI “Relationship”:

I Causal: Assess consequence that changing the value of onevariable (X) has on the value of another variable (Y)

I Association: Assess the extent to which two variables movetogether in the data. This affects the predictability that onevariable has on another and vice versa.

I Key idea: Throughout the course, we will constantly beasking whether the relationship being examined is causal or anassociation.

I QUESTION: Why is this important for public policy?

2 / 35

Page 3: Econometrics Review #1

Introduction

Example - A study published in Nature (Rauscher, Shaw and Ky,1993) suggested that listening to Mozart for 10-15 minutes couldtemporarily raise your IQ by 8 or 9 points. In fact, shortly after thestudy was published and reported in the popular press, the U.S.state of Georgia began handing out classical-music CDs to theparents of all infants, and there were similar but less officialprograms in Colorado, Florida and elsewhere.

QUESTION: Would this be a good or a bad policy to increasethe intelligence of children?

3 / 35

Page 4: Econometrics Review #1

Correlation vs. Causality

4 / 35

Page 5: Econometrics Review #1

Introduction

There are two types of causal relationships commonly assessed inthe policy world:

I Effect of one variable on another

I How much does maternal nutrition affect infant birth weight?

I How much do political campaign expenditures affect votingoutcomes?

I Does increasing the minimum wage increase unemployment?I 1% increase in state-level minimum wages reduces

employment of young blacks and Hispanics by 0.5% to 0.6%.(Neumark and Wascher, 2007)

I Other?

5 / 35

Page 6: Econometrics Review #1

Introduction

I Effect of a social program

I Does giving deworming drugs to children in Kenya improvetheir health status? Their school attendance?

I ↑ school absenteeism by 25% but no evidence on test score.(Miguel and Kremer, 2004)

I Does an extended school day program affect studentoutcomes? Test scores? Drop-out rates? Other measures ofachievement? By how much? Does this vary across differentstudent populations?

I What is the effect of a housing subsidy program onemployment?

I Among working-age, able-bodied adults, housing voucher usereduces quarterly labor force participation rates by4percentage points (Jacob and Ludwig, 2008)

I Other?

6 / 35

Page 7: Econometrics Review #1

Causality

Causality: A specific action leads to a specific, measurableconsequence

I We are often interested in assessing the magnitude of thecausal effect that a certain factor (X) has on an outcome(Y)

I Y may be a function of many factors other than X

I We want to establish the causal link between X and Y. Wewant to measure any changes in Y that are directlyattributable to X, not to other factors.

I Changes in X may be ‘linked with’ (correlated with) changes inY, but this alone is not sufficient to establish a causal link.

I Examples:

I Do car seatbelts (X) save lives (Y)?

I Do industrial emissions (X) cause the temperature of theplanet (Y) to rise?

I Do smaller classes (X) increase student learning (Y)?7 / 35

Page 8: Econometrics Review #1

Causality

I A key concept needed to answer these causal questions is thecounterfactual

I What would have happened otherwise

I Compare outcome Y when X occurs and when X does notoccur

QUESTION: What is the counterfactual for each of the 3examples above?

I Ideally, we want to compare the “state of the world” at agiven point in time with the “alternate state of the world”or counterfactual at that same point in time.

QUESTION: Conceptually, how can we use the concept of thecounterfactual to assess the causal effect of being in a smallclass on the test score of a given individual?

8 / 35

Page 9: Econometrics Review #1

Causality

I The difficulty of inferring causality is that we do not observethe counterfactual

I We do not observe what would have happened to seatbeltwearers, if they had not been wearing seatbelts.

I We do not observe what the temperature of the planet wouldhave been at the current time, had there been a different levelof industrial emissions.

I We do not observe how students in large classes would havefared at the same point in time had they been in small classes.

9 / 35

Page 10: Econometrics Review #1

Causality

I While the counterfactual cannot be directly observed, the goalof empirical analysis aimed at uncovering causal relationshipsis to mimic the counterfactual using data and statisticaltechniques.

I Ideal situation: randomized controlled experiment

I More common situation: observational data analyzed witheconometric techniques

10 / 35

Page 11: Econometrics Review #1

Validity

Validity represents a set of criteria by which the credibility ofresearch may be judged.

I A key goal of any research study should be to achieve highvalidity

I Our focus: internal validity and external validity

11 / 35

Page 12: Econometrics Review #1

Validity

Internal Validity

A study has strong internal validity if it estimates the causaleffect of interest for the population represented by our sample.

I To what extent does the evidence presented support a causallink between X and Y, for this population?

I Key question: Is there any factor other than X that could bepartly responsible for the observed association between X andY?

I Generally not a problem in a properly conducted randomizedexperiment

I Questions more likely to arise when using observational dataI Linked to assumptions behind regression models (more on this

later in the course)

12 / 35

Page 13: Econometrics Review #1

Validity

External Validity

A study has strong external validity if its findings can begeneralized to other settings (i.e. people, time periods, locations,age groups, etc.)

I Do the conclusions hold for other geographic locations,socioeconomic conditions, time periods?

I Do other empirical studies investigating the same (or similar)research questions yield the same(or similar) results?

I Tools to improve external validity:I Selection of the sampleI Replications of a given evaluation in other times/locations

I Internal and external validity in the context of empirical studyI Random AssignmentI Random Sampling

13 / 35

Page 14: Econometrics Review #1

Randomized Experiments

I Conceptually, this is the ideal method to estimate the causaleffect of a “treatment”(program, intervention, etc.) - Goldstandard

I Also known as random assignment studies, social experiments,randomized controlled trials, randomized trials, etc.

I Randomized experiment compares two groups that are alikeexcept for “treatment”

I Participants volunteer for the experiment. This constitutes thesample.

I The sample is randomly divided into “treatment” and“control” groups XCoin flip, random number generator(assigned by a computer), etc.X“Treatment” group is offered the “treatment”X“Control” group does not get the “treatment”, and may geta placebo.XControl group is meant to mimic the counterfactual

14 / 35

Page 15: Econometrics Review #1

Randomized Experiments

I Isolate the causal effect of one factor on outcomes

F Only difference between two groups is the “treatment”.F Any difference in outcomes must be due to the “treatment”.

I Randomized Experiments in the Social Sciences

F About 11,000 known experiments in social sciencesF Contrast with over 250,000 in medicineF Wide variety of areas: Poverty, Labor, Health, Education,

Crime, etc.F Examples of randomized experiments:

XEffect of a deworming medication on health outcomes -KenyaXEffect of cost of health insurance (premiums, co-pays) onhealth outcomes - RANDXEffect of class size on student outcomes - Tennessee STAR

15 / 35

Page 16: Econometrics Review #1

Randomized Experiments

I Randomized ExperimentI Methodologically, they are the ideal way to mimic the

counterfactual and draw causal inferenceI This does not mean that all policy-relevant questions can be

addressed with experiments or that experiments always yieldvalid answers to causal questions

I Today we will examine a policy-relevant causal question usingboth experimental and non-experimental (i.e.observational)methods.

QUESTION: Is random assignment (to treatment vs.control group) the same as random sampling?

16 / 35

Page 17: Econometrics Review #1

Experiment Vs. Observation Studies

Tennessee STAR Experiment:

I Project STAR (Student-Teacher Achievement Ratio)I 1980sI Four-year studyI Examined the effect of class size in grades K-3

I Students entering the school system were randomly assignedto one of three groups:

I Small class (13-17 students)I Regular class (22-25 students)I Regular class with teacher’s aide

I Compare outcomes of children in small classes with outcomesof children in regular classes.

I Our focus will be kindergarten

Question: What are the treatment and control groups in thiscase?

17 / 35

Page 18: Econometrics Review #1

Experimental Data

I The great benefit of an experiment to address this question isthe internal validity we gain by randomization.

I Internal validity in this context means that we can be confidentthat any differences in the outcomes between the treatmentand control groups indicate a causal effect of the treatment

I Key: The treatment and control groups are alike except fortheir treatment status

I Did randomization work?I We can review the protocol for the experimentI We can look for evidence to assess whether the protocol was

followedI We can look at the treatment and control groups to see if they

are comparableI This evidence is very important since the control group can

mimic the counterfactual only if the treatment and controlgroups are alike before the treatment.

18 / 35

Page 19: Econometrics Review #1

Experimental Data

Mean Background Characteristics

Treatment Control Difference P-Value+

Free Lunch(%) 47.2 48.5 -1.3 0.325

Male(%) 51.5 51.3 0.2 0.883

Black(%) 31.1 32.5 -1.4 0.302+: Corresponds to a Z test where the null hypothesis is that the proportions in the treatment and control

are the same

Question: Based on the table above, did randomization work?

19 / 35

Page 20: Econometrics Review #1

Experimental Data

I Return to our policy question: What is the effect of smallclass sizes on test scores?

I How do we address this question explicitly?I Estimation: Compare average test scores in small classes with

average test scores in regular classes.I Hypothesis Testing: Test the null hypothesis that average test

scores are the same between small and regular size classesI Confidence Interval: Estimate a confidence interval for the

difference of mean test scores in small vs. regular size classes.

20 / 35

Page 21: Econometrics Review #1

Experimental Data

Estimation

. tab small, sum(tscorek)

small class | Summary of tscorek

in K | Mean Std. Dev. Freq.

------------+------------------------------------

0 | 918.20133 72.214225 4048

1 | 931.94189 76.358633 1738

------------+------------------------------------

Total | 922.32872 73.746597 5786

Question: What is the difference in average test scores betweensmall and regular classes?

21 / 35

Page 22: Econometrics Review #1

Experimental Data

Hypothesis Testing

H0 : µT = µC

HA : µT 6= µC

t =YT − YC

SE (YT − YC )

Where SE (YT − YC )is the standard error of YT − YC

SE (YT − YC ) =

√(σYT

)2

nT+

(σYC)2

nC

t =YT − YC

SE (YT − YC )=

931.94− 918.20√(76.36)2

1738 + (72.21)2

4048

=13.74

2.15= 6.38

22 / 35

Page 23: Econometrics Review #1

Experimental Data

QUESTION: Is this difference in mean test scores statisticallysignificant at the 5% level?

95% Confidence Interval

(Y T − YC )± 1.96SE (Y T − YC )

23 / 35

Page 24: Econometrics Review #1

Experimental Data

Conclusions from analysis of STAR Experiment (Kindergarten Result)

I Being in a small class rather than a regular-sized classI Increases average test score by 13.74 points.I Difference is highly statistically significantI What about significance from a policy perspective?

I Effect size = (effect on Y)/(standard deviation of Y)I Effect size= (13.74/73.74)I Effect size= 0.19 standard deviations of test scoreI Useful for cost-benefit analysis

24 / 35

Page 25: Econometrics Review #1

Observational Data

Observational Data for the Class Size and Student OutcomeQuestion: California Test Score Data Set

I All K-6 and K-8 California school districts (n=420)

I Class size measured by student-teacher ratio (STR) = numberof students in the district dividing by number of full-timeequivalent teachers.

I This measure of class size says nothing about what is going onin individual classrooms.

25 / 35

Page 26: Econometrics Review #1

Observational Data

Look at the data in a scatterplot:

QUESTION: What does this figure show?

26 / 35

Page 27: Econometrics Review #1

Observational Data

How do we address the question of whether students dobetter in small classes, using these data?

I Divide the school districts into those with small classes (STRless than 20) and those with larger classes (STR more than20).

I As before, use estimation, hypothesis testing, and confidenceinterval to ask the same question we did with the experimentaldata

27 / 35

Page 28: Econometrics Review #1

Observational Data

Estimation:

. tab small, sum(testscr)

| Summary of testscr

small | Mean Std. Dev. Freq.

------------+------------------------------------

0 | 649.97885 17.853364 182

1 | 657.35126 19.358012 238

------------+------------------------------------

Total | 654.15655 19.053348 420

QUESTION: What is the difference in average test scoresbetween small and regular classes?

28 / 35

Page 29: Econometrics Review #1

Observational Data

I In test score points:

Hypothesis Testing:

H0 : µsmall = µlarge

HA : µsmall 6= µlarge

t =Ysmall − Ylarge

SE (Ysmall − Ylarge)=

7.4

1.8= 4.05

95% Confidence Interval:

(Y small − Ylarge)± 1.96SE (Y small − Ylarge)

[3.8, 11.0]

QUESTION: Have we answered the key question: What isthe effect of small classes on test scores in California?

29 / 35

Page 30: Econometrics Review #1

Observational Data

Compare characteristics of ”treatment” and ”control” groups:

Small Regular Difference P-Value+

Free Lunch(%) 41.6 48.7 -7.1 0.001

English as a second language(%) 12.5 20.0 -7.5 0.001

Average Income(thousand $) 16.3 14.0 2.3 0.001+: Corresponds to a Z test (t test) where the null hypothesis is that the proportions (means) in the

treatment and control groups are the same

Question: Is the control group here a good estimate of thecounterfactual for the treatment group? Why or whynot?

30 / 35

Page 31: Econometrics Review #1

Observational Data

Conclusions from analysis of non-experimental data:

I In the context of an ideal experiment, the difference in meanoutcomes between treatment and control groups can give us agood estimate of the causal effect of the intervention.

I In the context of an observational study, the difference inmean outcomes across groups does not usually give us a goodestimate of the causal effect because the groups are rarelyalike

I We need another tool since experiments are not alwaysavailable

31 / 35

Page 32: Econometrics Review #1

Observational Data

I Regression analysis can be a powerful tool for assessing causaleffects.

I The simplest form of regression (bivariate regression), whichwe will start with, is not much better for internal validity thanthis difference of means.

I But regression more generally is very powerful to get aroundsome of the problems with internal validity using observationaldata.

32 / 35

Page 33: Econometrics Review #1

Why We Need Econometrics

I Key goal of this course is to develop your ability to assesscritically the quality/credibility of empirical studies on healtheconomics.

I Randomized experiments are a benchmarkI Not always available or possible (expensive, difficult to

administer, ethical issues)I Not always flawless in providing an answer to the important

policy questionI BUT provide good estimate of causal effect when designed and

conducted properly

33 / 35

Page 34: Econometrics Review #1

Why We Need Econometrics

I In the absence of a randomized experiment, what can we do?I Analyze observational data

I Surveys (CPS), administrative recordsI Key distinction: Variable or factor of interest not randomly

assigned

I Use econometric techniques to conduct regression analysisand look for causal effects

34 / 35

Page 35: Econometrics Review #1

Why We Need Econometrics

I Additional Topics in Evaluating Causal RelationshipsI Issues with randomized experimentsI Regression-based techniques to assess causal effects

I Differences-in-differencesI Fixed effectsI Instrumental variablesI Regression discontinuity

35 / 35