publ0055:introductiontoquantitativemethods lecture2 ... · insured health age female non_white...

PUBL0055: Introduction to Quantitative Methods

Lecture 2: Causality

Jack Blumenau and Benjamin Lauderdale

1 / 49

Motivation

Does health insurance improve health outcomes?A critical issue in public policy is whether and how governments shouldprovide health care to their citizens. But is there a causal effect ofhealth insurance on actual levels of health? We will evaluate thisquestion, and use it as an example to illustrate strengths andweaknesses of experimental and observational research.

• Y (Dependent variable): Health

• What is the self-assessed health of an individual?

• X (Independent variable, or ”treatment”): Health insurance

• Does the individual have health insurance, or not?

2 / 49

Lecture Outline

Causality and counterfactuals

Difference in means and confounding

Randomized experiments and causality

Observational studies and causality

Conclusions

3 / 49

Causality and counterfactuals

Why causality?

• Cause-and-effect relationships are at the heart of some of the mostinteresting social science theories

• We can often express important research topics as a simplequestion: Does X cause Y?

• Does increasing the minimum wage decrease employment?• Does economic development lead to democracy?• Does the race of a politician affect their chances of being elected?

• Establishing causal effects using quantitative data is difficult!

• Today we focus on defining what causal effects are, and how wemight think about measuring them using quantitative data

4 / 49

Counterfactuals

Whenever a person makes a causal statement, they are contrasting whatthey observe (something factual) with what they believe they wouldobserve if a key condition was different (something counterfactual).

• “Austerity caused Brexit” (Fetzer, 2019)

• Observation: Austerity policy in the UK and a vote to leave the EU• Belief: Without austerity, smaller Leave vote

• “Oil wealth inhibits democracy” (Ross, 2013)

• Observation: Oil-rich countries tend to be less democratic• Belief: If countries had less oil they would be more democratic

Problem: We can’t observe counterfactual outcomes! Causal inferencerequires estimating counterfactuals for comparison to realised outcomes.

5 / 49

https://warwick.ac.uk/fac/soc/economics/research/centres/cage/manage/publications/381-2018_fetzer.pdf

https://press.princeton.edu/titles/9686.html

Potential outcomes and causal effects

We can define the causal effect of an independent variable𝑋 on anoutcome variable 𝑌 by considering the potential outcomes of 𝑌Potential outcomesThe potential outcomes of 𝑌 are the values of 𝑌 that would be realisedfor different values of𝑋. e.g.

• 𝑌𝑖(1) = the value that 𝑌𝑖 would have been if𝑋𝑖 was equal to 1

• 𝑌𝑖(0) = the value that 𝑌𝑖 would have been if𝑋𝑖 was equal to 0

Treatment effectFor any given individual, if we could observe both potential outcomes,the treatment effect of X on Y for that individual can be calculated as

𝑌𝑖(1) − 𝑌𝑖(0)

6 / 49

Potential outcomes and causal effects (example)

What are the potential outcomes in our health insurance example?

• 𝑌𝑖(1) = Health of individual 𝑖 if they have health insurance• 𝑌𝑖(0) = Health of individual 𝑖 if they do not have health insurance

What are the treatment effects?

• If 𝑌𝑖(1) > 𝑌𝑖(0) then insurance improves health• If 𝑌𝑖(1) < 𝑌𝑖(0) then insurance worsens health• If 𝑌𝑖(1) = 𝑌𝑖(0) then insurance has no effect on health

7 / 49


• 𝑋𝑖 = 1 if the individual is insured, and𝑋𝑖 = 0 if uninsured• 𝑌𝑖(1) is the health of the individual if they were insured• 𝑌𝑖(0) is the health of the individual if they were uninsured• The treatment effect for an individual is 𝑌𝑖(1) − 𝑌𝑖(0)

Individual 𝑋𝑖 𝑌𝑖(1) 𝑌𝑖(0) Treatment effect

1 1

5 3 2

2 1

5 4 1

3 0

3 3 0

4 0

4 3 1

Average treatment effect (ATE)= 2+1+0+14 = 4

4 = 1

8 / 49




1 1 5

3 2

2 1

5 4 1

3 0

3 3 0

4 0

4 3 1


4 = 1

8 / 49




1 1 5 3

2

2 1

5 4 1

3 0

3 3 0

4 0

4 3 1


4 = 1

8 / 49




1 1 5 3 22 1

5 4 1

3 0

3 3 0

4 0

4 3 1


4 = 1

8 / 49




1 1 5 3 22 1 5

4 1

3 0

3 3 0

4 0

4 3 1


4 = 1

8 / 49




1 1 5 3 22 1 5 4

1

3 0

3 3 0

4 0

4 3 1


4 = 1

8 / 49




1 1 5 3 22 1 5 4 13 0

3 3 0

4 0

4 3 1


4 = 1

8 / 49




1 1 5 3 22 1 5 4 13 0 3

3 0

4 0

4 3 1


4 = 1

8 / 49




1 1 5 3 22 1 5 4 13 0 3 3

0

4 0

4 3 1


4 = 1

8 / 49




1 1 5 3 22 1 5 4 13 0 3 3 04 0

4 3 1


4 = 1

8 / 49




1 1 5 3 22 1 5 4 13 0 3 3 04 0 4

3 1


4 = 1

8 / 49




1 1 5 3 22 1 5 4 13 0 3 3 04 0 4 3

1


4 = 1

8 / 49




1 1 5 3 22 1 5 4 13 0 3 3 04 0 4 3 1


4 = 1

8 / 49

The Fundamental Problem of Causal Inference

But we cannot observe both potential outcomes for any individual!

Individual 𝑋𝑖 𝑌𝑖(1) 𝑌𝑖(0) Causal effect

1 1

5 ? ?

2 1

5 ? ?

3 0

? 3 ?

4 0

? 3 ?

Average treatment effect (ATE)= ?+?+?+?4 = ?

4 = ?

9 / 49




1 1 5

? ?

2 1

5 ? ?

3 0

? 3 ?

4 0

? 3 ?


4 = ?

9 / 49




1 1 5 ?

?

2 1

5 ? ?

3 0

? 3 ?

4 0

? 3 ?


4 = ?

9 / 49




1 1 5 ? ?2 1

5 ? ?

3 0

? 3 ?

4 0

? 3 ?


4 = ?

9 / 49




1 1 5 ? ?2 1 5

? ?

3 0

? 3 ?

4 0

? 3 ?


4 = ?

9 / 49




1 1 5 ? ?2 1 5 ?

?

3 0

? 3 ?

4 0

? 3 ?


4 = ?

9 / 49




1 1 5 ? ?2 1 5 ? ?3 0

? 3 ?

4 0

? 3 ?


4 = ?

9 / 49




1 1 5 ? ?2 1 5 ? ?3 0 ?

3 ?

4 0

? 3 ?


4 = ?

9 / 49




1 1 5 ? ?2 1 5 ? ?3 0 ? 3

?

4 0

? 3 ?


4 = ?

9 / 49




1 1 5 ? ?2 1 5 ? ?3 0 ? 3 ?4 0

? 3 ?


4 = ?

9 / 49




1 1 5 ? ?2 1 5 ? ?3 0 ? 3 ?4 0 ?

3 ?


4 = ?

9 / 49




1 1 5 ? ?2 1 5 ? ?3 0 ? 3 ?4 0 ? 3

?


4 = ?

9 / 49




1 1 5 ? ?2 1 5 ? ?3 0 ? 3 ?4 0 ? 3 ?


4 = ?

9 / 49


The Fundamental Problem of Causal InfereceWe only ever observe one potential outcome for a given individual, andour observed outcome depends on the status of our explanatoryvariable. We can never directly observe individual causal effects.

10 / 49

11 / 49

Difference in means and confounding

Quantity of Interest

We want to estimate the average treatment effect (ATE):

1𝑁

𝑁∑𝑖=1

{𝑌𝑖(1) − 𝑌𝑖(0)}

But we can’t observe 𝑌𝑖(1) and 𝑌𝑖(0) for any given unit!One alternative is to use the difference-in-means to estimate the ATE:

𝑌𝑋=1 − 𝑌𝑋=0

where 𝑌𝑋=1 and 𝑌𝑋=0 are the average observed outcomes for treatedand control units, respectively.

Question: Is the difference-in-means equal to the ATE?

12 / 49

Using the difference-in-means to estimate the ATE

Question: Is the difference-in-means likely to equal the ATE?


1 1 5 3 22 1 5 4 13 0 3 3 04 0 4 3 1


4 = 1Difference-in-means= 5+5

2 − 3+32 = 5 − 3 = 2

No! The difference-in-means is larger than the ATE. Why?

13 / 49




1 1 5 3 22 1 5 4 13 0 3 3 04 0 4 3 1


4 = 1

Difference-in-means= 5+52 − 3+3

2 = 5 − 3 = 2No! The difference-in-means is larger than the ATE. Why?

13 / 49




1 1 5 3 22 1 5 4 13 0 3 3 04 0 4 3 1



2 − 3+32 = 5 − 3 = 2


13 / 49




1 1 5 3 22 1 5 4 13 0 3 3 04 0 4 3 1



2 − 3+32 = 5 − 3 = 2

No! The difference-in-means is larger than the ATE.

Why?

13 / 49




1 1 5 3 22 1 5 4 13 0 3 3 04 0 4 3 1



2 − 3+32 = 5 − 3 = 2


13 / 49

Confounding

Confounding differences between treatment and control groups meanthat the difference-in-means can give a biased estimate of the ATE.

1. Causal effect of job training on employment

• Finding: People with training are more likely to be employed• Confounding: People with training may be more motivated• Direction of bias: Positive, we overstate the effectiveness of training

2. Causal effect of aspirin on headache symptoms

• Finding: People who take aspirin report worse headache symptoms• Confounding: People who took aspirin had headaches to begin with!• Direction of bias: Negative, we understate the effectiveness of aspirin

→ confounding bias can be positive or negative!

Question: Is confounding likely in the health care example?

14 / 49

Confounding bias example

Does health insurance improve health outcomes?The National Health Interview Survey (NHIS) is an annual survey of theUS population that asks questions about health and health insurance.We are interested in two questions in particular:

• Y (Dependent variable): health

• “Would you say your health in general is excelent (5), very good (4),good (3), fair (2), or poor (1)?”

• X (Independent variable): insured

• “Do you have health insurance?” TRUE = Insured, FALSE = Notinsured

We will also use information from some of the other questions on thesurvey (gender, income, race, etc).

15 / 49

Confounding bias example

Here are the first six rows of the NHIS data:

Table 4: NHIS data

id insured health age female years_educ non_white income

1 FALSE 4 29 TRUE 14 FALSE 19282.932 FALSE 4 35 FALSE 11 FALSE 19282.933 TRUE 3 32 FALSE 12 FALSE 167844.534 TRUE 3 34 TRUE 16 FALSE 167844.535 TRUE 4 45 FALSE 12 FALSE 85985.786 TRUE 4 44 TRUE 12 FALSE 85985.78

16 / 49

Difference-in-means

To calculate the difference in means between the two groups we will use:

• the mean() function• the subsetting ([]) operator

## Mean health level for insured individualsmean(nhis$health[nhis$insured == TRUE])

## [1] 3.943847

## Mean health level for non-insured individualsmean(nhis$health[nhis$insured == FALSE])

## [1] 3.617647

Insured individuals report health levels that are approximately 10%better than non-insured individuals.

17 / 49

Difference-in-means

Questions:

1. Can we interpret the difference in means here as causal?2. How might we assess the possibility of confounding bias here?

18 / 49

Checking “balance”

One implication of confounding: treatment and control units will bedifferent with respect to characteristics other than the treatment.

We can check this by evaluating the degree of balance for treated andcontrol units on different pre-treatment variables.

1. Calculate ��𝑇=1 (average value of pre-treatment variable for treated)2. Calculate ��𝑇=0 (average value of pre-treatment variable for control)3. Large differences between ��𝑇=1 and ��𝑇=0 suggest confounding

For example, for age:mean(nhis$age[nhis$insured == TRUE]) -

mean(nhis$age[nhis$insured == FALSE])

## [1] 2.318222

→ The insured are, on average, 2.3 years older than the uninsured

19 / 49

Checking “balance”

insured health age female non_white years_educ income

Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6Insured 4.0 43.3 49.2 15.3 14.3 101315.4Difference 0.3 2.3 2.8 -2.4 2.7 58422.9

Insured individuals are• less likely to be non-white (15.3%) than the non-insured (17.8%)• more likely to be female (49.2%) than the non-insured (46.4%)• older (43) than non-insured (41)• more educated (14.3 years) than the non-insured (12 years)• significantly richer ($101315) than the non-insured ($42893)

Implications:1. Race, gender, age, education and income are all imbalanced withrespect to insurance status.

2. Confounding is very likely a problem in this data.20 / 49

21 / 49

Randomized experiments and causality

Observational studies and randomized experiments

Randomized experiments• Units are randomly assigned todifferent X values

• Researchers directly intervenein the world they study

• Gold standard for causalinference

Observational studies• Units are assigned to X values“by nature”

• Researchers observe, but donot intervene in, the world

• Very commonly used in socialscience research

22 / 49

Randomization and selection bias

We saw previously that, in general, the difference-in-means will not be anunbiased estimate of the ATE because of confounding.

Question: How can we avoid this problem?

Answer: Randomize!

Key intuitionRandomization of treatment doesn’t eliminate differences betweenindividuals, but ensures that the mix of individuals being compared isthe same on average.

23 / 49


We saw previously that, in general, the difference-in-means will not be anunbiased estimate of the ATE because of confounding.

Question: How can we avoid this problem? Answer: Randomize!

Key intuitionRandomization of treatment doesn’t eliminate differences betweenindividuals, but ensures that the mix of individuals being compared isthe same on average.

23 / 49


When the treatment,𝑋, is randomly assigned to units…

• … treated and untreated groups will be similar, on average, in termsof all characteristics (both observed and unobserved)

• … the only systematic difference between the two will be the receiptof treatment

• … the average outcome for controls will be similar to what wouldhave happened for the treatment group if they had not been treated

→ we can use the difference in means to estimate the ATE!

24 / 49

Randomization and selection bias (visualization)

0 5 10 15 20

010

2030

40

Potential Outcome (Control)

Cou

nt

0 5 10 15 20

05

1015

2025

3035

All units

Potential Outcome (Treatment)

Cou

nt

4.0 4.5 5.0 5.5 6.0 6.5

050

100

150

Difference in means

Cou

nt

True ATE

25 / 49


0 5 10 15 20

010

2030

40


Cou

nt

0 5 10 15 20

05

1015

2025

3035

1 sample


Cou

nt

4.0 4.5 5.0 5.5 6.0 6.5

050

100

150

Difference in means

Cou

nt

True ATE

25 / 49


0 5 10 15 20

010

2030

40


Cou

nt

0 5 10 15 20

05

1015

2025

3035

2 samples


Cou

nt

4.0 4.5 5.0 5.5 6.0 6.5

050

100

150

Difference in means

Cou

nt

True ATE

25 / 49


0 5 10 15 20

010

2030

40


Cou

nt

0 5 10 15 20

05

1015

2025

3035

5 samples


Cou

nt

4.0 4.5 5.0 5.5 6.0 6.5

050

100

150

Difference in means

Cou

nt

True ATE

25 / 49


0 5 10 15 20

010

2030

40


Cou

nt

0 5 10 15 20

05

1015

2025

3035

100 samples


Cou

nt

4.0 4.5 5.0 5.5 6.0 6.5

050

100

150

Difference in means

Cou

nt

True ATE

25 / 49


0 5 10 15 20

010

2030

40


Cou

nt

0 5 10 15 20

05

1015

2025

3035

1000 samples


Cou

nt

4.0 4.5 5.0 5.5 6.0 6.5

050

100

150

Difference in means

Cou

nt

True ATE

25 / 49


0 5 10 15 20

010

2030

40


Cou

nt

0 5 10 15 20

05

1015

2025

3035

6000 samples


Cou

nt

4.0 4.5 5.0 5.5 6.0 6.5

050

100

150

Difference in means

Cou

nt

True ATE

25 / 49


• Randomization of the treatment makes the difference in groupmeans an unbiased estimator of the true ATE

• On average, you can expect a randomized experiment to get theright answer

• This does not guarantee that the answer you get from any particularrandomization will be exactly correct!

26 / 49

Example – Experimental data

Does health insurance improve health outcomes?The RAND Health Insurance Experiment (RAND) was an experimentconducted between 1974 and 1982 in the US. In this experiment,researchers randomly allocated individuals to receive health insurance.

• Y (Dependent variable): health

• “Would you say your health in general is excelent (5), very good (4),good (3), fair (2), or poor (1)?”

• X (Independent variable): insured

• Was the participant randomly allocated to receive health insurance?TRUE = Insured, FALSE = Not insured

We will again use information from some of the other questions on thesurvey (gender, income, race, etc).

27 / 49

Experimental data

Here are the first six rows of the RAND data:

Table 6: RAND data

id insured health age female years_educ non_white income

1 FALSE 4 42 FALSE 12 TRUE 67486.4842 FALSE 4 43 TRUE 12 TRUE 67486.4843 TRUE 2 38 TRUE 12 TRUE 27608.1074 TRUE 3 24 FALSE 12 TRUE 4322.2035 TRUE 4 60 TRUE 9 TRUE 24540.5416 TRUE 4 59 FALSE 4 TRUE 24540.541

28 / 49

Difference-in-means - Experimental data

Let’s calculate the difference in means again using the experimental data:

## Mean health level for insured individualsmean(rand$health[rand$insured == TRUE])

## [1] 3.405373

## Mean health level for non-insured individualsmean(rand$health[rand$insured == FALSE])

## [1] 3.423304

The difference between insured and non-insured individuals almostcompletely disappears in the experimental data!

29 / 49

Results - Experimental data

insured health age female non_white years_educ income

Uninsured 3.4 33.1 56.0 16.9 12.1 32597.2Insured 3.4 33.2 53.4 17.1 12.0 31220.9Difference 0.0 0.1 -2.6 0.2 -0.1 -1376.3

Insured and non-insured individuals• are equally likely to be non-white• are equally likely to be female• have similar ages• have similar years of education• have similar income levels.• and also report very similar health outcomes!

Implications:1. There is less evidence of imbalance in the experimental data2. The experimental effect of insurance is much smaller

30 / 49

Observational and Experimental Results Compared

Health

Age

Female

Non−White

Education

Income

−60 −40 −20 0% difference between non−insured and insured

ExperimentalObservational

31 / 49

Why not always use experiments?

If randomized experiments are the “gold standard” for causal inference,why not always use them?

1. Practical concerns

• It is often infeasible to randomly vary the “treatment” you care about• e.g. Could you randomize the type of electoral system in a country?

2. Ethical concerns

• Experiments involving real people can be ethically dubious• e.g. Manipulating the type of messages people see on Facebook

3. Cost concerns

• Experiments can be expensive in both money and time• Particularly relevant to planning dissertation projects

32 / 49

https://journals.sagepub.com/doi/full/10.1177/1747016115599568

33 / 49

Observational studies and causality

Observational research designs

Researchers using observational data cannot rely on randomization tosolve the selection bias problem so they must employ alternative designs:

1. Cross-sectional designs

• Compare outcomes for treated and control units, possibly adjustingfor pre-treatment characteristics

2. Before and after designs

• Compare outcomes before and after treatment for treated unitsusing over-time data

3. Difference-in-differences designs

• Compare outcomes before and after treatment for treated units, andcompare to the same change over time for control units

34 / 49

Cross-sectional designs

• Data: One observation per unit (i.e. individuals, countries, firms, etc),many units

• Approach: Compare average outcomes between treatment andcontrol units, often “controlling” for potential confounders

• Assumption: No unmeasured confounding between treated andcontrol units

35 / 49

Statistical control (example)

Our observational difference-in-means estimate is relatively large:

mean(nhis$health[nhis$insured == T]) -mean(nhis$health[nhis$insured == F])

## [1] 0.3262003

But insurance status is imbalanced with respect to income:

mean(nhis$income[nhis$insured == T]) -mean(nhis$income[nhis$insured == F])

## [1] 58422.87

How can we control for income?

36 / 49

Subclassification (example)

Subclassification: calculate differences between treatment and controlwithin levels of a confounding variable.

Imagine that we have just 3 levels of income (low, mid and high).

Caluclate the average health of insured and uninsured for each level:

insured_low_mean <- mean(nhis$health[nhis$insured == T &nhis$income_cat == "Low"])

uninsured_low_mean <- mean(nhis$health[nhis$insured == F &nhis$income_cat == "Low"])

insured_low_mean - uninsured_low_mean

## [1] 0.1010293

37 / 49

Subclassification




insured_mid_mean <- mean(nhis$health[nhis$insured == T &nhis$income_cat == "Mid"])

uninsured_mid_mean <- mean(nhis$health[nhis$insured == F &nhis$income_cat == "Mid"])

insured_mid_mean - uninsured_mid_mean

## [1] 0.04954519

37 / 49

Subclassification




insured_high_mean <- mean(nhis$health[nhis$insured == T &nhis$income_cat == "High"])

uninsured_high_mean <- mean(nhis$health[nhis$insured == F &nhis$income_cat == "High"])

insured_high_mean - uninsured_high_mean

## [1] 0.1542666

37 / 49

Subclassification

insured_low_mean - uninsured_low_mean

## [1] 0.1010293

insured_mid_mean - uninsured_mid_mean

## [1] 0.04954519

insured_high_mean - uninsured_high_mean

## [1] 0.1542666

Consequence: Once we control for income, the effects of insurance onhealth are much smaller than in the naive difference-in-meanscomparison.

38 / 49

Cross-sectional designs summary

1. Cross-sectional data (one observation per unit) is very common inthe social sciences

2. Using statistical “control” strategies is a common approach forattempting to overcome confounding bias

3. Strong assumptions required: we must control for all potentialconfounders to be confident in the causal claims we are making

4. We will revisit these designs in weeks 4, 5 and 6, particularly in thecontext of regression models

39 / 49

Before and after designs

• Data: Many observations of treated units over time

• Approach: Difference in average outcomes before and aftertreatment

• Assumption: No time varying confounding/selection bias

• Example: A health care reform that provides all citizens withcoverage at a given point in time.

40 / 49


0.0

0.5

1.0

1.5

2.0

2.5

Ave

rage

Hea

lth

Pre−treatment Reform Post−treatment

41 / 49


0.0

0.5

1.0

1.5

2.0

2.5

Ave

rage

Hea

lth


Treatment effect?

41 / 49


Advantages:

• Holds unit-specific features “constant” by comparing outcomeswithin units over time

• Only requires data on treated units

Disadvantages:

• Requires assuming that nothing else changed for treated units

• E.g. Maybe health outcomes are just getting better over time?

• Requires data from multiple time points

42 / 49

Difference-in-differences designs

• Data: Many observations of many units, some of which are treatedat some point in time

• Approach: Difference in average outcomes before and aftertreatment for treated group, compared to same difference forcontrol group

• Assumption: Without treatment, treatment group outcomes wouldhave mirrored those in control group (“parallel trends”)

• Example: A health care reform that provides citizens of some stateswith health care coverage at a given point in time. Healthcare ofcitizens in non-treated states is not affected.

43 / 49


0.0

0.5

1.0

1.5

2.0

2.5

Ave

rage

Hea

lth


Treated states

44 / 49


0.0

0.5

1.0

1.5

2.0

2.5

Ave

rage

Hea

lth


Treated states

Control states

44 / 49


0.0

0.5

1.0

1.5

2.0

2.5

Ave

rage

Hea

lth


Treated states

Control states

Counterfactual

44 / 49


0.0

0.5

1.0

1.5

2.0

2.5

Ave

rage

Hea

lth


Treatment effect

44 / 49

Difference in differences designs

Advantages:

• Holds unit-specific features “constant” by comparing outcomeswithin units over time

• Removes confounding by shocks that are common to all units(treatment and control)

• Does not assume that outcomes remain constant over time

Disadvantages:

• Requires assuming that nothing else changed only for treated units

• E.g. Did other policies change in treatment states at the same time?

• Requires data from multiple time points and multiple types of units

45 / 49

Observational designs and assumptions

Each of these approaches to using observational data requireassumptions.

• Cross-sectional: No confounding

• Treatment and control groups should be similar in observed andunobserved characteristics

• Before/after: No time-varying confounding

• Without treatment, treatment units would not have changedbetween pre- and post-treatment periods

• Difference-in-differences: Parallel trends

• Without treatment, treatment units would have followed a similartrajectory as control units

46 / 49

External and internal validity

Internal validity: Can we interpret estimates in a study as causal?

• Randomized experiments have high internal validity→ treatmentand control groups are similar on average

• Observational studies have lower internal validity→ pre-treatmentvariables may differ between treatment and control

External validity: How generalizable are the conclusions of a study?

• Randomized experiments tend to have lower external validity→difficult to conduct on representative samples

• Observational studies tend to have higher external validity→easier to use representative samples or the population itself

Picking a research design requires trading off between these goals.

47 / 49

Conclusions

What have we covered?

• Causality can be understood as a process of counterfactualreasoning

• We can understand causal effects as being the differences inpotential outcomes for units with different treatment statuses

• Randomized experiments are the “gold standard” for causalinference

• Observational approaches can also be used for making causalclaims, but they require strong assumptions

• Causal inference is hard. But at least now you know that it is hard.

48 / 49

Seminar

In seminars this week, you will learn about …

1. … working directories

2. … loading data into R

3. … calculating causal effects

4. … thinking critically about causality in empirical research

49 / 49

publ0055:introductiontoquantitativemethods lecture2 ... · insured health age female non_white...

Documents