Case Studies and RegressionAnalysis
Jason Seawright
August 11, 2010
J. Seawright (PolSci) Essex 2010 August 11, 2010 1 / 44
Examples of Strengths andWeaknesses
J. Seawright (PolSci) Essex 2010 August 11, 2010 2 / 44
Case Selection
J. Seawright (PolSci) Essex 2010 August 11, 2010 3 / 44
Case Selection
1 Study the entire population.
J. Seawright (PolSci) Essex 2010 August 11, 2010 3 / 44
Case Selection
1 Study the entire population.
2 Take a random sample.
J. Seawright (PolSci) Essex 2010 August 11, 2010 3 / 44
Case Selection
1 Study the entire population.
2 Take a random sample.
3 Follow some rule for deliberate case
selection.
J. Seawright (PolSci) Essex 2010 August 11, 2010 3 / 44
Mill’s Methods
J. Seawright (PolSci) Essex 2010 August 11, 2010 4 / 44
Mill’s Methods
Method of Agreement
J. Seawright (PolSci) Essex 2010 August 11, 2010 4 / 44
Mill’s Methods
Method of Agreement
Method of Difference
J. Seawright (PolSci) Essex 2010 August 11, 2010 4 / 44
Mill’s Methods
Method of Agreement
Method of Difference
Etc..
J. Seawright (PolSci) Essex 2010 August 11, 2010 4 / 44
Crucial Cases
J. Seawright (PolSci) Essex 2010 August 11, 2010 5 / 44
Crucial Cases
Theory assigns high likelihood to an
outcome in a particular case that, for (all,
or most, or a major) competing theory has
low likelihood.
J. Seawright (PolSci) Essex 2010 August 11, 2010 5 / 44
Regression Analysis
In a survey of 1000 articles in 10 leading
political science journals, 49% used
statistics (Bennett, Barth, and Rutherford
2003).
Presumably, most of them involve somevariant of regression.
J. Seawright (PolSci) Essex 2010 August 11, 2010 6 / 44
Regression Analysis
In a survey of 1000 articles in 10 leading
political science journals, 49% used
statistics (Bennett, Barth, and Rutherford
2003).
Presumably, most of them involve somevariant of regression.
A search in JStor for the word “regression”
finds 10,404 relevant articles.
J. Seawright (PolSci) Essex 2010 August 11, 2010 6 / 44
Regression Analysis
Almost no matter what you work on, you
will have to interact with regression-based
studies.
J. Seawright (PolSci) Essex 2010 August 11, 2010 7 / 44
Choosing Cases
Case-selection rules:
J. Seawright (PolSci) Essex 2010 August 11, 2010 8 / 44
Choosing Cases
Case-selection rules:
Random sampling
J. Seawright (PolSci) Essex 2010 August 11, 2010 8 / 44
Choosing Cases
Case-selection rules:
Random samplingTypical cases
J. Seawright (PolSci) Essex 2010 August 11, 2010 8 / 44
Choosing Cases
Case-selection rules:
Random samplingTypical casesDiverse cases
J. Seawright (PolSci) Essex 2010 August 11, 2010 8 / 44
Choosing Cases
Case-selection rules:
Random samplingTypical casesDiverse casesExtreme cases
J. Seawright (PolSci) Essex 2010 August 11, 2010 8 / 44
Choosing Cases
Case-selection rules:
Random samplingTypical casesDiverse casesExtreme casesDeviant cases
J. Seawright (PolSci) Essex 2010 August 11, 2010 8 / 44
Choosing Cases
Case-selection rules:
Random samplingTypical casesDiverse casesExtreme casesDeviant casesInfluential cases
J. Seawright (PolSci) Essex 2010 August 11, 2010 8 / 44
Choosing Cases
Case-selection rules:
Random samplingTypical casesDiverse casesExtreme casesDeviant casesInfluential casesMost-similar cases
J. Seawright (PolSci) Essex 2010 August 11, 2010 8 / 44
Running Example
J. Seawright (PolSci) Essex 2010 August 11, 2010 9 / 44
Typical Cases
Typicalityi = −abs[yi − E(yi |x1,i , x2,i , . . . , xk ,i)] (1)
J. Seawright (PolSci) Essex 2010 August 11, 2010 10 / 44
Typical Cases
J. Seawright (PolSci) Essex 2010 August 11, 2010 11 / 44
Extreme Cases
Extremityi = |xi − x
s| (2)
J. Seawright (PolSci) Essex 2010 August 11, 2010 12 / 44
Deviant Cases
Deviantnessi = −Typicalityi (3)
J. Seawright (PolSci) Essex 2010 August 11, 2010 13 / 44
Influential Cases
Cook’s distance is a statistical measure of
how much the overall regression result
would change if a given case is deleted.
J. Seawright (PolSci) Essex 2010 August 11, 2010 14 / 44
Influential Cases
Cook’s distance is a statistical measure of
how much the overall regression result
would change if a given case is deleted.
A Cook’s distance score of 1 or more
usually is regarded as representing
substantial influence.
J. Seawright (PolSci) Essex 2010 August 11, 2010 14 / 44
Influential Cases
J. Seawright (PolSci) Essex 2010 August 11, 2010 15 / 44
Most-Similar Cases
Matching techniques are an automated way
of finding most similar cases.
J. Seawright (PolSci) Essex 2010 August 11, 2010 16 / 44
Measurement Error in Y
Y∗i = Yi + δY ,i
J. Seawright (PolSci) Essex 2010 August 11, 2010 17 / 44
Measurement Error in Y
Y∗i = Yi + δY ,i
Random Sampling
J. Seawright (PolSci) Essex 2010 August 11, 2010 17 / 44
Measurement Error in Y
Typical/Deviant Cases:
ei = Yi −Hi ,·Y + δY ,i
J. Seawright (PolSci) Essex 2010 August 11, 2010 18 / 44
Measurement Error in Y
Influential Cases Strategy:
Maximizes the product of the error term and the
weighted average distance of the right-hand-side
variables from their means.
J. Seawright (PolSci) Essex 2010 August 11, 2010 19 / 44
Measurement Error in Y
Extreme Cases:
Y∗i = Yi + δY ,i
J. Seawright (PolSci) Essex 2010 August 11, 2010 20 / 44
Measurement Error in Y
Most-Similar Cases
J. Seawright (PolSci) Essex 2010 August 11, 2010 21 / 44
Measurement Error in Y
Most-Similar Cases
Most-Different Cases
J. Seawright (PolSci) Essex 2010 August 11, 2010 21 / 44
Measurement Error in X
X∗i = Xi + δX ,i
J. Seawright (PolSci) Essex 2010 August 11, 2010 22 / 44
Measurement Error in X
X∗i = Xi + δX ,i
Random Sampling
J. Seawright (PolSci) Essex 2010 August 11, 2010 22 / 44
Measurement Error in X
Typical/Deviant Cases:
ei = Yi − Xi β∗ − δX ,i β
∗
J. Seawright (PolSci) Essex 2010 August 11, 2010 23 / 44
Measurement Error in X
Influential Cases Strategy:
Maximizes the product of the error term and the
weighted average distance of the right-hand-side
variables from their means.
J. Seawright (PolSci) Essex 2010 August 11, 2010 24 / 44
Measurement Error in X
Extreme Cases:
X∗i = Xi + δX ,i
J. Seawright (PolSci) Essex 2010 August 11, 2010 25 / 44
Measurement Error in X
Most-Similar Cases
J. Seawright (PolSci) Essex 2010 August 11, 2010 26 / 44
Measurement Error in X
Most-Similar Cases
Most-Different Cases
J. Seawright (PolSci) Essex 2010 August 11, 2010 26 / 44
Omitted Variables
ei = di + γZi , where Zi = ZI − E(Zi |Xi)
J. Seawright (PolSci) Essex 2010 August 11, 2010 27 / 44
Omitted Variables
ei = di + γZi , where Zi = ZI − E(Zi |Xi)
Random Sampling
J. Seawright (PolSci) Essex 2010 August 11, 2010 27 / 44
Omitted Variables
Typical/Deviant Cases:
ei = di + γZi
J. Seawright (PolSci) Essex 2010 August 11, 2010 28 / 44
Omitted Variables
Influential Cases Strategy:
Maximizes the product of the error term and the
weighted average distance of the right-hand-side
variables from their means.
J. Seawright (PolSci) Essex 2010 August 11, 2010 29 / 44
Omitted Variables
Extreme Cases:
J. Seawright (PolSci) Essex 2010 August 11, 2010 30 / 44
Omitted Variables
Extreme Cases:
For confounders, extreme on X may be a good
strategy.
J. Seawright (PolSci) Essex 2010 August 11, 2010 30 / 44
Omitted Variables
Extreme Cases:
For confounders, extreme on X may be a good
strategy.
Extreme on Y maximizes:
Yi + di + γZi
J. Seawright (PolSci) Essex 2010 August 11, 2010 30 / 44
Omitted Variables
Most-Similar Cases
J. Seawright (PolSci) Essex 2010 August 11, 2010 31 / 44
Omitted Variables
Most-Similar Cases
Most-Different Cases
J. Seawright (PolSci) Essex 2010 August 11, 2010 31 / 44
Pathway Variables
Wi = ν + µXi + ωi
Yi = α + τWi + σi
J. Seawright (PolSci) Essex 2010 August 11, 2010 32 / 44
Pathway Variables
Wi = ν + µXi + ωi
Yi = α + τWi + σiRandom Sampling
J. Seawright (PolSci) Essex 2010 August 11, 2010 32 / 44
Pathway Variables
Typical/Deviant Cases:
ei = τωi + σi
J. Seawright (PolSci) Essex 2010 August 11, 2010 33 / 44
Pathway Variables
Influential Cases Strategy:
Maximizes the product of the error term and the
weighted average distance of the right-hand-side
variables from their means.
J. Seawright (PolSci) Essex 2010 August 11, 2010 34 / 44
Pathway Variables
Extreme Cases:
J. Seawright (PolSci) Essex 2010 August 11, 2010 35 / 44
Pathway Variables
Extreme Cases:
Wi = ν + µXi + ωi
J. Seawright (PolSci) Essex 2010 August 11, 2010 35 / 44
Pathway Variables
Extreme Cases:
Wi = ν + µXi + ωi
Extreme on Y maximizes:
Yi = α + τWi + σi
J. Seawright (PolSci) Essex 2010 August 11, 2010 35 / 44
Pathway Variables
Most-Similar Cases
J. Seawright (PolSci) Essex 2010 August 11, 2010 36 / 44
Pathway Variables
Most-Similar Cases
Most-Different Cases
J. Seawright (PolSci) Essex 2010 August 11, 2010 36 / 44
Summary: Analytic Arguments
Deviant Influential Ext. X Ext. Y
Error in Y Good Mixed Poor Good
Error in X Mixed Mixed Good Poor
Confound Good Mixed Mixed Good
Pathway Good Mixed Good Mixed
J. Seawright (PolSci) Essex 2010 August 11, 2010 37 / 44
Monte Carlo for Case Selection
Simulate case selection for the same problem
10,000 times.
J. Seawright (PolSci) Essex 2010 August 11, 2010 38 / 44
Monte Carlo for Case Selection
Simulate case selection for the same problem
10,000 times.
Analysis of presidential vote shares and the
economy in Latin America, 1980-2000.
J. Seawright (PolSci) Essex 2010 August 11, 2010 38 / 44
Monte Carlo for Case Selection
Simulate case selection for the same problem
10,000 times.
Analysis of presidential vote shares and the
economy in Latin America, 1980-2000.
Add measurement error, omitted variables,
etc.
J. Seawright (PolSci) Essex 2010 August 11, 2010 38 / 44
Monte Carlo for Case Selection
Simulate case selection for the same problem
10,000 times.
Analysis of presidential vote shares and the
economy in Latin America, 1980-2000.
Add measurement error, omitted variables,
etc.
2 SD Rule
J. Seawright (PolSci) Essex 2010 August 11, 2010 38 / 44
Simulation Results
Random Typical Influential Deviant Extreme Y Extreme X Similar Different Contrast
0.00
0.05
0.10
0.15
0.20
Figure: Case Selection for Finding Confounder.
J. Seawright (PolSci) Essex 2010 August 11, 2010 39 / 44
Simulation Results
Random Typical Influential Deviant Extreme Y Extreme X Similar Different Contrast
0.00
0.05
0.10
0.15
0.20
0.25
0.30
Figure: Case Selection for Other Causes.
J. Seawright (PolSci) Essex 2010 August 11, 2010 40 / 44
Simulation Results
Random Typical Influential Deviant Extreme Y Extreme X Similar Different Contrast
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
Figure: Case Selection for Exploring Mechanisms.
J. Seawright (PolSci) Essex 2010 August 11, 2010 41 / 44
Simulation Results
Random Typical Influential Deviant Extreme Y Extreme X Similar Different Contrast
0.00
0.05
0.10
0.15
0.20
0.25
0.30
Figure: Case Selection for Error in X .
J. Seawright (PolSci) Essex 2010 August 11, 2010 42 / 44
Simulation Results
Random Typical Influential Deviant Extreme Y Extreme X Similar Different Contrast
0.0
0.2
0.4
0.6
0.8
Figure: Case Selection for Error in Y .
J. Seawright (PolSci) Essex 2010 August 11, 2010 43 / 44
Simulation Results
Random Typical Influential Deviant Extreme Y Extreme X Similar Different Contrast
0.0
0.2
0.4
0.6
0.8
1.0
1.2
Figure: Case Selection for Estimating Overall Slope.
J. Seawright (PolSci) Essex 2010 August 11, 2010 44 / 44
Case-selection software in R
J. Seawright (PolSci) Essex 2010 August 11, 2010 45 / 44
Assignment
Implement each case-selection technique for a
data set of interest, or off my website. Be
prepared to discuss what kinds of cases you get,
and whether they seem on first glance to be
useful, tomorrow.
J. Seawright (PolSci) Essex 2010 August 11, 2010 46 / 44