quantitative methods i: regression diagnostics...assumptions and errors leverage and influence...

55
Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Quantitative Methods I: Regression diagnostics Johan A. Elkink University College Dublin 10 December 2014 Johan A. Elkink diagnostics

Upload: others

Post on 09-Nov-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

Quantitative Methods I:Regression diagnostics

Johan A. Elkink

University College Dublin

10 December 2014

Johan A. Elkink diagnostics

Page 2: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

1 Assumptions and errors

2 Leverage and influence

3 Multicollinearity

4 Heteroscedasticity

Johan A. Elkink diagnostics

Page 3: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

Outline

1 Assumptions and errors

2 Leverage and influence

3 Multicollinearity

4 Heteroscedasticity

Johan A. Elkink diagnostics

Page 4: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

Assumptions: specification

Linear in parameters (i.e. f (Xβ) = Xβ and E [y] = Xβ)No extraneous variables in XNo omitted independent variablesParameters to be estimated are constantNumber of parameters is less than the number of cases,k < n

Johan A. Elkink diagnostics

Page 5: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

Assumptions: errors

Errors have an expected value of zero, E [ε|X] = 0Errors are normally distributed, ε ∼ N(0, σ2)

Errors have a constant variance, Var(ε|X) = σ2 <∞Errors are not autocorrelated, Cov(εi , εj |X) = 0 ∀ i 6= jErrors and X are uncorrelated, Cov(X, ε) = 0

Johan A. Elkink diagnostics

Page 6: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

Assumptions: regressors

X variesX is of full column rank (note: requires k < n)No measurement error in XNo endogenous variables in X

Johan A. Elkink diagnostics

Page 7: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

Assumptions for unbiasedness

If

the population regression model is linear in its parameters;the sample is a random sample from the population;there is no perfect collinearity, r(X) = k ;and the expected values of the error term is zeroconditional on the explanatory variables, E(ε|X) = 0 andCov(ε,X) = 0,

then the OLS estimators of β are unbiased.

(Glynn, 2007)

Johan A. Elkink diagnostics

Page 8: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

Non-constant error variance

Consequences:

σ2 is a biased estimator of σ2;the probability of a Type I error will not be α;the least squares estimator is no longer the best linearunbiased estimator,

but:

the severity depends on the level of heteroscedasticity;β still an unbiased estimator of β.

(King, 2007)

Johan A. Elkink diagnostics

Page 9: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

Non-normal errors

Consequences:

sampling distribution of β not normal;test statistics will not have t- and F -distributions;the probability of a Type I error will not be α,

but the estimates are still consistent: as n increases, the aboveproblems disappear.

(King, 2007)

Johan A. Elkink diagnostics

Page 10: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

Exercise: film reviews

Open the films.dta data set. Create a new variablehighrating, which is 1 for films rated 3 or higher, 0 otherwise.

Using matrix formulas,

1 regress desclength on a constant2 regress desclength on castsize3 regress desclength on castsize, highrating, length

Johan A. Elkink diagnostics

Page 11: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

Exercise: film reviews

Based on the last regression:

1 Which observation has the largest residual?2 Compute mean and median of residuals3 Compute correlation between residuals and fitted values4 Compute correlation between residuals and length5 All other predictors held constant, what would be the

difference in predicted description length between high andlow rated movies?

Johan A. Elkink diagnostics

Page 12: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

Non-linearity

If there is non-linearity in the variables, but not in theparameters, there is no problem. E.g.

yi = β0 + β1xi + β2x2i + εi

can be estimated with OLS.

If there are other non-linearities, sometimes the equation canbe transformed. E.g.

yi = β0xβ1i εi

log(yi) = log(β0) + β1 log(xi) + log(εi)

y∗i = β∗0 + β1x∗i + ε∗i

Johan A. Elkink diagnostics

Page 13: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

Functional forms for additional non-lineartransformations

log-linear as with the previous example

semi-log has two forms:

yi = β0 + β1 log(xi),

where β1 is ∆y due to %∆x

log(yi) = β0 + β1xi ,

where β1 is %∆y due to ∆x

inverse or reciprocal: yi = β0 + β11xi

polynomial yi = β0 + β1xi + β2x2i

Johan A. Elkink diagnostics

Page 14: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

Outline

1 Assumptions and errors

2 Leverage and influence

3 Multicollinearity

4 Heteroscedasticity

Johan A. Elkink diagnostics

Page 15: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

Leverage

A high leverage point i is one where xi is far from the mean ofX. These points can be identified using the so-called hat matrix,the matrix that puts a hat on y:

y = Xβ = X(X′X)−1X′y = Hy,

the diagonal of which is a measure of leverage.

(King, 2007)

Johan A. Elkink diagnostics

Page 16: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

Outliers

An outlier is a point on the regression line where the residual islarge.

To account for the potential variables in the sampling variancesof the residuals, we calculate externally studentized residuals(or studentized deleted residuals), where a large absolute valueindicates an outlier. A test could be based on the fact that in amodel without outliers, they should follow a t(n− k) distribution.

(Kutner et al., 2005, 390–398)

Johan A. Elkink diagnostics

Page 17: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

Outliers

An outlier is a point on the regression line where the residual islarge.

To account for the potential variables in the sampling variancesof the residuals, we calculate externally studentized residuals(or studentized deleted residuals), where a large absolute valueindicates an outlier. A test could be based on the fact that in amodel without outliers, they should follow a t(n− k) distribution.

(Kutner et al., 2005, 390–398)

Johan A. Elkink diagnostics

Page 18: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

Influence

An influential point is one which has a strong impact on theestimation of β. An influential point is one which has highleverage and is also an outlier.

We typically look at Cook’s Distance to assess the level ofinfluence of each variable.

(King, 2007)

Johan A. Elkink diagnostics

Page 19: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

Leverage and influence

A point with high leverage is located far from the other points. Ahigh leverage point that strongly influences the regression lineis called an influential point.

Johan A. Elkink diagnostics

Page 20: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

Outlier, low leverage, low influence

3 4 5 6 7 8

810

1214

x

y

Johan A. Elkink diagnostics

Page 21: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

High leverage, low influence

5 10 15 20

1015

2025

x

y

Johan A. Elkink diagnostics

Page 22: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

High leverage, high influence

5 10 15 20

810

1214

16

x

y

Johan A. Elkink diagnostics

Page 23: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

Cook’s Distance

Di ≡∑n

j=1(yj − yj(−i))2

ks2 =(βOLS

(−i) − βOLS)′X′X(βOLS

(−i) − βOLS)

ks2

=

(ei

s√

1− hi

)2 hi

k(1− hi)

=t2ik

var(yi)

var(ei)

∼ F (k ,n − k)

The F -test here refers to whether βOLS would be significantlydifferent if observation i were to be removed (H0 : β = β(−i)).

(Cook, 1979, 168)

Johan A. Elkink diagnostics

Page 24: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

Cook’s Distance

Di =t2ik

var(yi)

var(ei)

“t2i is a measure of the degree to which the i th observation can

be considered as an outlier from the assumed model.”

“The ratios var(yi )var(ei )

measure the relative sensitivity of the

estimate, βOLS, to potential outlying values at each data point.”

(Cook, 1977, 16)

Johan A. Elkink diagnostics

Page 25: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

What to do with outliers?

Options:

1 Ignore the problem2 Investigate why the data are outliers — what makes them

unusual?3 Consider respecifying the model, either by tranforming a

variable or by including an additional variable (but bewareof overfitting)

4 Consider a variant of “robust regression” that downweightsoutliers

Johan A. Elkink diagnostics

Page 26: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

Diagnosing problems in R

A very easy set of diagnostic plots can be accessed byplotting a lm object, using plot.lm()

This produces, in order:1 residuals against fitted values2 Normal Q-Q plot3 scale-location plot of

√|ei | against fitted values

4 Cook’s distances versus row labels5 residuals against leverages6 Cook’s distances against leverage/(1-leverage)

Note that by default, plot.lm() only gives you 1,2,3,5

Johan A. Elkink diagnostics

Page 27: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

Exercise

Open the uswages.dta data set and regress log(wage) oneduc, exper and race.

Check for leverage, outliers, influential points and nonlinearities.

Johan A. Elkink diagnostics

Page 28: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

Outline

1 Assumptions and errors

2 Leverage and influence

3 Multicollinearity

4 Heteroscedasticity

Johan A. Elkink diagnostics

Page 29: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

Collinearity

When some variables are linear combinations of others then wehave exact (or perfect) collinearity, and there is no unique leastsquares estimate of β.

When X variables are highly correlated, we havemulticollinearity.

Detecting multicollinearity:

look at correlation matrix of predictors for pairwisecorrelationsregress each independent variable on all otherindependent variables to produce R2

j , and look for highvalues (close to 1.0)

Johan A. Elkink diagnostics

Page 30: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

Multicollinearity

The extent to which multicollinearity is a problem is debatable.

The issue is comparable to that of sample size: if n is too small,we have difficulty picking up effects even if they really exist; thesame holds for variables that are highly multicollinear, making itdifficult to separate their effects on y.

Johan A. Elkink diagnostics

Page 31: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

Multicollinearity

However, some problems with high multicollinearity:

Small changes in data can lead to large changes inestimatesHigh standard errors but joint significanceCoefficients may have “wrong” sign or implausiblemagnitudes

(Greene, 2002, 57)

Johan A. Elkink diagnostics

Page 32: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

Variance of βOLS

var(βOLSk ) =

σ2

(1− R2k )∑n

i (xik − xk )2

σ2: all else equal, the better the fit, the lower the variance(1− R2

k ): all else equal, the lower the R2 from regressingthe k th independent variable on all other independentvariables, the lower the variance

∑ni (xik − xk )2: all else equal, the more variation in x , the

lower the variance

(Greene, 2002, 57)

Johan A. Elkink diagnostics

Page 33: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

Variance Inflation Factor

var(βOLSk ) =

σ2

(1− R2k )∑n

i (xik − xk )2

VIFk =1

1− R2k,

thus VIFk shows the increase in the var(βOLSk ) due to the

variable being collinear with other independent variables.

library(faraway)

vif(lm(...))

Johan A. Elkink diagnostics

Page 34: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

Multicollinearity: solutions

Check for coding or logical mistakes (esp. in cases ofperfect multicollinearity)Increase nRemove one of the collinear variables (apparently notadding much)Combine multiple variables in indices or underlyingdimensionsFormalise the relationship

Johan A. Elkink diagnostics

Page 35: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

Exercise

Using demdev.dta data and model

polity2i = β0+β1cwari+β3laggdppci+β4propdemi+β5energy2i+εi ,

check whether there are any multicollinearity problems.

Johan A. Elkink diagnostics

Page 36: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

Outline

1 Assumptions and errors

2 Leverage and influence

3 Multicollinearity

4 Heteroscedasticity

Johan A. Elkink diagnostics

Page 37: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

Homoscedasticity

Johan A. Elkink diagnostics

Page 38: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

Heteroscedasticity

Johan A. Elkink diagnostics

Page 39: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

Heteroscedasticity

Regression disturbances whose variances are not constantacross observations are heteroscedastic.

Under heteroscedasticity, the OLS estimators remain unbiasedand consistent, but are no longer BLUE or asymptoticallyefficient.

(Thomas, 1985, 94)

Johan A. Elkink diagnostics

Page 40: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

Causes of heteroscedasicity

More variation for larger sizes (e.g. profits of firms variesmore for larger firms)More variation across different groups in the sampleLearning effects in time-seriesVariation in data collection quality (e.g. historical data)Turbulence after shocks in time-series (e.g. financialmarkets)Omitted variableWrong functional formAggregation with varying sizes of populationsetc.

Johan A. Elkink diagnostics

Page 41: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

Heteroscedasticity

Since OLS is no longer BLUE or asymptotically efficient,

other linear unbiased estimators exist which have smallersampling variances;other consistent estimators exist which collapse morequickly to the true values as n increases;we can no longer trust hypothesis tests, becausevar(βOLS) is biased.

cov(X2i , σ

2i ) > 0, then var(βOLS) is underestimated

cov(X2i , σ

2i ) = 0, then no bias in var(βOLS)

cov(X2i , σ

2i ) < 0, then var(βOLS) is overestimated

(inefficient)

(Thomas, 1985, 94–95)

(Judge et al., 1985, 422)Johan A. Elkink diagnostics

Page 42: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

Residual plots: heteroscedasticity

To detect heteroscedasticity (unequal variances), it is useful toplot:

Residuals against fitted valuesResiduals against dependent variableResiduals against independent variable(s)

Usually, the first one is sufficient to detect heteroscedasticity,and can simply be found by:

m <- lm(y ~ x)

plot(m)

Johan A. Elkink diagnostics

Page 43: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

Residual plots: heteroscedasticity

●●

●●

●●

●●

●●

●●

0 2 4 6 8

510

1520

2530

35

x

y

m <- lm(y ~ x)

Johan A. Elkink diagnostics

Page 44: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

Residual plots: heteroscedasticity

●●

●●

●●

● ●

● ●

●●

●●

●●

5 10 15 20 25

−10

−5

05

fitted(m)

resi

dual

s(m

)

plot(residuals(m) ~ fitted(m))

Johan A. Elkink diagnostics

Page 45: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

Residual plots: heteroscedasticity

●●

●●

●●

● ●

● ●

●●

●●

●●

5 10 15 20 25 30 35

−10

−5

05

y

resi

dual

s(m

)

plot(residuals(m) ~ y)

Johan A. Elkink diagnostics

Page 46: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

Residual plots: heteroscedasticity

●●

●●

●●

● ●

● ●

●●

●●

●●

0 2 4 6 8

−10

−5

05

x

resi

dual

s(m

)

plot(residuals(m) ~ x)

Johan A. Elkink diagnostics

Page 47: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

Residual plots: homoscedasticity

●●

●●

●●

●●

●●

●●

●●

0 2 4 6

510

1520

25

x

y

m <- lm(y ~ x)

Johan A. Elkink diagnostics

Page 48: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

Residual plots: homoscedasticity

●●

●●

●●

● ●

● ●

●●

●●

●●

5 10 15 20 25

−2

−1

01

2

fitted(m)

resi

dual

s(m

)

plot(residuals(m) ~ fitted(m))

Johan A. Elkink diagnostics

Page 49: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

Residual plots: homoscedasticity

●●

●●

●●

● ●

● ●

●●

●●

●●

5 10 15 20 25

−2

−1

01

2

y

resi

dual

s(m

)

plot(residuals(m) ~ y)

Johan A. Elkink diagnostics

Page 50: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

Residual plots: homoscedasticity

●●

●●

●●

● ●

● ●

●●

●●

●●

0 2 4 6

−2

−1

01

2

x

resi

dual

s(m

)

plot(residuals(m) ~ x)

Johan A. Elkink diagnostics

Page 51: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

Residual plots: heteroscedasticity

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

0 2 4 6 8

020

4060

8010

012

0

x

y

m <- lm(y ~ x)

Johan A. Elkink diagnostics

Page 52: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

Residual plots: heteroscedasticity

●●

● ●

●●

●●

●●

● ●

●●

●●●

●●

●●

● ●

●●

●●

−20 0 20 40 60 80

−10

010

20

fitted(m)

resi

dual

s(m

)

plot(residuals(m) ~ fitted(m))

Johan A. Elkink diagnostics

Page 53: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

Residual plots: heteroscedasticity

●●

● ●

●●

●●

●●

● ●

●●

●●●

●●

●●

● ●

●●

●●

0 20 40 60 80 100 120

−10

010

20

y

resi

dual

s(m

)

plot(residuals(m) ~ y)

Johan A. Elkink diagnostics

Page 54: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

Residual plots: heteroscedasticity

●●

● ●

●●

●●

●●

● ●

●●

●●●

●●

●●

● ●

●●

●●

0 2 4 6 8

−10

010

20

x

resi

dual

s(m

)

plot(residuals(m) ~ x)

Johan A. Elkink diagnostics

Page 55: Quantitative Methods I: Regression diagnostics...Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Functional forms for additional non-linear transformations

Assumptions and errorsLeverage and influence

MulticollinearityHeteroscedasticity

Cook, R. Dennis. 1977. “Detection of influential observation in linear regression.” Technometrics pp. 15–18.

Cook, R. Dennis. 1979. “Influential observations in linear regression.” Journal of the American Statistical Association74(365):169–174.

Glynn, Adam. 2007. “GOV 2000: Quantitative Methodology for Political Science I.” Lecture slides, Harvard University.

Greene, William H. 2002. Econometric analysis. London: Prentice Hall.

Judge, George G, William E Griffiths, R Carter Hill, Helmut Lutkepohl and Tsoung-Chao Lee. 1985. The Theory andPractice of Econometrics. New York: John Wiley and Sons.

King, Gary. 2007. “GOV 2000: Quantitative Methodology for Political Science I.” Lecture slides, Harvard University.

Kutner, Michael H., Christopher J. Nachtsheim, John Neter and William Li. 2005. Applied linear statistical models.5th ed. McGraw-Hill.

Thomas, R. Leighton. 1985. Introductory econometrics: theory and applications. Longman Harlow, Essex.

Johan A. Elkink diagnostics