gov 2000 - 8. regression with two independent variables...1. why add variables to the regression? 2....

62
Gov 2000 - 8. Regression with Two Independent Variables Matthew Blackwell November 3, 2015 1 / 62

Upload: others

Post on 04-Feb-2021

16 views

Category:

Documents


0 download

TRANSCRIPT

  • Gov 2000 - 8. Regression withTwo Independent Variables

    Matthew BlackwellNovember 3, 2015

    1 / 62

  • 1. Why add variables to the regression?

    2. Adding a binary variable

    3. Adding a continuous variable

    4. OLS mechanics with 2 variables

    5. OLS assumptions & inference with 2 variables

    6. Omitted Variable Bias

    7. Multicollinearity

    2 / 62

  • Announcements

    ā€¢ Midterm:ā–¶ Mean: 23 out of 30 (rescaled for 1000/2000)ā–¶ SD: 5.84ā–¶ Excellent job! Weā€™re really happy with scores!ā–¶ Grades coming this week

    ā€¢ Matrix algebra (Wooldridge, Appendix D)

    3 / 62

  • Where are we? Where are wegoing?

    Last Week

    Number of Covariates in Our Regressions

    0

    2

    4

    6

    8

    10

    4 / 62

  • Where are we? Where are wegoing?

    Last Week This Week

    Number of Covariates in Our Regressions

    0

    2

    4

    6

    8

    10

    5 / 62

  • Where are we? Where are wegoing?

    Last Week This Week Next Week

    Number of Covariates in Our Regressions

    0

    2

    4

    6

    8

    10

    6 / 62

  • 1/ Why addvariables to theregression?

    7 / 62

  • 8 / 62

  • Berkeley gender bias

    ā€¢ Graduate admissions data from Berkeley, 1973ā€¢ Acceptance rates:

    ā–¶ Men: 8442 applicants, 44% admission rateā–¶ Women: 4321 applicants, 35% admission rate

    ā€¢ Evidence of discrimination toward women in admissions?ā€¢ This is a marginal relationshipā€¢ What about the conditional relationship within departments?

    9 / 62

  • Berkeley gender bias, II

    ā€¢ Within departments:Men Women

    Dept Applied Admitted Applied AdmittedA 825 62% 108 82%B 560 63% 25 68%C 325 37% 593 34%D 417 33% 375 35%E 191 28% 393 24%F 373 6% 341 7%

    ā€¢ Within departments, women do somewhat better than men!ā€¢ Women apply to more challenging departments.ā€¢ Marginal relationships (admissions and gender) ā‰  conditional

    relationship given third variable (department)

    10 / 62

  • Simpsonā€™s paradox

    0 1 2 3 4

    -3

    -2

    -1

    0

    1

    X

    Y

    Z = 0

    Z = 1

    ā€¢ Overall a positive relationship between š‘Œš‘– and š‘‹š‘– here11 / 62

  • Simpsonā€™s paradox

    0 1 2 3 4

    -3

    -2

    -1

    0

    1

    X

    Y

    Z = 0

    Z = 1

    ā€¢ Overall a positive relationship between š‘Œš‘– and š‘‹š‘– hereā€¢ But within levels of š‘š‘–, the opposite 12 / 62

  • Basic idea

    ā€¢ Old goal: estimate the mean of š‘Œ as a function of someindependent variable, š‘‹:

    š”¼[š‘Œš‘–|š‘‹š‘–]

    ā€¢ For continuous š‘‹ā€™s, we modeled the CEF/regression functionwith a line:

    š‘Œš‘– = š›½0 + š›½1š‘‹š‘– + š‘¢š‘–ā€¢ New goal: estimate the relationship of two variables, š‘Œš‘– and

    š‘‹š‘–, conditional on a third variable, š‘š‘–:

    š‘Œš‘– = š›½0 + š›½1š‘‹š‘– + š›½2š‘š‘– + š‘¢š‘–

    ā€¢ š›½ā€™s are the population parameters we want to estimate

    13 / 62

  • Why control for another variable

    ā€¢ Descriptiveā–¶ Get a sense for the relationships in the data.ā–¶ Conditional on the number of steps Iā€™ve taken, does higher

    activity levels correlate with less weight?ā€¢ Predictive

    ā–¶ We can usually make better predictions about the dependentvariable with more information on independent variables.

    ā€¢ Causalā–¶ Block potential confounding, which is when š‘‹ doesnā€™t cause š‘Œ ,

    but only appears to because a third variable š‘ causally affectsboth of them.

    ā–¶ š‘‹š‘–: ice cream sales on day š‘–ā–¶ š‘Œš‘–: drowning deaths on day š‘–ā–¶ š‘š‘–: ??

    14 / 62

  • Plan of attack

    1. Adding a binary š‘š‘–2. Adding a continuous š‘š‘–3. Mechanics of OLS with 2 covariates4. OLS assumptions with 2 covariates:

    ā–¶ Omitted variable biasā–¶ Multicollinearity

    15 / 62

  • What we wonā€™t cover in lecture

    1. The OLS formulas for 2 covariates2. Proofs3. The second covariate being a function of the first: š‘š‘– = š‘‹2š‘–4. Goodness of fit

    16 / 62

  • 2/ Adding a binaryvariable

    17 / 62

  • Example

    0 2 4 6 8 10

    4

    5

    6

    7

    8

    9

    10

    11

    Strength of Property Rights

    Log

    GD

    P pe

    r ca

    pita

    African countries

    Non-African countries

    18 / 62

  • Basics

    ā€¢ Ye olde model:š‘Œš‘– = š›½0 + š›½1š‘‹š‘–

    ā€¢ š‘š‘– = 1 to indicate that š‘– is an African countryā€¢ š‘š‘– = 0 to indicate that š‘– is an non-African countryā€¢ Concern: AJR might be picking up an ā€œAfrican effectā€:

    ā–¶ African countries might have low incomes and weak propertyrights

    ā–¶ ā€œControl forā€ country being in Africa or not to remove thisā–¶ Effects are now within Africa or within non-Africa, not betwen

    ā€¢ New model:š‘Œš‘– = š›½0 + š›½1š‘‹š‘– + š›½2š‘š‘–

    19 / 62

  • AJR modelā€¢ Letā€™s see an example with the AJR data:

    ajr.mod |t|)## (Intercept) 5.6556 0.3134 18.04

  • Two lines, one regression

    ā€¢ How can we interpret this model?ā€¢ Plug in two possible values for š‘š‘– and rearrangeā€¢ When š‘š‘– = 0:

    š‘Œš‘– = š›½0 + š›½1š‘‹š‘– + š›½2š‘š‘–= š›½0 + š›½1š‘‹š‘– + š›½2 Ɨ 0= š›½0 + š›½1š‘‹š‘–

    ā€¢ When š‘š‘– = 1:š‘Œš‘– = š›½0 + š›½1š‘‹š‘– + š›½2š‘š‘–

    = š›½0 + š›½1š‘‹š‘– + š›½2 Ɨ 1= (š›½0 + š›½2) + š›½1š‘‹š‘–

    ā€¢ Two different intercepts, same slope

    21 / 62

  • Interpretation of the coefficientsā€¢ Letā€™s review what weā€™ve seen so far:

    Intercept for š‘‹š‘– Slope for š‘‹š‘–Non-African country (š‘š‘– = 0) š›½0 š›½1

    African country (š‘š‘– = 1) š›½0 + š›½2 š›½1ā€¢ In this example, we have:

    š‘Œš‘– = 5.656 + 0.424 Ɨ š‘‹š‘– āˆ’ 0.878 Ɨ š‘š‘–ā€¢ We can read these as:

    ā–¶ š›½0: average log income for non-African country (š‘š‘– = 0) withproperty rights measured at 0 is 5.656

    ā–¶ š›½1: A one-unit increase in property rights is associated with a0.424 increase in average log incomes for two African countries(or for two non-African countries)

    ā–¶ š›½2: there is a āˆ’ 0.878 average difference in log income percapita between African and non-African counties conditionalon property rights

    22 / 62

  • General interpretation of thecoefficients

    š‘Œš‘– = š›½0 + š›½1š‘‹š‘– + š›½2š‘š‘–

    ā€¢ š›½0: average value of š‘Œš‘– when both š‘‹š‘– and š‘š‘– are equal to 0ā€¢ š›½1: A one-unit increase in š‘‹š‘– is associated with a š›½1-unit

    change in š‘Œš‘– conditional on š‘š‘–ā€¢ š›½2: average difference in š‘Œš‘– between š‘š‘– = 1 group and š‘š‘– = 0

    group conditional on š‘‹š‘–

    23 / 62

  • Adding a binary variable, visually

    0 2 4 6 8 10

    4

    5

    6

    7

    8

    9

    10

    11

    Strength of Property Rights

    Log

    GD

    P pe

    r ca

    pita

    Ī²0

    Ī²0 = 5.656

    Ī²1 = 0.424

    24 / 62

  • Adding a binary variable, visually

    0 2 4 6 8 10

    4

    5

    6

    7

    8

    9

    10

    11

    Strength of Property Rights

    Log

    GD

    P pe

    r ca

    pita

    Ī²0

    Ī²0 + Ī²2

    Ī²2

    Ī²0 = 5.656

    Ī²1 = 0.424

    Ī²2 = -0.878

    25 / 62

  • 3/ Adding acontinuousvariable

    26 / 62

  • Adding a continuous variable

    ā€¢ Ye olde model:š‘Œš‘– = š›½0 + š›½1š‘‹š‘–

    ā€¢ š‘š‘–: mean temperature in country š‘– (continuous)ā€¢ Concern: geography is confounding the effect

    ā–¶ geography might affect political institutionsā–¶ geography might affect average incomes (through diseases like

    malaria)ā€¢ New model:

    š‘Œš‘– = š›½0 + š›½1š‘‹š‘– + š›½2š‘š‘–

    27 / 62

  • AJR model, revisited

    ajr.mod2 |t|)## (Intercept) 6.8063 0.7518 9.05 1.3e-12 ***## avexpr 0.4057 0.0640 6.34 3.9e-08 ***## meantemp -0.0602 0.0194 -3.11 0.003 **## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1#### Residual standard error: 0.643 on 57 degrees of freedom## (103 observations deleted due to missingness)## Multiple R-squared: 0.615, Adjusted R-squared: 0.602## F-statistic: 45.6 on 2 and 57 DF, p-value: 1.48e-12

    28 / 62

  • Interpretation with a continuous ZIntercept for š‘‹š‘– Slope for š‘‹š‘–

    š‘š‘– = 0 āˆ˜C š›½0 š›½1š‘š‘– = 21 āˆ˜C š›½0 + š›½2 Ɨ 21 š›½1š‘š‘– = 24 āˆ˜C š›½0 + š›½2 Ɨ 24 š›½1š‘š‘– = 26 āˆ˜C š›½0 + š›½2 Ɨ 26 š›½1

    ā€¢ In this example we have:š‘Œš‘– = 6.806 + 0.406 Ɨ š‘‹š‘– āˆ’ 0.06 Ɨ š‘š‘–

    ā€¢ š›½0: average log income for a country with property rightsmeasured at 0 and a mean temperature of 0 is 6.806

    ā€¢ š›½1: A one-unit increase in property rights is associated with a0.406 change in average log incomes conditional on acountryā€™s mean temperature

    ā€¢ š›½2: A one-degree increase in mean temperature is associatedwith a āˆ’ 0.06 change in average log incomes conditional onstrength of property rights

    29 / 62

  • General interpretation

    š‘Œš‘– = š›½0 + š›½1š‘‹š‘– + š›½2š‘š‘–

    ā€¢ The coefficient š›½1 measures how the predicted outcome variesin š‘‹š‘– for a fixed value of š‘š‘–.

    ā€¢ The coefficient š›½2 measures how the predicted outcome variesin š‘š‘– for a fixed value of š‘‹š‘–.

    30 / 62

  • 4/ OLS mechanicswith 2 variables

    31 / 62

  • Fitted values and residuals

    ā€¢ Where do we get our hats? š›½0, š›½1, š›½2ā€¢ To answer this, we first need to redefine some terms from

    simple linear regression.ā€¢ Fitted values for š‘– = 1, ā€¦ , š‘›:

    š‘Œš‘– = ļæ½Ģ‚ļæ½[š‘Œš‘–|š‘‹š‘–, š‘š‘–] = š›½0 + š›½1š‘‹š‘– + š›½2š‘š‘–

    ā€¢ Residuals for š‘– = 1, ā€¦ , š‘›:

    Ģ‚š‘¢š‘– = š‘Œš‘– āˆ’ š‘Œš‘–

    32 / 62

  • Least squares is still least squares

    ā€¢ How do we estimate š›½0, š›½1, and š›½2?ā€¢ Minimize the sum of the squared residuals, just like before:

    (š›½0, š›½1, š›½2) = arg minš‘0,š‘1,š‘2

    š‘›āˆ‘š‘–=1

    (š‘Œš‘– āˆ’ š‘0 āˆ’ š‘1š‘‹š‘– āˆ’ š‘2š‘š‘–)2

    ā€¢ The calculus is the same as last week, with 3 partialderivatives instead of 2

    33 / 62

  • OLS estimator recipe using twosteps

    ā€¢ No explicit OLS formulas this week, but a recipe insteadā€¢ ā€œPartialling outā€ OLS recipe:

    1. Run regression of š‘‹š‘– on š‘š‘–:š‘‹š‘– = ļæ½Ģ‚ļæ½[š‘‹š‘– |š‘š‘–] = š›æ0 + š›æ1š‘š‘–

    2. Calculate residuals from this regression:Ģ‚š‘Ÿš‘„š‘§,š‘– = š‘‹š‘– āˆ’ š‘‹š‘–

    3. Run a simple regression of š‘Œš‘– on residuals, Ģ‚š‘Ÿš‘„š‘§,š‘–:š‘Œš‘– = š›½0 + š›½1 Ģ‚š‘Ÿš‘„š‘§,š‘–

    ā€¢ Estimate of š›½1 will be the same as running:š‘Œš‘– = š›½0 + š›½1š‘‹š‘– + š›½2š‘š‘–

    34 / 62

  • First regressionā€¢ Regress š‘‹š‘– on š‘š‘–:

    ## when missing data exists, need the na.actionin order## to place residuals or fitted values back into the dataajr.first |t|)## (Intercept) 9.9568 0.8202 12.1 < 2e-16 ***## meantemp -0.1490 0.0347 -4.3 0.000067 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1#### Residual standard error: 1.32 on 58 degrees of freedom## (103 observations deleted due to missingness)## Multiple R-squared: 0.241, Adjusted R-squared: 0.228## F-statistic: 18.4 on 1 and 58 DF, p-value: 0.0000673

    35 / 62

  • Regression of log income on theresiduals

    ā€¢ Save residuals:

    ## store the residualsajr$avexpr.res

  • Residual/partial regression plotā€¢ Can plot the conditional relationship between property rights

    and income given temperature:

    -3 -2 -1 0 1 2 3

    6

    7

    8

    9

    10

    Residuals(Property Right ~ Mean Temperature)

    Log

    GD

    P pe

    r ca

    pita

    37 / 62

  • 5/ OLSassumptions &inference with 2variables

    38 / 62

  • OLS assumptions for unbiasedness

    ā€¢ Last week we made Assumptions 1-4 for unbiasedness of OLS:1. Linearity2. Random/iid sample3. Variation in š‘‹š‘–4. Zero conditional mean error: š”¼[š‘¢š‘– |š‘‹š‘–] = 0

    ā€¢ Small modification to these with 2 covariates:1. Linearity

    š‘Œš‘– = š›½0 + š›½1š‘‹š‘– + š›½2š‘š‘– + š‘¢š‘–2. Random/iid sample3. No perfect collinearity4. Zero conditional mean error

    š”¼[š‘¢š‘– |š‘‹š‘–, š‘š‘–] = 0

    39 / 62

  • New assumption

    Assumption 3: No perfect collinearity(1) No independent variable is constant in the sample and (2) thereare no exactly linear relationships among the independent variables.

    ā€¢ Two components1. Both š‘‹š‘– and š‘š‘– have to vary.2. š‘š‘– cannot be a deterministic, linear function of š‘‹š‘–.

    ā€¢ Part 2 rules out anything of the form:

    š‘š‘– = š‘Ž + š‘š‘‹š‘–

    ā€¢ Notice how this is linear (equation of a line) and there is noerror, so it is deterministic.

    ā€¢ Whatā€™s the correlation between š‘š‘– and š‘‹š‘–? 1!

    40 / 62

  • Perfect collinearity example (I)

    ā€¢ Simple example:ā–¶ š‘‹š‘– = 1 if a country is not in Africa and 0 otherwise.ā–¶ š‘š‘– = 1 if a country is in Africa and 0 otherwise.

    ā€¢ But, clearly we have the following:

    š‘š‘– = 1 āˆ’ š‘‹š‘–

    ā€¢ These two variables are perfectly collinear.ā€¢ What about the following:

    ā–¶ š‘‹š‘– = property rightsā–¶ š‘š‘– = š‘‹2š‘–

    ā€¢ Do we have to worry about collinearity here?ā€¢ No! Because while š‘š‘– is a deterministic function of š‘‹š‘–, it is a

    nonlinear function of š‘‹š‘–.

    41 / 62

  • R and perfect collinearityā€¢ R, Stata, et al will drop one of the variables if there is perfect

    collinearity:

    ajr$nonafrica |t|)## (Intercept) 8.7164 0.0899 96.94 < 2e-16 ***## africa -1.3612 0.1631 -8.35 4.9e-14 ***## nonafrica NA NA NA NA## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1#### Residual standard error: 0.913 on 146 degrees of freedom## (15 observations deleted due to missingness)## Multiple R-squared: 0.323, Adjusted R-squared: 0.318## F-statistic: 69.7 on 1 and 146 DF, p-value: 4.87e-14

    42 / 62

  • Perfect collinearity example (II)

    ā€¢ Another example:ā–¶ š‘‹š‘– = mean temperature in Celsiusā–¶ š‘š‘– = 1.8š‘‹š‘– + 32 (mean temperature in Fahrenheit)

    ajr$meantemp.f

  • OLS assumptions for large-sampleinference

    ā€¢ For large-sample inference and calculating SEs, we need thetwo-variable version of the Gauss-Markov assumptions:

    1. Linearityš‘Œš‘– = š›½0 + š›½1š‘‹š‘– + š›½2š‘š‘– + š‘¢š‘–

    2. Random/iid sample3. No perfect collinearity4. Zero conditional mean error

    š”¼[š‘¢š‘–|š‘‹š‘–, š‘š‘–] = 0

    5. Homoskedasticityvar[š‘¢š‘–|š‘‹š‘–, š‘š‘–] = šœŽ2š‘¢

    44 / 62

  • Large-sample inference with 2covariates

    ā€¢ We have our OLS estimate š›½1 and an estimated SE: š‘†šø[š›½1].ā€¢ Under assumption 1-5, in large samples, weā€™ll have the

    following:š›½1 āˆ’ š›½1š‘†šø[š›½1]

    āˆ¼ š‘(0, 1)

    ā€¢ The same holds for the other coefficient:

    š›½2 āˆ’ š›½2š‘†šø[š›½2]

    āˆ¼ š‘(0, 1)

    ā€¢ Inference is exactly the same in large samples!ā€¢ Hypothesis tests and CIs are good to goā€¢ The SEā€™s will change, though

    45 / 62

  • OLS assumptions for small-sampleinference

    ā€¢ For small-sample inference, we need the Gauss-Markov plusNormal errors:

    1. Linearityš‘Œš‘– = š›½0 + š›½1š‘‹š‘– + š›½2š‘š‘– + š‘¢š‘–

    2. Random/iid sample3. No perfect collinearity4. Zero conditional mean error

    š”¼[š‘¢š‘–|š‘‹š‘–, š‘š‘–] = 05. Homoskedasticity

    var[š‘¢š‘–|š‘‹š‘–, š‘š‘–] = šœŽ2š‘¢6. Normal conditional errors

    š‘¢š‘– āˆ¼ š‘(0, šœŽ2š‘¢)

    46 / 62

  • Small-sample inference with 2covariates

    ā€¢ Under assumptions 1-6, we have the following small change toour small-š‘› sampling distribution:

    š›½1 āˆ’ š›½1š‘†šø[š›½1]

    āˆ¼ š‘”š‘›āˆ’3

    ā€¢ The same is true for the other coefficient:š›½2 āˆ’ š›½2š‘†šø[š›½2]

    āˆ¼ š‘”š‘›āˆ’3

    ā€¢ Why š‘› āˆ’ 3?ā–¶ Weā€™ve estimated another parameter, so we need to take off

    another degree of freedom.ā€¢ ā‡ small adjustments to the critical values and the t-values for

    our hypothesis tests and confidence intervals.47 / 62

  • 6/ OmittedVariable Bias

    48 / 62

  • Unbiasedness revisited

    ā€¢ True model:š‘Œš‘– = š›½0 + š›½1š‘‹š‘– + š›½2š‘š‘– + š‘¢š‘–

    ā€¢ Assumptions 1-4 ā‡’ we get unbiased estimates of thecoefficients

    ā€¢ What happens if we ignore the š‘š‘– and just run the simplelinear regression with just š‘‹š‘–?

    ā€¢ Misspecified model:

    š‘Œš‘– = š›½0 + š›½1š‘‹š‘– + š‘¢āˆ—š‘– š‘¢āˆ—š‘– = š›½2š‘š‘– + š‘¢š‘–

    ā€¢ OLS estimates from the misspecified model:

    š‘Œš‘– = Ģƒš›½0 + Ģƒš›½1š‘‹š‘–

    49 / 62

  • Omitted variable bias

    š‘Œš‘– = š›½0 + š›½1š‘‹š‘– + š‘¢āˆ—š‘– š‘¢āˆ—š‘– = š›½2š‘š‘– + š‘¢š‘–

    ā€¢ Which of the four assumptions might we violate?ā–¶ Zero conditional mean error!ā–¶ šø[š‘¢āˆ—š‘– |š‘‹š‘–] ā‰  0 because šø[š‘š‘– |š‘‹š‘–] might not be zero (show on the

    board, Blackwell)ā€¢ Intuition Correlation between š‘‹š‘– and š‘š‘– ā‡ correlation

    between š‘‹š‘– and the misspecified error š‘¢āˆ—š‘–ā€¢ Question: will š”¼[ Ģƒš›½1] = š›½1? If not, what will be the bias?

    50 / 62

  • Omitted variable bias, derivation

    ā€¢ Bias for the misspecified estimator (derived in notes):

    Bias( Ģƒš›½1) = š”¼[ Ģƒš›½1] āˆ’ š›½1 = š›½2š›æ1

    ā€¢ Where the š›æ1 is the coefficient on š‘š‘– from a regression of š‘š‘–on š‘‹š‘–:

    š‘š‘– = š›æ0 + š›æ1š‘‹š‘– + š‘’š‘–ā€¢ In other words:

    omitted variable bias = (effect of š‘š‘– on š‘Œš‘–)Ɨ(effect of š‘‹š‘– on š‘š‘–)

    51 / 62

  • Omitted variable bias, summary

    ā€¢ Remember that by OLS, the effect of š‘‹š‘– on š‘š‘– is:

    š›æ1 =cov(š‘š‘–, š‘‹š‘–)

    var(š‘‹š‘–)

    ā€¢ We can summarize the direction of bias like so:cov(š‘‹š‘–, š‘š‘–) > 0 cov(š‘‹š‘–, š‘š‘–) < 0 cov(š‘‹š‘–, š‘š‘–) = 0

    š›½2 > 0 Positive bias Negative Bias No biasš›½2 < 0 Negative bias Positive Bias No biasš›½2 = 0 No bias No bias No bias

    ā€¢ Very relevant if š‘š‘– is unobserved for some reason!

    52 / 62

  • Including irrelevant variables

    ā€¢ What if we do the opposite and include an irrelevant variable?ā€¢ What would it mean for š‘š‘– to be an irrelevant variable?

    Basically, that we have

    š‘Œš‘– = š›½0 + š›½1š‘‹š‘– + 0 Ɨ š‘š‘– + š‘¢š‘–

    ā€¢ So in this case, the true value of š›½2 = 0. But underAssumptions 1-4, OLS is unbiased for all the parameters:

    š”¼[š›½0] = š›½0š”¼[š›½1] = š›½1š”¼[š›½2] = 0

    ā€¢ Including an irrelevant variable will increase the standarderrors for š›½1.

    53 / 62

  • 7/ Multicollinearity

    54 / 62

  • Sampling variance for bivariateregression

    ā€¢ Under simple linear regression, we found that the samplingvariance of the slope was the following:

    var(š›½1) =šœŽ2š‘¢

    āˆ‘š‘›š‘–=1(š‘‹š‘– āˆ’ š‘‹)2

    ā€¢ Question What do we call the square root of the samplingvariance? Standard error!

    ā€¢ Factors affecting the standard errors:ā–¶ The error variance šœŽ2š‘¢ (higher conditional variance of š‘Œš‘– leads

    to bigger SEs)ā–¶ The total variation in š‘‹š‘–: āˆ‘š‘›š‘–=1(š‘‹š‘– āˆ’ š‘‹)2 (lower variation in š‘‹š‘–

    leads to bigger SEs)

    55 / 62

  • Sampling variation with 2covariates

    ā€¢ Regression with an additional independent variable:

    var(š›½1) =šœŽ2š‘¢

    (1 āˆ’ š‘…21) āˆ‘š‘›š‘–=1(š‘‹š‘– āˆ’ š‘‹)2

    ā€¢ Here, š‘…21 is the š‘…2 from the regression of š‘‹š‘– on š‘š‘–:š‘‹š‘– = š›æ0 + š›æ1š‘š‘–

    ā€¢ Factors now affecting the standard errors:ā–¶ The error variance (higher conditional variance of š‘Œš‘– leads to

    bigger SEs)ā–¶ The total variation of š‘‹š‘– (lower variation in š‘‹š‘– leads to bigger

    SEs)ā–¶ The strength of the relationship betwee š‘‹š‘– and š‘š‘– (stronger

    relationships mean higher š‘…21 and thus bigger SEs)ā€¢ What happens with perfect collinearity?

    56 / 62

  • Multicollinearity

    DefinitionMulticollinearity is defined to be high, but not perfect, correlationbetween two independent variables in a regression.

    ā€¢ With multicollinearity, weā€™ll have š‘…21 ā‰ˆ 1, but not exactly.ā€¢ The stronger the relationship between š‘‹š‘– and š‘š‘–, the closer

    the š‘…21 will be to 1, and the higher the SEs will be:

    var(š›½1) =šœŽ2š‘¢

    (1 āˆ’ š‘…21) āˆ‘š‘›š‘–=1(š‘‹š‘– āˆ’ š‘‹)2

    ā€¢ Given the symmetry, it will also increase var(š›½2) as well.

    57 / 62

  • Intuition for multicollinearity

    ā€¢ Remember the OLS recipe:ā–¶ š›½1 from regression of š‘Œš‘– on Ģ‚š‘Ÿš‘„š‘§,š‘–ā–¶ Ģ‚š‘Ÿš‘„š‘§,š‘– are the residuals from the regression of š‘‹š‘– on š‘š‘–

    ā€¢ Estimated coefficient:

    š›½1 =āˆ‘š‘›š‘–=1 Ģ‚š‘Ÿš‘„š‘§,š‘–š‘Œš‘–āˆ‘š‘›š‘–=1 Ģ‚š‘Ÿ2š‘„š‘§,š‘–

    ā€¢ When š‘š‘– and š‘‹š‘– have a strong relationship, then the residualswill have low variation (draw this)

    ā€¢ We explain away a lot of the variation in š‘‹š‘– through š‘š‘–.ā€¢ Low variation in an independent variable (here, Ģ‚š‘Ÿš‘„š‘§,š‘–) ā‡ high

    SEsā€¢ Basically, there is less residual variation left in š‘‹š‘– after

    ā€œpartialling outā€ the effect of š‘š‘–

    58 / 62

  • Effects of multicollinearityā€¢ No effect on the bias of OLS.ā€¢ Only increases the standard errors.ā€¢ Really just a sample size problem:

    ā–¶ If š‘‹š‘– and š‘š‘– are extremely highly correlated, youā€™re going toneed a much bigger sample to accurately differentiate betweentheir effects.

    59 / 62

  • Conclusion

    ā€¢ In this brave new world with 2 independent variables:1. š›½ā€™s have slightly different interpretations2. OLS still minimizing the sum of the squared residuals3. Small adjustments to OLS assumptions and inference4. Adding or omitting variables in a regression can affect the bias

    and the variance of OLSā€¢ Remainder of class:

    1. Regression in most general glory (matrices)2. How to diagnose and fix violations of the OLS assumptions

    60 / 62

  • Deriving of the misspecifiedcoefficient

    ā€¢ Here weā€™ll use cĢ‚ov to mean the sample covariance, and vĢ‚ar tobe the sample variance.

    Ģƒš›½1 =cĢ‚ov(š‘Œš‘–, š‘‹š‘–)

    vĢ‚ar(š‘‹š‘–)(OLS formulas)

    = cĢ‚ov(š›½0 + š›½1š‘‹š‘– + š›½2š‘š‘– + š‘¢š‘–, š‘‹š‘–)vĢ‚ar(š‘‹š‘–)(Linearity in correct model)

    = cĢ‚ov(š›½0, š‘‹š‘–)vĢ‚ar(š‘‹š‘–)+ cĢ‚ov(š›½1š‘‹š‘–, š‘‹š‘–)vĢ‚ar(š‘‹š‘–)

    + cĢ‚ov(š›½2š‘š‘–, š‘‹š‘–)vĢ‚ar(š‘‹š‘–)+ cĢ‚ov(š‘¢š‘–, š‘‹š‘–)vĢ‚ar(š‘‹š‘–)

    (covariance properties)

    = 0 + cĢ‚ov(š›½1š‘‹š‘–, š‘‹š‘–)vĢ‚ar(š‘‹š‘–)+ cĢ‚ov(š›½2š‘š‘–, š‘‹š‘–)vĢ‚ar(š‘‹š‘–)

    + 0 (zero mean error)

    = š›½1vĢ‚ar(š‘‹š‘–)vĢ‚ar(š‘‹š‘–)

    + š›½2cĢ‚ov(š‘š‘–, š‘‹š‘–)

    vĢ‚ar(š‘‹š‘–)(properties of covariance)

    = š›½1 + š›½2cĢ‚ov(š‘š‘–, š‘‹š‘–)

    vĢ‚ar(š‘‹š‘–)= š›½1 + š›½2š›æ1 (OLS formulas)61 / 62

  • Next step

    ā€¢ Letā€™s take expectations:

    š”¼[ Ģƒš›½1] = š”¼[š›½1 + š›½2š›æ1]= š›½1 + š›½2š”¼[š›æ1]= š›½1 + š›½2š›æ1

    ā€¢ Thus, we can calculate the bias here:

    Bias( Ģƒš›½1) = š”¼[ Ģƒš›½1] āˆ’ š›½1 = š›½2š›æ1

    62 / 62

    Why add variables to the regression?Adding a binary variableAdding a continuous variableOLS mechanics with 2 variablesOLS assumptions & inference with 2 variablesOmitted Variable BiasMulticollinearity