gov 2000 - 8. regression with two independent variables...1. why add variables to the regression? 2....
TRANSCRIPT
-
Gov 2000 - 8. Regression withTwo Independent Variables
Matthew BlackwellNovember 3, 2015
1 / 62
-
1. Why add variables to the regression?
2. Adding a binary variable
3. Adding a continuous variable
4. OLS mechanics with 2 variables
5. OLS assumptions & inference with 2 variables
6. Omitted Variable Bias
7. Multicollinearity
2 / 62
-
Announcements
ā¢ Midterm:ā¶ Mean: 23 out of 30 (rescaled for 1000/2000)ā¶ SD: 5.84ā¶ Excellent job! Weāre really happy with scores!ā¶ Grades coming this week
ā¢ Matrix algebra (Wooldridge, Appendix D)
3 / 62
-
Where are we? Where are wegoing?
Last Week
Number of Covariates in Our Regressions
0
2
4
6
8
10
4 / 62
-
Where are we? Where are wegoing?
Last Week This Week
Number of Covariates in Our Regressions
0
2
4
6
8
10
5 / 62
-
Where are we? Where are wegoing?
Last Week This Week Next Week
Number of Covariates in Our Regressions
0
2
4
6
8
10
6 / 62
-
1/ Why addvariables to theregression?
7 / 62
-
8 / 62
-
Berkeley gender bias
ā¢ Graduate admissions data from Berkeley, 1973ā¢ Acceptance rates:
ā¶ Men: 8442 applicants, 44% admission rateā¶ Women: 4321 applicants, 35% admission rate
ā¢ Evidence of discrimination toward women in admissions?ā¢ This is a marginal relationshipā¢ What about the conditional relationship within departments?
9 / 62
-
Berkeley gender bias, II
ā¢ Within departments:Men Women
Dept Applied Admitted Applied AdmittedA 825 62% 108 82%B 560 63% 25 68%C 325 37% 593 34%D 417 33% 375 35%E 191 28% 393 24%F 373 6% 341 7%
ā¢ Within departments, women do somewhat better than men!ā¢ Women apply to more challenging departments.ā¢ Marginal relationships (admissions and gender) ā conditional
relationship given third variable (department)
10 / 62
-
Simpsonās paradox
0 1 2 3 4
-3
-2
-1
0
1
X
Y
Z = 0
Z = 1
ā¢ Overall a positive relationship between šš and šš here11 / 62
-
Simpsonās paradox
0 1 2 3 4
-3
-2
-1
0
1
X
Y
Z = 0
Z = 1
ā¢ Overall a positive relationship between šš and šš hereā¢ But within levels of šš, the opposite 12 / 62
-
Basic idea
ā¢ Old goal: estimate the mean of š as a function of someindependent variable, š:
š¼[šš|šš]
ā¢ For continuous šās, we modeled the CEF/regression functionwith a line:
šš = š½0 + š½1šš + š¢šā¢ New goal: estimate the relationship of two variables, šš and
šš, conditional on a third variable, šš:
šš = š½0 + š½1šš + š½2šš + š¢š
ā¢ š½ās are the population parameters we want to estimate
13 / 62
-
Why control for another variable
ā¢ Descriptiveā¶ Get a sense for the relationships in the data.ā¶ Conditional on the number of steps Iāve taken, does higher
activity levels correlate with less weight?ā¢ Predictive
ā¶ We can usually make better predictions about the dependentvariable with more information on independent variables.
ā¢ Causalā¶ Block potential confounding, which is when š doesnāt cause š ,
but only appears to because a third variable š causally affectsboth of them.
ā¶ šš: ice cream sales on day šā¶ šš: drowning deaths on day šā¶ šš: ??
14 / 62
-
Plan of attack
1. Adding a binary šš2. Adding a continuous šš3. Mechanics of OLS with 2 covariates4. OLS assumptions with 2 covariates:
ā¶ Omitted variable biasā¶ Multicollinearity
15 / 62
-
What we wonāt cover in lecture
1. The OLS formulas for 2 covariates2. Proofs3. The second covariate being a function of the first: šš = š2š4. Goodness of fit
16 / 62
-
2/ Adding a binaryvariable
17 / 62
-
Example
0 2 4 6 8 10
4
5
6
7
8
9
10
11
Strength of Property Rights
Log
GD
P pe
r ca
pita
African countries
Non-African countries
18 / 62
-
Basics
ā¢ Ye olde model:šš = š½0 + š½1šš
ā¢ šš = 1 to indicate that š is an African countryā¢ šš = 0 to indicate that š is an non-African countryā¢ Concern: AJR might be picking up an āAfrican effectā:
ā¶ African countries might have low incomes and weak propertyrights
ā¶ āControl forā country being in Africa or not to remove thisā¶ Effects are now within Africa or within non-Africa, not betwen
ā¢ New model:šš = š½0 + š½1šš + š½2šš
19 / 62
-
AJR modelā¢ Letās see an example with the AJR data:
ajr.mod |t|)## (Intercept) 5.6556 0.3134 18.04
-
Two lines, one regression
ā¢ How can we interpret this model?ā¢ Plug in two possible values for šš and rearrangeā¢ When šš = 0:
šš = š½0 + š½1šš + š½2šš= š½0 + š½1šš + š½2 Ć 0= š½0 + š½1šš
ā¢ When šš = 1:šš = š½0 + š½1šš + š½2šš
= š½0 + š½1šš + š½2 Ć 1= (š½0 + š½2) + š½1šš
ā¢ Two different intercepts, same slope
21 / 62
-
Interpretation of the coefficientsā¢ Letās review what weāve seen so far:
Intercept for šš Slope for ššNon-African country (šš = 0) š½0 š½1
African country (šš = 1) š½0 + š½2 š½1ā¢ In this example, we have:
šš = 5.656 + 0.424 Ć šš ā 0.878 Ć ššā¢ We can read these as:
ā¶ š½0: average log income for non-African country (šš = 0) withproperty rights measured at 0 is 5.656
ā¶ š½1: A one-unit increase in property rights is associated with a0.424 increase in average log incomes for two African countries(or for two non-African countries)
ā¶ š½2: there is a ā 0.878 average difference in log income percapita between African and non-African counties conditionalon property rights
22 / 62
-
General interpretation of thecoefficients
šš = š½0 + š½1šš + š½2šš
ā¢ š½0: average value of šš when both šš and šš are equal to 0ā¢ š½1: A one-unit increase in šš is associated with a š½1-unit
change in šš conditional on ššā¢ š½2: average difference in šš between šš = 1 group and šš = 0
group conditional on šš
23 / 62
-
Adding a binary variable, visually
0 2 4 6 8 10
4
5
6
7
8
9
10
11
Strength of Property Rights
Log
GD
P pe
r ca
pita
Ī²0
Ī²0 = 5.656
Ī²1 = 0.424
24 / 62
-
Adding a binary variable, visually
0 2 4 6 8 10
4
5
6
7
8
9
10
11
Strength of Property Rights
Log
GD
P pe
r ca
pita
Ī²0
Ī²0 + Ī²2
Ī²2
Ī²0 = 5.656
Ī²1 = 0.424
Ī²2 = -0.878
25 / 62
-
3/ Adding acontinuousvariable
26 / 62
-
Adding a continuous variable
ā¢ Ye olde model:šš = š½0 + š½1šš
ā¢ šš: mean temperature in country š (continuous)ā¢ Concern: geography is confounding the effect
ā¶ geography might affect political institutionsā¶ geography might affect average incomes (through diseases like
malaria)ā¢ New model:
šš = š½0 + š½1šš + š½2šš
27 / 62
-
AJR model, revisited
ajr.mod2 |t|)## (Intercept) 6.8063 0.7518 9.05 1.3e-12 ***## avexpr 0.4057 0.0640 6.34 3.9e-08 ***## meantemp -0.0602 0.0194 -3.11 0.003 **## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1#### Residual standard error: 0.643 on 57 degrees of freedom## (103 observations deleted due to missingness)## Multiple R-squared: 0.615, Adjusted R-squared: 0.602## F-statistic: 45.6 on 2 and 57 DF, p-value: 1.48e-12
28 / 62
-
Interpretation with a continuous ZIntercept for šš Slope for šš
šš = 0 āC š½0 š½1šš = 21 āC š½0 + š½2 Ć 21 š½1šš = 24 āC š½0 + š½2 Ć 24 š½1šš = 26 āC š½0 + š½2 Ć 26 š½1
ā¢ In this example we have:šš = 6.806 + 0.406 Ć šš ā 0.06 Ć šš
ā¢ š½0: average log income for a country with property rightsmeasured at 0 and a mean temperature of 0 is 6.806
ā¢ š½1: A one-unit increase in property rights is associated with a0.406 change in average log incomes conditional on acountryās mean temperature
ā¢ š½2: A one-degree increase in mean temperature is associatedwith a ā 0.06 change in average log incomes conditional onstrength of property rights
29 / 62
-
General interpretation
šš = š½0 + š½1šš + š½2šš
ā¢ The coefficient š½1 measures how the predicted outcome variesin šš for a fixed value of šš.
ā¢ The coefficient š½2 measures how the predicted outcome variesin šš for a fixed value of šš.
30 / 62
-
4/ OLS mechanicswith 2 variables
31 / 62
-
Fitted values and residuals
ā¢ Where do we get our hats? š½0, š½1, š½2ā¢ To answer this, we first need to redefine some terms from
simple linear regression.ā¢ Fitted values for š = 1, ā¦ , š:
šš = ļæ½Ģļæ½[šš|šš, šš] = š½0 + š½1šš + š½2šš
ā¢ Residuals for š = 1, ā¦ , š:
Ģš¢š = šš ā šš
32 / 62
-
Least squares is still least squares
ā¢ How do we estimate š½0, š½1, and š½2?ā¢ Minimize the sum of the squared residuals, just like before:
(š½0, š½1, š½2) = arg minš0,š1,š2
šāš=1
(šš ā š0 ā š1šš ā š2šš)2
ā¢ The calculus is the same as last week, with 3 partialderivatives instead of 2
33 / 62
-
OLS estimator recipe using twosteps
ā¢ No explicit OLS formulas this week, but a recipe insteadā¢ āPartialling outā OLS recipe:
1. Run regression of šš on šš:šš = ļæ½Ģļæ½[šš |šš] = šæ0 + šæ1šš
2. Calculate residuals from this regression:Ģšš„š§,š = šš ā šš
3. Run a simple regression of šš on residuals, Ģšš„š§,š:šš = š½0 + š½1 Ģšš„š§,š
ā¢ Estimate of š½1 will be the same as running:šš = š½0 + š½1šš + š½2šš
34 / 62
-
First regressionā¢ Regress šš on šš:
## when missing data exists, need the na.actionin order## to place residuals or fitted values back into the dataajr.first |t|)## (Intercept) 9.9568 0.8202 12.1 < 2e-16 ***## meantemp -0.1490 0.0347 -4.3 0.000067 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1#### Residual standard error: 1.32 on 58 degrees of freedom## (103 observations deleted due to missingness)## Multiple R-squared: 0.241, Adjusted R-squared: 0.228## F-statistic: 18.4 on 1 and 58 DF, p-value: 0.0000673
35 / 62
-
Regression of log income on theresiduals
ā¢ Save residuals:
## store the residualsajr$avexpr.res
-
Residual/partial regression plotā¢ Can plot the conditional relationship between property rights
and income given temperature:
-3 -2 -1 0 1 2 3
6
7
8
9
10
Residuals(Property Right ~ Mean Temperature)
Log
GD
P pe
r ca
pita
37 / 62
-
5/ OLSassumptions &inference with 2variables
38 / 62
-
OLS assumptions for unbiasedness
ā¢ Last week we made Assumptions 1-4 for unbiasedness of OLS:1. Linearity2. Random/iid sample3. Variation in šš4. Zero conditional mean error: š¼[š¢š |šš] = 0
ā¢ Small modification to these with 2 covariates:1. Linearity
šš = š½0 + š½1šš + š½2šš + š¢š2. Random/iid sample3. No perfect collinearity4. Zero conditional mean error
š¼[š¢š |šš, šš] = 0
39 / 62
-
New assumption
Assumption 3: No perfect collinearity(1) No independent variable is constant in the sample and (2) thereare no exactly linear relationships among the independent variables.
ā¢ Two components1. Both šš and šš have to vary.2. šš cannot be a deterministic, linear function of šš.
ā¢ Part 2 rules out anything of the form:
šš = š + ššš
ā¢ Notice how this is linear (equation of a line) and there is noerror, so it is deterministic.
ā¢ Whatās the correlation between šš and šš? 1!
40 / 62
-
Perfect collinearity example (I)
ā¢ Simple example:ā¶ šš = 1 if a country is not in Africa and 0 otherwise.ā¶ šš = 1 if a country is in Africa and 0 otherwise.
ā¢ But, clearly we have the following:
šš = 1 ā šš
ā¢ These two variables are perfectly collinear.ā¢ What about the following:
ā¶ šš = property rightsā¶ šš = š2š
ā¢ Do we have to worry about collinearity here?ā¢ No! Because while šš is a deterministic function of šš, it is a
nonlinear function of šš.
41 / 62
-
R and perfect collinearityā¢ R, Stata, et al will drop one of the variables if there is perfect
collinearity:
ajr$nonafrica |t|)## (Intercept) 8.7164 0.0899 96.94 < 2e-16 ***## africa -1.3612 0.1631 -8.35 4.9e-14 ***## nonafrica NA NA NA NA## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1#### Residual standard error: 0.913 on 146 degrees of freedom## (15 observations deleted due to missingness)## Multiple R-squared: 0.323, Adjusted R-squared: 0.318## F-statistic: 69.7 on 1 and 146 DF, p-value: 4.87e-14
42 / 62
-
Perfect collinearity example (II)
ā¢ Another example:ā¶ šš = mean temperature in Celsiusā¶ šš = 1.8šš + 32 (mean temperature in Fahrenheit)
ajr$meantemp.f
-
OLS assumptions for large-sampleinference
ā¢ For large-sample inference and calculating SEs, we need thetwo-variable version of the Gauss-Markov assumptions:
1. Linearityšš = š½0 + š½1šš + š½2šš + š¢š
2. Random/iid sample3. No perfect collinearity4. Zero conditional mean error
š¼[š¢š|šš, šš] = 0
5. Homoskedasticityvar[š¢š|šš, šš] = š2š¢
44 / 62
-
Large-sample inference with 2covariates
ā¢ We have our OLS estimate š½1 and an estimated SE: ššø[š½1].ā¢ Under assumption 1-5, in large samples, weāll have the
following:š½1 ā š½1ššø[š½1]
ā¼ š(0, 1)
ā¢ The same holds for the other coefficient:
š½2 ā š½2ššø[š½2]
ā¼ š(0, 1)
ā¢ Inference is exactly the same in large samples!ā¢ Hypothesis tests and CIs are good to goā¢ The SEās will change, though
45 / 62
-
OLS assumptions for small-sampleinference
ā¢ For small-sample inference, we need the Gauss-Markov plusNormal errors:
1. Linearityšš = š½0 + š½1šš + š½2šš + š¢š
2. Random/iid sample3. No perfect collinearity4. Zero conditional mean error
š¼[š¢š|šš, šš] = 05. Homoskedasticity
var[š¢š|šš, šš] = š2š¢6. Normal conditional errors
š¢š ā¼ š(0, š2š¢)
46 / 62
-
Small-sample inference with 2covariates
ā¢ Under assumptions 1-6, we have the following small change toour small-š sampling distribution:
š½1 ā š½1ššø[š½1]
ā¼ š”šā3
ā¢ The same is true for the other coefficient:š½2 ā š½2ššø[š½2]
ā¼ š”šā3
ā¢ Why š ā 3?ā¶ Weāve estimated another parameter, so we need to take off
another degree of freedom.ā¢ ā small adjustments to the critical values and the t-values for
our hypothesis tests and confidence intervals.47 / 62
-
6/ OmittedVariable Bias
48 / 62
-
Unbiasedness revisited
ā¢ True model:šš = š½0 + š½1šš + š½2šš + š¢š
ā¢ Assumptions 1-4 ā we get unbiased estimates of thecoefficients
ā¢ What happens if we ignore the šš and just run the simplelinear regression with just šš?
ā¢ Misspecified model:
šš = š½0 + š½1šš + š¢āš š¢āš = š½2šš + š¢š
ā¢ OLS estimates from the misspecified model:
šš = Ģš½0 + Ģš½1šš
49 / 62
-
Omitted variable bias
šš = š½0 + š½1šš + š¢āš š¢āš = š½2šš + š¢š
ā¢ Which of the four assumptions might we violate?ā¶ Zero conditional mean error!ā¶ šø[š¢āš |šš] ā 0 because šø[šš |šš] might not be zero (show on the
board, Blackwell)ā¢ Intuition Correlation between šš and šš ā correlation
between šš and the misspecified error š¢āšā¢ Question: will š¼[ Ģš½1] = š½1? If not, what will be the bias?
50 / 62
-
Omitted variable bias, derivation
ā¢ Bias for the misspecified estimator (derived in notes):
Bias( Ģš½1) = š¼[ Ģš½1] ā š½1 = š½2šæ1
ā¢ Where the šæ1 is the coefficient on šš from a regression of ššon šš:
šš = šæ0 + šæ1šš + ššā¢ In other words:
omitted variable bias = (effect of šš on šš)Ć(effect of šš on šš)
51 / 62
-
Omitted variable bias, summary
ā¢ Remember that by OLS, the effect of šš on šš is:
šæ1 =cov(šš, šš)
var(šš)
ā¢ We can summarize the direction of bias like so:cov(šš, šš) > 0 cov(šš, šš) < 0 cov(šš, šš) = 0
š½2 > 0 Positive bias Negative Bias No biasš½2 < 0 Negative bias Positive Bias No biasš½2 = 0 No bias No bias No bias
ā¢ Very relevant if šš is unobserved for some reason!
52 / 62
-
Including irrelevant variables
ā¢ What if we do the opposite and include an irrelevant variable?ā¢ What would it mean for šš to be an irrelevant variable?
Basically, that we have
šš = š½0 + š½1šš + 0 Ć šš + š¢š
ā¢ So in this case, the true value of š½2 = 0. But underAssumptions 1-4, OLS is unbiased for all the parameters:
š¼[š½0] = š½0š¼[š½1] = š½1š¼[š½2] = 0
ā¢ Including an irrelevant variable will increase the standarderrors for š½1.
53 / 62
-
7/ Multicollinearity
54 / 62
-
Sampling variance for bivariateregression
ā¢ Under simple linear regression, we found that the samplingvariance of the slope was the following:
var(š½1) =š2š¢
āšš=1(šš ā š)2
ā¢ Question What do we call the square root of the samplingvariance? Standard error!
ā¢ Factors affecting the standard errors:ā¶ The error variance š2š¢ (higher conditional variance of šš leads
to bigger SEs)ā¶ The total variation in šš: āšš=1(šš ā š)2 (lower variation in šš
leads to bigger SEs)
55 / 62
-
Sampling variation with 2covariates
ā¢ Regression with an additional independent variable:
var(š½1) =š2š¢
(1 ā š 21) āšš=1(šš ā š)2
ā¢ Here, š 21 is the š 2 from the regression of šš on šš:šš = šæ0 + šæ1šš
ā¢ Factors now affecting the standard errors:ā¶ The error variance (higher conditional variance of šš leads to
bigger SEs)ā¶ The total variation of šš (lower variation in šš leads to bigger
SEs)ā¶ The strength of the relationship betwee šš and šš (stronger
relationships mean higher š 21 and thus bigger SEs)ā¢ What happens with perfect collinearity?
56 / 62
-
Multicollinearity
DefinitionMulticollinearity is defined to be high, but not perfect, correlationbetween two independent variables in a regression.
ā¢ With multicollinearity, weāll have š 21 ā 1, but not exactly.ā¢ The stronger the relationship between šš and šš, the closer
the š 21 will be to 1, and the higher the SEs will be:
var(š½1) =š2š¢
(1 ā š 21) āšš=1(šš ā š)2
ā¢ Given the symmetry, it will also increase var(š½2) as well.
57 / 62
-
Intuition for multicollinearity
ā¢ Remember the OLS recipe:ā¶ š½1 from regression of šš on Ģšš„š§,šā¶ Ģšš„š§,š are the residuals from the regression of šš on šš
ā¢ Estimated coefficient:
š½1 =āšš=1 Ģšš„š§,šššāšš=1 Ģš2š„š§,š
ā¢ When šš and šš have a strong relationship, then the residualswill have low variation (draw this)
ā¢ We explain away a lot of the variation in šš through šš.ā¢ Low variation in an independent variable (here, Ģšš„š§,š) ā high
SEsā¢ Basically, there is less residual variation left in šš after
āpartialling outā the effect of šš
58 / 62
-
Effects of multicollinearityā¢ No effect on the bias of OLS.ā¢ Only increases the standard errors.ā¢ Really just a sample size problem:
ā¶ If šš and šš are extremely highly correlated, youāre going toneed a much bigger sample to accurately differentiate betweentheir effects.
59 / 62
-
Conclusion
ā¢ In this brave new world with 2 independent variables:1. š½ās have slightly different interpretations2. OLS still minimizing the sum of the squared residuals3. Small adjustments to OLS assumptions and inference4. Adding or omitting variables in a regression can affect the bias
and the variance of OLSā¢ Remainder of class:
1. Regression in most general glory (matrices)2. How to diagnose and fix violations of the OLS assumptions
60 / 62
-
Deriving of the misspecifiedcoefficient
ā¢ Here weāll use cĢov to mean the sample covariance, and vĢar tobe the sample variance.
Ģš½1 =cĢov(šš, šš)
vĢar(šš)(OLS formulas)
= cĢov(š½0 + š½1šš + š½2šš + š¢š, šš)vĢar(šš)(Linearity in correct model)
= cĢov(š½0, šš)vĢar(šš)+ cĢov(š½1šš, šš)vĢar(šš)
+ cĢov(š½2šš, šš)vĢar(šš)+ cĢov(š¢š, šš)vĢar(šš)
(covariance properties)
= 0 + cĢov(š½1šš, šš)vĢar(šš)+ cĢov(š½2šš, šš)vĢar(šš)
+ 0 (zero mean error)
= š½1vĢar(šš)vĢar(šš)
+ š½2cĢov(šš, šš)
vĢar(šš)(properties of covariance)
= š½1 + š½2cĢov(šš, šš)
vĢar(šš)= š½1 + š½2šæ1 (OLS formulas)61 / 62
-
Next step
ā¢ Letās take expectations:
š¼[ Ģš½1] = š¼[š½1 + š½2šæ1]= š½1 + š½2š¼[šæ1]= š½1 + š½2šæ1
ā¢ Thus, we can calculate the bias here:
Bias( Ģš½1) = š¼[ Ģš½1] ā š½1 = š½2šæ1
62 / 62
Why add variables to the regression?Adding a binary variableAdding a continuous variableOLS mechanics with 2 variablesOLS assumptions & inference with 2 variablesOmitted Variable BiasMulticollinearity